How to Win a Data Science Competition: Learn from Top Kagglers

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

If you want to break into competitive data science, then this course is for you! Participating in predictive modelling competitions can help you gain practical experience, improve and harness your data modelling skills in various domains such as credit, insurance, marketing, natural language processing, sales’ forecasting and computer vision to name a few. At the same time you get to do it in a competitive context against thousands of participants where each one tries to build the most predictive algorithm. Pushing each other to the limit can result in better performance and smaller prediction errors. Being able to achieve high ranks consistently can help you accelerate your career in data science. In this course, you will learn to analyse and solve competitively such predictive modelling tasks. When you finish this class, you will: - Understand how to solve predictive modelling competitions efficiently and learn which of the skills obtained can be applicable to real-world tasks. - Learn how to preprocess the data and generate new features from various sources such as text and images. - Be taught advanced feature engineering techniques like generating mean-encodings, using aggregated statistical measures or finding nearest neighbors as a means to improve your predictions. - Be able to form reliable cross validation methodologies that help you benchmark your solutions and avoid overfitting or underfitting when tested with unobserved (test) data. - Gain experience of analysing and interpreting the data. You will become aware of inconsistencies, high noise levels, errors and other data-related issues such as leakages and you will learn how to overcome them. - Acquire knowledge of different algorithms and learn how to efficiently tune their hyperparameters and achieve top performance. - Master the art of combining different machine learning models and learn how to ensemble. - Get exposed to past (winning) solutions and codes and learn how to read them. Disclaimer : This is not a machine learning course in the general sense. This course will teach you how to get high-rank solutions against thousands of competitors with focus on practical usage of machine learning methods rather than the theoretical underpinnings behind them. Prerequisites: - Python: work with DataFrames in pandas, plot figures in matplotlib, import and train models from scikit-learn, XGBoost, LightGBM. - Machine Learning: basic understanding of linear models, K-NN, random forest, gradient boosting and neural networks. Do you have technical problems? Write to us:

Course Syllabus

This week we will introduce you to competitive data science. You will learn about competitions' mechanics, the difference between competitions and a real life data science, hardware and software that people usually use in competitions. We will also briefly recap major ML models frequently used in competitions.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

How to Win a Data Science Competition: Learn from Top Kagglers Wanna become a data ninja? This course is for those who are interested in becoming a data ninja, or would like to make sure that they understand the basics of data science. This course provides an overview of how data scientists work, how data are processed and stored, and the role of data in competitive games. We will look at the top data science projects from 2009 through 2014, and help you identify opportunities to apply these concepts through practice. Data science is a fluid and exciting area, and many data science projects and initiatives are in progress. Data science projects require a strong interest, and the skills necessary to perform a data science project may include data analysis, machine learning, data visualization, data-driven decision-making, and more. This course is for those of you who want to understand the tools that data scientists use, and who would like to know how to identify opportunities to apply these tools to solve a problem. The course is divided into four modules: Week 1: You will be challenged to complete a simple task to earn a small token of data science expertise. Week 2: You will become familiar with a particular data science project, and will use these resources to fully apply the data science knowledge you gain in Week 1. Week 3: You will continue to build your application, using the skills you learned in Week 1 to process and visualize data, and apply machine learning to process and visualize data, and apply data-driven decision

Course Tag

Data Analysis Feature Extraction Feature Engineering Xgboost

Related Wiki Topic

Article Example
How People Learn How People Learn is the title of an educational psychology book edited by John D. Bransford, Ann L. Brown, and Rodney R. Cocking and published by the United States National Academy of Sciences' National Academies Press. The committee on "How People Learn" also wrote "How Students Learn: History, Mathematics, and Science in the Classroom" as a follow-up.
Data science Data science, also known as data-driven science, is an interdisciplinary field about scientific methods, processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured, similar to Knowledge Discovery in Databases (KDD).
Learning How to Learn Written in response to enquiries about the Sufi tradition, "Learning How to Learn: Psychology and Spirituality in the Sufi Way" presents traditional teaching stories and anecdotes and articles from newspapers to illustrate prerequisites to Sufi learning. One such prerequisite is that the learner should organise their basic human needs so as to be able to give adequate attention to their studies. The second section of the book is dedicated entirely to Shah's theory on the human need to give and receive attention.
Learn to Talk In 1990 RecRec Music re-issued "Learn to Talk" together with Skeleton Crew's next album "The Country of Blinds" on a single compilation CD, "Learn to Talk / Country of Blinds", omitting "Los Colitos" and "Life At The Top" from "Learn to Talk", and "Money Crack" from "The Country of Blinds".
Data science Data science is a "concept to unify statistics, data analysis and their related methods" in order to "understand and analyze actual phenomena" with data.
Learning How to Learn Shortly before he died, Shah stated that his books form a complete course that could fulfil the function he had fulfilled while alive. As such, "Learning How to Learn: Psychology and Spirituality in the Sufi Way" can be read as part of a whole course of study.
Data science In 2013, the IEEE Task Force on Data Science and Advanced Analytics was launched, and the first international conference: IEEE International Conference on Data Science and Advanced Analytics was launched in 2014. In 2014, the American Statistical Association section on Statistical Learning and Data Mining renamed its journal to "Statistical Analysis and Data Mining: The ASA Data Science Journal" and in 2016 changed its section name to "Statistical Learning and Data Science". In 2015, the International Journal on Data Science and Analytics was launched by Springer to publish original work on data science and big data analytics. 2013 the first "European Conference on Data Analysis (ECDA)" was organised in Luxembourg establishing the European Association for Data Science (EuADS) in August 2015. In September 2015 the Gesellschaft für Klassifikation (GfKl) added to the name of the Society "Data Science Society" at the third ECDA conference at the University of Essex, Colchester, UK.
Learning How to Learn Richard Smoley and Jay Kinney, writing in "Hidden Wisdom: A Guide to the Western Inner Traditions" (2006), described "Learning How to Learn" as one of Shah's best works. They noted that the book provided a solid orientation to Shah's "psychological" approach to Sufi work and added that Shah, at his best, provides "insights that inoculate students against much of the nonsense in the spiritual marketplace."
Learning How to Learn Learning How to Learn: Psychology and Spirituality in the Sufi Way is a book by the writer Idries Shah that was first published by Octagon Press in 1978. Later editions by Harper & Row (1981) and Penguin Books (1985, 1993, 1996) include an introduction by Nobel Prize Winner Doris Lessing.
Data science In April 2002, the International Council for Science: Committee on Data for Science and Technology (CODATA) started the "Data Science Journal", a publication focused on issues such as the description of data systems, their publication on the internet, applications and legal issues. Shortly thereafter, in January 2003, Columbia University began publishing "The Journal of Data Science", which provided a platform for all data workers to present their views and exchange ideas. The journal was largely devoted to the application of statistical methods and quantitative research. In 2005, The National Science Board published "Long-lived Digital Data Collections: Enabling Research and Education in the 21st Century" defining data scientists as "the information and computer scientists, database and software and programmers, disciplinary experts, curators and expert annotators, librarians, archivists, and others, who are crucial to the successful management of a digital data collection" whose primary activity is to "conduct creative inquiry and analysis."
Data science Turing award winner Jim Gray imagined data science as a "fourth paradigm" of science (empirical, theoretical, computational and now data-driven) and asserted that "everything about science is changing because of the impact of information technology" and the data deluge.
Learn How to Read and Write, Son Learn How to Read and Write, Son () is a 1981 Greek comedy film directed by Thodoros Maragos and starring Vasilis Diamantopoulos, Nikos Kalogeropoulos, Kostas Tsakonas and Anna Mantzourani.
How to Win! How to WIN! is Maria Bamford's second comedy album, following "The Burning Bridges Tour". It was recorded at Cap City Comedy Club in Austin, Texas November 15-19, 2005.
Data science he initiated the modern, non-computer science, usage of the term "data science" and advocated that statistics be renamed data science and statisticians data scientists.
Data science "Data Scientist" has become a popular occupation with Harvard Business Review dubbing it "The Sexiest Job of the 21st Century" and McKinsey & Company projecting a global excess demand of 1.5 million new data scientists. Universities are offering masters courses in data science. Shorter private bootcamps are also offering data science certificates including student-paid programs like General Assembly to employer-paid programs like The Data Incubator.
How Children Learn How Children Learn is a nonfiction book by educator John Caldwell Holt, first published in 1967. A revised edition was released in 1983, with new chapters and commentaries.
Data science It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, data mining, databases, and visualization.
Data science The term "data science" (originally used interchangeably with "datalogy") has existed for over thirty years and was used initially as a substitute for computer science by Peter Naur in 1960. In 1974, Naur published "Concise Survey of Computer Methods", which freely used the term data science in its survey of the contemporary data processing methods that are used in a wide range of applications.
How to Design Programs The book therefore carefully introduces more and more complex kinds of data, which sets it apart from every other introductory programming book. It starts from "atomic" forms of data and then progresses to "compound" forms of data, including data that can be arbitrarily large. For each kind of data definition, the book explains how to organize the program in principle, thus enabling a programmer who encounters a new form of data to still construct a program systematically.
Data science Although use of the term "data science" has exploded in business environments, many academics and journalists see no distinction between data science and statistics. Writing in Forbes, Gil Press argues that data science is a buzzword without a clear definition and has simply replaced “business analytics” in contexts such as graduate degree programs. In the question-and-answer section of his keynote address at the Joint Statistical Meetings of American Statistical Association, noted applied statistician Nate Silver said, “I think data-scientist is a sexed up term for a statistician...Statistics is a branch of science. Data scientist is slightly redundant in some way and people shouldn’t berate the term statistician.”