Open Source tools for Data Science

Start Date: 07/05/2020

Course Type: Common Course

Course Link:

About Course

What are some of the most popular data science tools, how do you use them, and what are their features? In this course, you'll learn about Jupyter Notebooks, RStudio IDE, Apache Zeppelin and Data Science Experience. You will learn about what each tool is used for, what programming languages they can execute, their features and limitations. With the tools hosted in the cloud on Cognitive Class Labs, you will be able to test each tool and follow instructions to run simple code in Python, R or Scala. To end the course, you will create a final project with a Jupyter Notebook on IBM Data Science Experience and demonstrate your proficiency preparing a notebook, writing Markdown, and sharing your work with your peers. LIMITED TIME OFFER: Subscription is only $39 USD per month for access to graded materials and a certificate.

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Open Source tools for Data Science Data science is a fast and growing area that is rapidly becoming a science discipline of the future. Data scientists are drawn to the fast pace of change, the collision of new and old ideas, new and unexpected project opportunities, and the challenges of maintaining a high-quality data science program. This course will take you on a journey to explore open source software and data science tools that are used in data science across the world. You will learn about the various open source data science tools and platforms that are used by data scientists to collect, process, store, and share data with their colleagues. You will also learn about the different open data science tools and platforms that data scientists use to manage the data science process.Open Data Resources Collecting and Processing Data Collecting and Processing Data & Storage Managing Data Science Projects Open Source Software Development Methods in Python This course focuses on the Python programming techniques for manipulating software sources and then building applications from those modified sources. In this course, we also cover the topics such as modularity, using functions, classes, inheritance, and polymorphism. We will also introduce the use of the Python logging module and how to use it for debugging your programs. In the second half of the course, we will introduce the use of the Python subprocess module to manipulate and control programs. We will also explain the concepts of classes, inheritance, and polymorphism. We will use Python

Course Tag

Related Wiki Topic

Article Example
Open science Big scientific projects are more likely to practice open science than small projects. Different projects conduct, advocate, develop tools for, or fund open science, and many organizations run multiple projects. For example, the Allen Institute for Brain Science conducts numerous open science projects while the Center for Open Science has projects to conduct, advocate, and create tools for open science.
Open science data In 2010 the Panton Principles launched, advocating Open Data in science and setting out for principles to which providers must comply to have their data Open.
Open science data Open science data is a type of open data focused on publishing observations and results of scientific activities available for anyone to analyze and reuse. While the "idea" of open science data has been actively promoted since the 1950s, the rise of the Internet has significantly lowered the cost and time required to publish or obtain data.
Open science Other advocates concentrate on educating scientists about appropriate open science software tools. Education is available as training seminars, e.g., Software Sustainability Institute's Software Carpentry project ; as domain specific training materials, e.g., Software Sustainability Institute's Data Carpentry project ; and as materials for teaching graduate classes, e.g., the Open Science Training Initiative . Many organizations also provide education in the general principles of open science.
Talend Open Studio for Data Quality Talend Open Studio for Data Quality is an open source computer software project for data profiling. The project is driven by commercial open source vendor Talend.
Open science data In 2013 the G8 Science Ministers released a Statement supporting a set of principles for open scientific research data
Open science The Fecher and Frieske model can aid discussions of broad initiatives. For example, e-science promotes massive shared computational work and Science 2.0 provides tools for increased collaboration so both could be argued to be in the "infrastructure school". Open science data and data publishing both advocate releasing data and could be argued as the "measurement school" or perhaps part of the "democratic school". Each of these initiatives covers a broad range of projects with a common perception of improving science.
Open science According to the FOSTER taxonomy Open science can often include aspects of Open access, Open data and the open source movement whereby modern science requires software in order to process data and information. Open research computation also addresses the problem of reproducibility of scientific results.
Open data Open data is the idea that some data should be freely available to everyone to use and republish as they wish, without restrictions from copyright, patents or other mechanisms of control. The goals of the open data movement are similar to those of other "open" movements such as open source, open hardware, open content and open access. The philosophy behind open data has been long established (for example in the Mertonian tradition of science), but the term "open data" itself is recent, gaining popularity with the rise of the Internet and World Wide Web and, especially, with the launch of open-data government initiatives such as and
Open science data In 2015 the World Data System of the International Council for Science adopted a new set of Data Sharing Principles to embody the spirit of 'open science'. These Principles are in line with data policies of national and international initiatives and they express core ethical commitments operationalized in the WDS Certification of trusted data repositories and service.
Open science data In 2011 was launched to realize the approach of the Linked Open Science to openly share and interconnect scientific assets like datasets, methods, tools and vocabularies.
Data science In the 2010-2011 time frame, data science software reached an inflection point where open source software started supplanting proprietary software. The use of open source software enables modifying and extending the software, and it allows sharing of the resulting algorithms.
Open data While the open-science-data movement long predates the Internet, the availability of fast, ubiquitous networking has significantly changed the context of Open science data, since publishing or obtaining data has become much less expensive and time-consuming.
Open science data The concept of open access to scientific data was institutionally established with the formation of the World Data Center system (now the World Data System), in preparation for the International Geophysical Year of 1957–1958. The International Council of Scientific Unions (now the International Council for Science) established several World Data Centers to minimize the risk of data loss and to maximize data accessibility, further recommending in 1955 that data be made available in machine-readable form.
Space Telescope Science Data Analysis System The Space Telescope Science Data Analysis System (STSDAS) is an IRAF-based suite of astronomical software for reducing and analyzing astronomical data. It contains general purpose tools and packages for processing data from the Hubble Space Telescope. STSDAS is produced by Space Telescope Science Institute (STScI). The STSDAS software is in the public domain and the source code is available.
Open-source learning Two years of data indicates that open-source learning students show higher levels of participation in classes, are more likely to apply for scholarships and register for Advanced Placement (AP) courses and exams, and are enthusiastic during and after the completion of their classes. The tools and techniques of Open Source Learning trains students to be innovators, and helps them build positive online presence—essential skills for progression in 21st century academics and careers.
Open data Open data can come from any source. This section lists some of the fields that publish (or at least discuss publishing) a large amount of open data.
Open science A variety of computer resources support open science. These include software like the Open Science Framework from the Center for Open Science to manage project information, data archiving and team coordination; distributed computing services like Ibercivis to utilize unused CPU time for computationally intensive tasks; and services like to provide crowdsourced funding for research projects.
Open Data Institute The ODI Labs team creates tools, techniques and standards for open data publishing. Flagship ODI Labs products include Open Data Certificates, which show that data as been published in a sustainable and reusable way, and an Open Data Maturity Model and associated Open Data Pathway tool for organisations to assess their data practices (developed in collaboration with The Department for Environment, Food and Rural Affairs (Defra).
Data source A data source is any of the following types of sources for (mostly) digitized data: