Modern Big Data Analysis with SQL Specialization

Start Date: 07/14/2019

Course Type: Specialization Course

Course Link: https://www.coursera.org/specializations/cloudera-big-data-analysis-sql

Explore 1600+ online courses from top universities. Join Coursera today to learn data science, programming, business strategy, and more.

About Course

This Specialization teaches the essential skills for working with large-scale data using SQL. Maybe you are new to SQL and you want to learn the basics. Or maybe you already have some experience using SQL to query smaller-scale data with relational databases. Either way, if you are interested in gaining the skills necessary to query big data with modern distributed SQL engines, this Specialization is for you. Most courses that teach SQL focus on traditional relational databases, but today, more and more of the data that’s being generated is too big to be stored there, and it’s growing too quickly to be efficiently stored in commercial data warehouses. Instead, it’s increasingly stored in distributed clusters and cloud storage. These data stores are cost-efficient and infinitely scalable. To query these huge datasets in clusters and cloud storage, you need a newer breed of SQL engine: distributed query engines, like Hive, Impala, Presto, and Drill. These are open source SQL engines capable of querying enormous datasets. This Specialization focuses on Hive and Impala, the most widely deployed of these query engines. This Specialization is designed to provide excellent preparation for the Cloudera Certified Associate (CCA) Data Analyst certification exam. You can earn this certification credential by taking a hands-on practical exam using the same SQL engines that this Specialization teaches—Hive and Impala.

Course Syllabus

Foundations for Big Data Analysis with SQL
Analyzing Big Data with SQL
Managing Big Data in Clusters and Cloud Storage

Deep Learning Specialization on Coursera

Course Introduction

Learn Data Analysis for Big Data. Master using SQL for data analysis on distributed big data systems

Course Tag

Related Wiki Topic

Article Example
Big data The Workshops on Algorithms for Modern Massive Data Sets (MMDS) bring together computer scientists, statisticians, mathematicians, and data analysis practitioners to discuss algorithmic challenges of big data.
Big data Big data analysis is often shallow compared to analysis of smaller data sets. In many big data projects, there is no large data analysis happening, but the challenge is the extract, transform, load part of data preprocessing.
Social data analysis Social data analysis is a style of analysis in which people work in a social, collaborative context to make sense of data. The term was introduced by Martin Wattenberg in 2005 and recently also addressed as big social data analysis in relation to big data computing.
Azure SQL Data Warehouse SQL Data Warehouse integrates with data tools such as Azure Machine Learning for advanced analytics, Azure Data Factory for data orchestration, Azure Stream Analytics for real-time analytics, and Power BI for data visualization. Standard query tools like SQL Management Studio (SSMS), SQLCMD, and SQL Server Data Tools can also be used to execute queries.
Data analysis Data integration is a precursor to data analysis, and data analysis is closely linked to data visualization and data dissemination. The term "data analysis" is sometimes used as a synonym for data modeling.
Big data Big data is a term for data sets that are so large or complex that traditional data processing application software is inadequate to deal with them. Challenges include capture, storage, analysis, data curation, search, sharing, transfer, visualization, querying, updating and information privacy. The term "big data" often refers simply to the use of predictive analytics, user behavior analytics, or certain other advanced data analytics methods that extract value from data, and seldom to a particular size of data set. "There is little doubt that the quantities of data now available are indeed large, but that’s not the most relevant characteristic of this new data ecosystem."
Big data The use and adoption of big data within governmental processes allows efficiencies in terms of cost, productivity, and innovation, but does not come without its flaws. Data analysis often requires multiple parts of government (central and local) to work in collaboration and create new and innovative processes to deliver the desired outcome. Below are some examples of initiatives the governmental big data space.
Microsoft SQL Server SQL Server Analysis Services adds OLAP and data mining capabilities for SQL Server databases. The OLAP engine supports MOLAP, ROLAP and HOLAP storage modes for data. Analysis Services supports the XML for Analysis standard as the underlying communication protocol. The cube data can be accessed using MDX and LINQ queries.
Data analysis Analysis of data, also known as data analytics, is a process of inspecting, cleansing, transforming, and modeling data with the goal of discovering useful information, suggesting conclusions, and supporting decision-making. Data analysis has multiple facets and approaches, encompassing diverse techniques under a variety of names, in different business, science, and social science domains.
Microsoft SQL Server Business Intelligence Development Studio (BIDS) is the IDE from Microsoft used for developing data analysis and Business Intelligence solutions utilizing the Microsoft SQL Server Analysis Services, Reporting Services and Integration Services. It is based on the Microsoft Visual Studio development environment but is customized with the SQL Server services-specific extensions and project types, including tools, controls and projects for reports (using Reporting Services), Cubes and data mining structures (using Analysis Services). For SQL Server 2012 and later, this IDE has been renamed SQL Server Data Tools (SSDT).
SQL Server Reporting Services "SQL Server Reporting Services (SSRS)" is a server-based report generating software system from Microsoft. It is part of suite of Microsoft SQL Server services, including SSAS (SQL Server Analysis Services) and SSIS (SQL Server Integration Services). While SSAS enables users to construct special databases for fast analysis of very large amounts of data, and while SSIS enables users to integrate data from many sources outside Microsoft SQL Server, SSRS enables users to quickly and easily generate reports from Microsoft SQL Server databases.
Data analysis The most important distinction between the initial data analysis phase and the main analysis phase, is that during initial data analysis one refrains from any analysis that is aimed at answering the original research question. The initial data analysis phase is guided by the following four questions:
Data analysis In the main analysis phase either an exploratory or confirmatory approach can be adopted. Usually the approach is decided before data is collected. In an exploratory analysis no clear hypothesis is stated before analysing the data, and the data is searched for models that describe the data well. In a confirmatory analysis clear hypotheses about the data are tested.
Big data Research on the effective usage of information and communication technologies for development (also known as ICT4D) suggests that big data technology can make important contributions but also present unique challenges to International development. Advancements in big data analysis offer cost-effective opportunities to improve decision-making in critical development areas such as health care, employment, economic productivity, crime, security, and natural disaster and resource management. Additionally, user-generated data offers new opportunities to give the unheard a voice. However, longstanding challenges for developing regions such as inadequate technological infrastructure and economic and human resource scarcity exacerbate existing concerns with big data such as privacy, imperfect methodology, and interoperability issues.
Big data Big data often poses the same challenges as small data; and adding more data does not solve problems of bias, but may emphasize other problems. In particular data sources such as Twitter are not representative of the overall population, and results drawn from such sources may then lead to wrong conclusions. Google Translate—which is based on big data statistical analysis of text—does a good job at translating web pages. However, results from specialized domains may be dramatically skewed.
Data analysis The choice of analyses to assess the data quality during the initial data analysis phase depends on the analyses that will be conducted in the main analysis phase.
Data analysis expressions Data Analysis Expressions (DAX) is the native formula and query language for Microsoft PowerPivot, Power BI Desktop and SQL Server Analysis Services (SSAS) Tabular models. DAX includes some of the functions that are used in Excel formulas, and additional functions that are designed to work with relational data and perform dynamic aggregation. It is, in part, an evolution of the Multidimensional Expression (MDX) language developed by Microsoft for Analysis Services multidimensional models (often called cubes) combined with Excel formula functions. It is designed to be simple and easy to learn, while exposing the power and flexibility of PowerPivot and SSAS tabular models.
Data analysis Barriers to effective analysis may exist among the analysts performing the data analysis or among the audience. Distinguishing fact from opinion, cognitive biases, and innumeracy are all challenges to sound data analysis.
Data Mining Extensions Data Mining Extensions (DMX) is a query language for Data Mining Models supported by Microsoft's SQL Server Analysis Services product.
Big data In the provocative article "Critical Questions for Big Data", the authors title big data a part of mythology: "large data sets offer a higher form of intelligence and knowledge [...], with the aura of truth, objectivity, and accuracy". Users of big data are often "lost in the sheer volume of numbers", and "working with Big Data is still subjective, and what it quantifies does not necessarily have a closer claim on objective truth". Recent developments in BI domain, such as pro-active reporting especially target improvements in usability of big data, through automated filtering of non-useful data and correlations.