Modernizing Data Lakes and Data Warehouses with GCP

Start Date: 01/24/2021

Course Type: Common Course

Course Link:

About Course

The two key components of any data pipeline are data lakes and warehouses. This course highlights use-cases for each type of storage and dives into the available data lake and warehouse solutions on Google Cloud Platform in technical detail. Also, this course describes the role of a data engineer, the benefits of a successful data pipeline to business operations, and examines why data engineering should be done in a cloud environment. Learners will get hands-on experience with data lakes and warehouses on Google Cloud Platform using QwikLabs.

Course Syllabus

Building a Data Lake

Coursera Plus banner featuring three learners and university partner logos

Course Introduction

Modernizing Data Lakes and Data Warehouses with GCP This 6-week MOOC focuses on understanding the modern data lake and data warehouse architecture trends and issues. You will learn about the modern data lakes and data warehouse architectures and their different levels of abstraction. The modern data lakes and data warehouse architectures are discussed in detail using both class and instance based examples. The full course consists of 4 modules spread out over 6 weeks. In week 1 we discuss the modern data lakes and data warehouse architectures from a class-based perspective. In week 2 we present the various data lake level abstraction levels and their corresponding classes and their subclasses. In week 3 we present the data warehouse level abstraction levels and their corresponding classes. In week 4 we present the modern data lake and data warehouse architectures from a data warehousing perspective. All of the features of this course are available for free. It does not offer a certificate upon completion.The Modern Data Lake and Data Warehouses Modern Data Lakes and Data Surplus Modern Data Lakes and CHF Modern Data Surplus and Modern Data Lakes Modern Robotics and Control This course introduces the modern robotics and control toolkit with a focus on the application of modern robotics to specific types of robots and to a specific type of robot from a control application perspective. The course also focuses on the study of robot dynamics during interaction with various types of robots and the study of robot control. The course also covers

Course Tag

Related Wiki Topic

Article Example
Data warehouse The environment for data warehouses and marts includes the following:
Data lake James Dixon, then chief technology officer at Pentaho coined the term to contrast it with data mart, which is a smaller repository of interesting attributes extracted from raw data. He argued that data marts have several inherent problems, and promoted data lakes. These problems are often referred to as information siloing. PricewaterhouseCoopers said that data lakes could "put an end to data silos. In their study on data lakes they noted that enterprises were "starting to extract and place data for analytics into a single, Hadoop-based repository."
Data warehouse Data warehouses (DW) often resemble the hub and spokes architecture. Legacy systems feeding the warehouse often include customer relationship management and enterprise resource planning, generating large amounts of data. To consolidate these various data models, and facilitate the extract transform load process, data warehouses often make use of an operational data store, the information from which is parsed into the actual DW. To reduce data redundancy, larger systems often store the data in a normalized way. Data marts for specific reports can then be built on top of the Data warehouse.
Staging (data) A staging area, or landing zone, is an intermediate storage area used for data processing during the extract, transform and load (ETL) process. The data staging area sits between the data source(s) and the data target(s), which are often data warehouses, data marts, or other data repositories.
Data Quality Campaign 4. Data Repositories: Build state data repositories (e.g. data warehouses) that integrate student,
Data migration Data integration, by contrast, is a permanent part of the IT architecture, and is responsible for the way data flows between the various applications and data stores - and is a process rather than a project activity. Standard ETL technologies designed to supply data from operational systems to data warehouses would fit within the latter category.
BusinessObjects Data Integrator The Data Integrator product consists primarily of a Data Integrator Job Server and the Data Integrator Designer. It is commonly used for building data marts, ODS systems and data warehouses, etc.
Alternative data An especially promising option is the full reporting of trade credit data, such as records from cash and carry warehouses.
Clinical data management Standard operating procedures (SOPs) describe the process to be followed in conducting data management activities and support the obligation to follow applicable laws and guidelines (e.g. ICH GCP and 21CFR Part 11) in the conduct of data management activities.
Data warehouse In regards to source systems listed above, Rainer states, "A common source for the data in data warehouses is the company's operational databases, which can be relational databases".
Data mart While transactional databases are designed to be updated, data warehouses or marts are read only. Data warehouses are designed to access large groups of related records. Data marts improve end-user response time by allowing users to have access to the specific type of data they need to view most often by providing the data in a way that supports the collective view of a group of users.
Data mart Organizations build data warehouses and data marts because the information in the database is not organized in a way that makes it readily accessible, requiring queries that are too complicated or resource-consuming.
Data integration Since 2011, data hub approaches have been of greater interest than fully structured (typically relational) Enterprise Data Warehouses. Since 2013, data lake approaches have risen to the level of Data Hubs. (See all three search terms popularity on Google Trends.) These approaches combine unstructured or varied data into one location, but do not necessarily require an (often complex) master relational schema to structure and define all data in the Hub.
Data Data ( , , or ) is a set of values of qualitative or quantitative variables. An example of qualitative data would be an anthropologist's handwritten notes about his or her interviews with people of an Indigenous tribe. Pieces of data are individual pieces of information. While the concept of data is commonly associated with scientific research, data is collected by a huge range of organizations and institutions, including businesses (e.g., sales data, revenue, profits, stock price), governments (e.g., crime rates, unemployment rates, literacy rates) and non-governmental organizations (e.g., censuses of the number of homeless people by non-profit organizations).
Data monetization Data monetization, a form of monetization, is generating revenue from available data sources or real time streamed data by instituting the discovery, capture, storage, analysis, dissemination, and use of that data. Said differently, it is the process by which data producers, data aggregators and data consumers, large and small, exchange sell or trade data. Data monetization leverages data generated through business operations as well as data associated with individual actors and with electronic devices and sensors participating in the internet of things. The ubiquity of the internet of things is generating location data and other data from sensors and mobile devices at an ever increasing rate. When this data is collated against traditional databases, the value and utility of both sources of data increases, leading to tremendous potential to mine data for social good, research and discovery, and achievement of business objectives. Closely associated with data monetization are the emerging data as a service models for transactions involving data by the data item.
Tidy data Tidy data provide standards and concepts for data cleaning, and with tidy data there’s no need to start from scratch and reinvent new methods for data cleaning.
Data lake Many companies also use cloud storage services such as Amazon S3. There is a gradual academic interest in the concept of data lakes, for instance, Personal DataLake at Cardiff University to create a new type of data lake which aims at managing big data of individual users by providing a single point of collecting, organizing, and sharing personal data.
Data mart According to the Inmon school of data warehousing, tradeoffs inherent with data marts include limited scalability, duplication of data, data inconsistency with other silos of information, and inability to leverage enterprise sources of data.
Data warehouse Data warehouses are optimized for analytic access patterns. Analytic access patterns generally involve selecting specific fields and rarely if ever 'select *' as is more common in operational databases. Because of these differences in access patterns, operational databases (loosely, OLTP) benefit from the use of a row-oriented DBMS whereas analytics databases (loosely, OLAP) benefit from the use of a column-oriented DBMS. Unlike operational systems which maintain a snapshot of the business, data warehouses generally maintain an infinite history which is implemented through ETL processes that periodically migrate data from the operational systems over to the data warehouse.
Data steward A data steward is a person responsible for the management and fitness of data elements - both the content and metadata. Data stewards have a specialist role that incorporates processes, policies, guidelines and responsibilities for administering organizations' entire data in compliance with policy and/or regulatory obligations. A data steward may share some responsibilities with a data custodian.