




























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Introduction to Data Engineering, Data science, comparison of the role of Data scientist and data engineers
Typology: Lecture notes
Uploaded on 05/30/2020
1 document
1 / 36
This page cannot be seen from the preview
Don't miss anything!





























2
(^) In this first chapter, you will be Exposed to the world of data engineering. (^) Explore the differences between a data engineer and a data scientist. (^) Get an overview of the various tools data engineers use. (^) Expand your understanding of how cloud technology plays a role in data engineering.
4
(^) Preparing Data for Analytics is Hard. (^) Data is often the biggest challenge of self-service analytics. (^) Self-Service Analytics allows end users to easily (^) analyze their data by building their own reports and modify existing ones with little to no training.
5
(^) Half of the organizations are accessing external data sources. (^) Data is scattered.
7
(^) Database needs to be optimized so it becomes (^) faster to query free of corrupt data
8
(^) In comes the Data Engineer to rescue.
10
(^) The tasks of a data engineer consist of: (^) developing a scalable data architecture (schema) (^) تطوير بنية بيانات قابلة للتطوير (مخطط) (^) streamlining data acquisition (^) تبسيط الحصول على البيانات (^) setting up processes that bring data together from several sources (^) إعداد عمليات تجمع البيانات من عدة مصادر safeguarding data quality by cleaning up corrupt data (^) عن طريق تنظيف البيانات الفاسدةحماية جودة البيانات ع
11
(^) Data engineers design , build , and maintain data architectures for large-scale applications. This career path requires strong software engineering skills (^) Essentially, a data engineer needs to have the skills to build a data pipeline that connects all the pieces of the data ecosystem together and keep it up and running. (^) Data engineering is the first — and arguably most crucial — step for a successful data strategy. Data engineers make sure data scientists have the data they need to perform data science.
13
14
(^) To emphasize just how important data engineering is for data science, take a look at the following hierarchy of needs, proposed by Monica Rogati.
16
(^) There are some differences between the tasks of data scientists and the tasks of data engineers. (^) Below are three essential tasks that need to happen in a data- driven company. Can you find the one that best fits the job of a data engineer? (^) Apply a statistical model to a large dataset to find outliers. (^) Set up scheduled ingestion of data from the application databases to an analytical database. (^) Come up with a database schema for an application.
17
(^) Classify the tasks in the correct color. Data engineer (red) or the data scientist (blue). (^) Cloud technology (^) Mining data for patterns (^) Monitor business processes Streamline data acquisition (^) Clean statistical outliers in data (^) Set up processes to bring together data (^) Statistical modeling (^) Develop scalable data architecture (^) Predictive models using machine learning (^) Clean corrupt data
19
(^) Data engineers are expert users of database systems. (^) A database is a computer system that holds large amounts of data. (^) Applications rely on databases to provide certain functionality. (^) Other databases are used for analyses. (^) The data engineer’s task begins and ends at databases.
20
(^) Tools for quickly processing data (^) Clean data Aggregate data (^) Join data (^) Huge data have to be processed. That is where parallel processing comes into play. (^) Data engineers use clusters of machines to process data.