Working with Data – Overview, Lecture notes of Database Management Systems (DBMS)

An overview of the CS102 course on working with data. It highlights the importance of data-driven scientific discovery, business practices, medicine, education, politics, societal interventions, and the ability to collect data across many domains. The document covers data tools and techniques, pitfalls in working with data, data systems and platforms, promises of working with data, basic data manipulation and analysis, data mining, and machine learning. It also discusses the importance of using data to build models and make predictions. examples of data analysis and data mining techniques.

Typology: Lecture notes

2019/2020

Uploaded on 05/11/2023

arold
arold 🇺🇸

4.7

(24)

372 documents

1 / 53

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS102
Overview
Working with Data Overview
CS102
Spring 2020
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35

Partial preview of the text

Download Working with Data – Overview and more Lecture notes Database Management Systems (DBMS) in PDF only on Docsity!

Working with Data – Overview

CS

Spring 2020

Data is Everywhere

§ Explosion in data-driven scientific discovery, business practices, medicine, education, politics, societal interventions, … § And it’s just the beginning Ø Ability to collect data across many domains will continue to accelerate Ø Data analysis techniques will continue to improve “Data is the oil of the 21 st century”

The Two Steps of Working with Data

(1) Collect data Via computers, sensors, people, events, … (2) Do something with it Make decisions, confirm hypotheses, gain insights, predict future, … “Data Science” = Going from (1) to (2)

This Overview

§ Promises of working with data Applications and services § Data tools and techniques Database management systems Data mining and machine learning § Pitfalls in working with data Correlation and causation Underfitting and overfitting Privacy and a few others § Data systems and platforms

Traffic

(1) Collect data (2) Do something with it

Recommender Systems

(1) Collect data

  • music, news, friends, romantic partners, and many more! (2) Do something with it

Sports

(2) Do something with it (1) Collect data

Ocean Health

44,000 sensors, over 2 billion measurements Physical, chemical, biological … (1) Collect and curate data (2) Do something with it

And Many More

§ Weather prediction § Medical diagnosis § Financial markets § Resource management § Computational social science § Smart buildings and cities § The list goes on and on, and it’s still early days

Data Tools and Techniques

§ Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions (“queries”) § Data Mining Looking for patterns in data § Machine Learning Using data to build models and make predictions § Data Visualization Graphical depiction of data § Data Collection and Preparation

Basic Data Manipulation and Analysis

Performing well-defined computations or asking well-defined questions (“queries”) § Average January low temperature for each country over last 20 years § Number of items over $100 bought by females between ages 20 and 30 § Frequency of specific medicine relieving specific symptoms § The ten stocks whose price varied the most over the past year

  • Spreadsheets
  • Relational (SQL) database systems
  • “NoSQL” / scalable systems
  • Programming languages with data support (e.g., Python, R)

Data Mining

Looking for patterns in data § Items X,Y,Z are bought together frequently § People who like movie X also like movie Y § Patients who respond well to medicines X and Y also respond well to medicine Z § Students going to the same university are frequently online friends § Wealthier people are moving from cities to suburbs

Machine Learning

Using data to build models and make predictions § Customers who are women over age 20 are likely to respond to an advertisement § Students with good grades are predicted to do well on the SAT § The temperature of a city can be estimated as the average of its nearby cities, unless some of the cities are on the coast or in the mountains

Machine Learning

Using data to build models and make predictions § Customers who are women over age 20 are likely to respond to an advertisement § Students with good grades are predicted to do well on the SAT § The temperature of a city can be estimated as the average of its nearby cities, unless some of the cities are on the coast or in the mountains Roughly: Basic data analysis and data mining give answers from the available data, while machine learning uses the available data to make predictions about missing or future data Ÿ Regression Ÿ Classification Ÿ Clustering