CS 102: Working with Data, Lecture notes of Database Management Systems (DBMS)

What's This Course About? “Aimed at non-CS undergraduate and graduate students who want to learn a variety of tools and techniques for working with data.

Typology: Lecture notes

2022/2023

Uploaded on 05/11/2023

alley
alley 🇺🇸

4.2

(5)

256 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 102: Working with Data
Tools and Techniques
Spring 2020
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download CS 102: Working with Data and more Lecture notes Database Management Systems (DBMS) in PDF only on Docsity!

CS 102: Working with Data

Tools and Techniques

Spring 2020

Course Staff

Instructor

Jennifer Widom

Course Assistants

Leo Mehr (head)

Kyle D’Souza

Tara Iyer

Aamir Rasheed

What’s This Course About?

“Aimed at non-CS undergraduate and graduate students who want to learn a variety of tools and techniques for working with data. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing data sets. This course provides a broad and practical introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.”

What’s This Course About?

“Aimed at non-CS undergraduate and graduate students who want to learn a variety of tools and techniques for working with data. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing data sets. This course provides a broad and practical introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.”

Who Should Take It?

“Aimed at non-CS undergraduate and graduate students who want to learn a variety of tools and techniques for working with data. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing data sets. This course provides a broad and practical introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.”

Who Should Take It?

Aimed at non-CS undergraduate and graduate students who want to learn a variety of tools and techniques for working with data. Many of the world's biggest discoveries and decisions in science, technology, business, medicine, politics, and society as a whole, are now being made on the basis of analyzing data sets. This course provides a broad and practical introduction to working with data: data analysis techniques including databases, data mining, machine learning, and data visualization; data analysis tools including spreadsheets, Tableau, relational databases and SQL, Python, and R; introduction to network analysis and unstructured data. Tools and techniques are hands-on but at a cursory level, providing a basis for future exploration and application. Prerequisites: comfort with basic logic and mathematical concepts, along with high school AP computer science, CS106A, or other equivalent programming experience.”

Who Shouldn’t Take It?

Computer Science or MCS students

(except by petition)

Who’s Taking It – Spring 2020 American Studies Asian American Studies Biology Business Administration Chemistry Civil Engineering Civil & Environmental Engineering Comparative Studies in Race & Ethnicity Comparative Literature Computer Science Earth System Science Earth Systems East Asian Studies Economics Education Electrical Engineering Engineering Energy Resources Engineering English Environment and Resources Environmental Systems Engineering Undergraduate, Masters, MBA, MD, PhD All seven of Stanford’s schools, 42 different majors Feminist, Gender, & Sexuality Studies Geological Sciences History Human Biology Individually Designed Major International Relations Law Linguistics Management Management Science & Engineering Materials Science & Engineering Math & Computational Science Mechanical Engineering Medicine Philosophy Political Science Public Policy Science, Technology, & Society Sociology Theater and Performance Studies Undeclared

Who’s Taking It

Who’s Taking It

Ordering of Course Topics

§ Data Analysis & Visualization Using Spreadsheets

§ Advanced Data Visualization Using Tableau

§ Relational Databases and SQL

§ Python for Data Analysis & Visualization

§ Machine Learning – Regression, Classification, Clustering

§ Using Python for Machine Learning

§ The R Language

§ Data Mining Algorithms

§ Data Mining Using Python (and SQL)

§ Network Analysis

§ Unstructured Data

§ Correlation and Causation

Assigned Work

Assignment/Project Assigned Due Assignment # Spreadsheets for Data Analysis and Visualization April 13 April 20 Project # Personal Data Analysis April 13 April 27 May 18 Assignment # Data Visualization Using Tableau, SQL April 20 April 30 Assignment # Python for Data Analysis and Visualization April 30 May 9 Assignment # Machine Learning, R Language May 18 May 25 Project # Movie-Rating Predictions May 18 June 1 Assignment # Data Mining, Network Analysis May 28 June 5

Honor Code

Under the Honor Code at Stanford, you are expected to submit your own original work for assignments, projects, and exams. On many occasions when working on assignments or projects (but never exams!) it is useful to ask others – the instructor, the TAs, or other students – for hints, or to talk generally about aspects of the assignment. Such activity is both acceptable and encouraged, but you must indicate on all submitted work any assistance that you received. Any assistance received that is not given proper citation will be considered a violation of the Honor Code. In any event, you are responsible for understanding, writing up, and being able to explain all work that you submit. The course staff will pursue aggressively all suspected cases of Honor Code violations, and they will be handled through official University channels.

Logistics

§ Units - 4 for undergraduates, 3-4 for graduates

§ WAYS requirement - Applied Quantitative

Reasoning (WAY-AQR)

§ Textbook? No Readings? Recommended

§ Class “attendance” – Expected

Ø Hand-on activities Ø Only cursory notes Ø All class material game for exams