Project information (CAMP), Assignments of Technology

This document contains information regarding a project for a camp

Typology: Assignments

2024/2025

Uploaded on 03/07/2026

ira-chakraborty
ira-chakraborty 🇺🇸

2 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Capstone Project
Throughout the summer, you have learned data science and its applications on various datasets. Now, use
this knowledge to analyze a real world dataset of your choice.
Dataset
The idea of this project is that you will use a dataset you are passionate about. It can be anything it can
be a dataset used from another class (eg: think if you had any data you get in Excel), it can be a dataset
you found online, or it can be a dataset you gather yourself. Some ideas include but are not limited to:
A dataset about a hobby you are interested in (eg: vacation destinations, best beaches, fashion trends,
Instagram, music, etc.)
A dataset about something you enjoy doing or watching (eg: swimming, volleyball, Rocket League,
Illini Football, etc.)
A dataset about your a topic related to your major (economics, communications, political science, etc.)
Any dataset that means something to you.
The dataset: - Must be non-trivial. Meaning it has at least 200 data points. This means that the rows
multiplied by the columns is greater than or equal to 200.
Must NOT be a dataset we used in labs / class.
There is a link on canvas dedicated to helping you find datasets!
Deliverables
1. Project Report
2. Presentation
3. One Slide
Project Report
This is a jupyter notebook (.ipynb) file similar to what we have worked with all summer long. It must have
5 sections. Each section should clearly be labeled with a markdown chunk. There is a guide linked on
canvas for markdown syntax.
The five sections:
1. Dataset:
In Markdown, explain what dataset you chose and why you chose it. Include why is it meaningful
to you and how you went about finding it. Then, in Python, load your dataset into a DataFrame.
The code to do that in colab is given below.
import pandas as pd
from google.colab import drive
drive.mount('/content/drive')
df =pd.read_csv('/content/drive/My Drive/file.csv')
1
pf2

Partial preview of the text

Download Project information (CAMP) and more Assignments Technology in PDF only on Docsity!

Capstone Project

Throughout the summer, you have learned data science and its applications on various datasets. Now, use this knowledge to analyze a real world dataset of your choice.

Dataset

The idea of this project is that you will use a dataset you are passionate about. It can be anything – it can be a dataset used from another class (eg: think if you had any data you get in Excel), it can be a dataset you found online, or it can be a dataset you gather yourself. Some ideas include but are not limited to:

  • A dataset about a hobby you are interested in (eg: vacation destinations, best beaches, fashion trends, Instagram, music, etc.)
  • A dataset about something you enjoy doing or watching (eg: swimming, volleyball, Rocket League, Illini Football, etc.)
  • A dataset about your a topic related to your major (economics, communications, political science, etc.)
  • Any dataset that means something to you.

The dataset: - Must be non-trivial. Meaning it has at least 200 data points. This means that the rows multiplied by the columns is greater than or equal to 200.

  • Must NOT be a dataset we used in labs / class.
  • There is a link on canvas dedicated to helping you find datasets!

Deliverables

  1. Project Report
  2. Presentation
  3. One Slide

Project Report

This is a jupyter notebook (.ipynb) file similar to what we have worked with all summer long. It must have 5 sections. Each section should clearly be labeled with a markdown chunk. There is a guide linked on canvas for markdown syntax.

The five sections:

  1. Dataset:
    • In Markdown, explain what dataset you chose and why you chose it. Include why is it meaningful to you and how you went about finding it. Then, in Python, load your dataset into a DataFrame. The code to do that in colab is given below. import pandas as pd from google.colab import drive

drive.mount('/content/drive')

df = pd.read_csv('/content/drive/My Drive/file.csv')

  1. Exploratory Data Analysis:
    • In Markdown, explain what descriptive statistics can help you give a broad overview of the data (ex: size, shape, interesting descriptive statistics, etc.). In Python, do exploratory data analysis.
  2. Exploratory Data Visualization:
    • In Python, create at least one data visualization. This does not need to be complex, but should showcase something about your EDA or Data Science analysis. In Markdown, provide at least a two sentence summary of this result.
  3. Data Science
    • In Markdown, explain at least one question you have about your dataset. Clearly state the questions you have and how you plan on using Python to answer them. This may involve cleaning or selecting a subset of the data. You can use any technique you learned this summer that is beyond simple descriptive statistics. You can use regression, hypothesis testing, correlation, simulation, or ideas from any of the labs or lecture. In Python, do the data science!
  4. Overall Summary:
    • In Markdown, summarize your dataset, findings, and visualization. A good summary shares a complete overview of your work in only 1-2 paragraphs without going into the code. This might be the summary you would share in a future interview if someone asked you about “what is a data science project you did on your own?”. Make sure to include at least 1-2 paragraphs for your summary (a paragraph is at least 5 sentences).

Each section should be several sentences AND several lines of code (with the exception of section three where the code is one line to create plots.)

You DO NOT need to explain any data science methods in your report. Assume that we know data science but not your specific dataset. Explain the results of your code and why you chose specific methods but not what each line of code does.

Presentation

Your presentation will be short 5-7 minutes. You will be presenting to the entire class. Create a slideshow for the presentation DO NOT just put your report on screen and read that to the class. It should be mainly graphs and pictures generated by your report, it should not be full paragraphs but short bullet points that describe the graphs, and it should contain ZERO code.

The presentation should contain three key things:

  • A summary of the problem you are trying to analyze and the data you used to complete this analysis.
  • Mention the data science methods you used. - “I used simple linear regression and it showed me that a house of X size is estimated to be Y price” is a good example. - “I used model.predict(df[ind]) to give me predictions for the price of houses” is a bad example.
  • Discuss the results and conclusions.

One Slide

This should be one slide containing only your most important information from your report and your presen- tation. This is to be used at the gallery walk, so you want to know every thing on the slide without having to look at it, there are examples posted to canvas.

The one slide should contain your 3-4 best plots, and each plot should provide unique information.