Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

C797 data science and analytics Study Guide, Exams of Data Structures and Algorithms

Massachusetts Institute of Technology (MIT)Data Structures and Algorithms

C797 data science and analytics Study Guide

Typology: Exams

2022/2023

Available from 05/24/2023

DrShirleyAurora 🇺🇸

4.4

(9)

6.2K documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

C797 data science and analytics Study

Guide

Column charts - ✔Use to compare data across categories( requires categorical or

ordinal data, displays variables vertically)

Line graphs - ✔Use to display continuous data over time

Histogram - ✔Use to visually display normality ( normal distribution of

continuous/quantitative /frequency of data around the mean) for a set of data points

Pie charts - ✔Use when you have one row or column of data and you want to know how

much one data point is in relation to the whole. Especially useful when displaying

portions or percentages

bar chart - ✔Similar to column charts but better to use when your labels are long (

categorical or ordinal data, there are spaces between bars and it is displayed

horizontally)

x-y scatter plot - ✔Shows the relationship among numeric values in several data series,

or plot two groups of numbers as one series of XY coordinates ( requires continuous

data)

donut chart - ✔Show the relationship of parts to a whole like a pie chart but can contain

more than one data series( can be continuous or categorical sections of a whole)

Bubble charts - ✔Are continuous data that are arranged in columns on a worksheet so

that the X values are listed in the first column and the corresponding Y values and

bubble size values s are listed in adjacent columns.

Geospatial maps - ✔Visually depict the prevalence and occurrence of a condition or

disease geographically using polygons. The are 2 types of data used in mapping- vector

data and raster data

Raster Data - ✔A grid-based format for storing location-based data in a geographic

information system in which each equally-sized cell or pixel contains a value that

represents geographic data such as land.

Vector Data - ✔A format for storing location-based data in a geographic information

system that uses latitude and longitude coordinates to represent geographic features

with points, lines, and other complex shapes.

Discover Exams of Data Structures and Algorithms Massachusetts Institute of Technology (MIT)

Partial preview of the text

Download C797 data science and analytics Study Guide and more Exams Data Structures and Algorithms in PDF only on Docsity!

C797 data science and analytics Study

Guide

Column charts - ✔Use to compare data across categories( requires categorical or ordinal data, displays variables vertically) Line graphs - ✔Use to display continuous data over time Histogram - ✔Use to visually display normality ( normal distribution of continuous/quantitative /frequency of data around the mean) for a set of data points Pie charts - ✔Use when you have one row or column of data and you want to know how much one data point is in relation to the whole. Especially useful when displaying portions or percentages bar chart - ✔Similar to column charts but better to use when your labels are long ( categorical or ordinal data, there are spaces between bars and it is displayed horizontally) x-y scatter plot - ✔Shows the relationship among numeric values in several data series, or plot two groups of numbers as one series of XY coordinates ( requires continuous data) donut chart - ✔Show the relationship of parts to a whole like a pie chart but can contain more than one data series( can be continuous or categorical sections of a whole) Bubble charts - ✔Are continuous data that are arranged in columns on a worksheet so that the X values are listed in the first column and the corresponding Y values and bubble size values s are listed in adjacent columns. Geospatial maps - ✔Visually depict the prevalence and occurrence of a condition or disease geographically using polygons. The are 2 types of data used in mapping- vector data and raster data Raster Data - ✔A grid-based format for storing location-based data in a geographic information system in which each equally-sized cell or pixel contains a value that represents geographic data such as land. Vector Data - ✔A format for storing location-based data in a geographic information system that uses latitude and longitude coordinates to represent geographic features with points, lines, and other complex shapes.

Nominal Data - ✔Categorical : gender, type of pet, hair color, eye color ordinal data - ✔an arbitrary numerical scale where the exact numerical value has no significance other than to rank a set of data points. Deals with the order or position of items such as words, letters, symbols or numbers arranged in a hierarchical order : 1st place 2nd place, big to small Interval Data - ✔Data comprised of consistent units or intervals but doesn't always have a true zero. Higher numbers mean more of something while lower numbers mean less of something : temperature, height , weight, time Ratio Data - ✔data that is similar to interval data, except that they have a meaningful zero point and the ratio of two data points is meaningful. : age- you can't be younger than age 0 Measures of Center (Central Tendency) - ✔mean, median, mode interquartile range - ✔The difference between the upper and lower quartiles. ( find the median of the upper and lower numbers on either side of the median to get your first and third quartiles) Midrange - ✔the sum of the lowest and highest data values, divided by 2 range - ✔the difference between the highest and lowest scores in a distribution standard deviation - ✔a quantity calculated to indicate the extent of deviation for a group as a whole. proportion - ✔The part of or something that is expressed as a ratio or percentage Frequency - ✔The count of something, not the sum (10+10+15=35 but the count is 3) Slope equation - ✔m=y2-y1/x2-x slope-intercept form - ✔y=mx+b, where m is the slope and b is the y-intercept of the line. ( when the line intercepts Y axis, x=O so Y=b) box and whisker plot - ✔ sample variance - ✔Standard deviation squared=? (? is larger than deviation) standard deviation - ✔the square root of the variance=?

null hypothesis - ✔the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error. failing to reject the null hypothesis - ✔indicates that there is not a statistically significant difference between the means of the groups in the study and that the means are equal reject the null hypothesis - ✔when you have enough statistical strength to show a difference or an association ( p value is more than the alpha) alternative hypothesis - ✔The hypothesis that states there is a difference between two or more sets of data. Empirical Rule (68- 95 - 99.7) Rule - ✔Only works with a normal distribution-68% of data lands between 1 standard deviation on either side of the mean, 95% of data will be between 2 SD on either side of the mean. 99.7% of data will be between 3 SD on either side of the mean. Independent Variable (IV) - ✔the variable that a researcher actively manipulates, and if the hypothesis is correct, will cause a change in the dependent variable Dependent Variable (DV) - ✔The measured outcome of a study; the responses of the subjects in a study. Descrete data (variables) - ✔Non-continuous, categorical variable- no relationship between each variable - nominal, chi-square. Weak, limited to number, percent and mode when using categorical data. Chi-square test - ✔A statistical method of testing for an association between two categorical variables. Specifically, it tests for the equality of two frequencies or proportions. continuous data - ✔Data that can take any value (within a range) has mean, SD and range Likert Scale - ✔a numerical scale used to assess attitudes; includes a set of possible answers with labeled anchors on each extreme ( strongly disagree, disagree....strongly agree)- best shown with bad graphs sampling distribution - ✔a distribution of statistics obtained by selecting all the possible samples of a specific size from a population ANOVA (analysis of variance) - ✔an inferential statistical test for comparing the means of three or more groups

t-test - ✔a statistical test used to evaluate the size and significance of the difference between two means correlation test - ✔Non-parametric test that compares the strength of a relationship with with two variables. The closer to 1 the stronger the correlation. The closer to 0 the weaker the correlation. Since 0 has a flat slope, when the line is horizontal there is no correlation F statistic - ✔a ratio of two measures of variance to compare critical value to determine significance of results and draw conclusions about the hypothesis. Used with ANOVA and levene's test Non-parametric tests - ✔Not normal distribution, examples : chi- squared, fisher exact probability, Mann-Whitney, wilcoxon and Kirkland-wallos parametric tests - ✔Sample is representative of population and is normally distributed. Uses interval or ratio data only. Have more statistical power. Examples t-test and ANOVA Linear regression - ✔Y=a+b(x)+ e finding the best-fitting line by finding the slope multiple regression - ✔Y= a+b1(X11) + b2(x2)+ e regression model that estimates the relationship between the dependent variable and two or more independent variables

C797 data science and analytics Study Guide, Exams of Data Structures and Algorithms

Related documents

Partial preview of the text

Download C797 data science and analytics Study Guide and more Exams Data Structures and Algorithms in PDF only on Docsity!

C797 data science and analytics Study

Guide