C797 2023 Data Science and Analytics Study Guide., Exams of Nursing

C797 2023 Data Science and Analytics Study Guide.

Typology: Exams

2022/2023

Available from 08/31/2023

Terrie001
Terrie001 🇺🇸

4

(22)

443 documents

1 / 5

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Column charts - ✔Use to compare data across categories( requires
categorical or ordinal data, displays variables vertically)
Line graphs - ✔Use to display continuous data over time
Histogram - ✔Use to visually display normality ( normal distribution of
continuous/quantitative /frequency of data around the mean) for a set of
data points
Pie charts - ✔Use when you have one row or column of data and you want to
know how much one data point is in relation to the whole. Especially useful
when displaying portions or percentages
bar chart - ✔Similar to column charts but better to use when your labels are
long ( categorical or ordinal data, there are spaces between bars and it is
displayed horizontally)
x-y scatter plot - ✔Shows the relationship among numeric values in several
data series, or plot two groups of numbers as one series of XY coordinates
( requires continuous data)
donut chart - ✔Show the relationship of parts to a whole like a pie chart but
can contain more than one data series( can be continuous or categorical
sections of a whole)
Bubble charts - ✔Are continuous data that are arranged in columns on a
worksheet so that the X values are listed in the first column and the
corresponding Y values and bubble size values s are listed in adjacent
columns.
Geospatial maps - ✔Visually depict the prevalence and occurrence of a
condition or disease geographically using polygons. The are 2 types of data
used in mapping- vector data and raster data
Raster Data - ✔A grid-based format for storing location-based data in a
geographic information system in which each equally-sized cell or pixel
contains a value that represents geographic data such as land.
Vector Data - ✔A format for storing location-based data in a geographic
information system that uses latitude and longitude coordinates to represent
geographic features with points, lines, and other complex shapes.
C797 2023 Data Science and Analytics
Study Guide.
pf3
pf4
pf5

Partial preview of the text

Download C797 2023 Data Science and Analytics Study Guide. and more Exams Nursing in PDF only on Docsity!

Column charts - ✔Use to compare data across categories( requires categorical or ordinal data, displays variables vertically) Line graphs - ✔Use to display continuous data over time Histogram - ✔Use to visually display normality ( normal distribution of continuous/quantitative /frequency of data around the mean) for a set of data points Pie charts - ✔Use when you have one row or column of data and you want to know how much one data point is in relation to the whole. Especially useful when displaying portions or percentages bar chart - ✔Similar to column charts but better to use when your labels are long ( categorical or ordinal data, there are spaces between bars and it is displayed horizontally) x-y scatter plot - ✔Shows the relationship among numeric values in several data series, or plot two groups of numbers as one series of XY coordinates ( requires continuous data) donut chart - ✔Show the relationship of parts to a whole like a pie chart but can contain more than one data series( can be continuous or categorical sections of a whole) Bubble charts - ✔Are continuous data that are arranged in columns on a worksheet so that the X values are listed in the first column and the corresponding Y values and bubble size values s are listed in adjacent columns. Geospatial maps - ✔Visually depict the prevalence and occurrence of a condition or disease geographically using polygons. The are 2 types of data used in mapping- vector data and raster data Raster Data - ✔A grid-based format for storing location-based data in a geographic information system in which each equally-sized cell or pixel contains a value that represents geographic data such as land. Vector Data - ✔A format for storing location-based data in a geographic information system that uses latitude and longitude coordinates to represent geographic features with points, lines, and other complex shapes.

C797 2023 Data Science and Analytics

Study Guide.

Nominal Data - ✔Categorical : gender, type of pet, hair color, eye color ordinal data - ✔an arbitrary numerical scale where the exact numerical value has no significance other than to rank a set of data points. Deals with the order or position of items such as words, letters, symbols or numbers arranged in a hierarchical order : 1st place 2nd place, big to small Interval Data - ✔Data comprised of consistent units or intervals but doesn't always have a true zero. Higher numbers mean more of something while lower numbers mean less of something : temperature, height , weight, time Ratio Data - ✔data that is similar to interval data, except that they have a meaningful zero point and the ratio of two data points is meaningful. : age- you can't be younger than age 0 Measures of Center (Central Tendency) - ✔mean, median, mode interquartile range - ✔The difference between the upper and lower quartiles. ( find the median of the upper and lower numbers on either side of the median to get your first and third quartiles) Midrange - ✔the sum of the lowest and highest data values, divided by 2 range - ✔the difference between the highest and lowest scores in a distribution standard deviation - ✔a quantity calculated to indicate the extent of deviation for a group as a whole. proportion - ✔The part of or something that is expressed as a ratio or percentage Frequency - ✔The count of something, not the sum (10+10+15=35 but the count is 3) Slope equation - ✔m=y2-y1/x2-x slope-intercept form - ✔y=mx+b, where m is the slope and b is the y- intercept of the line. ( when the line intercepts Y axis, x=O so Y=b) box and whisker plot - ✔ sample variance - ✔Standard deviation squared=? (? is larger than deviation) standard deviation - ✔the square root of the variance=?

null hypothesis - ✔the hypothesis that there is no significant difference between specified populations, any observed difference being due to sampling or experimental error. failing to reject the null hypothesis - ✔indicates that there is not a statistically significant difference between the means of the groups in the study and that the means are equal reject the null hypothesis - ✔when you have enough statistical strength to show a difference or an association ( p value is more than the alpha) alternative hypothesis - ✔The hypothesis that states there is a difference between two or more sets of data. Empirical Rule (68-95-99.7) Rule - ✔Only works with a normal distribution- 68% of data lands between 1 standard deviation on either side of the mean, 95% of data will be between 2 SD on either side of the mean. 99.7% of data will be between 3 SD on either side of the mean. Independent Variable (IV) - ✔the variable that a researcher actively manipulates, and if the hypothesis is correct, will cause a change in the dependent variable Dependent Variable (DV) - ✔The measured outcome of a study; the responses of the subjects in a study. Descrete data (variables) - ✔Non-continuous, categorical variable- no relationship between each variable - nominal, chi-square. Weak, limited to number, percent and mode when using categorical data. Chi-square test - ✔A statistical method of testing for an association between two categorical variables. Specifically, it tests for the equality of two frequencies or proportions. continuous data - ✔Data that can take any value (within a range) has mean, SD and range Likert Scale - ✔a numerical scale used to assess attitudes; includes a set of possible answers with labeled anchors on each extreme ( strongly disagree, disagree .............................................................................................................. strongly agree)- best shown with bad graphs sampling distribution - ✔a distribution of statistics obtained by selecting all the possible samples of a specific size from a population ANOVA (analysis of variance) - ✔an inferential statistical test for comparing the means of three or more groups

t-test - ✔a statistical test used to evaluate the size and significance of the difference between two means correlation test - ✔Non-parametric test that compares the strength of a relationship with with two variables. The closer to 1 the stronger the correlation. The closer to 0 the weaker the correlation. Since 0 has a flat slope, when the line is horizontal there is no correlation F statistic - ✔a ratio of two measures of variance to compare critical value to determine significance of results and draw conclusions about the hypothesis. Used with ANOVA and levene's test Non-parametric tests - ✔Not normal distribution, examples : chi- squared, fisher exact probability, Mann-Whitney, wilcoxon and Kirkland-wallos parametric tests - ✔Sample is representative of population and is normally distributed. Uses interval or ratio data only. Have more statistical power. Examples t-test and ANOVA Linear regression - ✔Y=a+b(x)+ e finding the best-fitting line by finding the slope multiple regression - ✔Y= a+b1(X11) + b2(x2)+ e regression model that estimates the relationship between the dependent variable and two or more independent variables