C797 data science and analytics Study Guide (1).docx, Exams of Nursing

C797 data science and analytics Study Guide (1).docx

Typology: Exams

2024/2025

Available from 05/23/2025

augustine-kinyua
augustine-kinyua 🇺🇸

498 documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Column charts - Use to compare data across categories( requires categorical or
ordinal data, displays variables vertically)
Line graphs - Use to display continuous data over time
Histogram - Use to visually display normality ( normal distribution of
continuous/quantitative /frequency of data around the mean) for a set of data
points
Pie charts - Use when you have one row or column of data and you want to know
how much one data point is in relation to the whole. Especially useful when
displaying portions or percentages
bar chart - Similar to column charts but better to use when your labels are long (
categorical or ordinal data, there are spaces between bars and it is displayed
horizontally)
x-y scatter plot - Shows the relationship among numeric values in several data
series, or plot two groups of numbers as one series of XY coordinates ( requires
continuous data)
donut chart - Show the relationship of parts to a whole like a pie chart but can
contain more than one data series( can be continuous or categorical sections of a
whole)
Bubble charts - Are continuous data that are arranged in columns on a
worksheet so that the X values are listed in the first column and the corresponding
Y values and bubble size values s are listed in adjacent columns.
Geospatial maps - Visually depict the prevalence and occurrence of a condition or
disease geographically using polygons. The are 2 types of data used in mapping-
vector data and raster data
Raster Data - A grid-based format for storing location-based data in a
geographic information system in which each equally-sized cell or pixel contains a
value that represents geographic data such as land.
Vector Data - A format for storing location-based data in a geographic
C797 data science and analytics Study
Guide
pf3
pf4
pf5
pf8

Partial preview of the text

Download C797 data science and analytics Study Guide (1).docx and more Exams Nursing in PDF only on Docsity!

Column charts - ✔Use to compare data across categories( requires categorical or ordinal data, displays variables vertically) Line graphs - ✔Use to display continuous data over time Histogram - ✔Use to visually display normality ( normal distribution of continuous/quantitative /frequency of data around the mean) for a set of data points Pie charts - ✔Use when you have one row or column of data and you want to know how much one data point is in relation to the whole. Especially useful when displaying portions or percentages bar chart - ✔Similar to column charts but better to use when your labels are long ( categorical or ordinal data, there are spaces between bars and it is displayed horizontally) x-y scatter plot - ✔Shows the relationship among numeric values in several data series, or plot two groups of numbers as one series of XY coordinates ( requires continuous data) donut chart - ✔Show the relationship of parts to a whole like a pie chart but can contain more than one data series( can be continuous or categorical sections of a whole) Bubble charts - ✔Are continuous data that are arranged in columns on a worksheet so that the X values are listed in the first column and the corresponding Y values and bubble size values s are listed in adjacent columns. Geospatial maps - ✔Visually depict the prevalence and occurrence of a condition or disease geographically using polygons. The are 2 types of data used in mapping- vector data and raster data Raster Data - ✔A grid-based format for storing location-based data in a geographic information system in which each equally-sized cell or pixel contains a value that represents geographic data such as land. Vector Data - ✔A format for storing location-based data in a geographic

C797 data science and analytics Study

Guide

information system that uses latitude and longitude coordinates to represent geographic features with points, lines, and other complex shapes.

(? is larger than deviation) standard deviation - ✔the square root of the variance=?

positively skewed distribution (right-skewed) - ✔A distribution where the scores pile up on the left side and taper off to the right. (Mode<median<mean) negatively skewed distribution (left-skewed) - ✔A distribution in which most scores pile up at the right end of the scale. (Mean <median<mode) Leptokurtic distribution - ✔a frequency distribution that has a tendency toward peakedness (leaping curve) mesokurtic distribution - ✔normal distribution curve Platykurtic distribution - ✔Flatter and more spread out than a normal curve. (Memory: 'Plat' sounds like 'flat') sample statistic - ✔A measurable characteristic of a sample. ( same mean "Xbar ") population parameter - ✔A characteristic or measure of a population. ( often can't be calculated directly due to population size) ( population mean-mew) confidence interval - ✔statistical range, with a given probability, that takes random error into account Lower Quartile (Q1) - ✔The median of the lower half of a set of data. Upper Quartile (Q3) - ✔the median of the upper half of the data Confidence interval formula - ✔mean +- z*SE

  • for a 95% CI, z will be 1.96, or simplify it to 2 for easier calculation (z could also be t but you don't need to know that. It will pretty much always just be 2 for this class) p-value < 0.05 - ✔statistically significant to reject the null hypothesis ( if >0.05 you fail to reject the null hypothesis) Pearson's correlation - ✔Lower case r multiple correlation coefficient - ✔Upper case R- measures strength and direction of linear relationship between 3 or more Variables Alpha Value (Level of Significance) - ✔probability of rejecting a true null hypothesis (type 1 error)

sampling distribution - ✔a distribution of statistics obtained by selecting all the possible samples of a specific size from a population ANOVA (analysis of variance) - ✔an inferential statistical test for comparing the means of three or more groups

t-test - ✔a statistical test used to evaluate the size and significance of the difference between two means correlation test - ✔Non-parametric test that compares the strength of a relationship with with two variables. The closer to 1 the stronger the correlation. The closer to 0 the weaker the correlation. Since 0 has a flat slope, when the line is horizontal there is no correlation F statistic - ✔a ratio of two measures of variance to compare critical value to determine significance of results and draw conclusions about the hypothesis. Used with ANOVA and levene's test Non-parametric tests - ✔Not normal distribution, examples : chi- squared, fisher exact probability, Mann-Whitney, wilcoxon and Kirkland-wallos parametric tests - ✔Sample is representative of population and is normally distributed. Uses interval or ratio data only. Have more statistical power. Examples t-test and ANOVA Linear regression - ✔Y=a+b(x)+ e finding the best-fitting line by finding the slope multiple regression - ✔Y= a+b1(X11) + b2(x2)+ e regression model that estimates the relationship between the dependent variable and two or more independent variables