C797 Data Science & Analytics UNIT 1 Study Guide, Exams of Nursing

C797 Data Science & Analytics UNIT 1 Study Guide

Typology: Exams

2022/2023

Available from 05/24/2023

DrShirleyAurora
DrShirleyAurora 🇺🇸

4.4

(9)

6.2K documents

1 / 8

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
C797 Data Science & Analytics UNIT 1
Study Guide
Descriptive statistics - numerical or graphical summaries of data, and may include
charts, graphs, and simple summary statistics such as means and standard deviations
to describe characteristics of a population sample
Inferential statistics - statistical techniques (e.g., chi-square test, the t test, the one-
way ANOVA) that allow conclusions to be drawn about the relationships found among
different variables in a population sample; how one variable is related to other variables
explanatory studies - Studies that have the primary purpose of elucidating the
relationships among variables; often collected through observational studies
chi-square test - statistical test of probability
Randomized control trials - experimental designs in which study participants are
randomly assigned to an intervention group
experimental studies are considered - the gold standard for causal inference
A study plan - a written presentation of how the researcher is going to obtain and
analyze the numerical data needed to answer the research questions.
Independent variables - those that are manipulated or that may affect the outcome
Dependent variables - variables that are expected to change in response to the
characteristics, exposures or interventions being studied
Research Designs most seen in Healthcare research - observational studies, quasi-
experimental studies, and experimental studies
Observational studies - a phenomenon is simply observed and no intervention is
instituted; appropriate when the purpose of the study is descriptive, when the
hypotheses are exploratory, or when it is not possible to manipulate the exposure being
studied
three main types of observational studies - cross-sectional studies, case-control
studies, and longitudinal studies (also referred to as "cohort studies"
Cross-sectional studies - collection of data on the study participants' current outcome
status and exposure status at one point in time
pf3
pf4
pf5
pf8

Partial preview of the text

Download C797 Data Science & Analytics UNIT 1 Study Guide and more Exams Nursing in PDF only on Docsity!

C797 Data Science & Analytics UNIT 1

Study Guide

Descriptive statistics - ✔numerical or graphical summaries of data, and may include charts, graphs, and simple summary statistics such as means and standard deviations to describe characteristics of a population sample Inferential statistics - ✔statistical techniques (e.g., chi-square test, the t test, the one- way ANOVA) that allow conclusions to be drawn about the relationships found among different variables in a population sample; how one variable is related to other variables explanatory studies - ✔Studies that have the primary purpose of elucidating the relationships among variables; often collected through observational studies chi-square test - ✔statistical test of probability Randomized control trials - ✔experimental designs in which study participants are randomly assigned to an intervention group experimental studies are considered - ✔the gold standard for causal inference A study plan - ✔a written presentation of how the researcher is going to obtain and analyze the numerical data needed to answer the research questions. Independent variables - ✔those that are manipulated or that may affect the outcome Dependent variables - ✔variables that are expected to change in response to the characteristics, exposures or interventions being studied Research Designs most seen in Healthcare research - ✔observational studies, quasi- experimental studies, and experimental studies Observational studies - ✔a phenomenon is simply observed and no intervention is instituted; appropriate when the purpose of the study is descriptive, when the hypotheses are exploratory, or when it is not possible to manipulate the exposure being studied three main types of observational studies - ✔cross-sectional studies, case-control studies, and longitudinal studies (also referred to as "cohort studies" Cross-sectional studies - ✔collection of data on the study participants' current outcome status and exposure status at one point in time

Case-control studies - ✔collection of data about the study participants' current outcome status (e.g., whether or not they have the health outcome being studied) and past exposure status. Longitudinal studies - ✔collect data at more than one point in time, following study participants forward in time to identify future outcomes; provide stronger evidence for causation than cross-sectional or case control studies Quasi-experimental & Experimental designs - ✔researcher is an active agent in the work; follow participants forward in time; measurements taken at least two separate points in time; some type of intervention to study; treatment group and a control group; differ in the amount of control the experimenter has over external sources of bias and random error Sampling - ✔the process of selecting a portion of the population to represent the entire population Random sampling - ✔the selection of a group of subjects from a population so that each individual is chosen entirely by chance; "self-weighted sampling" nonrandom sampling - ✔convenience or subjective judgment is used to decide who is chosen for the sample four stages of statistical analysis - ✔data entry cleaning the data describe the sample inferential testing Assumptions - ✔statements that are taken to be true even though the direct evidence of the truth is either absent or not well documented Limitations - ✔weaknesses or handicaps that potentially limit the validity of the results Delimitations - ✔boundaries in which the study was deliberately confined Dissemination Plan - ✔Getting the information learned from the work out in the world for use; Peer reviewed journals, conferences etc. Data governance - ✔has an operational focus on policies, processes, and practices that address accuracy, validity, completeness, timeliness, and integrity of the data; data at the business unit, functional area, or departmental level

Deidentification - ✔patient identifiers are removed from health information, mitigates privacy risks Reliability of a scale - ✔indicates how free it is from random error test-retest reliability - ✔One of the ways to measure reliability of a test; giving the test to the same people on two different occasions to see if you get the same results. internal consistency - ✔a measure of how reliable a test is; it verifies that the items making up the scale are measuring the same attribute. validity of a scale - ✔the degree to which it measures what it is supposed to measure.(content validity, construct validity, criterion validity) closed or open-ended questionnaire - ✔closed: respondents are given a defined number of answers to the questions, making it easy to code. Open-ended: respondents are free to respond in their own way, not restricted by defined list of answers. answers can be categorized for data collection descriptive statistics - ✔describe the findings within a population being studied and results are only to pertain to that population, no attempt to generalize inferential statistics - ✔drawing conclusions about a population based on sample data from that population central tendency - ✔the sample mean (average), median (midpoint), mode (most frequently occuring numbers) measures of variability - ✔range (difference between the largest and smallest variables), variance (how far the numbers are spread out), standard deviation (how much variation exists from the average/mean) skewness - ✔how symmetrical the distribution of variables is) kurtosis - ✔peakedness or flatness of a distribution of the sample data shape of a sample - ✔includes modality, outliers of the sample data problems with using only "the mean" as your marker for central tendency - ✔if there are outliers that are an extreme, they will skew the mean to make it seem higher or lower than the average really is standard deviation - ✔average distance of scores from the mean

probability - ✔assigned a "p-value" in inferential statistics. has to do with your confidence level in your results. when you're making inferences about descriptive data, how sure are you that they are true to the population? "a priori" probability - ✔aka: theoretical or classical probability. used when making statistical inferences about the data. the probability can be inferred without actually getting the data first "a posteriori" probability - ✔aka: empirical or relative frequency probability, estimated from the data - after the data is collected sample space - ✔the set of all possible outcomes of a study probability distribution - ✔the set of probabilities associated with each event in the sample space p(A) - ✔The marginal probability that event A will occur p ( ¯ ¯ ¯A ) - ✔The probability that event A will not occur p(A|B) - ✔The conditional probability that event A will occur if event B occurs p(A⋂B) - ✔The joint probability that both events A and B will occur; also called the intersection of A and B p(A⋃B) - ✔The probability that event A will happen and/or event B will happen; also called the union of A and B Addition rule - ✔p(A⋃B) = p(A) + p(B) − p(A⋂B); used to compute the probability that either one of two events will occur, meaning one, the other, or both will occur. Multiplication rule - ✔p(A⋂B) = p(A) × p(A|B); allows certain types of probabilities to be computed from other probabilities Independence of events A and B - ✔If p(A) = p(A|B), then A and B are independent Probability theory - ✔1. The probability that each event will occur must be greater than or equal to 0 and less than or equal to 1.

  1. The sum of the probabilities of all the mutually exclusive outcomes of the sample space is equal to 1. Mutually exclusive outcomes are those that cannot occur at the same time (e.g., on any given flip, a coin can be either heads or tails but not both).
  2. The probability that either of two mutually exclusive events, A or B, will occur is the sum of the probabilities of their individual probabilities.

chi-square test - ✔A statistical method of testing for an association between two categorical variables. Specifically, it tests for the equality of two frequencies or proportions. one sample t-test - ✔allows us to test whether a sample mean signifcantly differs from a hypothesized value parameter - ✔an unknown value for an entire population. therefore it has to be estimated using a smaller sample representative of the entire population greek letters represent - ✔the population data roman numerals represent - ✔the sample data statistics - ✔the collection, organization, & interpretation of data; used to describe/summarize data, identify associations & relationships, and make predictions of generalizations data - ✔factual information expressed by numbers (numeric vatiables) or narrative form (string variables); reported as a mean, mode, and frequency distribution qualitative research utilizes what kind of data - ✔narrative (string variables) quantitative research utilizes what kind of data - ✔numeric form (numeric variables) variables - ✔a characteristic being measured that varies among individuals, events, or objects; ann attribute that describes a person/place/thing univariate vs. bivariate data - ✔when a study looks at one variable vs. more than one variable independent variable - ✔the variable the researcher manipulates; the "effect" dependent variable - ✔the outcome the researcher measures discrete (non-continuous) quantitative variables - ✔a variable that cannot take on any other value then what it is (ie: gender/race/blood type) continuous quantitative variable - ✔a variable that can take on any value between its minimum and its maximum value (ie: height, blood glucose levels, blood pressures) nominal data - ✔lowest level of data measurement and is "name only" for categories of data such as gender or ethnicity

Ordinal data - ✔can be placed in meaningful order but has no true zero and does not have meaningful intervals such as pain scale of health status (poor, fair, good) Interval data - ✔placed in meaningful order but have no true zero such as temperature in Fahrenheit or Celsius Ratio data - ✔placed in a meaningful order, have meaningful intervals, and have true zero such as age, height, or weight inferential statistics - ✔drawing conclusions about a population based on sample data from that population inferential statistics - ✔drawing conclusion