Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Fundamentals of Statistical Analysis, Exams of Research Methodology

An overview of key concepts in statistical analysis, including the differences between categorical, interval, and ratio variables, the concepts of random variation and frequency distributions, measures of dispersion and shape, sampling distributions, hypothesis testing, and common statistical tests. It covers important principles of the scientific method, research process, and ethical considerations in data collection and analysis. Likely useful for students in a variety of university-level courses related to statistics, research methods, and data analysis across disciplines such as psychology, sociology, business, and public health.

Typology: Exams

2023/2024

Available from 10/28/2024

wil-mug
wil-mug 🇰🇪

5

(1)

1.1K documents

1 / 17

Toggle sidebar

Related documents


Partial preview of the text

Download Fundamentals of Statistical Analysis and more Exams Research Methodology in PDF only on Docsity! QUANTITATIVE RESEARCH METHODS; STUDY GUIDE EXAMS WITH ANSWERS / GUARANTEED PASS What is a nominal/categorical variable? - ANSWER It's not inherently numeric e.g. eye colour, sex What is an ordinal variable? - ANSWER It has an inherent order, differences in value not meaningful It's basically a categorical variable for which the possible values are ordered E.g. exam grade, ranking pain on scale of 1-10 What is a continuous variable? - ANSWER Inherently numeric, differences in values meaningful, lots of possible values E.g. weight, height, BP, age What are the 2 different types of continuous variables? - ANSWER Interval and ratio Difference between interval and ratio scale? - ANSWER In interval, the definition of zero is arbitrary e.g. temperature In ratio, zero isn't arbitrary - e.g. with weight, height, and it makes sense to compute the ratio of 2 ratio values What is systematic variation? - ANSWER Variation due to identified (and measured) factors I.e. variation in one variable attributable to variation in other variables 'Between group' variation or 'co-variation' Captured by models What is random variation? - ANSWER Variation due to unidentified (and unmeasured) factors 'Within group' variation or 'error' variation Variation in a variable not attributed to variation in other variables - NOT explained by the model What is a frequency distribution? - ANSWER This is a table listing of the values (or range of values) of the variable together with the observed frequencies for which they occur in a sample E.g. Coin flipping - 12 observations, 1 variable (side) Frequency distribution = heads 5 times, tails 7 so: H: 5 T: 7 What is a relative frequency distribution? - ANSWER Lists relative frequencies for each value, rather than absolute frequencies (basically a proportion/percentage) What is a common method for visualising frequency distributions? - ANSWER Histograms The relative area under the bars is equal to probability What is meant by the 'location' on a frequency distribution/histogram? - ANSWER The average or 'central' value - e.g. mean, median What's the best way to describe location in symmetrical/skewed graphs? - ANSWER Using the mean is better if it's symmetrical. If the graph is quite skewed, it's better to use the median as a measure of location, or the interquartile range (because median and IQR aren't as affected by extreme values) How do you describe dispersion/spread of a graph? - ANSWER Variance, standard deviation, range How are the two ways of describing shape of a graph? - ANSWER Skewness and kurtosis What is skewness? - ANSWER How symmetrical a graph is. If it's 0, it's not skewed What is the standard error of the mean? - ANSWER A standard error is the standard deviation of the sampling distribution of a statistic. Standard error is a statistical term that measures the accuracy with which a sample represents a population. How do you calculate the standard error? - ANSWER Standard deviation over square root sample size How do you calculate the distribution of sample E.g. N (4.7, 0.36) sample size 100 - ANSWER Just do 0.36/100 = 0.0036 So it is N (4.7, 0.0036) How to get standard error of the sample mean from this: N (170, 100) sample size 100 - ANSWER Is distributed N (170, 100/100) so N (170, 1) Square root 1 to get SE = 1 The height of a population can be assumed to follow a normal distribution with mean 173.7cm and standard deviation 3 cm. What range of heights would you expected 95% of the population to lie within? - ANSWER Mean +/- (1.96 x SD) 173.7 - (1.96 x 3) = 167.82 173.7 + (1.96 x 3) (5.88) = 179.58 167.82-179.58 Outline of hypothesis testing? - ANSWER (1) Definition of population and variable(s) of interest (2) Assume a 'statistical model' (a distribution) (3) Provide a 'Null Hypothesis' - denoted ' H0: ' - in terms of parameter values for the population, (e.g. μ = 0) (4) Comparison of observed data with that expected from (2) and (3) (often involving a test statistic) (5) A statement of probability - how likely was the observed data if (2) and (3) are correct? (6) Interpretation - is there evidence against H0, should it be rejected? What is the null hypothesis? - ANSWER This is the hypothesis that assumes no effect e.g. no difference between the 2 groups What is the test statistic? - ANSWER After collecting the data, you substitute values from the sample into a formula → determines value for the test statistic. This reflects the amount of evidence in the data against the null hypothesis What is the p-value? - ANSWER The P-value is the probability of obtaining the results, or something more extreme, if the null hypothesis is true What does the value of the p-value say about the null hypothesis? - ANSWER The smaller the P-value, the greater the evidence against the null hypothesis. Conventionally, it's considered that if the P-value is less than 0.05, there is sufficient evidence to reject the null hypothesis - the results are significant. What does it mean if the p-value is over 0.05? - ANSWER There is insufficient evidence to reject the null hypothesis (doesn't necessarily mean the hypothesis isn't true though - just means there isn't enough evidence) What is a t test? - ANSWER This is a very commonly used test based on the assumption of a normal distribution → Used to make inferences about the mean What are the 2 types of t test? - ANSWER Paired/single sample t test: Before and after test Two sample/simple independent t test: Difference between 2 independent groups What is a type 1 error? - ANSWER Reject the null hypothesis when it is true, and conclude that there is an effect, when actually there is none. What is a type 2 error? - ANSWER Don't reject the null hypothesis when it is false, and conclude that there is no evidence of an effect when one really exists What is alpha and type 1 errors? - ANSWER • The maximum chance of making a type 1 error is denoted by alpha • This is the significance level of the test - we reject the null hypothesis if the P-value is less than the significance level, i.e. if P < alpha • Must decide the value of alpha before collecting data - usually assigned 0.05 What is the power? - ANSWER The power is the probability of rejecting the null hypothesis when it is false - THE PROBABILITY OF FINDING A DIFFERENCE/RESULT IF ONE EXISTS Ideally, you'd want the power to be 100% but there's always a chance of a type 2 error What are assumptions of all statistical tests? - ANSWER • Random (or representative) sample • Independent observations - statistical results are only valid when all subjects are sampled from the same population and each has been selected independently of the others • Accurate data (i.e. no systematic errors in data collection) What's an assumption of t tests? - ANSWER Assume that the difference between a change in one group is normally distributed When it's 2 groups, you assume their results are both normally distributed What do confidence intervals do? - ANSWER Confidence intervals express precision or margin of error and so let you make a general conclusion from limited data. What does a 95% CI mean? - ANSWER It means you're 95% sure that the true value lies within that range How do you calculate a CI? - ANSWER Do the difference in means +/- (2 x SE) What is the effect size? - ANSWER How much the researchers want the drug etc. to change things by What does the variability mean when calculating sample size? - ANSWER When comparing means, the required sample size depends on the expected value of the standard deviation. If there is a lot of variation, you'll need bigger samples. E.g. estimate baseline rate to be 30% When doing a case control study, what do you usually calculate at the end? - ANSWER Odds ratio Say in a case control: Odds of smoking in lung cancer 647/2 = 321 Odds of smoking in no lung cancer 622/27 = 23 Odds ratio? What do you infer from it? - ANSWER 321/23 = 13.96 ~ 14 --> Therefore lung cancer patients have 14 times the odds of smoking as non-lung cancer patients Smokers have 14 times the odds of non-smokers as being lung cancer patients. What is recall bias? - ANSWER If cases recall differently from controls then this is recall bias More of a problem in case control studies What is selection bias in case control studies? - ANSWER Controls are used to estimate exposure rate in population. Selection bias will occur if controls not representative • Can also occur if exposed cases more likely to be selected than unexposed cases Advantages of case control studies? - ANSWER • Quick, no follow-up period • Can use relatively rare diseases Disadvantages of case control studies? - ANSWER • Recall bias • Only study one disease Definition of bias? - ANSWER Bias is a systematic departure from the true value of a measure How do you avoid selection bias in RCTs? - ANSWER Randomisation How do you avoid performance bias in RCTs? - ANSWER Equal treatment of groups apart from intervention, blinding What is stratified randomisation? - ANSWER Strata are constructed based on values of prognostic variables E.g. stratified by sex What is intention to treat analysis? - ANSWER ITT analysis includes every subject who is randomized according to randomized treatment assignment. It ignores noncompliance, protocol deviations, withdrawal, and anything that happens after randomization → so even if someone who was supposed to get drug A gets drug B, they will still be looked at in the analysis - because you INTENDED to treat What is per-protocol analysis? - ANSWER This analysis can only be restricted to the participants who fulfil the protocol So say you had 100 people who were supposed to get the treatment and 20 don't end up getting it, you just wouldn't include those 20 in the analysis What is the advantage of ITT? - ANSWER This avoids bias associated with non-random loss of participants - it reflects what happens in the real world What is the difference between parametric and non-parametric tests? - ANSWER Tests that are based on an assumption about the distribution of values in the population (usually Normal) are called parametric tests Nonparametric tests do not assume sampling from a Normal distribution What does a paired test mean? - ANSWER Usually means that a variable is measured in each subject before and after an intervention Also can be: • Twins or siblings recruited as pairs - each gets a different treatment • A part of the body on one side is treated with control, and the other side with the experimental treatment (e.g. eyes) What does an unpaired t test do? - ANSWER • This is to compare two means - the difference between two means, from 2 independent groups • E.g. comparing pulse rate in groups of people taking 2 different drugs What is the Mann-Whitney test used for? - ANSWER • This is to compare the average ranks or medians of two unpaired groups - find difference between the 2 medians • E.g. comparing the self reported pain score (1-10) on patients taking 2 different drugs • Doesn't have to be Normal distribution (so can be used if assumption is violated) What is a Wilcoxon matched paired test used for? - ANSWER • Compares the average ranks or medians of 2 paired groups • E.g. comparing the amount of skin inflammation (assessed on scale of 1-10) between the right arm treated with one cream and the left treated with another What is the Chi-squared test used for? - ANSWER Basically, the chi squared tests to see if the distributions of categorical variables differ from each other. A very small chi squared test statistic means that there is a relationship So it's good for analysis of cohort/case-control results What is an assumption of the chi-squared test? - ANSWER That the 'expected' count in each cell is greater than 5 in all cells of the table Important Features of the Scientific Method - ANSWER Direct observation Clearly defined variables Clearly defined methods Empirically testable Elimination of alternatives Statistical justification Self-correcting process The Steps in the Research Process - ANSWER 1. Asking the question: identifying the need for a question to be answered or a problem to be solved. 2. Identifying the important factors: in general pick factors that haven't been identified before, add to the body of knowledge. 3. Formulating a hypothesis: an "if...then" statement. A good hypothesis poses a question in a testable form. Components of Informed Consent - ANSWER Identify researcher Describe survey topic Describe target sample Identify sponsor Describe purpose of research Promise anonymity and confidentiality Give "good faith" estimate of required time commitment State participation is voluntary State item non-response is acceptable Ask for permission Participant Confidentiality - ANSWER Provide non-disclosure of data subsets and subjects Minimize instruments requiring ID Obtain signed non-disclosure forms Restrict access to ID Reveal ID/specific data only with written consent Validity - ANSWER Refers to the degree to which a study accurately reflects or assesses the specific concept that the researcher is attempting to measure. Reliability - ANSWER The consistency of your measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects. General Source - ANSWER Provides an overview of a topic and provides leads to where more information can be found. Secondary Source - ANSWER Provides a level of information "once removed" from the original work. Primary Source - ANSWER The original reports of the original work or experience. Face Validity - ANSWER Does the measure, on the face of it, seem to measure what is intended Construct Validity - ANSWER If a measure has construct validity it measures what it purports to measure External Validity - ANSWER Refers to our ability to generalize the results of our study to other settings. Personal Interviewing - ANSWER Helps in enlisting cooperation Probing for more clarification Helps with complex instructions Multi-method data collection Rapport Longer interview can be done in person More costly, trained interviewers Total time period longer Some samples may be difficult to reach Telephone interview - ANSWER Lower costs Better/easier access to certain populations Quicker Telephone survey - ANSWER Sampling limitations Nonresponses Limits on response alternatives Less appropriate for personal questions Question Content - ANSWER Should this question be asked? Is the question of proper scope and coverage? Can the participant adequately answer this question as asked? Will the participant willingly answer this question as asked? Guidelines for Question Sequencing - ANSWER Interesting topics early Classification questions later Sensitive questions later Simple items early Transition between topics Designing a reliable instrument - ANSWER Make sure wording is complete: Age? Vs. What was your age on your last birthday? Ensure consistent meaning to All respondents Standardize response type