Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
A comprehensive overview of the key concepts in descriptive and inferential statistics, including measures of central tendency, dispersion, probability, hypothesis testing, and statistical significance. It also covers the use of spss, a powerful statistical software package, for data management, analysis, and presentation. The document delves into the importance of understanding how data is analyzed and interpreted, emphasizing the need for sound decision-making based on a thorough understanding of the statistical methods employed. With detailed explanations of various statistical tests and their applications, this document serves as a valuable resource for students, researchers, and professionals seeking to enhance their knowledge and skills in data analysis and interpretation.
Typology: Exams
1 / 9
Descriptive statistics - -used to describe basic features of data -provide simple summaries -allow you to present quantitative data in a manageable form -run the risk of distorting the original data or losing important detail inferential statistics - -used to try to reach conclusions beyond the immediate data alone --> to make assumptions about the population --> or to make predictions about what might happen in the future -used to test hypotheses -underlying assumption that sampling is random hypothesis testing - -when we are using inferential statistics, we are testing hypothesis -hypotheses are typically written in null form: there is NO difference between group A and group B -if the result of the test IS significant(usually a p value of less than .05), we reject the null hypothesis and conclude that there is a significant difference -if the result of a test is NOT significant, we accept (or fail to reject) the null hypothesis and conclude there is no significant difference what is a variable? - -a measurable factor, characteristic, or attribute of an individual or a system -a numeric value or a characteristic that can vary over time or between individuals -a characteristic that may assume more than one set of values what is a group? - -the possible responses for a categorical variable -the categories created when the values of a continuous variable are collapsed into specific groups
what are values? - -the code that identifies the legitimate options for a given variable -numerical quantity or test representing the characteristic being measured categorical variables - -a discrete variable -there is a finite number of potential values -qualitative -typically a grouping variable -can be nominal, ordinal, or interval data -can run descriptive statistics such as frequencies and percentages but not measures of central tendency continuous variables - -a variable that can theoretically assume an infinite number of values between any two points on the scale(BP, weight, temp) -quantitative -ordinal(sometimes), interval, or ratio data measures of central tendency - -the mean (average value ) -->population mean represented by mu -->sample mean represented by x-bar -median(the midpoint of a distribution, the same # of scores are above the median as below it) -mode(the most frequently occurring value) what is the standard deviation? - -the most common measure of dispersion from the mean -a measure of how spread out a distribution is -->a large standard deviation=data points are far from the mean -->a small standard deviation = data closely clustered around the mean
what is probability? - -chance behavior is unpredictable in the short run, but has a predictable pattern in the long run -the probability of any outcome of a random phenoma is the proportion of times the outcome would occur in a very long series of repetitions probability distribution - -a table or equation that links each outcome of an experiment with its probability of occurring -suppose you flip a coin twice --> possible outcomes: HH, HT, TH, TT probability and p value - -p value represents the probability of error that is involved in accepting the observed result as valid(or representative) -typically a p value of .05 is the threshold for accepting that the relationship between variables is not a result of chance -a p value of .05 indicates that there is a 5% probability that the relationship is a fluke -the lower the p value, the stronger the evidence and the more certain we can be about the relationship statistical significance - -the probability that the observed relationship between variables or a difference between means did not occur by chance -tells us something about the degree to which the result is true -statistical significance does NOT mean the result: --> is of practical significance -->demonstrates a large effect in the population -->is important Confidence interval - -the range of values that you can be certain contains the true mean in the population
-a 95% confidence interval means that if you take repeated samples and the 95% CI was computed for each sample, 95% of the intervals would contain the population mean correlation coefficients - -tests for a linear relationship between two continuous variables -relationship can be positive or negative -ranges from +1 to -1. 0 indicates no relationship -the larger the correlation coefficient, the more closely the two measures vary together or oppositely -does not indicate causality -variables must contain interval or ratio data -data must be normally distributed -there is no relationship between the strength(size) of the correlation and its statistical significance -a correlation report also reports the significance level -->or the likelihood that the correlation reported may be due to chance -a challenge in interpreting correlation coefficients is interpreting strength relationship -->multiple scales can exist, describing the different levels differently -scatterplots are helpful in interpreting the strength of correlation coefficients Null=there is no relationship between __ and __ T-tests - -dependent variable must be continuous and evaluates the differences between a categorical variable with 2 groups -compares the means of 2 groups -data must be normally distributed -must check for equality of variance -independent 2 sample=used whent the groups are independent of each other(ex: one group given medication and another group given a placebo)
-paired=used when the two groups of observations are based on the same sample of subjects(ex:BP before and after an intervention) null= there is no difference between __ and __ ANOVA - -dependent variable must be continuous and evaluates the difference among a categorical variable with more than two groups -data must be normally distributed -compares the means of the different groups -result only determines id there is a significant difference somewhere, but it does not tell you where --> to find out where the differences are, must run additional tests(called post hoc tests) null= there is not difference between __ and __ chi- square test - -both variables are categorical -measured variables must be independent -evaluates for a relationship between two variables (by testing the difference between theoretically expected and observed frequencies) What is SPSS - -statistical package for social sciences -a powerful tool used to manage and analyze data -user friendly because it is menu driven What can you do with SPSS - -create a data file -edit data -manage data --> compute new variables
-->recode variables -->select specific cases -->sort cases -->merge files -create graphs and charts -frequencies -descriptive statistics(mean, median, mode, standard deviation) -cross tab and chi-square -correlations -t tests -one way analysis of variance(ANOVA) -linear regression -logistic regression sound decision making requires an understanding of... - -how the data was analyzed -why the analysis was performed in a specific manner data= output= syntax= - data=sav file output=spo file syntax=sps file data file's 2 views - data view -variables in columns
-cases in rows Variable view -allows you to change how the data appears(EX:the # of decimal points that will show) -enables you to include information that will simplify your life(EX: labels that will make it easier to read results) Variable view - -Name can appear at the head of each column in the data view -->no spaces --> cannot begin with a number -Type: numeric, dollar, string -Width: the maximum number of characters allowed in the field -Label: can appear in the data view header -->invaluable when reading results -Values: include a value label so you don't have to remember if 1 is male or female -Columns: important if importing data from flat files -Missing: codes are used for missing data -->listing these codes tells SPSS not to use these data points in analysis Output file - -shows the results of your work -can be set to show the log in the output --> log shows syntax, or instructions to SPSS, that describes what you have done in the file -can copy results and paste into a Word file -can copy results into an excel file -can manipulate output tables easily if paste to "match destination formatting" --> sort the results differently -->adjust # of decimal places
-->copy and paste into tables that contain results from multiple tests syntax file - -since SPSS writes syntax(instructions) for you, you don't have to do anything with this file -but if you are going to be running the same tests again, it may be to your advantage to use this -syntax is NOT written automatically into the file. you must tell spss to do that SPSS steps - Step 1: Read Data In -enter data directly into SPSS -open and SPSS data file -import data from another proprietary program(Excel, acess, SAS) -Raw text data(csv, fixed, tab separated) Step 2: Organize your data file -assign variable labels -assign value labels -these help you easily identify the different variables and the values used in each Step 3: Check data for accuracy -run frequencies and descriptive statistics to check whether data are reasonable --> look for outliers(> 3 standard deviations) --> bad data entry -verify that variables are what you think they are Step 4: Begin working with your data -create new variables --> compute new variables from existing ones -recode -->collapse categories
-transform string variables into numeric -always check your data after making changes to your variables Next steps: -merge one file with another -select a sub sample -perform statistical analyses -present results