









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Module 1 – Introduction to Statistics
Typology: Study Guides, Projects, Research
1 / 16
This page cannot be seen from the preview
Don't miss anything!










Students who are studying Social and Natural Sciences such as Psychology, Economics, Sociology, or Business are often surprised to learn how much of the curriculum in their field focuses on understanding and learning how to analyze data. Because statistics is an essential part of these fields, students need to understand how to use statistics and how to interpret statistical results.
The science of statistics provides students with guidelines for describing data and for interpreting patterns of differences that may occur within certain groups of data or between different groups. Statistics also provides sound and scientifically rigorous techniques for handling and analyzing data when making inferences from numerical results. Statistics has the power to transform data into knowledge that can be used to understand and solve real world problems.
Statistics is the science of collecting, organizing, summarizing data and drawing conclusions for the population based on those results. Alternative Definition: Statistics is a collection of methods for decision making in the face of uncertainty on the basis of numerical data and calculated risks. The two major branches of statistics are Descriptive and Inferential Statistics; these statistics are not mutually exclusive, but rather, build on one another. They are presented below: Descriptive Statistics Descriptive statistics are statistical procedures that are used to describe characteristics of data sets. Descriptive statistics refers to the measurement of data that is presently occurring within all subjects. This type of statistic involves methods of organizing, picturing, and summarizing information from samples or populations so that we can communicate our results as accurately and completely as possible. For example, one of the first things that we would want to do with our data is to graph them, to calculate means, (averages) and other measures, and to look from extreme scores or oddly shaped distribution of scores. These procedures are called descriptive statistics because they are primarily aimed at describing the data. It also allows us to simplify large quantities of data in an understandable and logical manner. Definition: The branch of statistics which deals with collecting, organizing, summarizing and graphing a data set is called descriptive statistics. Inferential Statistics Once we have described our data in detail and are satisfied that we understand what the numbers have to say on a superficial level, we will employ the use of Inferential Statistics. Inferential statistics refers to assessing
preschools, our sample would not be representative of the entire population, and any conclusions drawn from that sample may be very limited.
A sample is a subset of members from entire population. Example 1.1: In a recent study of the proportion of college students addicted to nicotine, it was found that out of 13, students, a total of 36.2% of students smoked cigarettes. A researcher randomly selected a group of 86 of the nearly 14,000 students and determined that 37.1% were addicted to nicotine. In this particular study, the size of the population was 13,847 students and the size of the sample was 86 students. Note: As the size of the sample increases, the sample percentage becomes closer to the population percentage. This means that we can reduce the error in estimation by increasing the sample size. It is quite obvious that using samples to draw conclusions about a population is much faster and more cost effective than using the entire population for the study. Parameter and Statistic: In Statistical research, the measurements obtained can be classified into two categories: Parameter and Statistic. One important distinction is that Statistics are numbers or indices that are provided by our sample (e.g., the average preschool vocabulary score in the sample) that are used to estimate the same values within the population (called parameters) in the process of statistical inference. Parameter A parameter is a true value in the population of interest. When describing parameters, the population mean is most often used. The following are examples of parameters:
A Parameter is a numerical measurement describing a certain characteristic of a population. Statistic A statistic is a numerical quantity that describes characteristics of the sample. A statistic is a number you calculate from your sample data in order to estimate the parameter. The following are examples of statistics:
A Statistic is a numerical measurement describing a certain characteristic of a sample. SECTION 1.2: Statistical Data and Design of Experiments Statistical data In simple terms, data is the information that is compiled during any type of experiment or study. Statistics is a science that consists of different models of collecting, summarizing, and graphing measurements from populations or samples. The compilation on this information is called a data set. Some data sets consist of numerical values while others consist of nonnumeric measurements.
Data is a set of numerical or categorical values of the variable. Quantitative and Qualitative Data: Quantitative data are purely numeric. A quantitative variable has a value or numerical measurement for which operations such as addition or averaging make sense. Quantitative data deals data which can be measured. For example, the number of packages delivered to a particular business each month is a quantitative data. On the other hand, Qualitative data deals with descriptions. This type of data can be
Continuous Data is data that represent values that correspond to some continuous scale without interruptions.
one category cannot be in any other category and they never overlap. One category is not “greater” or “better” than any other, therefore, categories are merely different. For example, the sex of a person is a nominal variable. A male is considered one category and a female another. From the sports arena, basketball jerseys would be considered nominal in the sense that a player that has a high jersey number is not necessarily better or worse than a player with a low jersey number. Considering political parties as another example, one can be registered to vote as a Democrat, Republican, or as any other political party, but ultimately cannot be registered to vote as all.
Data at nominal level of measurement is categorized by names, labels and categories. At this level of measurement data is qualitative only and there is no order. Colors, Models, Brands, etc. are example of such data. Ordinal Measures The second type of measurement scale is Ordinal measures. Like a categorical scale, this may involve different qualitative categories, but these particular categories are ranked in some kind of order. However, differences between data values either cannot be determined or are meaningless. For example, runners in a race finish in different places. As is in this example, runners can finish in 1st^ place, 2nd^ place, 3rd^ place, and so on to “last place”. These numbers are not truly numeric because adding first place to second place does not equal third place, but they do represent different degrees of quality. In terms of runners, the runner who came in first is certainly “better” than the person who came in third place and the runner who finished third is certainly “better” than the person who came in “last.” Similarly, if we designated employees within a manufacturing firm according to their rank (e.g., 1 = executive, 2 = managerial, 3 = administrative, 4 = factory line worker), the numbers have no true numeric meaning (“factory” is not twice as much as “managerial”), but they are in an ordinal (or “ordered”) form, so that executive is “higher” than managerial or administrative positions. While the categories in an ordinal scale are ranked, we cannot discern the distance or difference between two categories just from their values. While our 1st^ place runner is the “best,” we have no idea from this label how much better she is than the 2nd^ place, or 3rd^ place runner. The 2nd^ place runner could be one second or ten minutes behind the winner; an ordinal scale does not provide this information.
Data at ratio level of measurement can be arranged in order and the difference in between entries is meaningful. Data at the ratio level have a trust zero. Ratio level of measurement has the same characteristic as interval level of measurement with an added starting point of zero. Speed, Distance, Marriages, etc. are few examples of such data. It is clear that for the above examples there is a natural zero starting point. Independent and Dependent Variables In any experiment, the variables of interest are classified as Independent and Dependent. Independent Variable
Dependent Variable The dependent variable is the outcome variable. It is the aspect of behavior that is affected by the independent variable. It is dependent on other variables, in particular (at least this is usually what researchers are predicting) the independent variable. For example, Dr. Echo may be interested in the effects of sleep deprivation on young adults’ driving performance. He is hypothesizing that individuals who are sleep deprived will exhibit poorer driving performance and he wants to see if sleep deprivation is actually causing poor driving performance. He decides to conduct an experiment where he systematically varies level of sleep deprivation (the independent variable) and then measures how those variations relate to variations in driving performance (the dependent variable). SECTION 1.3: Sampling Methods There are different sampling methods to select research participants whose responses will make up our sample. We list a few for your review:
Random Sampling The best and easiest method is called Random Sampling. This is where research participants are chosen from a specified population in such a way that each individual has an equal probability of being selected.
A random sample is one where each member has equal chance or probability of being selected. For example, suppose George wanted to study the level of job satisfaction among workers in a certain factory. In order to choose a random sample of 50 workers from the total population of 500 workers, he could begin with a list of all of the workers, and use a table of random numbers, or a computer program that generates random numbers, to randomly select 50 workers. With a small population of interest, random sampling is not too difficult to work with. However, with a very large population (like preschool children’s vocabulary scores), true random sampling may not be possible and a Convenience Sample is more likely to be used. Convenience Sampling Convenience sampling is often used when the population is too large. This method is often used because the sample is readily available and the responses are easily accessed. As with the example of job satisfaction in a factory, an example of a convenient sample would be to survey only those employees who were in a particular room in the entire factory.
A convenience sample is one which uses results that are easy to obtain. Systematic Sampling The next method is called Systematic Sampling. This method consists of selecting any unit at random from the first k units numbered 1 to k and then selecting every kth^ unit in succession subsequently.
A systematic sample is one where we select a starting point and then select every kth^ member of the population.
For example, in order to obtain the percentage of voters favoring a republican candidate for presidency three suburbs are randomly selected from all possible and all the qualified voters questioned.
Chapter One Exercises