Download Inferential Statistics and Sampling Techniques and more Exams Business Economics in PDF only on Docsity!
and Statistics
BMAL 590 Quantitative Research Techniques and Statistics
Test
Decision Analysis (Section 8)
- Which one of the following would not be considered a state of nature for a business firm? Minimum wage regulations
- Assume an investment is made a significant number of times using the same probabilities and payoffs. In this case, the average payoff per investment represents.
- The level of doubt regarding the decision situation where both the possible states of nature and their exact probabilities of occurrence are known as which of the following?
- The difference between expected payoff under certainty and expected value of the best act without certainty is the. Expected value of Perfect Information
- Which of the following regarding EMV/EOL if false?------ "The EMV decision is always different from the EOL decision" is a FALSE statement.
Analysis of Variance (Section 7)
- The F-statistic is a one-way ANOVA represents the. the ratio of the variance between the group means to the variance within the groups. It is used to determine if there are significant differences between the group means.
- In ONE WAY ANALYSIS OF VARIANCE we can observe the effect on the response variable of at least two factors.
- The distribution of the test statistics for analysis of variance is the NO ANSWER.
What is Statistics? (section 1)
- A sample of 500 athletes is taken from a population of 11,000 Olympic athletes to measure work ethic. As a result we. Can make consistent inferences each time work ethic is the outcome
- When data is collected in a statistical study for only a portion or subset of all elements of interest we are using a. Sample
Data Collecting and Sampling (section 2)
- When a person receives an email questionnaire and places it in their deleted items without responding, they are contributing to. Non-response error
and Statistics
- The difference between a sample mean and the population mean is called the. Sampling error
Introduction to hypothesis Testing (Section 5)
- A type I error occurs when we. Reject a true null hypothesis
- In a criminal trial where the null hypothesis states that the defendant is innocent, a Type II error is made when. A guilty defendant is found not guilty
- The p-value of the test is the. The largest a at which a null hypothesis cannot be rejected
and Statistics
Section 1- What is Statistics?
What is Statistics?
- Statistics is a way to get information from data. It is a tool for creating new understanding from a set of numbers. Descriptive Statistics
- Descriptive Statistics - is one of two branches of statistics, which focuses on methods of organizing, summarizing, and presenting data in a convenient and informative way. o One form of descriptive statistics uses graphical techniques, which allow statistics practitioners to present data in ways that make it easy for the reader to extract useful information. ▪ Histogram (bar graph) can show if the data is evenly distributed across the range of values, if it falls symmetrically from a center peak (normal distribution), if there is a peak but more of the data falls to one side (skewed distribution), or if there are two or more peaks in the data (bi-or multi-modal) ▪ Numerical Techniques- rather than providing raw data the professor may only share summary data with the student. One such method used frequently calculates the average or mean
- Measure of central location- the mean (average) is one such measure, it is the sum of all data values divided by the number of values
- Range - the simplest measure of variability, is calculated by subtracting the smallest number from the largest.
- Median - midpoint of the distribution where 50% of the data values are high and 50% are lower. (not that the mean and median will not necessarily be an observed test score).
- Mode - the most frequently occurring data value
- Variance- the average squared deviation to the mean. To compute the difference between each data value and the mean is calculated and squared. If differences are not squared sum will always be 0.
- Standard deviation
- - simply the square root of the variance and gets the variability measure back to the same units as the data
- Negatively skewed if mean is to the left (point is to the right), positively skewed if the mean is to the right (point is to the left) Inferential Statistics
- Inferential statistics is a body of methods used to draw conclusions or inferences about characteristics of population based on sample data o Example of inferential statistics is exit polling during elections o Practitioners can control the fraction of the size of the sample with between 90-99% Key Statistical Concepts
- Statistical inference problems involve three concepts: o population - the group of all items of interest to a statistics practitioner. Frequently very large and may in fact be infinitely large. Does not necessarily refer to a group of people ▪ parameter - descriptive measure of a population, represents the information we need o sample – set of data drawn from the population o statistical inference- we use statistics to make inferences about parameters. Statistical
and Statistics
inference is the process of making an estimate, prediction, or decision about a population based on sample data.
and Statistics
Section 2- Data Collecting and Sampling Methods of collecting data
- Statistics is a tool for converting data into information
- Number of methods that produce data o Data are the observed values of a variable o We define a variable or variables that are of interest to us and then proceed to collect observations of those variables.
- Three popular methods to collect data for statistical analysis- o Direct Observation- ex. Number of customers entering a bank per hour ▪ Simplest method to obtain data ▪ Data said to be observational ▪ Many drawback to direct observation including that it is difficult to produce useful information in a meaningful way ▪ Advantage is low cost o Experiments- ex new ways to produce things to minimize costs ▪ Sample is split into two groups, one who does something and the other does not then evaluate results from two groups o Surveys – one of the most familiar data collecting methods. Solicit information from people concerning such things as their income, family size and opinions on various issues. Majority are conducted for private use. ▪ Response rate- the proportion of all people who were selected to complete the survey - Low response rate- can destroy the validity of any conclusion resulting from statistical analysis. Need to ensure data is reliable. ▪ Personal interview- many researchers believe this is the best way to survey people, involves an interviewer soliciting information from a respondent. Has higher response rate. Main disadvantage is the cost. ▪ Telephone interview- usually less expensive but also les personal and lower expected response rate o Self-administered questionnaire - usually mailed to sample of people. Inexpensive, but usually have low response rate, have high number of incorrect responses due to misunderstanding questions Questionnaire Design
- Must be well thought out, key design principles include: o Keep short as possible o Ask short, simple, clearly worded questions, o Start with demographic questions o Use dichotomous (yes/no) and multiple choice for simplicity o Use open ended questions cautiously o Avoid using leading questions o Try questionnaire to small number of people first to uncover problems o Think about the way you intend to use the collected data when preparing the questionnaire Sampling
- Chief motives for examining a sample rather than a population are cost and practicality
- Target population – the population about which we want to draw inferences
and Statistics
- Sampled population- actual population from which the sample has been taken
- Sampled and target populations should be close to one another
and Statistics
- Non-sampling error- more serious than sampling error because taking a larger sample wont diminish the size or the possibility of occurrence of this error
and Statistics
o Result from mistakes that are made in the acquisition of the data an from the sample observations being selected improperly o Three types of non-sampling errors: o 1-Data Acquisition errors- arise from the recording of incorrect responses. May be result of incorrect measurement taken because of faulty equipment, mistakes made during transcription from primary sources, inaccurate recording of data due to misinterpretation of terms or inaccurate responses to questions concerning sensitive issues o 2- Non-Response Error- refers to error or bias introduced when responses are not obtained from some members of the sample. When this happens sample observations may not be representative of the target population resulting in biased results. ▪ Response rate- the proportion of all people selected who complete the survey, key survey parameter and helps in understanding the validity of the survey and sources of non-response error o 3- Selection bias - occurs when the sampling plan is such that some members of the target population cannot possibly be selected for inclusion in the sample. Together with non- response error selection bias
- When responses are not received from a sampled person bias is introduced
QUIZ Section 2 -Which of the following statements is true regarding the design of a good survey? The questions should be kept as shot as possible -Which method of data collection is involved when a researcher counts and records the umber of students wearing backpacks on campus in a given day? Direct Observation
-Manager at electronics store wants to know if customers who purchased video recorder over the last 12 months are satisfied with their products. If there are 4 different brands of video recorders made by the company, which sampling strategy would be best to use? Stratified random sample
-Which of the following types of samples are almost always biased? Self-selected sampled
- is an expected error based only on the observations limited to a sample taken from a population. Sampling error
and Statistics
- Marginal Probability is a measure of the likelihood that a particular event will occur regardless of were another event occurs. o Computed by adding across rows or down columns are so named because they are calculated in the margins of the table ▪ Ex. With MBA program grads running successful mutual funds, add top MBA grads and get .40 or 40% of all mutual fund managers graduated from top MBA program. Combination of all must still add to 1, non top mba is .60 or 60% Conditional Probability
- Conditional Probability is used to determine how two events are related, that is we can determine the probability of on event given the occurrence of another related event. The probability of an event given that another event has occurred. o Called a conditional probability because we want to know the probability given a certain condition o Ex. Probability that a fund manager by a top MBA grad will outperform the market, given the manager graduated from a top school
- Conditional probabilities are written as P(A|B) read as the probability of A given B
- Calculation on conditional probabilities raises the question of whether the two events are related Independence
- One of the objectives in calculating a conditional probability is to determine if the two events are related. In particular we would like to know if thy are independent events.
- Two events are said to be independent if: P(A|B)=P(A) or P(B|A)=P(B)
- Independent - two events are independent if the probability of one event is not affected by the occurrence of another event.
- Ignore mutually exclusive combinations
- In each combination in the example the two events are independent, in this type of problem when one combination is dependent all 4 will be dependent or visa versa. This rule does not play to any other situation. Union
- Union is another combination of events, the Union of events A and B is the event that occurs when either A or B or both occur denoted as A ○ B
- Ex. To determine that a randomly selected fund outperforms the market or the manager graduated from a top MBA program, we will need to compute the union of the two events. Union occurs when: o Fund out performs the market and the manager graduated from a top mba program o Fund outperforms the market and the manager did not graduate from a top mba program o Fund does not out perform the market and the manager graduated from a top mba program Complement Rule
- Complement of event A is the event that occurs when event A does not occur. Complement of event A is denoted as Ac. Event consisting of all sample points that are “not in A” o The compliment of the rule defined here derives from the fact that the probability of an event and the probability of the events complement must sum to 1
- Compliment rule is P(Ac)=1-P(A) for any event A o Ex. Roll of die, probability the number “1” is rolled is 1/6, the probability that some other number than “1” will be rolled is 1-1/6=5/ Multiplication Rule
- Multiplication rule is used to calculate the join probability of two events. It is based on the formula
and Statistics
for conditional probability defined earlier. o P(A|B)=P(A n B)/P(B) o We derive the multiplication symbol my multiplying both sides by P(B) o Joint probability of ay two events is P(A n B)= P(A)xP(B|A)
and Statistics
- In many examples conditional probability measures the probability that an event occurs given that a possible cause of the event has occurred.
- Baye’s law is the technique we use to compute the probability of one of the possible causes of a particular event
- Ex. Mba applicant is considering gmat prep course
and Statistics
- P(A|B)= P(A n B)/P(B) or .052/.259=.201 or that chances are 20.1% when prep course is taken
- Prior probabilities because they are determined prior to the decision about taking the preparatory course - P(A) and P(Ac)
- Posterior probability (or revised probability) because the prior probability is revised after the decision about taking the prep course – P9A|B)
- Bayes’ law can also be expressed in a formula for an algebraic approach Identifying the correct method
- Key issue in determining which probability method to use is whether joint probabilities are provided or are required
- If joint probabilities are given: o We can compute marginal probabilities by adding across rows or columns o We can use joint and marginal probabilities to compute conditional probabilities for which a formula is available. This allows us to determine whether the events described by the table are independent or dependent. o Can also use the addition rule to compute the probability that either of the two events occurs
- If joint probabilities are required (not given): o Need to apply soe or all 3 of the probability rules where one or more joint probabilities are required o Multiplication rule (either by formula or probability tree) to calculate probability of intersections o Addition rule for mutually exclusive events when we want to add the joint probabilities o Compliment rule to determine is an event that occurs when another event does not occur o Bayes’ law to calculate new conditional probabilities
- First step in assigning a probability is to create an exhaustive and mutually exclusive list of outcomes.
- Second step is to use classical, relative frequency, or subjective approach and assign probability to outcomes. There are very few methods available to compute the probability of other events. These methods include probability rules and trees. An important application of these rules is Bayes’ law which allows us to compute conditional probabilities fro other forms of probability
QUIZ Section 3 Bayes’ Law is used to compute. Posterior Probabilities
The classical approach describes a probability. In terms of the proportion of times that an event can be theoretically expected to occur
If a set of events includes all possible outcomes of an experiment these events are considered to be. Exhaustive
Which statement is not correct? If event A does not occur, then its compliment Ai^ will also not occur
and Statistics
parameters precisely o As a consequence the finite population correlation factor is usually omitted
- If x is normal, X (with line) is normal. If X is non-normal, X (with line) is approximately normal for sufficiently large sample sizes. The definition of sufficiently large depends on the extent of non- normality of X.
and Statistics
Creating the Sampling Distribution Empirically
- To create the sampling distribution empirically, we can actually toss the dice repeatedly, calculating the mean for each sample, counting the number of times each value of X occurs and computing the relative frequencies to estimate the theoretical probabilities.
- Disadvantages are excessive amount of time Contents of a 32-oz bottle
- Ex. Foreman at a bottling plant observed that the amount of soda in a 32oz bottle is actually normatively distributed random variable, with a mean of 32.2oz and a standard deviation of. 3oz o We want to find P(X>32) where X is normally distributed and μ=32.2 and ơ=. o P(Z>.67)=1-.2514=. o There is about a 75% chance that a bottle of soda contains more than 32 oz Salaries of business school graduates
- We want to fin the probability that the sample mean is less than $750 (earned per week for grad school grads)- P( X <750)
- The distribution of X , the weekly income, is likely to be positively skeed but not sufficiently so to make the distribution of X non normal. As a result we may assume that X is normal with the mean μx=μ800 and standard deviation is ơx=
- Thus =P(Z<-2.5) or =.5-.4938 which is =.
- The probability of observing a sample as low as $750 when the population mean is $800 is extremely small. Because the event is quite unlikely Using the Sampling Distribution for Inference
- P(-1.96<Z<1.96)=.
- The middle 95% of a normal distribution leaves tails of both the left and right of the distribution. The z scores associated with those tails of .025 are +- 1.
Sampling Distribution of a proportion
- Proportion of the successes if we are only looking for a yes or no answer. Success is getting the outcome we are interested in even if its broken items.
- Binominal experiments because they only have two outcomes (success or failure), and have binomial distribution o The binomial distribution is a Discrete distribution because it can only take on whole number values o The binomial distribution parameter is p, the probability of success in any trial
- To compute binomial probabilities we have to assume that p was known. However in the real world, p is unknown, requiring a statistics practitioner to estimate its value from a sample. o Sample proportion is the estimator of a population proportion, that is we count the number of successes in a sample and compute. o P (hat) is where X is the number of successes and n is the sample size. When we can take a sample of size n, were actually conducting a binominal experiment and as such a result, X is binomially distributed. Thus the probability of P(hat) can be calculated from its value of X o Suppose we have a binominal experiment with n=10 and p=.4, To find the probability that the sample proportion P(hat) is less than or equal to .50, we find the probability that X is less than or equal to 5 because 5/10=. ▪ P(Phat≤.50=P(X≤5)=. ▪ We can calculate the probability associated with other values of P(hat) similarly