¡Descarga Probability and Statistics: A Comprehensive Guide to Basic Concepts y más Apuntes en PDF de Estadística solo en Docsity!
ESTADÍSTICA
[email protected] UNIT 1: PROBABILITY
1. Random experiment and basic grounds of probability Important terms
- Random experiment: a process leading to an uncertain outcome ( tossing a coin ) as it's a process that has two possible outcomes but we don't know
- Basic outcome: a possible outcome of a random experiment ( when you graduate possible outcomes are finding a job full time, part time, get further education… )
- Sample space: the collection of all possible outcomes of a random experiment
- Event: any subset of basic outcomes from the sample space. For example finding a job Intersection of events: if A and B are two events in a sample space S then the intersection, A and B is the set of all outcomes in S that belong to both A and B. For example A and B are finding a job and education, what is the intersection? Education and part time job
- A: finding a job ( FT job, PT job, PT job + education )
- B: education ( education, education + PT job )
- C: year off So between A and C there’s no intersection → Ø which is an empty set A and B are mutually exclusive event if they have no basic outcomes in common
- A ∩ B ≠ Ø
- A ∩ B = Ø A, B are mutually exclusive
For example when rolling a dice
- A: even numbers
- B: odd numbers
- Intersection: none Union of events: if A and B are two events in a sample space, S, then the union, A U B is the set of all outcomes in S that belong to either A or B The entire shaded area represents A u B Collectively Exhaustive Events: when the union is equal to the sample space we say they are collectively exhaustive
- The complement of an event A is the set of all basic outcomes in the sample space that do not belong to A. The complement is denoted Ā A U Ā = Sample space = collectively exhaustive A ∩ Ā = Ø
EXAMPLE
- Sample space is the possible outcomes of rolling a dice: S = [ 1, 2, 3, 4, 5, 6 ]
- A: number rolles is even → A: [ 2, 4, 6 ]
- B: number rolles is at least 4 → B: [ 4, 5, 6 ]
- Complements
We have 8 candidates: 5 are men and 3 are women 4 positions A = no hiring women
Denominator: C 48 =
Numerator: C 4
5
P(A) =
Rolling a 1 = 1/ 6 = 0, 100 times = n 15 → na P(A) =15/ 100 = 0, A= Family with income > 75000 “ P(A) at least 40% Census → P(A) = 31496 / 54343 = 0, Tax agency P(A) = 32047/55100 = 0,
Probability postulates
- If A is any event in the sample space S, then
- Let A be an event in S, and let O¡ denote the basic outcomes. Then
- P(S) = 1 → The probability of the sample space is ALWAYS = 1 A = [ 2, 4, 6 ] P(A) = 3/6 = 0, P(A) = P (2) + P (4) + P(6) = ⅙ + ⅙ + ⅙= 3/
Probability rules
- The addition rule: the probability of the union of two events is
Probability table
Addition rule example
2. Conditional probability and statistical independence
Conditional probability
A conditional probability is the probability of one event, given that another event has occurred: → The conditional probability of A given that B has occurred
P ( Red ∩ Ace ) = 2/ 52 P ( red/ Ace ) · P ( ace ) = 2/4 · 4/52 = 2 / 52
Statistical independence
Two events are statistically independent if and only if :
- Events A and B are independent when the probability of one event is not affected by the other event. → they have nothing in common
- If A and B are independent this means that event B doesn’t affect the probability of event A. In probability terms is that the conditional probability is equal to the unconditional probability. This happens that when A given B is the same than B → P ( A/B )=P(A)
- If A and B are independent then
Statistical independence example
- Of the cars on a used car lot, 70% have air conditioning ( AC ) and 40% have a CD player. 20% of the cars have both
- Are the events AC and CD statistically independent?
- P (AC ∩ CD )? P(AC)· P(CD)
- 0.2 ≠ 0.7 · 0.4 → 0.28 so AC, CD are not independent
- A ∩ B = Ø
- P(A∩B ) = 0
- Let’s think about an event and a complement, whe have AC and non AC, these are mutually exclusive because when the car has AC it’s impossible it doesn’t have AC so they are mutually exclusive so the probability of the intersection is 0 and they are perfectly related because when the car has AC the other one doesn’t occur
Bivariate probabilities
B 1 B 2
Bk
A 1
P(A 1 ÇB 1
P(A 1 ÇB 2
P(A 1 ÇBk
A 2
P(A 2 ÇB 1
P(A 2 ÇB 2
P(A 2 ÇBk
Ah
P(AhÇB 1
P(AhÇB 2
P(AhÇBk
Example
A = frequency watching Netflix → { regular, occasional, never }
Overinvolvement ratio
- B1 = buyer
- B2 = non buyer
- A1 = seen new ad
- This indicates whether the ad is effective or not
- The probability of event A1 conditional on event B1 divided by the probability of A1 conditional on activity B2 is defined as the overinvolvement ratio
- An overinvolvement ratio greater than 1 implies that event A1 increases the conditional odds ratio in favor of B
- OR =
P ( A 1 / B 1 )
p ( A 1 / B 2 )
= 2 > 1 → A1 is effective. The new ad is effective
3. Bayes Theorem
- Where Ei = (^) ith^ event of k mutually exclusive and collectively exhaustive events
- A= new event that might impact P( Ei )
Bayes’ Theorem Example
- A drilling company has estimated a 40% chance of striking oil for their new well
- A detailed test has been scheduled for more information. Historically 60% of successful wells have had detailed tests, and 20% of unsuccessful wells have had detailed tests
- Given that this well has been scheduled for a detailed test, what is the probability that the well will be successful - S = successful well - U = unsuccessful well
- D = detailed test
- ND = non detailed test
- P ( S/D ) =
P ( D / S ) · p ( S )
p ( d )
- P(s) = 0.
- P( D/S) = 0.
- p(d/u)=0.
- P(D) = P(D/S) · P(S) + P(D/U) · P(U)
- p(d) = p(D∩S ) + P(D∩U) Chapter 5: DISCRETE RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS
Introduction to probability distributions
- Random variable: represents a possible numerical value from a random experiment
- discrete: countable ( number of customers arriving/number of sales from 10 customers )
- continuous : continuum ( maximum daily temperature/yearly income )
Discrete random variables
- can only take on a countable number of values
- roll a dice twice and x is the number of times 4 comes up
- toss a coin 5 times and x is the number of heads ( 0,1,2,3,4,5 )
Discrete probability distribution
Experiment: toss 2 coins let x be the number of heads P( at least 1 head ) = 0.25 + 0.5 = 0.
Functions of random variables
- If P (X) is the probability function of a discrete random variable X and g (X) is some function of X, then the expected value of function g is
Linear functions of random variables
- Let a and b be any constants
- a=
- if a random variable always takes the value a, it will have mean a and variance 0
- b=
- the expected value of b·X is b·E (x)
- Let random variable X have mean μx and variance (^) σ^2 x
- Let a and b be any constants
- Let Y = a + bX
- Then the mean and variance of Y are
- So that the standard deviation of Y is
Probability functions
The binomial distribution
Bernoulli Distribution
- Building block for binomial
- Consider only two outcomes: “ success” or “failure”
- Let P denote the probability of success
- Let 1 - P be the probability of failure
- Define random variable X:
- x = 1 if success, x = 0 if failure
- Then the Bernoulli probability function is
- p=0.4 Bernoulli probability distribution
- P(0) = p(x=0) = 1-P
- P(1) = p(x=1)=p
- A marketing research firm receives survey responses of yes i will buy or no i will not
- New job applicants either accept the offer or reject it
Binomial distribution formula
P (x) = the probability of x successes in ntrials, with probability of succes P on each trial x= number of successes in sample, ( x= 0,1,2,...,n) n= sample size ( number of trials or observations ) P = probability of success
Example: calculating a binomial probability
- What is the probability of one success in five observations if the probability of success is 0.1? x=1, n=5, and P = 0.
Binomial distribution
- The shape of the binomial distribution depends on the values of P and n
- Here, n=5 and P=0.
- Here, n=5 and P=0.
The hypergeometric distribution
- Binomial distribution assumes we are obtaining a sample of size n, so you can think of the sample as repeating it or independent and the probability of success is independent from the probability of success of any other repetition. The probability of success is always P
- n trials in a sample taken from a finite population of size n
- sample taken without replacement
- Outcomes of trials are dependent
- Concerned with finding the probability of X successes in the sample where there are S successes in the population Imagine a sample of N with 9 women and 6 men, we need a sample of 5 ( extracting 5 elements which affects the probability of other elements) and the probability of P(x=3) and we want to know the number of women. The first woman is 9/15 but the next will be 8/14 so the probability of success is different from the previous one. The reason is because the sample size is relative to the side of the population and of course we are changing the replacement
Hypergeometric distribution formula
C 1
3
C 3
5
C 4
8 +^
C 2
3
C 2
5
C 4
8 +^
C 3
3
C 1
5
C 4
or P( at least 1 A )= 1-P(0) = 1 -(⅝ · 4/7 · 3/6 ·2/5)=92.86% P(0) =
C 0
3
C 4
5
C 4
The poisson distribution: apply when
- you wish to count the number of times an event occurs in a given continuous interval
- The probability that an event occurs in one subinterval is very small and is the same for all subintervals
- The number of events that occur in one subinterval is independent of the number of events that occur in the other subintervals
- There can be no more than one occurrence in each subinterval
- The average number of events per unit is 入 ( lambda )
Poisson distribution formula
where x= number of successes per unit 入= expected number of successes per unit e=base of the natural logarithm system (2.71828)
- number of people using ATM per hour
- customer service department receving more than X calls in the next minute
- Employee receiving a certain number of emails in a given week
- A certain number of calls in a call-center between 8 and 9 am
- At least x vehicles stopping at the traffic light during a 15-minute time period
Poisson distribution characteristics
- Variance and Standard deviation where 入=expected number of successes per unit
Poisson distribution example
A chair manufacturer knows that the number of defective chairs per thousand varies with an average of 1.6 ( defective units per 1000 ) The number of defective chairs per 1000 can be modelled bt the Poisson distrbution with μ=
What is the prbailit of no mjore than 2 defective chairs in a randomly chosen sample of 1000 chairs? We will calculate P(X < 2 ) in two ways 0 is to indicate no cumulative probabiltiy In excel = poisson.dist (x,入, 1= =
0 is to indicate no cumulative probability, 1 is to indicate cumulative probability P(X=C) =
e
− 2
0
= e −^2 = 0.