




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The 60-day Engineering Statistics course, inspired by IIT, teaches essential statistical methods for engineers to analyze data, make informed decisions, and optimize processes.
Typology: Lecture notes
1 / 166
This page cannot be seen from the preview
Don't miss anything!





























































































In this course Engineering Statistics, we are going to understand how to use statistics to make inferences from data and how to apply them in the emerging field of artificial and machine learning. This course will revise basic probability and basic distributions, and then study various other distributions, including parameterized distributions and exponential families of distributions. Once we understand the basic distributions, we will talk about how to generate data from given distributions, how to infer that data is going to come from certain distributions, and how to extract information from the distributions. We will study hypothesis testing, construct P-values, and develop stat tests like T-test and F-test. In hypothesis testing, we will use non-parameterized methods like a chi-square test column across Midnoff test and Daily 4 test. In this course we will talk about basic programming language like Python and see how various libraries of Python related to statistics can be used to understand data and easily analyze them to make some inferences.
Simplified Examples: Coin Toss and Dice Throw To understand the concepts of probability, we will start with simple examples like a coin toss or a throw of a dice. These examples involve random outcomes that are relatively easy to understand and reason about. When we talk about a random experiment, we need to understand the possible outcomes and events. The sample space, denoted by omega, represents the possible outcomes of an experiment. An event is any subset of the sample space. In the case of flipping a coin, the sample space is simply head or tail (H or T). Rolling a dice has six possible outcomes, represented by the numbers 1 to 6. In more complex scenarios, the sample space can be a continuous interval, such as a range of room temperatures. Events are subsets of the sample space. In the case of flipping a coin, the events can be H (heads), T (tails), or HT (either heads or tails). In the case of rolling a dice, events can be the outcomes divisible by 2 (2, 4, 6), odd numbers (1, 3, 5), or divisible by 3 (3, 6). The number of events in a sample space is 2 to the power of the number of elements in the sample space. However, this applies only to finite sample spaces. Operations can be performed on events. For example, finding the chance of an outcome being both even and divisible by 3 in the case of rolling a dice involves considering multiple events. When an event occurs, it means that one of the outcomes in the event has happened. The complement of an event, denoted by A', is the set of outcomes not in A. Complement: The complement of an event E is denoted as E'. The union of E and its complement is the entire sample space, denoted as Ω. The intersection of E and its complement is the null set, as they have no common elements.
Union and Intersection: For two sets E and F, the union is the set containing all elements in either E or F. The intersection is the set containing all elements common to both E and F. Mutually Exclusive: Two events E and F are mutually exclusive if they have no common elements. Their intersection is the null set. Probability: Probability is a measure of the likelihood of an event occurring. A probability space consists of a sample space Ω and a σ-algebra F, which is a collection of subsets of Ω. The probability function assigns a probability value between 0 and 1 to each event in F. Basic Properties of Probability: Nonnegativity: The probability of any event is nonnegative. Normalization: The probability of the entire sample space Ω is 1. Finite Additivity: The probability of the union of two mutually exclusive events is equal to the sum of their individual probabilities. Then the likelihood of these events should add up because they are mutually exclusive. If they are mutually exclusive, then the probability of their union should be equal to the sum of their probabilities.
Now, I am interested in the event E, which consists of even numbers, such as 2, 4, 6, and so on. I want to find the probability of event E. The probability of event E can be thought of as the probability of 2 happening, 4 happening, 6 happening, and so on. But are events 2 happening and 4 happening mutually exclusive? Yes, they are. So, I can treat each of these events as mutually exclusive and use their probabilities to add up and get the probability of event E. Now, how many times do I need to add? Let's go back to the basics of probability. First, a probability should be greater than or equal to 0, which is obvious in this case. Second, the sum of probabilities of all possible outcomes should be equal to 1. In this case, if all the outcomes are 1, 2, 3, 4, and so on up to infinity, the sum of their probabilities is indeed 1. Now, I can write event E as a union of events. Let me write it as event 1 is 2, event 2 is 4, and so on. Event E is nothing but the union of these events, and I know that these events are mutually exclusive. If events are mutually exclusive, then the probability of their union can be found by adding their individual probabilities. In this case, I need to add countably many mutually exclusive events, so I need to extend the third axiom of probability. If I have a sequence of mutually exclusive events, E1, E2, and so on, defined on the same sample space omega, then the probability of their union is the sum of their individual probabilities. This is how we extend finite additivity to countably many mutually exclusive events. But what about the uncountable case?
Let's consider an example to understand uncountable cases. Imagine a square area represented by x and y coordinates, where x and y range from 0 to 1. We are interested in a certain point within this region. In order to have a proper probability function, we must ensure that the probability of all x and y values between 0 and 1 is 1. Now, suppose we assign a strictly positive number to each x, y pair within this region. If we sum up these numbers, we will get a value greater than 1, since there are infinitely many x, y points. This contradicts the requirement for the probability function to have a value of 1. Therefore, some of the values for p(x, y) must be 0 in order to satisfy the axiom of probability. This example demonstrates that the extension of additivity from countable to uncountable cases does not hold. In the case of uncountable cases, some points must have a probability of 0, as it is not possible for all points to have a positive probability. This is a general rule and not specific to this example. It's important to note that we are not concerned with the actual masses assigned to each point, but rather the fact that some of them must have a probability of 0 in order for the overall probability function to be valid.
In the study of probability and statistics, there are various interpretations and applications. One important concept is the limit of a function as it approaches infinity. This can be challenging when dealing with infinitely many points, but it is essential for understanding probability and statistics. Probability can be interpreted in different ways. One interpretation is based on likelihood, where equal values are assigned to events with equal chances. Another interpretation is the frequentist view, which considers probability as the fraction of times an event occurs when an experiment is repeated indefinitely.
In probability, we often want to determine the probability of certain events occurring. Let's consider two events, e and f. The probability of event e happening is 5/36, while the probability of event f happening is 6/36 (1/6). According to the definition of independence, if the product of the probabilities of e and f is not equal to the probability of their intersection, then e and f are dependent. Now, let's consider the outcome of rolling a dice. If the outcome is 4, does this provide any information about the outcome being 6? It does, as there is a possibility that the sum of the two outcomes could be 6. This implies a dependency between the events. Similarly, if we are interested in the sum being 7 and the first outcome is 4, there is still a chance that the sum could be 7. Thus, these events are dependent. However, when we calculate the probabilities, we find that the probability of the intersection of e and f is equal to the product of their individual probabilities. So, according to the definition, they should be independent. But why do we see a dependency in this case? The reason is that if 4 does not happen, the outcome could be any number from 1 to 6, and this does not improve our knowledge about the sum being 7. Therefore, the events are truly independent.
When dealing with more than two events, we can extend the definition of independence. If we have a finite set of events e1, e2, e3, up to en, they are considered independent if the joint probability of any subset of these events is equal to the product of their individual probabilities. For example, if we have events 1, 3, and 4, we need to check if the probability of their intersection is equal to the product of their individual probabilities. This needs to hold true for all subsets of events. The number of subsets to
check grows exponentially with the number of events, making the calculation complex. Instead, we often use a weaker notion called pairwise independence. In this case, we only consider pairs of events and check if the probability of their intersection is equal to the product of their individual probabilities. This simplifies the calculation as we only need to check n choose 2 (combinations of 2 events) conditions, which is quadratic in n. It's important to note that pairwise independence does not imply independence of the entire set of events.
In another example, suppose we have a fair die with outcomes ranging from 1 to 6. We define a random variable X as the sum of the outcomes obtained on two rolls of the die. The possible values of X range from 2 to 12. To compute the probability that X equals 3, we need to consider the possible outcomes that would result in a sum of 3, which in this case are 1-2 and 2-
The cumulative density function (CDF) of a random variable x at a point small x is the probability of the random variable taking a value less than or equal to x. Mathematically, it can be represented as: P(X ≤ x) = F(x) For example, let's consider a random variable x which takes values 1, 2, and 3 with probabilities 1/2, 1/3, and 1/6 respectively. We can plot the CDF as a graph where the x-axis represents the values of x and the y-axis represents the cumulative probability: For x ≤ 1, the CDF is 0 For 1 < x ≤ 2, the CDF is 1/ For 2 < x ≤ 3, the CDF is 5/ For x ≥ 3, the CDF is 1 The CDF has certain properties: It is nondecreasing in x As x approaches infinity, the CDF approaches 1 As x approaches minus infinity, the CDF approaches 0 It is right continuous If we want to find the probability that x lies between two points x and y, we can simply subtract the CDF at x from the CDF at y: P(x < X ≤ y) = F(y) - F(x) However, the probability that x is strictly less than x (P(X < x)) can be computed using the limiting notion:
A probability mass function (PMF) is used for a discrete random variable, while a probability density function (PDF) is used for continuous random variables. In a PMF, the random variable takes on discrete values, and the probabilities assigned to these values should add up to 1. Each value's assigned probability is called the mass at that point. In a PDF, the probability of the random variable falling within a subset A can be represented as the integration of a function f(x). If this is true for any subset, the random variable is considered continuous, and the function f(x) is called the probability density function. Properties of PDF If the subset A is the entire real line, the probability is 1. The area under the PDF curve is 1. If A is a finite interval [a, b], the probability is given by integrating the function f(x) from a to b. If A is a single point a, the probability is 0. Interpreting PDF The PDF indicates the rate of change of the mass in the neighborhood of a particular value x. It does not represent the probability at that point, as the probability at a single point is always 0 for continuous random variables. Relation between PDF and CDF
The PDF can be obtained by differentiating the cumulative density function (CDF). The PDF at a point x represents the rate of change of the CDF at that point. If a continuous random variable has a PDF, its CDF must be differentiable at every point. Continuity and Differentiability A continuous random variable's PDF indicates that the CDF must be continuous and differentiable at every point. There can be no jumps in the CDF curve. Probability is represented by the formula epsilon * f(x) * a. This formula calculates the probability in a given region by multiplying the value given by the probability density function (PDF) by epsilon. The value of epsilon represents the rate at which the probability changes in the neighborho od of epsilon. This formula is only valid when x is a small interval. If the interval is large, the formula does not make sense. The PDF at point a, denoted as f(x), measures the likelihood that a random variable x will be near point a. It does not represent the exact probability of getting a, but rather the probability of the outcome being in the neighborhood of a. With this understanding, we can now study some commonly used distributions. Standard Discrete Random Variables Bernoulli random variable: Denoted by Bernoulli and parameterized by p. It has two possible outcomes, 0 and 1, with probabilities p and 1 - p respectively. It is commonly used to model coin tosses or binary outcomes. Binomial random variable: Denoted by Binomial with parameters n and p. It has n+1 possible outcomes, ranging from 0 to n, with probabilities calculated using the formula n choose i * p^i * (1 - p)^(n-i). It is useful for counting the number of successes in a fixed number of trials.
In the previous lectures, we discussed discrete random variables such as Bernoulli, binomial, geometric, and Poisson. ToDAY , we will talk about continuous random variables. Uniform Distribution The first example of a continuous random variable is the uniform distribution. It is denoted as Uniform(a, b), where a and b are real numbers. The random variable x is uniformly distributed between a and b. The probability density function for this distribution is defined as 1/(b-a) within the range of a to b, and 0 outside of this range. Uses Modeling someone's height or weight Modeling temperature Discrete Uniform Distribution In the case of discrete random variables, the uniform distribution means that all values have the same probability. For example, if a discrete random variable takes values x1, x2, x3, up to xn, then the probability that x takes any of these values is 1/n. The probability mass function for this distribution is a constant value for all values xi. Exponential Distribution The exponential distribution is a positive-valued random variable that takes values between 0 and infinity. It is denoted as Exp(lambda), wh ere lambda is a strictly positive parameter. The probability density function for this distribution is lambda * exp(-lambda * x) for x > 0, and 0 otherwise.
Use The exponential distribution is often used to model lifetimes, such as the time a bulb will work before it breaks down. Gaussian Distribution The Gaussian distribution, also known as the normal distribution, is one of the most commonly used distributions. It is denoted as N(mu, sigma^2), where mu is the mean and sigma^2 is the variance. The random v ariable x takes values over the entire real line. The probability density function for this distribution is (1/(sqrt(2pisigma^2)) * exp(-(x-mu)^2/(2*sigma^2)).
Gaussian and Rayleigh distributions are commonly used to model random quantities such as errors and noise. These distributions have specific characteristics that make them suitable for this purpose. Gaussian Distribution The Gaussian distribution, also known as the normal distribution, is characterized by two parameters: μ (mu) and σ^2 (sigma squared). μ determines the center of the distribution, while σ^2 determines the spread or width of the distribution. If μ is larger, the distribution shifts towards the right. If σ^2 is larger, the distribution becomes wider. The Gaussian distribution is commonly used when handling both positive and negative quantities, such as errors and noise. It is symmetric around the value of μ and can model random factors that affect the outcome, like wind velocity, humidity, and temperature in a shooting scenario. Rayleigh Distribution The Rayleigh distribution is derived from the Gaussian distribution and is characterized by a parameter, σ (sigma), which must be a positive quantity. It takes on positive real numbers between 0 and infinity.