




Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Prepara tus exámenes
Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Prepara tus exámenes con los documentos que comparten otros estudiantes como tú en Docsity
Encuentra los documentos específicos para los exámenes de tu universidad
Estudia con lecciones y exámenes resueltos basados en los programas académicos de las mejores universidades
Responde a preguntas de exámenes reales y pon a prueba tu preparación
Consigue puntos base para descargar
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Comunidad
Pide ayuda a la comunidad y resuelve tus dudas de estudio
Ebooks gratuitos
Descarga nuestras guías gratuitas sobre técnicas de estudio, métodos para controlar la ansiedad y consejos para la tesis preparadas por los tutores de Docsity
Curso pyton gratis, ingles, programación intermedia
Tipo: Apuntes
1 / 8
Esta página no es visible en la vista previa
¡No te pierdas las partes importantes!





Write in the Q&A board and we will send it to you as soon as possible.
Please refer to this article which thoroughly explains how to attain your certificate of completion! https://support.udemy.com/hc/en-us/articles/229603868-Certificate-of-Completion
Unfortunately, such issues occur from time to time. You can try and refresh the page several times. If this does not solve your problem, you can wait for a while or contact Udemy support at: https://support.udemy.com/hc/en-us
Well, read above! If you have downloaded the PDF, refer to Section 1, Lecture 3, where we explain how to download all the materials for the course. For your convenience, here is a link to the associated lecture. https://www.udemy.com/the-data- science-course-complete-data-science-bootcamp/learn/v4/t/lecture/12955006?start=
A note from your instructor: I really prefer the more math-heavy books. Especially after completing this course, they will help you reinforce what you have learned and give you a better understanding of the mechanics of behind the formulas and the techniques used. In my opinion, introductory statistics books will overlap a lot with what we do in this course and will not prove that beneficial. Light and fun: An interesting read that may greatly aid you with your understanding is: 'A Field Guide to Lies and Statistics: A Neuroscientist on How to Make Sense of a Complex World' by Daniel Levitin. I found the book quite fun, while not technically heavy at all. It looks into different ways people misinterpret statistics (often on purpose). If you are comfortable with Math: I'd suggest the: 'Probability and Statistical Inference' by Nitis Mukhopadhyay. That must be one of the better readings on the topic (probability, distributions, confidence intervals, hypothesis testing, Bayesian statistics).
Then, for regressions (in a business / economics context), I would strongly suggest 'Econometric Analysis' by William Greene. That book is especially good, as it has several appendices, which include the linear algebra, probability, etc. that you may not know, or may have forgotten. But these two books are really math heavy (probability, linear algebra, etc.). If you want something more programming oriented: Then one of the classics is 'Python Data Science Handbook' by Jake VanderPlas. Probably you know this one. It is especially good because it looks into both NumPy and pandas and how to manipulate data using them.
Unfortunately, you cannot currently download this course on a desktop computer. However, if you can use a mobile phone to watch this course, you can download the Udemy App. Then, you can download the lectures and watch our content offline. Downloading Courses on iOS: https://support.udemy.com/hc/en-us/articles/229603928-Downloading- Courses-on-iOS Downloading Courses on Android: https://support.udemy.com/hc/en-us/articles/115006973308- Downloading-Courses-on-Android Lecture Specific Questions
Let's say, for example, in a biomedical field when a scientist is doing research using animals - some animals are given treatment A, and some animals are given treatment B, the rest are not given any treatment (control group). Each treatment group has 10 animals, so we have 30 in total. In this scenario the population in question is all animals. Why? Because we want to find a cure for all animals , right? We don't want to find a cure for 30 of them. There's no point in doing that.
What about these three groups? Now, we've got 3 samples , 10 each. In real life, by the way, we would usually draw a sample of 30 and then randomly divide it in 3. By all means these two scenarios are the same, but nobody is going to perform the sampling 3 times (10 animals each). Once this is done, we have 3 groups of animals. Ideally, we want each of these 3 samples to be: random and representative.
Let's explore another example. Imagine you are trying to cure all animals, and you've picked 10 dogs. That would be highly uninformative. What about horses, pigs, cows, cats, elephants? Given the same treatment, elephants will definitely react in a different way than dogs.
That would be true, but the general idea is different - the biggest sampling issues come from the fact that people can take representative samples in a NON-random manner (intentionally to prove some wrong statement). See example with languages above. If a sample is not representative, it may be because of lack of knowledge on the person sampling it, causing it to be non-random, too (unintentionally).
What is meant is 'weight' as a variable, not 'weight of a person'. Sorry about that. As you know the weight of nothingness is 0, so weight can be 0. Weight of a person, should have the same properties as weight, right? The fact that it is not achievable does not mean it is not a true zero. Unlike the temperature example where 0 degrees Celsius is a deceiving 0.
In statistics, the null hypothesis is the statement we are trying to reject. We define it that way, so we can reject it with a statistical test. Conventionally, we include the equality sign in the null hypothesis.
For a thorough walkthrough for creating a histogram refer to the course notes for Descriptive Statistics (Section 10, Lecture 26 in the Course Content tab). For your convenience here’s a link to that lecture: https://www.udemy.com/the-data-science-course- complete-data-science-bootcamp/learn/v4/t/lecture/
We use the one-tailed test when the null-hypothesis is some inequality. In that case, only values on one side of the null can reject it. For instance, if the null hypothesis states that “a” is greater than “b”, then we can only reject if “a - b” is significantly lower than 0. If instead, our null stated the two are equal, then we would have an equation that looks like “a - b = 0”. We can reject this hypothesis if “a – b” is significantly greater or significantly lower than 0. In this case, we have two possible ways of rejecting the null (if it is significantly 1) lower or 2) higher than the hypothesized value), which is why we need a “two-tailed” test.
To make this as easy to follow as possible, we are going to examine one of the exercises from the course. Go to the exercises available in Section 14, Lecture 7 4 and examine task number 3: https://www.udemy.com/the-data-science-course-complete-data-science- bootcamp/learn/v4/t/lecture/
We use the same logic as we did in the lecture on confidence intervals for difference of two means (independent samples, population variances assumed to be equal) - > Section 13 , Lecture 70 (https://www.udemy.com/the-data-science-course-complete-data-science- bootcamp/learn/v4/t/lecture/10764516?start=180) Back in Lecture 70 , we got the result that the confidence interval for the difference between apple prices in NY and apple prices in LA is positive. We made the conclusion that with 95% confidence, the difference of the means is between 0.47 and 0.92. In any case, it is positive. If the average apple prices in NY MINUS the average apple prices in LA is positive, then the apples in NY are more expensive. Using the same logic, we construct the confidence intervals in the practical example.
The main takeaway from the CLT is that no matter the underlying distribution, we know that the distribution of the sample means is normal (given a big enough sample). This means that we can take 1 sample (a SINGLE ONE!) and say: Imagine it was taken from the sampling distribution of the sample mean. It is normally distributed for sure, right? Because of the CLT. So, when we are testing, we can use this insight and treat the sample mean as if it is coming from a normal distribution (because it is - > the sampling distribution of sample means). And since all our hypothesis tests depend on normality, this proves extremely useful.
Unlike Z-scores, with T-statistics, we do not have a singular table of values, which matches T-scores with their associated P-values. This comes from the fact that every T-statistic requires an associated degree of freedom. This means that if we want to have an associated P-value for any T-score with any degree of freedom, we would require a 3-dimentional “table”. For a detailed discussion on the topic, please check this question: https://www.udemy.com/the-data- science-course-complete-data-science-bootcamp/learn/v4/questions/ Thus, convention dictates that we have a single table, which includes numerous degrees of freedom, but only has data regarding the most commonly used P-values. In short, the information you are looking for is NOT in the t-table , because it was intentionally cut out.