Practical 3 for stats, Exercises of Statistics

It includes some R computations and exercises for Stats.

Typology: Exercises

2022/2023

Uploaded on 01/07/2024

aarya-nepal
aarya-nepal 🇨🇭

1 document

1 / 3

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
University of Geneva
Statistics
GSEM
Fall semester
Practical 3
Exploratory Data Analysis - Numerical Variables
Goals: This practical is the continuation of practical 2 in Exploratory Data Analysis,
but this time we do analysis on numerical data. Numerical variables including discrete
variables and continuous variables can be numerically summarized by quantiles and some
other statistics measuring center or dispersion. And these statistics can also be visualized
by frequency tables, cumulative distributions, histograms/barplots, kernel density plots,
boxplots, violin plots, Q-Q plots. The goal of this practical is to further familiarize with R,
and in turn to better understand these summary statistics.
1 Exercise 1: Tables
In this part of the practical we see how to represent data through tables and extract some
information from them. For this purpose consider a dataset recording the wins, defeats
and goals of the Geneva hockey team HC Servette. The variables in file Hockey.csv are as
follows:
ScoredGoals: Number of goals for HC Servette.
ConcededGoals: Number of goals against HC Servette.
HomeMatch: binary variable indicating if HC Servette played a home or a away match
(Home=1; Away=0).
Winner: binary variable indicating if HC Servette won or lost (Won=1; Lost=0).
1. Import the dataset in Rand store it on a dataframe called Hockey.
Ho c k ey = r ea d . t a bl e ( " p a th / Ho c k ey . cs v " , s ep = " ;" , he a d er = T RU E )
2. We would like to have the variables HomeMatch and Winner as factors. First, verify their
type. We can use str() and class() to inspect the nature of the variables.
st r ( H oc ke y )
cl a ss ( H o ck ey $ H o me M at c h )
cl a ss ( H o ck ey $ W i nn e r )
3. Since they are not factors, recode them so that they are (see practical 2).
Supposing that the event of playing a home or away match is random, answer the following
questions using the Rfunctions table() and xtabs().
4. Build a frequency table for the number of goals scored by HC Servette.
5. Transform the frequencies of the table into proportions.
1
pf3

Partial preview of the text

Download Practical 3 for stats and more Exercises Statistics in PDF only on Docsity!

University of Geneva Statistics

Fall semester Practical 3

Exploratory Data Analysis - Numerical Variables

Goals: This practical is the continuation of practical 2 in Exploratory Data Analysis, but this time we do analysis on numerical data. Numerical variables including discrete variables and continuous variables can be numerically summarized by quantiles and some other statistics measuring center or dispersion. And these statistics can also be visualized by frequency tables, cumulative distributions, histograms/barplots, kernel density plots, boxplots, violin plots, Q-Q plots. The goal of this practical is to further familiarize with R, and in turn to better understand these summary statistics.

1 Exercise 1: Tables

In this part of the practical we see how to represent data through tables and extract some information from them. For this purpose consider a dataset recording the wins, defeats and goals of the Geneva hockey team HC Servette. The variables in file Hockey.csv are as follows:

  • ScoredGoals: Number of goals for HC Servette.
  • ConcededGoals: Number of goals against HC Servette.
  • HomeMatch: binary variable indicating if HC Servette played a home or a away match (Home=1; Away=0).
  • Winner: binary variable indicating if HC Servette won or lost (Won=1; Lost=0).
  1. Import the dataset in R and store it on a dataframe called Hockey.

Hockey = read. table ( " path / Hockey. csv " , sep = " ; " , header = TRUE )

  1. We would like to have the variables HomeMatch and Winner as factors. First, verify their type. We can use str() and class() to inspect the nature of the variables.

str ( Hockey ) class ( Hockey $ HomeMatch ) class ( Hockey $ Winner )

  1. Since they are not factors, recode them so that they are (see practical 2).

Supposing that the event of playing a home or away match is random, answer the following questions using the R functions table() and xtabs().

  1. Build a frequency table for the number of goals scored by HC Servette.
  2. Transform the frequencies of the table into proportions.

University of Geneva Statistics

Fall semester Practical 3

(a) What is the probability that HC Servette scores exactly 3 goals?

(b) What is the probability that HC Servette has more than 4 goals scored against them?

  1. Build a contingency table to represent the amount of victories of HC Servette according to whether they played home or away matches.

(a) How many home matches did they lose?

(b) What is the probability that they win a match?

(c) What is the probability that they lose an away match?

(d) What is the probability that they win a match knowing that it is a home match?

Bonus: What would the expected number of away defeats be if winning or losing were independent of the fact of playing a home or an away match?

2 Exercise 2: Barplot/Histogram for Discrete Numerical Data

For discrete numerical data, we can either use the function hist() or barplot(), but with some care.

  1. Compare the two following histograms hist ( Hockey $ ScoredGoal ) hist ( Hockey $ ScoredGoal , breaks = -0.5:8.5)
  2. What do you observe? Which graphical representation is correct? Why?

We can alternatively use the function barplot (designed for categorical variables).

  1. Compare the two following histograms barplot ( table ( factor ( Hockey $ ScoredGoal ))) barplot ( table ( factor ( Hockey $ ScoredGoal , levels =0:8)))

    Look at the helpfile of factor for the definition of levels

  2. What do you observe? Which graphical representation is correct? Why?

For the correct plot,

  1. Insert the title “Goals scored by HC Servette”.
  2. Modify the R command to obtain a horizontal barplot.
  3. What kind of probability distribution could this data represent? Hint: the variable counts the number of events within a given time-frame. How could you check your hypothesis graphically?