Lock5Data.pdf, Lecture notes of Statistics

Data from an in-class survey of statistics students asking about amount of exercise, TV viewing, handedness, sex, pulse rate, and number of body ...

Typology: Lecture notes

2021/2022

Uploaded on 08/05/2022

dirk88
dirk88 🇧🇪

4.4

(222)

3.1K documents

1 / 148

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Package ‘Lock5Data’
July 23, 2021
Title Datasets for ``Statistics: UnLocking the Power of Data''
Version 3.0.0
Maintainer Robin Lock <[email protected]>
Description Datasets for the third edition of ``Statistics: Unlocking the Power of Data'' by Lock^5
Includes version of datasets from earlier editions.
Depends R (>= 3.5.0)
License GPL-2
Encoding UTF-8
LazyData true
RoxygenNote 7.1.1
NeedsCompilation no
Author Robin Lock [aut, cre]
Repository CRAN
Date/Publication 2021-07-22 22:40:10 UTC
Rtopics documented:
Lock5Data-package..................................... 5
ACS............................................. 6
ACS2010 .......................................... 7
AllCountries......................................... 8
AllCountries1e ....................................... 9
AllCountries2e ....................................... 10
APMultipleChoice ..................................... 11
April14Temps........................................ 11
April14Temps1e ...................................... 12
April14Temps2e ...................................... 12
BaseballHits1e ....................................... 13
BaseballHits2014...................................... 14
BaseballHits2019...................................... 14
BaseballSalaries2015.................................... 15
BaseballSalaries2019.................................... 16
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Partial preview of the text

Download Lock5Data.pdf and more Lecture notes Statistics in PDF only on Docsity!

Package ‘Lock5Data’

July 23, 2021

Title Datasets for ``Statistics: UnLocking the Power of Data''

Version 3.0.

Maintainer Robin Lock

Description Datasets for the third edition of ``Statistics: Unlocking the Power of Data'' by Lock^ Includes version of datasets from earlier editions.

Depends R (>= 3.5.0)

License GPL-

Encoding UTF-

LazyData true

RoxygenNote 7.1.

NeedsCompilation no

Author Robin Lock [aut, cre]

Repository CRAN

Date/Publication 2021-07-22 22:40:10 UTC

R topics documented:

Lock5Data-package..................................... 5 ACS............................................. 6 ACS2010.......................................... 7 AllCountries......................................... 8 AllCountries1e....................................... 9 AllCountries2e....................................... 10 APMultipleChoice..................................... 11 April14Temps........................................ 11 April14Temps1e...................................... 12 April14Temps2e...................................... 12 BaseballHits1e....................................... 13 BaseballHits2014...................................... 14 BaseballHits2019...................................... 14 BaseballSalaries2015.................................... 15 BaseballSalaries2019.................................... 16

2 R topics documented:

Lock5Data-package Lock5 Datasets

  • BaseballTimes
  • Benford
  • BikeCommute
  • BodyFat
  • BodyTemp50
  • BootAtlantaCorr
  • CaffeineTaps
  • CAOSExam
  • CarbonDioxide
  • CarbonDioxide2e
  • CarDepreciation
  • Cars2015
  • Cars2020
  • Cereal
  • CityTemps
  • CityTemps2e
  • CocaineTreatment
  • ColaCalcium
  • CollegeScores
  • CollegeScores2yr
  • CollegeScores4yr
  • CommuteAtlanta
  • CommuteStLouis
  • CompassionateRats
  • CricketChirps
  • DDS
  • DecemberFlights
  • DecemberFlights2e
  • DietDepression
  • Digits
  • DogOwner
  • DrugResistance
  • EducationLiteracy
  • EducationLiteracy2e
  • ElectionMargin
  • EmployedACS
  • EmployedACS2010
  • ExerciseHours
  • FacebookFriends
  • FatMice18
  • FireAnts
  • FisherIris
  • FishGills12
  • FishGills3
  • Flight179
  • Flight433
  • Flight433_2e
  • FloridaLakes
  • R topics documented:
  • FootballBrain
  • ForestFires
  • GeneticDiversity
  • GlobalInternet2010
  • GlobalInternet2019
  • GolfRound
  • GPAbySex
  • GSWarriors2016
  • GSWarriors2019
  • HappyPlanetIndex
  • HeatCognition
  • HeightData
  • HockeyPenalties2011
  • HockeyPenalties2019
  • HollywoodMovies
  • HollywoodMovies2011
  • HollywoodMovies2013
  • HomesForSale
  • HomesForSale2e
  • HomesForSaleCA
  • HomesForSaleCA2e
  • HomesForSaleCanton
  • HomesForSaleCanton2e
  • HomesForSaleNY
  • HomesForSaleNY2e
  • HomingPigeons
  • Honeybee
  • HoneybeeCircuits
  • HoneybeeWaggle
  • HotDogs1e
  • HotDogs2015
  • HotDogs2019
  • HouseStarts2015
  • HouseStarts2018
  • HumanTears25
  • HumanTears50
  • Hurricanes2014
  • Hurricanes2018
  • ICUAdmissions
  • ImmuneTea
  • InkjetPrinters
  • LifeExpectancyVehicles
  • LifeExpectancyVehicles1e
  • LifeExpectancyVehicles2e
  • LightatNight
  • LightatNight4Weeks
  • LightatNight8Weeks
  • MalevolentUniformsNFL
  • MalevolentUniformsNHL 4 R topics documented:
  • MammalLongevity
  • ManhattanApartments
  • ManhattanApartments2011
  • MarriageAges
  • MastersGolf
  • MateChoice
  • MentalMuscle
  • MiamiHeat
  • MindsetMatters
  • MustangPrice
  • NBAPlayers2011
  • NBAPlayers2015
  • NBAPlayers2019
  • NBAStandings2011
  • NBAStandings2016
  • NBAStandings2019
  • NFLContracts2015
  • NFLContracts2019
  • NFLPreSeason2014
  • NFLPreseason2019
  • NFLScores2011
  • NFLScores2018
  • NHANES
  • NutritionStudy
  • OlympicMarathon2008
  • OlympicMarathon2012
  • OlympicMarathon2016
  • OrganicEffect
  • OttawaSenators
  • OttawaSenators2010
  • OttawaSenators2019
  • PASeniors
  • PizzaGirl
  • PumpkinBeer
  • QuizPulse10
  • RandomP50N200
  • RestaurantTips
  • RetailSales
  • RetailSales2011
  • RockandRoll2012
  • RockandRoll2015
  • RockandRoll2019
  • SalaryGender
  • SampColleges
  • SampColleges2yr
  • SampColleges4yr
  • SampCountries
  • Lock5Data-package
  • SampCountries1e
  • SampCountries2e
  • SandP500
  • SandP5001e
  • SandP5002e
  • SandwichAnts
  • SandwichAnts2
  • SkateboardPrices
  • SleepCaffeine
  • SleepStudy
  • Smiles
  • SpeedDating
  • SplitBill
  • StatGrades
  • StockChanges
  • StorySpoilers
  • StressedMice
  • StudentSurvey
  • SynchronizedMovement
  • TenCountries
  • TenCountries1e
  • TenCountries2e
  • TextbookCosts
  • ToenailArsenic
  • TrafficFlow
  • USStates
  • USStates1e
  • USStates2e
  • WaterStriders
  • WaterTaste
  • Wetsuits
  • YoungBlood
  • Index
  • Datasets for first, second, and third editions of Statistics: Unlocking the Power of Data by Lock^ Description

ACS2010 7

Source

The full public dataset can be downloaded at https://www.census.gov/programs-surveys/ acs/microdata.html, and the full list of variables are at https://www.census.gov/programs-surveys/ acs/microdata/documentation.html

ACS2010 American Community Survey - 2010

Description

Data from a sample of individuals in the 2010 American Community Survey

Format

A dataset with 1000 observations on the following 9 variables.

Sex 0=female and 1=male Age Age (years) Married 0=not married and 1=married Income Wages and salary for the past 12 months (in $1,000’s) HoursWk Hours of work per week Race asian, black, white, or other USCitizen 1=citizen and 0=noncitizen HealthInsurance 1=have health insurance and 0=no health insurance Language 1=native English speaker and 0=other

Details

The American Community Survey, administered by the US Census Bureau, is given every year to a random sample of about 3.5 million households (about 3% of all US households). Data on a random sample of 1% of all US residents are made public (after ensuring anonymity), and we have selected a random sub-sample of n = 1000 from the 2010 data for this dataset.

** From 2e - dataset has been updated for 3e **

Source

The full public dataset can be downloaded at http://www.census.gov/acs/www/data documentation/pums data/, and the full list of variables are at http://www.census.gov/acs/www/Downloads/data documentation/pums/DataDict/PUMSDataDict10.pdf.

8 AllCountries

AllCountries All Countries

Description

Data on the countries of the world

Format

A data frame with 217 observations on the following 26 variables.

Country Country name Code Three-letter code for country LandArea Size in 1000 sq. km. Population Population in millions Density Number of people per square kilometer GDP Gross Domestic Product (in $US) per capita Rural Percentage of population living in rural areas CO2 CO2 emissions (metric tons per capita) PumpPrice Price for a liter of gasoline ($US) Military Percentage of government expenditures directed toward the military Health Percentage of government expenditures directed towards healthcare ArmedForces Number of active duty military personnel (in 1,000’s) Internet Percentage of the population with access to the internet Cell Cell phone subscriptions (per 100 people) HIV Percentage of the population with HIV Hunger Percent of the population considered undernourished Diabetes Percent of the population diagnosed with diabetes BirthRate Births per 1000 people DeathRate Deaths per 1000 people ElderlyPop Percentage of the population at least 65 years old LifeExpectancy Average life expectancy (years) FemaleLabor Percent of females 15 - 64 in the labor force Unemployment Percent of labor force unemployed Energy Kilotons of oil equivalent Electricity Electric power consumption (kWh per capita) Developed Categories for kilowatt hours per capita, 1= under 2500, 2=2500 to 5000, 3=over 5000

10 AllCountries2e

Source

Data collected from the World Bank website, worldbank.org.

AllCountries2e AllCountries - 2e

Description

Data on the countries of the world

Format

A dataset with 215 observations on the following 25 variables.

Country Name of the country LandArea Size in 1000 sq. kilometers Population Population in millions Density Number of people per square kilometer GDP Gross Domestic Product (in $US) per capita Rural Percentage of population living in rural areas CO2 CO2 emissions (metric tons per capita) PumpPrice Price for a liter of gasoline ($US) Military Percentage of government expenditures directed toward the military Health Percentage of government expenditures directed towards healthcare ArmedForces Number of active duty military personnel (in 1,000’s) Internet Percentage of the population with access to the internet Cell Cell phone subscriptions (per 100 people) HIV Percentage of the population with HIV Hunger Percent of the population considered undernourished Diabetes Percent of the population diagnosed with diabetes BirthRate Births per 1000 people DeathRate Deaths per 1000 people ElderlyPop Percentage of the population at least 65 years old LifeExpectancy Average life expectancy (years) FemaleLabor Percent of females 15 - 64 in the labor force Unemployment Percent of labor force unemployed Energy Energy usage (kilotons of oil equivalent) Electricity Electric power consumption (kWh per capita) Developed Categories for kilowatt hours per capita, 1= under 2500, 2=2500 to 5000, 3=over 5000

Details

Data for each variable were collected for years between 2012 and 2014. Within a variable all country measurements are from the same year, but the year may vary between different variables depending on availability. ** From 2e - dataset has been updated for 3e **

April14Temps 11

Source

Data collected from the World Bank website, worldbank.org.

APMultipleChoice AP Multiple Choice

Description

Correct responses on Advanced Placement multiple choice exams

Format

A dataset with 400 observations on the following variable.

Answer Correct response: A, B, C, D, or E

Details

Correct responses from multiple choice sections for a sample of released Advanced Placement exams

Source

Sample exams from several disciplines at http://apcentral.collegeboard.com

April14Temps April 14th Temperatures

Description

Temperatures in Des Moines, IA and San Francisco, CA on April 14th

Format

A data frame with 25 observations on the following 3 variables.

Year 1995 to 2019 DesMoines Temperature in Des Moines (degrees F) SanFrancisco Temperature in San Francisco (degrees F)

Details

Average temperature for the day of April 14th in each of 25 years from 1995- ** Data set updated for 3e (earlier versions are now April14Temps1e and April14Temps2e) **

BaseballHits1e 13

Details

Average temperature for the day of April 14th in each of 21 years from 1995- ** From 2e - dataset has been updated for 3e **

Source

The University of Dayton Average Daily Temperature Archive at http://academic.udayton.edu/kissock/http/Weather/citylistUS.htm

BaseballHits1e Baseball Hits

Description

Number of hits, wins, and other stats for MLB teams - 2011

Format

A dataset with 30 observations on the following 14 variables.

Team Name of baseball team League Either American AL or National NL League Wins Number of wins for the season Runs Number of runs scored Hits Number of hits Doubles Number of doubles Triples Number of triples HomeRuns Number of home runs RBI Number of runs batted in StolenBases Number of stolen bases CaughtStealing Number of times caught stealing Walks Number of walks Strikeouts Number of strikeouts BattingAvg Team batting average

Details

Data from the 2010 Major League Baseball regular season. ** From 1e - dataset has been updated for 2e **

Source

http://www.baseball-reference.com/leagues/MLB/2011-standard-batting.shtml

14 BaseballHits

BaseballHits2014 Baseball Hits - 2014

Description

Number of hits, wins, and other stats for MLB teams - 2014

Format

A dataset with 30 observations on the following 14 variables.

Team Name of baseball team (3-character code) League Either AL or NL Wins Number of wins for the season Runs Number of runs scored Hits Number of hits Doubles Number of doubles Triples Number of triples HomeRuns Number of home runs RBI Number of runs batted in StolenBases Number of stolen bases CaughtStealing Number of times caught stealing Walks Number of walks Strikeouts Number of strikeouts BattingAvg Team batting average

Details

Data from the 2014 Major League Baseball regular season. ** From 2e - dataset has been updated for 3e **

Source

http://www.baseball-reference.com/leagues/MLB/2014-standard-batting.shtml

BaseballHits2019 Baseball Team Statistics (2019)

Description

Number of hits, wins, and other stats for MLB teams in 2019

16 BaseballTimes

Details

Yearly salary (in millions of dollars) for all players on the rosters of Major League Baseball teams at the start of the 2015 season. ** From 2e - dataset has been updated for 3e **

Source

http://www.usatoday.com/sports/mlb/salaries

BaseballSalaries2019 MLB Player Salaries in 2019

Description

Opening Day salaries for all Major League Baseball players in 2019

Format

A data frame with 877 observations on the following 4 variables.

Name Player’s name Salary 2019 season salary (in millions) Team Abbreviated team name POS Code for player’s main position

Details

Yearly salary (in millions of dollars) for all players on the rosters of Major League Baseball teams at the start of the 2019 season. ** Updated for 3e (earlier version for 2015 is at BaseballSalaries2015). **

Source

https://databases.usatoday.com/mlb-salaries/

BaseballTimes Baseball Game Times

Description

Information for a sample of 30 Major League Baseball games played during the 2011 season

Format

A dataset with 30 observations on the following 9 variables.

Benford 17

Away Away team name Home Home team name Runs Total runs scored (both teams) Margin Margin of victory Hits Total number of hits (both teams) Errors Total number of errors (both teams) Pitchers Total number of pitchers used (both teams) Walks Total number of walks (both teams) Time Elapsed time for game (in minutes)

Details

Data from a sample of boxscores for Major League Baseball games played in August 2011.

Source

http://www.baseball-reference.com/boxes/2011.shtml

Benford Benford data

Description

Two examples to test Benford’s Law

Format

A dataset with 9 observations on the following 4 variables.

Digit Leading digit (1-9) BenfordP Expected proportion according to Benford’s law Address Frequency as a first digit in an address Invoices Frequency as the first digit in invoice amounts

Details

Leading digits from 1188 addresses sampled from a phone book and 7273 amounts from invoices sampled at a company.

Source

Thanks to Prof. Richard Cleary for providing the data

BodyTemp50 19

Bodyfat Percent body fat Age Age in years Weight Weight in pounds Height Height in inches Neck Neck circumference in cm. Chest Chest circumference in cm. Abdomen Abdomen circumference in cm. Ankle Ankle circumference in cm. Biceps Extended biceps circumference in cm. Wrist Wrist circumference in cm.

Details

This is a subset of a larger sample of men who each had a percent body fat estimated by an un- derwater weighing technique. Other measurements were taken to see how they might be used to predict the body fat percentage.

Source

These data were contributed by Roger Johnson, then at Carleton University, to the Datasets Archive at the Journal of Statistics Education. https://ww2.amstat.org/publications/jse/v4n1/datasets.johnson.html The data were originally supplied by Dr. A. Garth Fisher, Human Performance Research Center, Brigham Young University, Provo, Utah 84602.

BodyTemp50 Body Temperatures

Description

Sample of 50 body temperatures

Format

A data frame with 50 observations on the following 3 variables.

BodyTemp Body temperature in degrees F Pulse Pulse rates (beat per minute) Sex F=Female, M=Male

Details

Body temperatures and pulse rates for a sample of 50 healthy adults. Note the Sex variable was la- beled as Gender in earlier versions of this dataset. We acknowledge that this binary dichotomization is not a complete or inclusive representation of reality.

20 CaffeineTaps

Source

Shoemaker, "What’s Normal: Temperature, Gender and Heartrate", Journal of Statistics Education, Vol. 4, No. 2 (1996) http://jse.amstat.org/v4n2/datasets.shoemaker.html

BootAtlantaCorr Bootstrap Correlations for Atlanta Commutes

Description

Bootstrap correlations between Time and Distance for 500 commuters in Atlanta

Format

A dataset with 1000 observations on the following variable.

CorrTimeDist Correlation between Time and Distance for a bootstrap sample of Atlanta commuters

Details

Correlations for bootstrap samples of Time vs. Distance for the data on Atlanta commuters in CommuteAtlanta.

Source

Computer simulation

CaffeineTaps Caffeine Taps

Description

Finger tap rates with and without caffeine

Format

A dataset with 20 observations on the following 2 variables.

Taps Number of finger taps in one minute Group Treatment with levels Caffeine NoCaffeine