




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Data from an in-class survey of statistics students asking about amount of exercise, TV viewing, handedness, sex, pulse rate, and number of body ...
Typology: Lecture notes
1 / 148
This page cannot be seen from the preview
Don't miss anything!





























































































Title Datasets for ``Statistics: UnLocking the Power of Data''
Version 3.0.
Maintainer Robin Lock
Description Datasets for the third edition of ``Statistics: Unlocking the Power of Data'' by Lock^ Includes version of datasets from earlier editions.
Depends R (>= 3.5.0)
License GPL-
Encoding UTF-
LazyData true
RoxygenNote 7.1.
NeedsCompilation no
Author Robin Lock [aut, cre]
Repository CRAN
Date/Publication 2021-07-22 22:40:10 UTC
Lock5Data-package..................................... 5 ACS............................................. 6 ACS2010.......................................... 7 AllCountries......................................... 8 AllCountries1e....................................... 9 AllCountries2e....................................... 10 APMultipleChoice..................................... 11 April14Temps........................................ 11 April14Temps1e...................................... 12 April14Temps2e...................................... 12 BaseballHits1e....................................... 13 BaseballHits2014...................................... 14 BaseballHits2019...................................... 14 BaseballSalaries2015.................................... 15 BaseballSalaries2019.................................... 16
Lock5Data-package Lock5 Datasets
Source
The full public dataset can be downloaded at https://www.census.gov/programs-surveys/ acs/microdata.html, and the full list of variables are at https://www.census.gov/programs-surveys/ acs/microdata/documentation.html
ACS2010 American Community Survey - 2010
Description
Data from a sample of individuals in the 2010 American Community Survey
Format
A dataset with 1000 observations on the following 9 variables.
Sex 0=female and 1=male Age Age (years) Married 0=not married and 1=married Income Wages and salary for the past 12 months (in $1,000’s) HoursWk Hours of work per week Race asian, black, white, or other USCitizen 1=citizen and 0=noncitizen HealthInsurance 1=have health insurance and 0=no health insurance Language 1=native English speaker and 0=other
Details
The American Community Survey, administered by the US Census Bureau, is given every year to a random sample of about 3.5 million households (about 3% of all US households). Data on a random sample of 1% of all US residents are made public (after ensuring anonymity), and we have selected a random sub-sample of n = 1000 from the 2010 data for this dataset.
** From 2e - dataset has been updated for 3e **
Source
The full public dataset can be downloaded at http://www.census.gov/acs/www/data documentation/pums data/, and the full list of variables are at http://www.census.gov/acs/www/Downloads/data documentation/pums/DataDict/PUMSDataDict10.pdf.
8 AllCountries
AllCountries All Countries
Description
Data on the countries of the world
Format
A data frame with 217 observations on the following 26 variables.
Country Country name Code Three-letter code for country LandArea Size in 1000 sq. km. Population Population in millions Density Number of people per square kilometer GDP Gross Domestic Product (in $US) per capita Rural Percentage of population living in rural areas CO2 CO2 emissions (metric tons per capita) PumpPrice Price for a liter of gasoline ($US) Military Percentage of government expenditures directed toward the military Health Percentage of government expenditures directed towards healthcare ArmedForces Number of active duty military personnel (in 1,000’s) Internet Percentage of the population with access to the internet Cell Cell phone subscriptions (per 100 people) HIV Percentage of the population with HIV Hunger Percent of the population considered undernourished Diabetes Percent of the population diagnosed with diabetes BirthRate Births per 1000 people DeathRate Deaths per 1000 people ElderlyPop Percentage of the population at least 65 years old LifeExpectancy Average life expectancy (years) FemaleLabor Percent of females 15 - 64 in the labor force Unemployment Percent of labor force unemployed Energy Kilotons of oil equivalent Electricity Electric power consumption (kWh per capita) Developed Categories for kilowatt hours per capita, 1= under 2500, 2=2500 to 5000, 3=over 5000
10 AllCountries2e
Source
Data collected from the World Bank website, worldbank.org.
AllCountries2e AllCountries - 2e
Description
Data on the countries of the world
Format
A dataset with 215 observations on the following 25 variables.
Country Name of the country LandArea Size in 1000 sq. kilometers Population Population in millions Density Number of people per square kilometer GDP Gross Domestic Product (in $US) per capita Rural Percentage of population living in rural areas CO2 CO2 emissions (metric tons per capita) PumpPrice Price for a liter of gasoline ($US) Military Percentage of government expenditures directed toward the military Health Percentage of government expenditures directed towards healthcare ArmedForces Number of active duty military personnel (in 1,000’s) Internet Percentage of the population with access to the internet Cell Cell phone subscriptions (per 100 people) HIV Percentage of the population with HIV Hunger Percent of the population considered undernourished Diabetes Percent of the population diagnosed with diabetes BirthRate Births per 1000 people DeathRate Deaths per 1000 people ElderlyPop Percentage of the population at least 65 years old LifeExpectancy Average life expectancy (years) FemaleLabor Percent of females 15 - 64 in the labor force Unemployment Percent of labor force unemployed Energy Energy usage (kilotons of oil equivalent) Electricity Electric power consumption (kWh per capita) Developed Categories for kilowatt hours per capita, 1= under 2500, 2=2500 to 5000, 3=over 5000
Details
Data for each variable were collected for years between 2012 and 2014. Within a variable all country measurements are from the same year, but the year may vary between different variables depending on availability. ** From 2e - dataset has been updated for 3e **
April14Temps 11
Source
Data collected from the World Bank website, worldbank.org.
APMultipleChoice AP Multiple Choice
Description
Correct responses on Advanced Placement multiple choice exams
Format
A dataset with 400 observations on the following variable.
Answer Correct response: A, B, C, D, or E
Details
Correct responses from multiple choice sections for a sample of released Advanced Placement exams
Source
Sample exams from several disciplines at http://apcentral.collegeboard.com
April14Temps April 14th Temperatures
Description
Temperatures in Des Moines, IA and San Francisco, CA on April 14th
Format
A data frame with 25 observations on the following 3 variables.
Year 1995 to 2019 DesMoines Temperature in Des Moines (degrees F) SanFrancisco Temperature in San Francisco (degrees F)
Details
Average temperature for the day of April 14th in each of 25 years from 1995- ** Data set updated for 3e (earlier versions are now April14Temps1e and April14Temps2e) **
BaseballHits1e 13
Details
Average temperature for the day of April 14th in each of 21 years from 1995- ** From 2e - dataset has been updated for 3e **
Source
The University of Dayton Average Daily Temperature Archive at http://academic.udayton.edu/kissock/http/Weather/citylistUS.htm
BaseballHits1e Baseball Hits
Description
Number of hits, wins, and other stats for MLB teams - 2011
Format
A dataset with 30 observations on the following 14 variables.
Team Name of baseball team League Either American AL or National NL League Wins Number of wins for the season Runs Number of runs scored Hits Number of hits Doubles Number of doubles Triples Number of triples HomeRuns Number of home runs RBI Number of runs batted in StolenBases Number of stolen bases CaughtStealing Number of times caught stealing Walks Number of walks Strikeouts Number of strikeouts BattingAvg Team batting average
Details
Data from the 2010 Major League Baseball regular season. ** From 1e - dataset has been updated for 2e **
Source
http://www.baseball-reference.com/leagues/MLB/2011-standard-batting.shtml
14 BaseballHits
BaseballHits2014 Baseball Hits - 2014
Description
Number of hits, wins, and other stats for MLB teams - 2014
Format
A dataset with 30 observations on the following 14 variables.
Team Name of baseball team (3-character code) League Either AL or NL Wins Number of wins for the season Runs Number of runs scored Hits Number of hits Doubles Number of doubles Triples Number of triples HomeRuns Number of home runs RBI Number of runs batted in StolenBases Number of stolen bases CaughtStealing Number of times caught stealing Walks Number of walks Strikeouts Number of strikeouts BattingAvg Team batting average
Details
Data from the 2014 Major League Baseball regular season. ** From 2e - dataset has been updated for 3e **
Source
http://www.baseball-reference.com/leagues/MLB/2014-standard-batting.shtml
BaseballHits2019 Baseball Team Statistics (2019)
Description
Number of hits, wins, and other stats for MLB teams in 2019
16 BaseballTimes
Details
Yearly salary (in millions of dollars) for all players on the rosters of Major League Baseball teams at the start of the 2015 season. ** From 2e - dataset has been updated for 3e **
Source
http://www.usatoday.com/sports/mlb/salaries
BaseballSalaries2019 MLB Player Salaries in 2019
Description
Opening Day salaries for all Major League Baseball players in 2019
Format
A data frame with 877 observations on the following 4 variables.
Name Player’s name Salary 2019 season salary (in millions) Team Abbreviated team name POS Code for player’s main position
Details
Yearly salary (in millions of dollars) for all players on the rosters of Major League Baseball teams at the start of the 2019 season. ** Updated for 3e (earlier version for 2015 is at BaseballSalaries2015). **
Source
https://databases.usatoday.com/mlb-salaries/
BaseballTimes Baseball Game Times
Description
Information for a sample of 30 Major League Baseball games played during the 2011 season
Format
A dataset with 30 observations on the following 9 variables.
Benford 17
Away Away team name Home Home team name Runs Total runs scored (both teams) Margin Margin of victory Hits Total number of hits (both teams) Errors Total number of errors (both teams) Pitchers Total number of pitchers used (both teams) Walks Total number of walks (both teams) Time Elapsed time for game (in minutes)
Details
Data from a sample of boxscores for Major League Baseball games played in August 2011.
Source
http://www.baseball-reference.com/boxes/2011.shtml
Benford Benford data
Description
Two examples to test Benford’s Law
Format
A dataset with 9 observations on the following 4 variables.
Digit Leading digit (1-9) BenfordP Expected proportion according to Benford’s law Address Frequency as a first digit in an address Invoices Frequency as the first digit in invoice amounts
Details
Leading digits from 1188 addresses sampled from a phone book and 7273 amounts from invoices sampled at a company.
Source
Thanks to Prof. Richard Cleary for providing the data
BodyTemp50 19
Bodyfat Percent body fat Age Age in years Weight Weight in pounds Height Height in inches Neck Neck circumference in cm. Chest Chest circumference in cm. Abdomen Abdomen circumference in cm. Ankle Ankle circumference in cm. Biceps Extended biceps circumference in cm. Wrist Wrist circumference in cm.
Details
This is a subset of a larger sample of men who each had a percent body fat estimated by an un- derwater weighing technique. Other measurements were taken to see how they might be used to predict the body fat percentage.
Source
These data were contributed by Roger Johnson, then at Carleton University, to the Datasets Archive at the Journal of Statistics Education. https://ww2.amstat.org/publications/jse/v4n1/datasets.johnson.html The data were originally supplied by Dr. A. Garth Fisher, Human Performance Research Center, Brigham Young University, Provo, Utah 84602.
BodyTemp50 Body Temperatures
Description
Sample of 50 body temperatures
Format
A data frame with 50 observations on the following 3 variables.
BodyTemp Body temperature in degrees F Pulse Pulse rates (beat per minute) Sex F=Female, M=Male
Details
Body temperatures and pulse rates for a sample of 50 healthy adults. Note the Sex variable was la- beled as Gender in earlier versions of this dataset. We acknowledge that this binary dichotomization is not a complete or inclusive representation of reality.
20 CaffeineTaps
Source
Shoemaker, "What’s Normal: Temperature, Gender and Heartrate", Journal of Statistics Education, Vol. 4, No. 2 (1996) http://jse.amstat.org/v4n2/datasets.shoemaker.html
BootAtlantaCorr Bootstrap Correlations for Atlanta Commutes
Description
Bootstrap correlations between Time and Distance for 500 commuters in Atlanta
Format
A dataset with 1000 observations on the following variable.
CorrTimeDist Correlation between Time and Distance for a bootstrap sample of Atlanta commuters
Details
Correlations for bootstrap samples of Time vs. Distance for the data on Atlanta commuters in CommuteAtlanta.
Source
Computer simulation
CaffeineTaps Caffeine Taps
Description
Finger tap rates with and without caffeine
Format
A dataset with 20 observations on the following 2 variables.
Taps Number of finger taps in one minute Group Treatment with levels Caffeine NoCaffeine