Scatterplots and Correlation: Exercises and Questions, Exams of Statistics

Gary Sandoval is a photographer who is wondering if there is an association between the number of photographs he takes and percent cloud coverage. His record is shown in the scatterplot. How many photographs did he take when the cloud coverage was 10 percent or more? a.) 550 b.) 450 c.) 750 d.) 300 2. This scatterplot shows the performance of a pressure sensor using two variables, pressure and voltage. Page 2 Select the answer choice that accurately describes the data's form, direction, and strength in the scatterplot. a.) Form: Non-Linear Direction: Negative Strength: Strong b.) Form: Non-Linear Direction: Positive Strength: Weak c.) Form: Linear Direction: Negative Strength: Weak d.) Form: Linear Direction: Positive Strength: Moderate Page 3 3. This scatterplot show

Typology: Exams

2023/2024

Available from 08/28/2024

joseh-karis
joseh-karis 🇰🇪

4.7

(7)

735 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Scatterplots and Correlation: Exercises and Questions and more Exams Statistics in PDF only on Docsity!

Gary Sandoval is a photographer who is wondering if there is an association between the number of photographs he takes and percent cloud coverage. His record is shown in the scatterplot.

How many photographs did he take when the cloud coverage was 10 percent or more?

a.)^550

b.)^450

c.) 750

d.) 300

This scatterplot shows the performance of a pressure sensor using two variables, pressure and voltage.

Select the answer choice that accurately describes the data's form, direction, and strength in the scatterplot.

a.) Form: Non-Linear

Direction: Negative

Strength: Strong

b.) Form: Non-Linear

Direction: Positive

Strength: Weak

c.) Form:^ Linear

Direction: Negative

Strength: Weak

d.) Form: Linear

Direction: Positive

Strength: Moderate

Which statement is true regarding correlation?

a.) Correlation is a quantitative measure of the form between two variables, as seen on a scatterplot.

b.) Correlation can be used to determine the direction of the relationship between two variables.

c.) Correlation can only be negative.

d.) Correlation can only be positive.

Sam rolls two dice, one labeled “x” and the other “y.” He rolls each of the dice six times and records the (x, y) measurements as follows:

Roll "x" die "y" die Roll 1 1 4

Roll 2 2 3

Roll 3 2 2

Roll 4 5 2

Roll 5 4 4

Roll 6 6 5

For the "x" die, the mean is 3.3 and the standard deviation is 2.0. For the "y" die, the mean is 3.3 and the standard deviation is 1.2.

Using the formula below or Excel, find the correlation coefficient, r, for this set of outcomes Sam rolled. Answer choices are rounded to the nearest hundredth.

a.) 0.

b.) 0.

c.) 0.

d.) 0.

pressure.

The correlation coefficient, r, is -0.

b.) 71.8% of the variation in pressure can be explained by the voltage.

The correlation coefficient, r, is -0.

c.) 48.5% of the variation in voltage can be explained by the pressure.

The correlation coefficient, r, is -0.

d.) 26.5% of the variation in pressure can be explained by the voltage.

The correlation coefficient, r, is -0.

Using the provided scatterplot, select the correct direction of the blue outlier.

a.) The outlier is in neither the x- nor y- direction.

b.) The outlier is in the y- direction.

c.) The outlier is in the x- direction.

d.) The outlier is in both the x- and y- direction.

Which of the following statements is true?

a.) High correlation does not necessarily imply causation.

b.) Only a correlation equal to 1 implies causation.

c.) A correlation equal to 1 or -1 implies causation.

d.) Only a correlation equal to -1 implies causation.

Stacey finds a scatterplot that shows data for nine schools. It relates the percentage of students receiving free lunches to the percentage of students wearing a bicycle helmet. The plot shows a strong negative correlation.

Stacey recalls that correlation does not imply causation. In this example, Stacey sees that increasing the percentage of free lunches would not cause children to use their bicycle helmets less.

Identify the confounding variable that is causing Stacey's observed association.

a.) School budget

b.) Helmet brands

c.) Parents' annual salary

d.) The number of bikes at each school

Which of the following is NOT a guideline for establishing causality?

a.) Check if the effect is present or absent when the explanatory variable is present or absent.

b.) Check if the effect is present or absent when the response variable is present or absent.

c.) Look for cases where correlation remains while other factors vary.

d.) Perform a randomized, controlled experiment.

For a Biology assignment, Lisa collected data on plant growth of a sunflower every week for 9 weeks. When Lisa first planted the sunflower, it was 10 centimeters tall. The time (in weeks) is plotted against the height (in centimeters) as shown below.

James takes two data points from the weight and feed cost data set to calculate a slope, or average rate of change. A guinea pig weighs 3 pounds and costs $3.50 per week to feed, while a Chihuahua weighs 4. pounds and costs $6.20 per week to feed.

Using weight as the explanatory variable, what is the slope of the line between these two points? Answer choices are rounded to the nearest hundredth.

a.) $0.67 / lb.

b.) $0.36 / lb.

c.) $1.50 / lb.

d.) $2.80 / lb.

A clinic has recorded the age, x, versus weight, y, of many babies for their first 12 months of life, and claim the line of best fit is ŷ = 0.60x + 3.3, where y is in kg, and x is in months.

A new baby, who is 10 months and weighs 10 kg, is added to the clinic records.

What is the residual of the data for this new baby?

a.) 0. kg

b. ) –^0.^4 kg

c.) 0. kg

d. ) – 0. 7 kg

Jaime finished analyzing a set of data with an explanatory variable x and a response variable y.

He finds that the mean and standard deviation for x are 5.43 and 1.12, respectively. The mean and standard deviation for y are 10.32 and 2.69, respectively.

The correlation was found to be 0.893.

Select the correct slope and y-intercept for the least-squares line.

a.) Slope = 2. y-intercept = -

b.) Slope = 0. y-intercept = -1.

c.) Slope = -2. y-intercept = -

d.) Slope = -0. y-intercept = -

This scatterplot shows the performance of a pressure sensor using two variables, pressure and voltage

The equation for the least-squares regression line to this data set is

The predicted value for the voltage for a pressure of 50 MPa is.

a.) 2560 mV

b.) 2582 mV

c.) 2502 mV

d.) 2580 mV

Answer Key

Question Answer

1 c

Concept: Scatterplot Rationale: In order to find the total number greater than 10%, we must add all the values 10% and above.

At 10%, there were 300 photographs. At 11%, there were 200 photographs. At 12%, there were 250 photographs.

So the total is 300 + 200 + 250 = 750 photographs.

2 c

Concept: Describing Scatterplots Rationale: If we look at the data, there is a lot of variation/scatter which means there is a weak or moderate relationship. As pressure goes up, the voltage goes down, so the direction is negative. Finally, there appears to be a relatively linear form since a straight line would capture the data fairly well.

3 d

Concept: Explanatory and Response Variables Rationale: The variable on the vertical axis is the outcome or response, while the horizontal axis is the explanatory variable. So we can note voltage is response and pressure is explanatory.

4 b

Concept: Correlation Rationale: We note that correlation is a measure of the strength and the direction of the linear association between two quantitative variables.

5 a

Concept: Correlation Rationale: In order to get the correlation, we can use the formula

Correlation can be quickly calculated by using Excel. Enter the values and use the function "=CORREL(".

14 c

Concept: Linear Equation Algebra Review Rationale: In order to get slope, we can use the formula:. Using the information provided, the two points are: (3 lb., $3.50) and (4.8 lb., $6.20). We can note that:

15 a

Concept: Residuals Rationale: Recall that to get the residual, we take the actual value - predicted value. So if the actual age of the baby is 10 kg and the resulting actual weight 10 kg, we simply need the predicted weight. Using the regression line, we can say:

The predicted weight is 9.3 kg. So the residual is:

16 a

Concept: Finding the Least-Squares Line Rationale: We first want to get the slope. We can use the formula:

To then get the intercept, we can solve for the y-intercept by using the following formula:

We know the slope, , and we can use the mean of x and the mean of y for the variables and to solve for the y-intercept,.

17 b

Concept: Interpreting Intercept and Slope Rationale: When interpreting the linear slope we generally substitute in a value of 1. So we can note that in general, as x increases by 1 unit, the slope tells us how the outcome changes. So for this equation, we can note that as x (height) increases by 1 inch, the outcome (weight) will increase by 1.2 kg on average.

18 c

Concept: Predictions from Best-Fit Lines Rationale: In order to get the predicted voltage when the pressure is 50 MPA, we simply substitute the value 50 in our equation for x. So we can note that: