Analyzing Price Differences: A Scatterplot Study of Grocery Costs in Kauai and Bend, Study notes of Data Mining

Instructions for creating a scatterplot to compare the cost of food items in Kauai and Bend. The data collected includes the cost of various non-sale food items in both locations. The document also discusses the concept of best fit lines and how to use Excel to determine the percentage increase in grocery costs on Kauai compared to Bend.

Typology: Study notes

2021/2022

Uploaded on 08/05/2022

dirk88
dirk88 🇧🇪

4.4

(222)

3.1K documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Correlation and Data Mining
Quiz 1.
A few years ago, when my fam went on a vacation to Kauai, quite a few of my friends told me, “Dude! Bring a
ton of money everything’s twice as expensive there!” So, being me, I decided to check.
The first time we went food shopping on Kauai, I recorded the cost of a bunch of non-sale food items that I
could also find in Bend. Then, when we got home, I did the same here. Please open the attached spreadsheet and
see what I collected!
1. (2 points) Start by creating a scatterplot of these data. Make sure it has labels on the independent and
dependent axes, and a good title. If you need help, remember the “how to make a scatterplot” videos on
the resource page (and the likely 12.4 billion videos on YouTube). Include a screenshot of your
scatterplot as your answer to this question.
That’s a pretty nice positive correlation! Pretty strong, too those points are nicely clumped around what
appears to be line moving diagonally up.
Now, some of you might have learned about “best fit lines” in previous math (or stat) classes. You have to be
careful when you use them (something you’ll learn in later stat classes), but in this case, it’s pretty darned clear that
the data is trending linearly upward.
The problem, though, is which line? I mean, two points determine any line, but which two points?
Many of you may have learned the “connect the first and last data points” method:
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Analyzing Price Differences: A Scatterplot Study of Grocery Costs in Kauai and Bend and more Study notes Data Mining in PDF only on Docsity!

Correlation and Data Mining

Quiz 1.

A few years ago, when my fam went on a vacation to Kauai, quite a few of my friends told me, “Dude! Bring a ton of money – everything’s twice as expensive there!” So, being me, I decided to check. 

The first time we went food shopping on Kauai, I recorded the cost of a bunch of non-sale food items that I could also find in Bend. Then, when we got home, I did the same here. Please open the attached spreadsheet and see what I collected!

  1. ( 2 points ) Start by creating a scatterplot of these data. Make sure it has labels on the independent and dependent axes, and a good title. If you need help, remember the “how to make a scatterplot” videos on the resource page (and the likely 12.4 billion videos on YouTube).  Include a screenshot of your scatterplot as your answer to this question.

That’s a pretty nice positive correlation! Pretty strong, too – those points are nicely clumped around what appears to be line moving diagonally up.

Now, some of you might have learned about “best fit lines” in previous math (or stat) classes. You have to be careful when you use them (something you’ll learn in later stat classes), but in this case, it’s pretty darned clear that the data is trending linearly upward.

The problem, though, is – which line? I mean, two points determine any line, but which two points?

Many of you may have learned the “connect the first and last data points” method:

But, even that isn’t (always) correct^1. So what is correct? Welp, we need to make sure that our line “fits” the data “the best” …which means they it needs to “miss” the data “the least” (if that sounds confusing as all get out, don’t worry – I made a video I‘ll share with ya later that explains it!)

And, even better? Excel has that feature built in! Check this out!

  1. ( 2 points ) Rad! But, hang on – that equation has a fixed cost of about 73 cents. What would that mean , in context of our data?

Well, now – that doesn’t make any sense! If something costs nothing in Bend, it should also cost nothing on Kauai^2! Let’s fix that now! On a Windows machine, right-click on the best-fit line (“trendline”) and then left-click “Format Trendline”. Then, set the intercept to zero:

When you do that, the equation will update and (sensibly) go through the origin. Yay!

(^1) Actually, in 25 years of making these lines, I’ve never seen one go exactly through those two points. (^2) Even though nothing costs “nothing” in Hawai’i. I mean, I guess sunsets are free (except that you need to pay to get to them).

Quiz 2.

There’s a fabulous number in mathematics called the “golden ratio” (or “golden section”). It’s so rad, it even

has its own Greek letter: Φ (that’s “phi”, pronounced “fee”, to you and me.  )

  1. ( 2 points ) Do some Googling and tell me a) what the numerical value of Φ is, and b) also one thing that makes it so rad! (By “rad” I mean, “Why does it get its own cool letter? What properties could it have that possibly warrant that?”)

So, as you may have just discovered, people find Φ in all sorts of places (we’ll have a follow-up quiz next week about it, if you like). But here’s one that stuck with me for years!

(Source: https://edu.glogster.com/glog/golden-ratio-in-the-human-body/1wtj52n00hq?=glogpedia-source)

Now, although some of this stuff borders on numerology, I still find these kinds of ideas fascinating – because they can be tested! So, for a few years, I did in my classes. I had my students record the lengths of their hands (wrist to middle finger) and their forearms (wrists to crook of elbow), and we collected it. Open the spreadsheet that accompanies this exam to see our results!

  1. ( 2 points ) Create a scatterplot for this data! Makes sure your axes are labeled, and give your plot a good title. Include a screenshot of it here.

Now…there seems to be a (fairly loose) positive correlation there.

Well, that makes sense, right? I mean, as our hand size goes up, we’d hope that our forearm size would too. Bu… are our forearms equal to the Golden Ratio times our hands? Let’s see!

In cell D3 , type the formula =C3/B3. Then drag that puppy down so it’s applied to every data pair. This will divide the forearm length by the hand length. If Φ is there, it’ll show up somewhere in this column.

  1. ( 2 points ) Take a look through that column. Do you see anything close to phi? Write a sentence or two! Hint: if you get tired of looking through the list, you can sort the data like this, which might make it easier!

OK, so a couple came “close”. I wouldn’t expect any of them to exactly be equal to phi, since phi is one of those numbers (like π ) that has an infinite decimal representation.

But! We’re in a stat class, not a math class. 😊😊 We don’t care if things are exactly equal, because nothing in the word is ever exactly equal to anything else. We care if two things deviate so much as to be assumed different^3. So, in this case: does our data imply a difference, on average , from the belief that the forearm should be about 1.618 times as long as the hand?

Let’s see! Add a best fit line to your scatterplot (you might have to refer back to Quiz 1 if you don’t know how). Also make sure to set the intercept to zero (if we had a zero-length hand, we’d likely have a zero-length forearm.  ). Also add its equation to the graph so you can see it.

  1. ( 1 point ) If Φ is there, on average, what part of the equation of the line should it be represented by?

OK! So, I see that it’s close to 1.618…but – is it close enough?

Here’s where we look a little ahead. Soon, in class, we’ll talk about things called Margins of Error (MOE). Here’s a

crash course: the only way to see if Φ is in people’s forearms and hands is to measure every single person’s forearms and hands. Since we can’t do that, we measure a sample of their forearms and hands – and then we have to admit that, since we didn’t measure everyone, whatever result we get is a little “off”. The MOE is a number that we add and subtract to whatever our measurement is; the two numbers we get are then an interval between which we think the true measurement lies!

I’ll give ya an example: from the news today:

(Source: https://news.gallup.com/poll/284285/new-high-americans-satisfied-personal-life.aspx)

So, in other words, there is, definitely, a percentage of Americans who are “satisfied with their personal lives.” But, the only way to say “90% of Americans” are is to have asked every single one! Let’s check!

(^3) For example – when you buy 12 ounces of coffee at a shop, it doesn’t have exactly 12 ounces in it. It’s close. But, if it gets too far away

from 12 ounces, you’ll either be upset or stoked. 😊😊

Quiz 3.

Note: this quiz isn’t a correlation nor data mining quiz; it’s just an extension of a topic from last week’s quiz, in case that interested you.

In the last quiz, we took a look at the “Golden Ratio” – a magnificent little number that seems to show up in a ton of places…everywhere from fine art to Romanesco. If you want to do this quiz (and didn’t do Quiz 2 from last week), you might wanna go back up there and take a look first.

You may have learned (in your Googling) that the Golden Ratio gets a symbol: φ. Why? Well, like π and its lesser – known cousin, e , φ tends to show up quite a bit in the world (hence, it gets a cool name). Here are a

couple of cool places φ shows up:

In the Mona Lisa, the ratio of the subject’s face, length to width, is approximately φ. In fact, there are many occurrences of φ in the Mona Lisa – this is just one of them.

In the Parthenon, the length to width ratio, again, is approximately φ.

The nautilus shell’s long side to short side ratio is about φ, as well.

Here’s one of my favorites: In any regular (all sides and angles equal) pentagon, if you draw in the diagonals, you get a five pointed regular star. The lengths of some of those segments formed can be

combined into a ratio of φ…for example, the ratio of

DB to DC is φ.

  1. ( 2 points ) Find two other lengths in that figure

that have, as their ratio, a value of φ. You

might want to start by Googling “golden ratio pentagon” – unless you love doing geometry!

  1. ( 4 points…2 for each ) Do a little more research and tell me how the Golden Ratio is used at Cardas Audio – an Oregon-based audio company!
  2. ( 2 points ) In the Fibonacci Series (1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, etc.), each successive element is formed by

adding the previous two together. What does φ have to do with this series? # googleitup

At this point, you might be wondering where this magical 1.618 came from. It has to do with something in geometry called “similarity”…in a “golden rectangle” (that is, a rectangle whose sides are in the golden ratio), if you cut off a square, the smaller rectangle you leave behind is also golden (and, thereby, similar to the larger one).

( extra 3 points ) The large rectangle shown at left is “golden” – that is, the ratio of its long side to its short side is φ. The small rectangle is also golden, that is the ratio of its long to short side also φ. Walk me through some algebra that shows me where the 1.

(or, exactly,

2 ) arises!