Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Introduction Data Science Using R, Essays (university) of Economics

Iraq University College (IUC)Economics

Introduction Data Science Using R

Typology: Essays (university)

2017/2018

Uploaded on 03/27/2018

sam-king 🇮🇶

1 document

1 / 18

This page cannot be seen from the preview

Don't miss anything!

Data Science: Key Concepts

CHAPTER 1CHAPTER 1

CHAPTER 1

In this chapter we will also look at the five disruptions that are caused in the

market place by data science. Once the context and its importance is

understood it’s easy to simplify and demonstrate what data science actually

is. We will also study traditional architecture versus Data science and

understand the importance of Signal detection, which we shall study as

chapter 2 and the machine learning techniques that help with this signal

detection is studied from chapter 8 onwards, although we have covered

few machine learning concepts in this chapter. This chapter shall also discuss

solution architecture and the three critical components that are required for

any solution.

FIVE DISRUPTIVE PRODUCTS

The five quick disruptive products launched in the market place will be

discussed now:

1. A very simple Japanese App

2. Healthcare App

3. Coursera

4. Sensory device in Agriculture Sector

5. Autonomous Car

THE JAPANESE APP

The first one is a very simple Japanese app, which essentially helps two

people to discover each other. Essentially, what the App does is, for every

individual a set of questions has to be answered. When these questions are

answered it gives a characteristics score that tells if the person likes music,

books, viewpoints on philosophy, religion etc. Whatever the parameters are,

the questions have to be answered and each person gets a score attached

to each question answered.

11

1

Discover Essays (university) of Economics Iraq University College (IUC)

Partial preview of the text

Download Introduction Data Science Using R and more Essays (university) Economics in PDF only on Docsity!

Data Science: Key Concepts

CHAPTER 1CHAPTER 1CHAPTER 1CHAPTER 1CHAPTER 1

In this chapter we will also look at the five disruptions that are caused in the market place by data science. Once the context and its importance is understood it’s easy to simplify and demonstrate what data science actually is. We will also study traditional architecture versus Data science and understand the importance of Signal detection, which we shall study as chapter 2 and the machine learning techniques that help with this signal detection is studied from chapter 8 onwards, although we have covered few machine learning concepts in this chapter. This chapter shall also discuss solution architecture and the three critical components that are required for any solution.

FIVE DISRUPTIVE PRODUCTS

The five quick disruptive products launched in the market place will be discussed now:

1. A very simple Japanese App 2. Healthcare App 3. Coursera 4. Sensory device in Agriculture Sector 5. Autonomous Car

THE JAPANESE APP

The first one is a very simple Japanese app, which essentially helps two people to discover each other. Essentially, what the App does is, for every individual a set of questions has to be answered. When these questions are answered it gives a characteristics score that tells if the person likes music, books, viewpoints on philosophy, religion etc. Whatever the parameters are, the questions have to be answered and each person gets a score attached to each question answered.

11111

22222 Introduction to Data Science Using ‘R’

The other score that is attached to this device is the location. If a device is carried while walking on the street it will tell how many people with similar scores are around you within a 1 km radius. This app will enable strangers to look up at one another and have coffee, chat or get to know one another better. Using similarity score and location they are able to discover one another. Disruption: An app that leveraged and capitalized on new social norms of today’s casual meetups. Revolutionized the way people find others with similar taste/interests. Usage of data to find patterns and clusters from humongous set of entries and present to the users in a meaningful way, which is ‘right match’ in this case. Turning Data to Insights.

FIGURE 1.1 Japanese dating app

THE HEALTHCARE APP

The second one is in the healthcare space. In this healthcare app a heart implant is able to communicate information such as rate of heartbeat, condition of heart in real time with your mobile phone. The mobile app also communicates remotely to the doctor. Disruption: Reduction in visits to the clinic, reduction in non-medical costs. Continuous monitoring of organ health vs. one time data captured during the physician visit. Presents an opportunity to track patterns and higher chance of identifying an anomaly and hence act early/on time.

44444 Introduction to Data Science Using ‘R’

to continuously improve the quality of the content delivered by collecting data on focus/topics of interest from thousands of students from across the world. By redesigning UX, and fine tuning content COURSERA disrupted the way online education was delivered by its predecessors like Khanacademy, MIT OCW, etc.

SENSORY DEVICE IN AGRICULTURE SECTOR

Fourth, disruptive product is in the Agriculture sector. Netherlands agriculture is a big part of their economy. They make the worlds best cheese and butter. One of the problems farmers face there is understanding the health of cows, which are carrying. Therefore now they have attached a sensory device to the cow’s ears, through which farmers can remotely (communicated via a satellite), monitor their cow’s health. Disruption: Livestock farming techniques and the sensors help with cattle health monitoring and action can be taken immediately if the cattle are unwell. This helps within time detection of disease and helps prevention of spread of disease to the other cows through prediction.

FIGURE 1.4 Sensored cows in Netherland

AUTONOMOUS CAR

Lastly, the autonomous car, an autonomous car is special in that the car moves without a driver. This device tracks and scans the surroundings of the car at high speeds. It has the intelligence to process all kinds of real- time information and communicates it back to the steering wheel.

CHAPTER 1 Data Science: Key Concepts 55555

Disruption: Processing data from images and supplementary sensors, self- driving cars create a virtual world through which they navigate. By reducing the reaction time by millions of folds than human level, they aim to eliminate human error driven accidents and traffic congestions. Significant improvement in time and fuel efficiency whilst saving lives.

FIGURE 1.5 Googles autonomous car

A look at all the five uses shows one thing that is common to all of these and that is a data product which is working behind the scenes, very silently humming. To create a data product a data science process is needed, which will unlearn patterns from that data and create a bigger product. So in the five examples that happen in our everyday like how our heath gets taken care of, how we learn, how we fall in love, how we farm and how we drive, all of these are touched increasingly by data products. Data science needs to be an integral part of any organization you consider, else there is a very high probability that you will lose the market place. One of the biggest secrets of winners is that they are able to see patterns faster. So a core team, which uses data science techniques to process all the structured, unstructured data and looks at patterns around it and acts on it in real time is what most companies are aiming at today.

DATA SCIENCE Vs TRADITIONAL METHODS

It’s similar to an iceberg floating on water. Most organizations just see the tip of the iceberg. For example they just know how much sales is happening.

CHAPTER 1 Data Science: Key Concepts 77777

DEMYSTIFYING MACHINE LEARNING

The goal of Data scientist is to use data to discover signals that cause changes and which ultimately have an impact on the revenue of the firm. Even for a data scientist, it is humanely impossible to analyze big data. But with the aid of a computer, it can be easily done. Yet, a computer can only compute what has been programmed into it. So how do data scientists cope with this scenario, where analysis of the data will require the computer to pick up the ‘trends’ on its own? This is where machine learning comes in. Machine Learning is a remarkable application of artificial intelligence that enables computing systems to perform tasks through a process of “self- learning” without their being specifically programmed for the same. As data scientists cannot pinpoint exactly what sorts of patterns, the computer should recognize, this application of “machine learning comes in extremely handy. Thus, machine learning facilitates the computer to automatically adapt to new patterns and signals in data, while “learning” or recognizing previous trends and data computations. When Google’s search bar uses “auto- complete” before you type in your query, it is an example of machine learning, as the Google server has learnt to give you ‘predictions’ of what you might want to search based on your previous search history. We will now familiarize with five techniques

TECHNIQUE 1: SEGMENTATION

This process involves breaking data into various chunks based on shared characteristics. The analyst then picks the clusters through an iterative process looking for uniqueness between segments. We could segment based on demographic, need based, behavior based etc. The statistical techniques that we use for segmentation are K Means, Hierarchical clustering and Discriminant analysis, as shown in figure 1.7. Some business questions that are answered by segmentation are:

What are the behavioral personas about customer, which lie buried in my raw customer transactions in the database? This is explained in Figure 1.
Which specific customer behavior discriminates a high value segment from low value segment? This is explained in Figure 1.
How do customer behavior segments migrate across time and what does it reveal to us? This is explained in Figure 1. and 1.

88888 Introduction to Data Science Using ‘R’

FIGURE 1.7 A Real ife customer segmentation case study

FIGURE 1.8 Behavioral components considered for fleet card segmentation

FIGURE 1.9 Dimensions of fleet behavior measured and segmented

1 01 01 01 01 0 Introduction to Data Science Using ‘R’

FIGURE 1.12 Segmentation in banking industry

The greater the similarity within a group the better is the cluster. The greater the dissimilarity between groups the cluster is more distinct. One technique of clustering is the k means technique. This technique is used to separate data into the best-suited group based on information the algorithm already has. Once data is separated one has to specify the number of cluster that will be created to be able to produce effective data mining results. Each cluster had a centre point called the centroid, which each observation is assigned to. Associating every observation with the nearest mean creates K clusters. Then one has to calculate the centroid mean for each cluster. This becomes the new mean and the above two steps are repeated till convergence has been reached.

FIGURE 1.13 The matematic behind clustering

CHAPTER 1 Data Science: Key Concepts 1 11 11 11 11 1

TECHNIQUE 2: UNSTRUCTURED TEXT MINING

The second technique is unstructured text mining. Here we use data to discover signals and process changes that create an impact. So lets take a store manager as an example. A store manager gets a lot of feedback and there is a lot of unstructured data that comes in. It’s very important to process this into a structure. Text mining can process all this feedback and give a glimpse of what is called the sentiment analysis as how many people like the store and how many don’t like the store. Another example of text mining is in the health care domain. For example, a patient goes to a doctor and gets admitted to a hospital. He comes in contact with a doctor, lab technicians and the nurse. The Doctor makes a record of the state of his health condition, the Lab technician tests his blood and writes down some inferences and the nurse regularly checks his vital parameters and makes notes of the health condition of the patients. Now all this data can be run through a text mining activity and triangulate the state of the patient by mashing up all the three datasets, the doctors dataset, the lab technicians dataset and the nurse’s dataset.

FIGURE 1.14 Real world unstructured text mining in the health care

Another example of text mining is insurance domain. Auto insurance companies deal with large number of claims every day from collision damage, fire and theft damage, accidental damage. Large amount of time and money is spent in identifying fraudulent claims. One of the datasets used is historical claims, coverage and settlements. A text mining solution

CHAPTER 1 Data Science: Key Concepts 1 31 31 31 31 3

Recommenders are one aspect of data analytics that most of us in some form or the other would have come in contact in our day-to-day activities. When you buy items from a grocery store you could generally get coupons to use for next purchase. When you visit a salon, you could be offered a combo deal with manicure / pedicure along with your hair cut rather than just haircut. These generally happen because the company knows that people who buy a set of products, for e.g., eggs and milk, could buy bread and cheese next and hence recommendation happens via coupons. In the past new comers who have been to the salon have generally tried manicure along with a haircut so when you for go the first time the company markets both services to you.

FIGURE 1.16 Forecasting

FIGURE 1.17 Recommenders

1 41 41 41 41 4 Introduction to Data Science Using ‘R’

Snapshot of Machine Learning Techniques As in the above topic we have just seen few machine learning technique below is a graph that displays a snapshot of six techniques and a brief idea of what falls under each of the techniques. We will study Machine learning as a whole topic in chapter 8.

FIGURE 1.18 Machine learning reference architecture

REFERENCE ARCHITECTURE

A look at the reference architecture for any of these projects has three layers. Machine Learning Reference Architecture is an algorithm that can process raw data to provide a big picture that combines all the major and minor aspects of the data being analyzed. Real time data analysis is carried out using an optimization process,

1. The store layer that captures and stores data: where one can store the data in Hadoop, Hive, Hana or any other database. More important than storing is what is done with the data, how to extract signals from the data and that’s where data science comes in. 2. The Sense Layer: in this layer a text-mining model or a scoring model is used to detect a pattern. Then it mines the collected data from historical trends and patterns that act as reference points, this pattern is then monetized. 3. The Respond Layer: compare previous trends with the latest data collected, to predict an outcome and to recommend the next decision to be taken. This is where the analysis of the data is presented in relatable

1 61 61 61 61 6 Introduction to Data Science Using ‘R’

2. To visualize the data we use the “ggplot” command. On the X axis we have plotted average purchase value and on the Y axis we have plotted average items purchased for two cities Bangalore and Chennai.

on keywords usage frequencies in a visualization format producing some accurate and some not expected findings. At the foundation of each micro gene is a broader genre, like romance, comedy or drama. From here Netflix then has other advanced filters like location, time period, actors to get to the precise category. Since Netflix uses algorithms to cater to its customer base, a human analyzes the films and then its put through a Netflix process, which brings certain patterns. Example micro genes about friendship and marriage were among the most common. He also discovered that Drama, classic were some of Netflix’s widely used adjectives for movies.

HANDS ON SEGMENTATION

You will learn the path to download R in chapter 2, here you can have a quick view of some commands in R specific to clustering technique and its visualization.

1. To segment the dataset we use the “kmeans” command to find clusters. To view the cluster statistics we input the “fit”command.

CHAPTER 1 Data Science: Key Concepts 1 71 71 71 71 7

3. Another way of visualizing is in 3D format as shown in the figure below using the “scatterplot3d” command.

Introduction Data Science Using R, Essays (university) of Economics

Related documents

Partial preview of the text

Download Introduction Data Science Using R and more Essays (university) Economics in PDF only on Docsity!

Data Science: Key Concepts

CHAPTER 1CHAPTER 1CHAPTER 1CHAPTER 1CHAPTER 1

FIVE DISRUPTIVE PRODUCTS

THE JAPANESE APP

THE HEALTHCARE APP

SENSORY DEVICE IN AGRICULTURE SECTOR

AUTONOMOUS CAR

DATA SCIENCE Vs TRADITIONAL METHODS

DEMYSTIFYING MACHINE LEARNING

TECHNIQUE 1: SEGMENTATION

REFERENCE ARCHITECTURE

HANDS ON SEGMENTATION