1641 Business Intelligence Assignment 2, Cheat Sheet of Artificial Intelligence

(Continued from previous scenario) Your next task is to demonstrate to the board of directors about the ability of applying business intelligence in the company's current business processes. To demonstrate BI, you need to prepare a presentation about BI and related tools & techniques and a demonstration on real company dataset. For the presentation, you need: - Explain general concept of what is BI - Introduction to some tools / techniques for BI and their application in general For the demonstration, you need: - A (some) data set(s) extracted from the company's business processes. Explain the dataset. - Show how you pre-process data for later analysis, explain each step and it purpose - Design dashboards to show your analysis on pre-processed data. Explain clearly purpose of dashboards and charts. Suggestions should be made after analysis During the demonstration, you need collect feed-back and comments from users to review how well your dashboards design meet user or busine

Typology: Cheat Sheet

2022/2023

Uploaded on 04/25/2023

jen-nguyen
jen-nguyen 🇻🇳

3 documents

1 / 25

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Higher Nationals in Computing
Unit 14: Business Intelligence
ASSIGNMENT 2
Assessor name: NGUYEN XUAN SAM
Learner’s name: NGUYEN DAN QUE
ID: GCS210143
Class: GCS0905A
Subject code: 1641
Assignment due: 11/03/2023 Assignment submitted: 11/03/2023
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19

Partial preview of the text

Download 1641 Business Intelligence Assignment 2 and more Cheat Sheet Artificial Intelligence in PDF only on Docsity!

Higher Nationals in Computing

Unit 14: Business Intelligence

ASSIGNMENT 2

Assessor name: NGUYEN XUAN SAM

Learner’s name: NGUYEN DAN QUE

ID: GCS

Class: GCS0905A

Subject code: 1641

Assignment due: 11 /03/2023 Assignment submitted: 11 /03/

ASSIGNMENT 1 FRONT SHEET

Qualification BTEC Level 5 HND Diploma in Computing

Unit number

and title

Unit 14: Business Intelligence

Submission

date

Date Received 1st

submission

Re-submission

Date

Date Received 2nd

submission

Student Name Nguyen Dan Que Student ID GCS

Class GCS0905A Assessor name Nguyen Xuan Sam

Student declaration

I certify that the assignment submission is entirely my own work and I fully understand the

consequences of plagiarism. I understand that making a false declaration is a form of malpractice.

Student’s signature

Grading grid

P1 P2 M1 M2 D1 D

ASSIGNMENT 2 BRIEF

Student Name/ID Number Nguyen Dan Que / GCS

Unit Number and Title 14: Business Intelligence

Academic Year 2018

Unit Tutor

Assignment Title Assignment 2: Apply BI tools & techniques and their impact

Issue Date

Submission Date 11/03/

IV Name & Date

Submission Format

Part I: Project submission. This should be a zip / rar folder of your project, including all

necessary files to run your project. There should be a link to your Tableau work on Tableau

Public cloud.

Part II: The submission is in the form of a group written report. This should be written in a

concise, formal business style using single spacing and font size 12. You are required to make

use of headings, paragraphs and subsections as appropriate, and all work must be supported

with research and referenced using the Harvard referencing system. Please also provide a

bibliography using the Harvard referencing system.

Part III: Team needs to present their point of view about how business intelligence tools can

contribute to effective decision-making as well as the legal issues involved in exploiting user

data for business intelligence. You may need to research for specific examples of organizations

that use BI tools to enhance or improve their business and evaluate how they can use BI tools

for extend their target audience and make them more competitive within the market.

Unit Learning Outcomes

LO3 Demonstrate the use of business intelligence tools and technologies

Assignment Brief

(Continued from previous scenario)

Your next task is to demonstrate to the board of directors about the ability of applying business

intelligence in the company's current business processes. To demonstrate BI, you need to

prepare a presentation about BI and related tools & techniques and a demonstration on real

company dataset.

For the presentation, you need:

  • Explain general concept of what is BI
  • Introduction to some tools / techniques for BI and their application in general

For the demonstration, you need:

  • A (some) data set(s) extracted from the company's business processes. Explain the

dataset.

  • Show how you pre-process data for later analysis, explain each step and it purpose
  • Design dashboards to show your analysis on pre-processed data. Explain clearly purpose

of dashboards and charts. Suggestions should be made after analysis

During the demonstration, you need collect feed-back and comments from users to review how

well your dashboards design meet user or business requirement and what customization needed

for future use.

Team needs to present their point of view about how business intelligence tools can contribute

to effective decision-making as well as the legal issues involved in exploiting user data for

business intelligence. You may need to research for specific examples of organizations that use

BI tools to enhance or improve their business and evaluate how they can use BI tools for extend

their target audience and make them more competitive within the market.

To summary, you need to submit a report in PDF includes 4 parts: your presentation, result of

demonstration and review of user feedback, point of view on BI contribution and legal issues.

Assignment 2 answers

1. Introduction

1. 1 Overview of problems

Nowadays, the progress of the economy and human needs are constantly changing, buying a

house is no exception. However, the prices of houses are not stable and are always affected

by many different factors. On the other hand, the needs of home buyers are difficult to meet.

Therefore, the purpose of the model to predict the house price is based on many factors such

as architecture, location, quality, ...

Figure 1: Housing price factors

In machine learning, we can predict the trend of price based on available information, this is

applicable in most of the applications around us such as online shopping, entertainment, ...

Similarly, this is also applied to price prediction.

1. 2 Motivations

With the increasing demand for housing, meeting the needs of customers is also becoming

increasingly difficult. When deciding to buy a house, customers always want to find a house

with a convenient location, reasonable spaciousness, and many other requirements, but

especially, the price of the house must be cheap or at least reasonable. Applying machine

learning to house price prediction will significantly reduce the pressure on real estate. At the

same time, customers will also find it easier to find a suitable home for them at a reasonable

price. In this report, I will predict home prices in King County, United State.

1.3 Objectives

In this report, there will be some objectives that we’re focusing on:

  • How the number of bedrooms affect the price of the house. Will it have much impact

on it?

  • How areas of living house affect the price of the house.
  • How the number of bathrooms affect the price of the house. Will it have much impact

on it?

  • How the grade of the house affect the price of the house.
  • Date: the date someone bought the house
  • Price: price of the house
  • Bedrooms: number of bedrooms in the house
  • Bathrooms: number of bathrooms in the house
  • Sqft_living: the width of the house in feet
  • Sqft_lot: the width of the lot in feet
  • Floors: number of floors
  • Waterfront: house that is next to water area
  • View: View of the house, usually housed with nice view
  • Condition: overall condition of the house
  • Grade: overall grade of the house
  • Sqft_above: area that is above the ground of the house
  • Sqft_basement: living square of the basement
  • Yr_built: the year the house was finished building
  • Yr_renovated: year that the house renovated
  • Zipcode: zipcode of the house
  • Lat: latitude coordinate
  • Long: longitude coordinate
  • Sqft_living15: The area of the interior where the 15 closest neighbors’ living spaces

are located

  • Sqft_lot15: the area of the 15 closest neighbors’ nearest land lots

3. Proposed model

3.1 Correlation

A correlational research strategy looks into correlations between variables without allowing

the researcher to control or manipulate any of them. (Bhandari P, 2021)

A correlation is a measurement of the intensity and/or direction of a link between two (or

more) variables. A correlation’s direction might be either positive or negative. (Bhandari P,

Table 2: Correlation explanation

Positive correlation Both variables change in the same way.

Negative correlation Variables shift in opposing directions.

Zero correlation There is no connection between the variables.

Figure 2 : correlation coefficient

There are also many types of correlation coefficients, the most common one is Pearson’s due

to its strong inferences. (Bhandari P, 2021)

[𝑛Σ𝑥

2

2

](𝑛Σ𝑦

2

2

R

xy

is strength of the correlation between variables x and y

n is sample size

∑ is sum of what follows

X is every x-variable value

Y is every y-variable value

XY is the product of each x-variable score and the corresponding y-variable score

3.2 Linear regression

Linear regression is one of the most popular modeling techniques because, in addition to

explaining the relationship between variables (like correlation), it also gives an equation that

The coefficient of determination (R²) measures how well a statistical model predicts an

outcome. (Turney S, 2022)

The lowest possible value of R² is 0 and the highest possible value is 1. Put simply, the better

a model is at making predictions, the closer its R² will be to 1. (Turney S, 2022)

You can choose one out of two formulas to calculate the coefficient of determination:

Formula 1

2

2

r is Pearson correlation coefficient

Formula 2

2

RSS is sum of squared residuals

TSS is total sum of squares.

Adjusted R-squared

The Adjusted Coefficient of Determination (Adjusted R-squared) is a Coefficient of

Determination modification that takes the number of variables in a data set into account. It

also penalizes you for points that do not correspond to the model. (Vogt, 2005)

𝐴

2

2

n is sample size

k is number of independent variables

R

2

is coefficient of determination

3.5 Model estimation

To measure model accuracy, we use Mean absolute (MAE), Mean square error (MSE) and

Root mean square error (RMSE).

Mean absolute error (MAE)

The degree of inaccuracy in your measurements is expressed as absolute error. It represents

the discrepancy between the measured and "actual" values.

𝑖= 1

| 𝑥

𝑖

−𝑥

|

𝑛

n is the number of errors,

Σ is summation symbol (which means “add them all up”),

|x i

  • x| is the absolute errors.

Mean square error (MSE)

The degree of inaccuracy in statistical models is measured by mean squared error (MSE). The

average squared difference between the observed and expected values is calculated. When

there is no error in a model, the MSE is 0. When model inaccuracy rises, so does its value. The

mean squared deviation is another name for the mean squared error (MSD). (Frost J)

𝑖

− ŷ i)

2

y i

is the i

th

observed value.

ŷ i

is the corresponding predicted value.

n = the number of observations.

Root mean square error (RMSE)

The standard deviation of the residuals is defined as the Root Mean Square Error (RMSE)

(prediction errors). Residuals are a measure of how far away data points are from the

4. Simulating scenario and results

4.1 Package installation

Also, I’m using Tableau for my project. I downloaded the program on https://www.tableau.com/..

After downloaded and run the program, you’ll see the main page as below.

Figure 3 : Tableau main page

Looking on the left side, is the taskbar of the main page. At the “To a file” section, I chose

“More…” and simply add the dataset in.

Figure 4 : Tableau work space

After adding it in, you’ll see the working space. Here, I can start working with my project.

4.2 Correlation

For the correlation of this dataset, I used the heat map below:

4.3 Scenarios

4.3.1 Dashboard

Figure 6 : Dashboard

Dashboard helps with the data visualization. It provides the user with an overview of the data.

Looking at the dashboard above, we can see the interaction between prices with different

components like square feet living, bathrooms, bedrooms and grade.

4.3. 2 Price and number of bedrooms

How can numbers of bedrooms affect the house prices?

Figure 7 : Price versus Bedrooms

The regression line shows that the relationship between number of bedrooms and the

house prices is linear positive. As the number of bedrooms increases, house prices will also

increase. However, the price of most houses with 5 rooms is not too different from that of

houses with 4 rooms. On the other hand, most houses with 3, 4, and 5 bedrooms have

approximately the same price. There are even bigger houses with lower prices. When buying

a house, customers are often afraid of the problem that if the house has many bedrooms,

the price will often be expensive. But based on the model, it can be seen that the number of

rooms does not affect the price of the house too much. Not much influence does not mean

that it is not an issue that needs attention. I think this feature still need to be focused on

when buying a house.