1641 business intelligence, Cheat Sheet of Artificial Intelligence

Your company is currently working in [Assumed Domain] for 2 years. For a new, young company, the competition in the market is very high. Therefore, the Board of Director has decided to apply Business Intelligence to improve the company business process by making better decisions. The Board of Directors assigns a small group including you in Research & Development Department to study business intelligence to apply for the company in the coming years. You need to research about business processes and decision support processes in the company and identify the types of data (unstructured, semi-structured or structured) generated by these processes with examples. You also need to research about current software used in the business process or decision support process and evaluate these usages (benefits and drawbacks). Next you need to understand the types of support for decision-making at different levels (operational, tactical and strategic) within the company and study which business

Typology: Cheat Sheet

2022/2023

Uploaded on 04/25/2023

jen-nguyen
jen-nguyen 🇻🇳

3 documents

1 / 26

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Higher Nationals in Computing
Unit 14: Business Intelligence
ASSIGNMENT 1
Assessor name: NGUYEN XUAN SAM
Learner’s name: NGUYEN DAN QUE
ID: GCS210143
Class: GCS0905A
Subject code: 1641
Assignment due: 04/03/2023 Assignment submitted: 04/03/2023
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a

Partial preview of the text

Download 1641 business intelligence and more Cheat Sheet Artificial Intelligence in PDF only on Docsity!

Higher Nationals in Computing

Unit 14: Business Intelligence

ASSIGNMENT 1

Assessor name: NGUYEN XUAN SAM

Learner’s name: NGUYEN DAN QUE

ID: GCS

Class: GCS0905A

Subject code: 1641

Assignment due: 04/03/2023 Assignment submitted: 04/03/

ASSIGNMENT 1 FRONT SHEET

Qualification BTEC Level 5 HND Diploma in Computing Unit number and title Unit 14: Business Intelligence Submission date 04/03/2023 Date Received 1st submission Re-submission Date Date Received 2nd submission Student Name Nguyen Dan Que Student ID GCS Class GCS0905A Assessor name Nguyen Xuan Sam Student declaration I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I understand that making a false declaration is a form of malpractice.

Student’s signature Grading grid P1 P2 M1 M2 D1 D

ASSIGNMENT 1 BRIEF

Student Name/ID Number Nguyen Dan Que/GCS

Unit Number and Title 14: Business Intelligence Academic Year 2019 - 2020 Unit Tutor Assignment Title Number & Assignment 1: Discover business process and BI technologies

Issue Date Submission Date IV Name & Date

Submission Format The submission is in the form of a Microsoft® PowerPoint® style presentation to be presented to your colleagues. The presentation can include links to performance data with additional speaker notes and a bibliography using the Harvard referencing system. T copy. You are required to make effective use of headings, bullet points and subsections,he presentation slides for the findings should be submitted with speaker notes as one as appropriate. Your research should be referenced using the Harvard referencing The recommended word limit is 500 words, including speaker notes, although you will not system. be penalised for exceeding the total word limit.

Unit Learning Outcomes LO1 making. Discuss business processes and the mechanisms used to support business decision- LO2 Compare the tools and technologies associated with business intelligence functionality Assignment Brief Your company is currently working in [Assumed Domain] for 2 years. For a new, young company, the competition in the market is ver decided to apply Business Intelligence to improve the company business process byy high. Therefore, the Board of Director has making better decisions. The Board of Directors assigns Department to study business intelligence to apply for the company in the coming year a small group including you in Research & Developments. You need to research about business processes and decision support processes in the company and identify the types of data generated by these processes with examples (unstructured, semi. You also need to research about current-structured or structured) software used in the business process or decision support process and evaluate these usages (benefits and drawbacks) Next you need to understand the types of support for decision. -making at different levels (operational, tactical and strategic) within the company and study which business

intelligence features can help on that types of support. Study the information systems or technologies (of BI) can be used in this case, compare and co which should be used. ntrast them to conclude Your group needs to present the research results to the board in a presentation of 30 minutes.

Learning Outcomes and Assessment Criteria Pass Merit Distinction

LO1 Discuss business processes and the mechanisms used to support business decision-making

D1 Evaluate the benefits and drawbacks of using application software as a mechanism for business processing.

P1 Examine, using examples, the terms ‘Business Process’ and ‘Supporting Processes’.

M1 Differentiate between unstructured and semi-structured data within an organisation.

LO2 intelligence functionality Compare the tools and technologies associated with business^ D2 contrast a range of^ Compare and information systems and technologies that can be used to support organisations at operational, tactical and strategic levels.

P2 support available for business Compare the types of decision levels within an organisation.-making at varying

M2 the key features of business Justify, with specific examples, intelligence functionality.

Assignment 1 answers

1. Introduction

1. 1 Overview of problems Nowadays, the progress of the economy and human needs are constantly changing, buying a house is no exception. However, the prices of houses are not stable and are always affected by many different factors. On the other hand, the needs of home buyers are difficult to meet. Therefore, the purpose of the model to predict the house price is based on many factors such as architecture, location, quality, ...

Figure 1: Housing price factors In machine learning, we can predict the trend of price based on available information, this is applicable in most of the applications around us such as online shopping, entertainment, ... Similarly, this is also applied to price prediction.

1. 2 Motivations With the increasing demand for housing, meeting the needs of customers is also becoming increasingly difficult. When deciding to buy a house, customers always want to find a house with a convenient location, reasonable spaciousness, and many other requirements, but especially, the price of the house must be cheap or at least reasonable. Applying machine learning to house price prediction will significantly reduce the pressure on real estate. At the

same time, customers will also find it easier to find a suitable home for them at a reasonable price. In this report, I will predict home prices in King County, United State.

1.3 Objectives In this report, there will be some objectives that we’re focusing on:

  • How the number of bedrooms affect the price of the house. Will it have much impact on it?
  • How areas of living house affect the price of the house.
  • How the number of bathrooms affect the price of the house. Will it have much impact on it?
  • How the grade of the house affect the price of the house.

Table 1 : First 5 rows of data from dataset

2.2.3 Dataset description As mentioned above, this dataset contains 21 features:

  • Id: unique identifier of a house
  • Date: the date someone bought the house
  • Price: price of the house
  • Bedrooms: number of bedrooms in the house
  • Bathrooms: number of bathrooms in the house
  • Sqft_living: the width of the house in feet
  • Sqft_lot: the width of the lot in feet
  • Floors: number of floors
  • Waterfront: house that is next to water area
  • View: View of the house, usually housed with nice view
  • Condition: overall condition of the house
  • Grade: overall grade of the house
  • Sqft_above: area that is above the ground of the house
  • Sqft_basement: living square of the basement
  • Yr_built: the year the house was finished building
  • Yr_renovated: year that the house renovated
  • Zipcode: zipcode of the house
  • Lat: latitude coordinate
  • Long: longitude coordinate
  • Sqft_living15: The area of the interior where the 15 closest neighbors’ living spaces are located
  • Sqft_lot15: the area of the 15 closest neighbors’ nearest land lots

3. Proposed model

3.1 Correlation A correlational research strategy looks into correlations between variables without allowing the researcher to control or manipulate any of them. (Bhandari P, 2021) A correlation is a measurement of the intensity and/or direction of a link between two (or more) variables. A correlation’s direction might be either positive or negative. (Bhandari P,

Table 2: Correlation explanation Positive correlation Both variables change in the same way. Negative correlation Variables shift in opposing directions. Zero correlation There is no connection between the variables.

Figure 2 : correlation coefficient There are also many types of correlation coefficients, the most common one is Pearson’s due to its strong inferences. (Bhandari P, 2021)

𝑟 = (^) √[𝑛Σ𝑥 2 𝑛 (^) −Σ 𝑥𝑦(Σ𝑥^ −) 2 (]Σ(𝑥𝑛)Σ(𝑦Σ 2 𝑦 (^) −) (Σ𝑦) (^2) ) (^) (1)

Rxy is strength of the correlation between variables x and y n is sample size ∑ is sum of what follows

Β 1 X 1 is the regression coefficient (Β 1 ) of the first independent variable (X 1 ) (a.k.a. the effect that increasing the value of the independent variable has on the predicted y value) … is do the same for however many independent variables you are testing Βn Xn is the regression coefficient of the last independent variable e is model error (a.k.a. how much variation there is in our estimate of ) (Bevans R, 2020)

3.4 R-squared and adjusted R-squared R-squared The coefficient of determination (R²) measures how well a statistical model predicts an outcome. (Turney S, 2022) The lowest possible value of R² is 0 and the highest possible value is 1. Put simply, the better a model is at making predictions, the closer its R² will be to 1. (Turney S, 2022) You can choose one out of two formulas to calculate the coefficient of determination: Formula 1 𝑅^2 = (𝑟)^2 ( 4 )

r is Pearson correlation coefficient Formula 2

𝑅^2 = 1 − 𝑅𝑆𝑆 𝑇𝑆𝑆^ (^5 )

RSS is sum of squared residuals TSS is total sum of squares Adjusted R-squared The Adjusted Coefficient of Determination (Adjusted R-squared) is a Coefficient of Determination modification that takes the number of variables in a data set into account. It also penalizes you for points that do not correspond to the model. (Vogt, 2005)

𝑅 − 𝑠𝑞(𝑎𝑑𝑗) = 𝑅𝐴^2 = 1 − ( 1 − 𝑅^2 )( 𝑛 −𝑛^ −𝑘 −^1 1 ) (^6 )

n is sample size k is number of independent variables R^2 is coefficient of determination

3.5 Model estimation To measure model accuracy, we use Mean absolute (MAE), Mean square error (MSE) and Root mean square error (RMSE). Mean absolute error (MAE) The degree of inaccuracy in your measurements is expressed as absolute error. It represents the discrepancy between the measured and "actual" values.

𝑀𝐴𝐸 = 1 𝑛 Σ𝑖^ 𝑛=^1 |𝑥𝑖−𝑥| (^7 )

n is the number of errors, Σ is summation symbol (which means “add them all up”), |xi – x| is the absolute errors. Mean square error (MSE) The degree of inaccuracy in statistical models is measured by mean squared error (MSE). The average squared difference between the observed and expected values is calculated. When there is no error in a model, the MSE is 0. When model inaccuracy rises, so does its value. The mean squared deviation is another name for the mean squared error (MSD). (Frost J)

𝑀𝑆𝐸 = Σ(𝑦𝑖^ 𝑛−^ ŷ i)^2 (^8 )

4. Simulating scenario and results

4.1 Package installation Firstly, I downloaded Anaconda navigator on https://www.anaconda.com. After downloaded and run the program, you’ll see the main page as figure 3 below.

Figure 3: Anaconda navigator For this work, I’m using Jupyter lab on Anaconda. JupyterLab is the latest web-based interactive development environment for notebooks, code, and data. First, the two libraries needed for data exploration and analysis are Pandas and Numpy. The matplotlib and seaborn libraries are used in visualization. The statsmodels libraries is used for modeling. Seaborn is used for creating heatmap.

Figure 4: Added libraries

4.2 Correlation For the correlation of this dataset, I used the heat map below:

Figure 5: Heatmap I collected some high correlation pairs as well as some features I think are important when looking for a house:

  • Price and sqft_living
  • Price and bathrooms
  • Price and grade
  • Price and bedrooms

Figure 7 : OLS regression result of bedrooms Looking at the result above, the R-squared = 0.095 means the number of bedrooms affect only 9.5% of the house price. The figure below show the relationship between number of bedrooms and the house prices.

Figure 8 : visualized model of bedrooms

The regression line shows that the relationship between number of bedrooms and the house prices is linear positive. As the number of bedrooms increases, house prices will also increase. However, the price of most houses with 5 rooms is not too different from that of houses with 4 rooms. On the other hand, most houses with 3, 4, and 5 bedrooms have approximately the same price. There are even bigger houses with lower prices. When buying a house, customers are often afraid of the problem that if the house has many bedrooms, the price will often be expensive. But based on the model, it can be seen that the number of rooms does not affect the price of the house too much. Not much influence does not mean that it is not an issue that needs attention. I think this feature still need to be focused on when buying a house. The formula used for this model: 𝑦ℎ𝑎𝑡 = 1. 217 𝑒 + 05 ∗ 𝑥 − 1. 298 𝑒 + 05

4.3.2 Prices and square feet living

Figure 9 : OLS regression result for sqft_living