Download gcs1004b 1639 assignment 1 and more Assignments Computer Communication Systems in PDF only on Docsity!
Higher Nationals in Computing
Unit 13: Computing Research Project
House Price Prediction In King Country
Assessor name: NGUYEN XUAN SAM
Learner’s name: NGUYEN TIN HOC
ID: GCS
Class: GCS1004B
Subject code: 1639
Assignment due: D e c e m b e r 1 2
t h
Assignment submitted: D e c e m b e r 1 2
t h
ASSIGNMENT 1 FRONT SHEET
Qualification BTEC Level 5 HND Diploma in Computing Unit number and title Unit^ 13:Computing^ Research^ Project Submission date 12/12/2023 Date Received 1st submission Re-submission Date Date Received 2nd submission Student Name Nguyen Tin Hoc Student ID GCS Class GCS210848 Assessor name Nguyen Xuan Sam Student declaration I certify that the assignment submission is entirely my own work and I fully understand the consequences of plagiarism. I understand that making a false declaration is a form of malpractice. Student’s signature Hoc Grading grid P1 P2 P3 P4 P5 M1 M2 M3 D1 D
Contents
3.5.....................................................................................................................................................................Price vs. Grade
- I.Introduction...............................................................................................................................................................................
- 1.1 Real estate business process..................................................................................................................................
- 1.2 Motivation.........................................................................................................................................................................
- 1.3 Objective...........................................................................................................................................................................
- II. Related works..........................................................................................................................................................................
- 2.1 Dataset..............................................................................................................................................................................
- 2.2 Basic linear regression.....................................................................................................................................................
- 2.2.1 Dowload Anaconda AND Lauch Jupiterlab...............................................................................................................
- 3.Result.................................................................................................................................................................................
- 3.1 Library.........................................................................................................................................................................
- 3.2 Price affected by Age of the house..............................................................................................................................
- 3.3 Square footage Analysis..................................................................................................................................................
- 3.4 Check if price and sqft_living and age_built follow the normaldistribution....................................................................
- Figure 1 Real estate business process...........................................................................................................................
- Figure 2 Data info..........................................................................................................................................................
- Figure 3 data Style.........................................................................................................................................................
- Figure 4 Heatmap........................................................................................................................................................
- Figure 5 data.descripe.................................................................................................................................................
- Figure 6 data.descripe.................................................................................................................................................
- Figure 7 Download Anaconda.....................................................................................................................................
- Figure 8 Anaconda Home............................................................................................................................................
- Figure 9 JupiterLab......................................................................................................................................................
- Figure 10 Library..........................................................................................................................................................
- Figure 11:Price vs. Age of the house...........................................................................................................................
- Figure 12: Python code...............................................................................................................................................
- Figure 13:Price vs Age of the house............................................................................................................................
- Figure 14:Python code................................................................................................................................................
- Figure 15:sqft_living vs. sqft_living15 and sqft_living vs. sqft_above......................................................................
- Figure 16:Python code................................................................................................................................................
- Figure 17:sqft_square vs normal distribution and price vs normal distribution........................................................
- Figure 18:age_built vs normal distribution.................................................................................................................
- Figure 19:Price vs Grade.............................................................................................................................................
- Figure 20:Python code................................................................................................................................................
- Figure 21:Python code................................................................................................................................................
- Figure 22:Price vs Grade Size and Layout
the real estate market. This includes statistics, customer feedback, competitor analysis, and market demand. Businesses and investors use this information to make informed decisions and develop effective strategies for their upcoming projects. Conducting thorough market research is crucial for the success of any project, serving as a decisive factor in its outcome.
Figure 1 Real estate business process...........................................................................................................................
Vietnam’s residential property prices continue to rise strongly, as demand and supply plummet. In Hanoi, the country’s capital city, the average price of apartments rose strongly by 16.08% y-o-y (12.32% inflation adjusted) to US$2,007 per square meter (SQM) in Q1 2023, based on figures released by JLL Vietnam. On a quarterly basis, prices increased 2.35% during the latest quarter (1.59% inflation- adjusted). Likewise in Ho Chi Minh City (HCMC), apartment prices rose by 10.32% in Q1 2023 from a year earlier (6.74% inflation-adjusted), to an average of US$3,229 per SQM. Quarter-on-quarter, apartment prices in the city increased 2.7% in Q1 (1.94% inflation-adjusted). 1.2 Motivation The foundations of high levels of transparency in the real estate sector include strictly enforced laws and regulations, high-quality, easily accessible market information and performance benchmarks, clear and fair practices, and high professional standards. To fulfill this role and operate efficiently, the real estate sector needs to be highly transparent. These foundations enable governments to operate efficiently, bringing long-term benefits to local communities and the environment, while helping businesses and investors to make decisions with confidence (Jeremy, 2018). People will search for a home that fits all of their specifications and is affordable when they decide to purchase a home. With the aid of machine learning, wecan estimate home prices with ease and determine whether a particular home is better suited for purchase or higher-priced sale. In this article,
we'll make housing price predictions for King County, Washington. When calculating the price of homes in regions like King County, Washington, predictive algorithms are complicated and tough to utilize (WA ). Real estate sales prices in King County may be impacted by a number of independent factors. The pricing can be significantly influenced by some characteristics, such as size, location, housing area, and so forth.
1.3 Objective...........................................................................................................................................................................
In this report, I will concentrate on current data analysis with the following objectives:
- Price affected by Age of the house
- Square footage Analysis
- Check if price and sqft_living and age_built follow the normal distribution
- Price vs. Grade
II. Related works..........................................................................................................................................................................
2.1 Dataset..............................................................................................................................................................................
The information I got from Kaggle (Lemsalu, 2017). The data set includes KingCounty, Washington, home values from May 2014 to May 2015. There are 21 columnsand more than 21000 entries in the raw dataset. The price column in this dataset is the dependent variable, and all other columns—aside from id and date—are independent features. The choice of this topic is rooted in the fact that predicting house prices is a common and pertinent subject for business analysts. It offers a practical and applicable scenario for analysis. The dataset itself is rich and diverse, featuring a wide array of independent variables such as bedrooms, bathrooms, living area, lot size, and various other features. These variables encompass different types, including categorical, numerical, and time-based data. The central focus of this analysis lies in the dependent variable: the sale price of houses within King County, USA, during the specified period from May 2014 to May
- It's crucial to emphasize that this dataset specifically covers house sale prices in King County, including Seattle, ensuring a focused and localized perspective on the real estate market within that area.
yr_renovated The year of the house’s last renovation ('0' if never renewed) zipcode 5-digit zip code Lat Latitude coordinate Long Longitude coordinate sqft_living15 Living room area in 2015, implying some renovations might have occurred. sqft_lot15 Lot size area in 2015, suggesting potential renovations. Checking the datatype and number of nulls and rows
Figure 2 Data info..........................................................................................................................................................
The dataset consists of more than 21 thousand entries detailing house prices, where all columns, except for the date, contain numerical information. Impressively, there are no missing or empty values, indicating the dataset's completeness and accuracy. Some features like "grade" represent specific categories rather than continuous numbers. Notably, these features are categorical and don't require additional adjustments. The data types mainly include integers and decimals, making it simple to comprehend and prevent confusion during analysis. Style :
- Here is a data style : Five features are floats, fifteen are integers and one is an object.
Figure 3 data Style.........................................................................................................................................................
As shown in the Figure 2 , we can see that there are many different data types, mainly: int and float. This is to help avoid confusion when reading data and make it easier to understand Heatmap
Figure 5 data.descripe.................................................................................................................................................
Figure 6 data.descripe.................................................................................................................................................
2.2 Basic linear regression.....................................................................................................................................................
First, to get to the main point, we have to take a walk around to enjoy an appetizer. That's right, we will learn a little about anaconda and installation. Anaconda is a distribution of Python and R programming languages tailored for scientific computing. Its primary purpose is to simplify the management and deployment of packages. This distribution includes data science packages and is compatible with Windows, Linux, and macOS. Anaconda Distribution, also known as Anaconda Individual Edition, is a free product offered by Anaconda, Inc. However, other products from the company, such as Anaconda Team Edition and Anaconda Enterprise Edition, are not free of charge.
2.2.1 Dowload Anaconda AND Lauch Jupiterlab...............................................................................................................
Figure 7 Download Anaconda.....................................................................................................................................
The first step in utilizing Python for data analysis with Jupyter involves installing Anaconda. Before installation, follow these steps:
- Visit the Anaconda website.
- Click on the Download button and choose the suitable operating system for your device.
- Anaconda's default installation includes Python. Select the appropriate system type for your computer and click Install, as this version of Python is widely used.
- Once the installation is finished, run the software and follow the system-specific installation instructions..
3.Result.................................................................................................................................................................................
3.1 Library.........................................................................................................................................................................
Figure 10 Library..........................................................................................................................................................
*** NumPy** : NumPy is a prominent open-source Python library, provides essential mathematical tools and handles large matrices and multidimensional data efficiently. It serves as a versatile tool for random number generation, managing complex data structures, and performing advanced operations in linear algebra. pandas : Pandas is a Python library for data analysis and data manipulation. It offers flexible and convenient data structures like DataFrames and Series, making it easier to work with tabular data. Pandas is employed for reading, processing, and analyzing data from various sources such as CSV files, Excel sheets, SQL databases, and more. It simplifies tasks related to data cleaning, transformation, and aggregation. *** Matplotlib* : Matplotlib an open-source toolkit, is tailored for visualizing and exploring numerical data. It enables the creation of various graphics, including graphs, scatterplots, and histograms, aiding in data visualization and analysis. *statsmodels.api : Statsmodels is a Python package dedicated to statistical modeling and research. It empowers researchers and analysts with tools for activities like hypothesis testing and logistic regression, facilitating in- depth statistical analyses. *seaborn : Seaborn is a data visualization library built on top of Matplotlib. It provides a high-level interface for creating complex and visually appealing statistical plots. Seaborn is employed to create various statistical plots including distribution plots, violin plots, box plots, and more. It simplifies the process of generating aesthetically pleasing visualizations and is commonly used in exploratory data analysis and data presentation tasks.
Sceanario 1. Price affected by Age of the house
Figure 14 Python code
Figure 11:Price vs. Age of the house...........................................................................................................................
Figure 12: Python code...............................................................................................................................................
Figure 15:sqft_living vs. sqft_living15 and sqft_living vs. sqft_above......................................................................
The relationship between the living area of a house (sqft_living) and the neighborhood standards
(sqft_living15) shows a strong positive correlation, as depicted by a linear trend in the plot. This
suggests that houses in the area aresimilar in size, hinting at a shared community. Similarly,
comparing the living area (sqft_living) with the above- ground area (sqft_above) reveals a
significant positive correlation. Houses without basements have a living area equal to their above-
ground space, indicating a consistent pattern.
Sceanario 3. Check if price and sqft_living and age_built follow
the normal distribution
Figure 16:Python code................................................................................................................................................
Figure 17:sqft_square vs normal distribution and price vs normal distribution........................................................