

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Data analytics 2026 exam practice questions 1. A retail company collects daily sales data from its stores. Recently, the management noticed that profits are decreasing even though sales volume appears to be increasing. As a Data Analyst, you are
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


A retail company collects daily sales data from its stores. Recently, the management noticed that profits are decreasing even though sales volume appears to be increasing. As a Data Analyst , you are asked to investigate the issue using the company’s dataset which contains:
House ID Size (sqft) Bedrooms Age (years) Location Price ($) 2 1500 2 5 B 350, 3 2500 4 20 C 600, 4 NaN 3 15 B 400, 5 1800 NaN 8 A 450, Assignment Tasks (provide statistical representations)
1. Show calculations for missing value replacement using mean and median Imputation, Compare the results – which method is better for this dataset and why 2. Standardize numerical features (Size, Bedrooms, Age) using Z-score normalization: 3. Detect potential outliers in the Price column using Z-score 4. Convert the Location column into one-hot encoding. 5. Standardize the Size, Bedrooms, and Age columns using Z-score normalization. Show all calculations. 6. Scale the Size column using min-max normalization, show the formula and results and compare the difference between Z-score and min-max scaling. Which situations is each method better for?. 7. Create a new feature called Price per Sqft: Price Price per Sqft = Size Show the calculated values for all houses. 8. Suppose some houses have very high Age values compared to others. Suggest a transformation to reduce skewness in the Age column (e.g., log transformation). Show an example calculation. 9. Why is data preprocessing important before training a machine learning model? Give at least 2 reasons.