



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A series of homework questions related to statistical modeling, specifically focusing on regression techniques. It covers stepwise regression, lasso, and elastic net methods, emphasizing the importance of data scaling and cross-validation for model evaluation. The solutions provided include methodologies, results, and analyses, offering insights into model selection and optimization. Additionally, it explores experimental design principles and their application in real-world scenarios, such as semiconductor process optimization and market research for housing features. The document also discusses various statistical distributions and their relevance in different contexts, providing examples for binomial, geometric, poisson, exponential, and weibull distributions. It serves as a practical guide for understanding and applying statistical concepts in data analysis and experimental design.
Typology: Assignments
1 / 5
This page cannot be seen from the preview
Don't miss anything!




Question 11. Using the crime data set uscrime.txt from Questions 8.2, 9.1, and 10.1, build a regression model using:
The stepwise regression model summary shows a reduction from 15 predictors down to 8, achieving an adjusted R² of 0.7444. Some features with marginally high p-values could be further eliminated if needed, depending on model simplicity and performance requirements. Stepwise Regression Model: -6426.1 + 93.32M + 180.12Ed + 102.65Pol + 22.34M.F - 6086.63U1 + 187.35U2 + 61.33Ineq – 3796.03Prob Lasso and Elastic Net For elastic net, alpha = 0.8 is selected to be the model in view of the highest R2 value. Ridge Elastic Net Lasso Alpha 0 0. 8 1. Lambda from model
To determine the value of 10 different yes/no features to the market value of a house (large yard, solar roof, etc.), a real estate agent plans to survey 50 potential buyers, showing a fictitious house with different combinations of features. To reduce the survey size, the agent wants to show just 16 fictitious houses. Use R’s FrF2 function (in the FrF2 package) to find a fractional factorial design for this experiment: what set of features should each of the 16 fictitious houses have? Note: the output of FrF2 is “1” (include) or “-1” (don’t include) for each feature. Solution: Methodology: The function FrF2(nruns = 16, nfactors = 10, factor.names = features) is used to generate a fractional factorial design. Here, nruns = 16 specifies that 16 experimental combinations (or fictitious houses) will be generated, while nfactors = 10 indicates there are 10 binary(yes/no) features being considered. These features are labeled as F1 through F10 using features <- c("F1", "F2", ..., "F10"). The set.seed(1) command is included to ensure the results are reproducible. Finally, a summary column is added to the output to list which features are included (i.e., coded as 1) for each combination. Result and analysis: A full factorial design with 10 binary (yes/no) features at 2 levels each would require 2¹⁰ = 1024 combinations, meaning every possible combination of the 10 features would need to be tested. However, conducting such a large number of surveys would be impractical and costly. By using the FrF function, the number of combinations can be significantly reduced to just 16, while still capturing the main effects of all 10 features. The features included in each combination is shown as above (column: Included_Features). Question 13. For each of the following distributions, give an example of data that you would expect to follow this distribution (besides the examples already discussed in class). a. Binomial b. Geometric c. Poisson d. Exponential
e. Weibull Solution: Distribution Definition Example Binomial Binomial distribution models the number of successes or failures in a fixed number of independent trials. A quality engineer inspects 100 randomly selected wafers each day to count how many are defective or fall outside control limits Geometric Geometric distribution models the number of trials until the first failure. After monthly maintenance, a particle monitoring system records the number of tests until the first contamination event occurs. Poisson Poisson distribution models the count of events happening in a fixed time interval, assuming the events are rare and independent. A semiconductor plant records the number of power outages or trips that occur each year. Exponential Exponential distribution models time between events (such as failures), especially when failures occur randomly and independently. The time between random failures of a machine’s hardware components is monitored. Weibull Weibull distribution models lifetimes of components, especially when the failure rate is not constant (e.g., wear-out over time). The lifespan of a semiconductor tool transfer fork used in high-temperature processes is tracked until it fails due to wear or heat damage. Reference: https://www.rdocumentation.org/packages/FrF2/versions/2.3-4/topics/FrF https://www.rdocumentation.org/packages/DAAG/versions/0.98/topics/cv.lm https://www.statology.org/lasso-regression-in-r/ https://www.geeksforgeeks.org/r-machine-learning/stepwise-regression-in-r/