



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Data Modeling for Machine Learning Overview
Typology: Exams
1 / 6
This page cannot be seen from the preview
Don't miss anything!




Data Collection and Preparation - CORRECT ANSWER โโโ Gather and preprocess raw data for analysis. Feature Selection and Engineering - CORRECT ANSWER โโโ Identify and modify key variables influencing outcomes. Model Selection - CORRECT ANSWER โโโ Choose appropriate machine learning algorithms for tasks. Training the Model - CORRECT ANSWER โโโ Feed data to model for learning and error minimization. Evaluation and Validation - CORRECT ANSWER โโโ Measure model accuracy using various performance metrics. Model Tuning - CORRECT ANSWER โโโ Adjust hyperparameters to improve model accuracy. Deployment and Monitoring - CORRECT ANSWER โโโ Implement model in production and track performance. Data Relationships - CORRECT ANSWER โโโ Uncover dependencies and patterns in data. Data Quality - CORRECT ANSWER โโโ Address issues like missing values and outliers. Feature Engineering - CORRECT ANSWER โโโ Create new features to enhance model performance. Reducing Complexity - CORRECT ANSWER โโโ Simplify datasets to facilitate analysis and visualization. Model Interpretability - CORRECT ANSWER โโโ Understand how input data affects model outcomes. Scalability and Efficiency - CORRECT ANSWER โโโ Blueprint for data flow to support larger datasets. Consistent Data Preparation - CORRECT ANSWER โโโ Standardized models support reuse across projects. Descriptive Models - CORRECT ANSWER โโโ Analyze historical data to uncover patterns. Clustering - CORRECT ANSWER โโโ Group similar data points based on features. Association Rule Mining - CORRECT ANSWER โโโ Identify correlations between variables in datasets.
Dimensionality Reduction - CORRECT ANSWER โโโ Reduce features while retaining important information. Predictive Models - CORRECT ANSWER โโโ Make predictions about future events using data. Regression - CORRECT ANSWER โโโ Model relationship between dependent and independent variables. Classification - CORRECT ANSWER โโโ Categorize data points into predefined classes. Time Series Forecasting - CORRECT ANSWER โโโ Predict future values using historical time-based data. Prescriptive Models - CORRECT ANSWER โโโ Suggest optimal actions based on predictions. Recommendation Systems - CORRECT ANSWER โโโ Suggest items based on user behavior and preferences. Optimization Models - CORRECT ANSWER โโโ Find best solutions from various decision scenarios. Mean Squared Error - CORRECT ANSWER โโโ Metric for measuring prediction accuracy in regression. F1 Score - CORRECT ANSWER โโโ Harmonic mean of precision and recall. Cross-Validation - CORRECT ANSWER โโโ Technique for assessing model performance and avoiding overfitting. ARIMA Models - CORRECT ANSWER โโโ Used for time series forecasting of trends. Logistics - CORRECT ANSWER โโโ Management of resources and supply chains. Resource Allocation - CORRECT ANSWER โโโ Distribution of resources for optimal efficiency. Manufacturing Processes - CORRECT ANSWER โโโ Methods used to produce goods and services. Decision Support Systems (DSS) - CORRECT ANSWER โโโ Tools for informed decision-making through scenario evaluation. Monte Carlo Simulations - CORRECT ANSWER โโโ Statistical methods for predicting outcomes in risk management. Data Preprocessing - CORRECT ANSWER โโโ Cleaning and structuring raw data for analysis. Data Cleaning - CORRECT ANSWER โโโ Removing errors and inconsistencies from raw data.
Wrapper Methods - CORRECT ANSWER โโโ Evaluate feature subsets by model performance. Recursive Feature Elimination (RFE) - CORRECT ANSWER โโโ Removes least important features recursively. Exhaustive Search - CORRECT ANSWER โโโ Tests all feature combinations for selection. Embedded Methods - CORRECT ANSWER โโโ Feature selection integrated during model training. Lasso Regression - CORRECT ANSWER โโโ Linear regression with L regularization for sparsity. Decision Trees - CORRECT ANSWER โโโ Algorithms that select informative features for splits. Random Forests - CORRECT ANSWER โโโ Ensemble method using multiple decision trees. Feature Importance Visualization - CORRECT ANSWER โโโ Shows influence of features on model predictions. SHAP Values - CORRECT ANSWER โโโ Measure of feature contribution from game theory. LIME - CORRECT ANSWER โโโ Explains individual predictions using simpler models. Data Splitting - CORRECT ANSWER โโโ Dividing data for training and testing purposes. Overfitting - CORRECT ANSWER โโโ Model memorizes training data, failing on new data. Model Evaluation - CORRECT ANSWER โโโ Assessing model performance on unseen data. Train-Test Split - CORRECT ANSWER โโโ Commonly 80% training, 20% testing ratio. Stratified Sampling - CORRECT ANSWER โโโ Preserves class distribution in splits. K-Fold Cross-Validation - CORRECT ANSWER โโโ Divides data into k folds for training/testing. Leave-One-Out Cross-Validation (LOOCV) - CORRECT ANSWER โโโ Each sample used once for testing in small datasets. scikit-learn - CORRECT ANSWER โโโ Library for machine learning with various utilities.
train_test_split - CORRECT ANSWER โโโ Function to split data into training and testing. cross_val_score - CORRECT ANSWER โโโ Simplifies k-fold cross-validation process. TensorFlow - CORRECT ANSWER โโโ Library for deep learning and neural networks. PyTorch - CORRECT ANSWER โโโ Flexible library for deep learning applications. AutoML Platforms - CORRECT ANSWER โโโ Tools simplifying machine learning for non-experts. Supervised Learning - CORRECT ANSWER โโโ Models trained on labeled data for predictions. Linear Regression - CORRECT ANSWER โโโ Predicts continuous outcomes using linear relationships. Support Vector Machines (SVM) - CORRECT ANSWER โโโ Classifies by maximizing the margin between classes. Unsupervised Learning - CORRECT ANSWER โโโ Models trained on unlabeled data to find patterns. K-Means Clustering - CORRECT ANSWER โโโ Partitions data into k clusters based on means. Principal Component Analysis (PCA) - CORRECT ANSWER โโโ Reduces dimensionality while retaining significant information. Semi-Supervised Learning - CORRECT ANSWER โโโ Uses few labeled and many unlabeled data points. Self-Supervised Learning - CORRECT ANSWER โโโ Creates labels from data structure for training. Imbalanced Data - CORRECT ANSWER โโโ One class significantly outnumbers others in dataset. Biased Predictions - CORRECT ANSWER โโโ Models favor majority class, neglecting minority class. Poor Generalization - CORRECT ANSWER โโโ Model fails to learn from minority class data. Misleading Evaluation Metrics - CORRECT ANSWER โโโ Standard metrics like accuracy can be deceptive. Precision - CORRECT ANSWER โโโ True positives divided by total predicted positives. Recall - CORRECT ANSWER โโโ True positives divided by actual positives.