Download Machine Learning Mastery Roadmap and more Study notes Computer science in PDF only on Docsity!
THE COMPLETE ROADMAP TO MASTER MACHINE LEARNING — (WITHOUT DEEP LEARNING) — COMPREHENSIVE LEARNING ROADMAP From Fundamentals to Advanced Techniques — Your Complete Guide to Machine Learning Excellence 27 MAJOR CATEGORIES kk 27. SCALING & BEYOND Deploy, scale and stay ahead ~ 21. EVALUATION (23) CLEAR ROADMAP Evaluate and validate J STEP-BY-STEP model performance 865+ TOPICS 10. ALGORITHMS COVERED Explore core ML algorithms {} PRACTICAL & APPLIED 3. DATA PREPARATION Ge ah Prepare and understand your data BEST PRACTICES & PITFALLS IDEAL FOR 1. FUNDAMENTALS BEGINNERS & Yn Build strong PRACTITIONERS START foundations oy COMPREHENSIVE PRACTICAL wa REAL-WORLD EXPERT COVERAGE Y=) INSIGHTS 411 APPLICATIONS = GUIDANCE 27 MAJOR CATEGORIES | 865+ TOPICS | END-TO-END MASTERY THE COMPLETE ROADMAP TO MASTER MACHINE LEARNING ——— (WITHOUT DEEP LEARNING) —— COMPREHENSIVE LEARNING ROADMAP GTISTO TORO TO TOTOTOTORORS) ok ® (Ra) 27 MAJOR CATEGORIES ,& ® @ @ 5 | IS | DS) Be | Machine Learning Fundamentals and Types Problem Formulation and ML Workflow Data Preparation and Train/Validation/Test Split Model Concepts and Fundamentals Loss Functions and Objectives Evaluation Metrics and Performance Analysis End-to-End ML Workflow and Pipelines Common ML Pitfalls and Best Practices Linear Models and Regression Classification Algorithms Tree-Based Models Ensemble Methods and Theory Boosting Algorithms Support Vector Machines (SVM) SOTOTSTOR ST OROTOROEO BORG) | 865+ TOPICS COVERED * TABLE OF CONTENTS : o90 °o0°e oa KA aps b ttl | He | cA) AS He Op © © *& | @ Els id: | | Oe Unsupervised Learning Algorithms Dimensionality Reduction Techniques Anomaly Detection Methods Probabilistic Models and Bayesian Methods Recommendation Systems and Ranking Hyperparameter Optimization and Tuning Model Evaluation and Validation Imbalanced Data Handling Ensemble Theory and Analysis Interpretability and Explainability Model Selection and Comparison Scaling Machine Learning Other Important ML Topics END-TO-END MASTERY ROADMAP 02 TRANSFORM IDEAS INTO IMPACTFUL ML SOLUTIONS PROBLEM DEFINITION (2) Problem understanding and scoping i Lo alll Bs Baseline establishment Defining success metrics and KPIs G Expected performance targets 0} PROBLEM FORMULATION AND ML WORKFLOW Sil 3] Constraints and requirements gathering Aj a fl Business problem to ML problem translation GQ Feasibility analysis = Data availability assessment Computational requirements fee Timeline and resource planning ML WORKFLOW AND PIPELINE 4 —o A SP BS oA [01] (02) {03 | {04 | [05 | {06 | [07 | { 08 | [ 09 | Data Exploratory Data Feature Model Hyperparameter Model Deployment — Maintenance Collection Data Preprocessing Engineering Selection & Tuning Evaluation & & & Retraining & Sourcing Analysis & Cleaning Training Validation Monitoring Cycles (EDA) Gather and Understand Clean, handle Create and Choose the Optimize model Measure Deploy model — Update model identify data through missing values select relevant best model parameters for performance and monitor and improve relevant insights and and prepare features and train better and validate in real-time continuously data visualization data it performance results =e ° ML project lifecycle (yz Stakeholder management =, Agile vs waterfall in ML projec Risk assessment and mitigation G ME gaa progects LN ae g SUCCESSFUL ML PROJECT tS Experiment tracking and versioning Documentation best practices Q) Team communication 49} CLEAR PROBLEMS Define the right problem to solve ML pipeline G * EFFICIENT WORKFLOW Follow a structured GREAT ML MODELS START WITH A WELL-DEFINED PROBLEM AND A SOLID PLAN. 6) Budget and resource allocation all ML project metrics and KPls @ Post-deployment analysis m2 ee MAXIMUM IMPACT Deliver value and drive business success OOo SMART MANAGEMENT ce) Plan, collaborate and | manage effectively e >>? ROADMAP SECTION — 03 2) DATA COLLECTION Data sources and acquisition Identify and gather data from reliable internal and external sources. Synthetic data generation Create artificial data to augment datasets or solve data scarcity. Data quality assessment Evaluate accuracy, completeness, consistency, and validity of data. UY Data volume requirements Determine the right amount of data needed for robust model performance. Data representativeness Ensure data reflects the real-world population and use-case scenario. 000 DATA PREPARATION AND TRAIN/VALIDATION/TEST SPLIT QUALITY DATA. SMART SPLITS. BETTER MODELS. a x7: n° 0 toy goes e-ere Sampling strategies o-@-0 $.4.6 Use random, stratified, or systematic sampling ‘ to collect unbiased data. AYA Bias in data collection Detect and mitigate bias to ensure fairness and reduce model skew. (_) _ Privacy and ethical considerations A Protect user privacy and follow ethical data practices. Data licensing and rights with data licenses. Data retention and deletion and secure deletion. Ensure proper usage rights and compliance Define policies for data storage duration Train/validation/test split concepts Divide data into subsets for training, tuning, and final evaluation. Holdout evaluation Set aside a portion of data as a holdout set for unbiased evaluation. @ G Cross-validation techniques Improve generalization estimates using multiple training/validation splits. <> ag 4 K-fold cross-validation Ooo Split data into K folds and rotate validation across folds. 8 = Stratified cross-validation €@ — Maintain class distribution across folds for classification tasks. (3) DATA PREPROCESSING Missing value handling Impute, remove, or flag missing values appropriately. Outlier detection and treatment Detect outliers and cap, transform, or remove them. Sealing and normalization Scale features to a similar range to improve madel performance. Standardization Transform features to have zero mean and unit variance. One-hot encoding Convert categorical variables into binary vectors. SD it & Be BETTER DATA Build models on high-quality data SMART SPLITS Evaluate fairly and avoid overfitting DATA SPLITTING STRATEGIES Time-series cross-validation Respect temporal order when validating on time-series data. Ee Q 90 Group-based cross-validation fn) Keep related groups together in the same fold to prevent leakage. oD Nested cross-validation Use inner loop for tuning and outer loop for unbiased evaluation. 9-2 Bootstrap and resampling 6-6-6 Generate samples with replacement to estimate model stability. Evaluation set design Design evaluation strategy aligned with business goals and metrics. DATA SPLIT ILLUSTRATION Full Dataset | OOOOeooEeoo 1 4 L Train Validation Test (60-80%) (10-20%) (10-2056) GOOD PRACTICES @) Prevent data leakage @) Maintain distribution @) Use appropriate strategy ©) Keep test set untouched Label encoding Convert categorical labels to integer values. Feature binning and discretization Convert continuous features into discrete bins, = Handling categorical features Encode or transform categorical variables effectively. Handling text features Tokenization, vectorization, and PREPROCESSING PIPELINE EXAMPLE 7-6 -@-@O-y Handling high-cardinality features Use hashing, grouping, or target encoding techniques: embedding techniques. Handling temporal features Extract date/time components, lags, rolling stats, etc. | {o> CLEAN PIPELINE Prepare data for maximum impact Raw Data Clean Transform Encode Model Ready WHY IT MATTERS © Cleaner data leads to better insights @ Proper splits ensure reliable evaluation © Good preprocessing boosts model performance © Robust workflow reduces errors and rework pa IMPROVED MODELS BUSINESS VALUE a 1 Higher accuracy and Data to decisions L that drive results better generalization ROADMAP SECTION 05 AND OBJECTIVES MEASURE ERROR. OPTIMIZE OBJECTIVES. BUILD BETTER MODELS. LOSS FUNCTIONS Objective: min L£(@) REGRESSION LOSS FUNCTIONS e-c.e Mean Squared Error (MSE) q R J Quantile Loss =5¥ al pely ae PAG P J e ui) 95°59 — Penalizes larger errors Luge = = y (wi Hd) Pearl Useful tor quantile 9-9*9 quadratically. fz] regression, | OE *. -Ife< op) Mean Absolute Error (MAE) 12 cs, Custom Loss Functions al []__ Penatizeserors nearly, Lave = = > lve ANS, _ Tailor the toss to specitic =15 (us He) more robust to outliers. ist business needs, Niel Mean Absolute Pe tage E " uae solute Percentage Error F tose ; B ay nye Weighted Loss Functions iz MAPE = a4 =— Oy, Hi) | Bah Ries a ol on Assign different importance £ = y w; ey, Fo) to samples. i=1 percentage error. Huber Loss 0 -iy* ly—al <6 sleiel Multi-output Regression Loss igs Combines MSE and ne, | £6 = Py cae ee : TIDE) Aaregate iss across (as DY Cys dy) abut th otters (ly-#1— 56) otherwise multiple targets. i=1 j=l Log-cosh Loss itll Loss Scaling and Normalization L : 5 —# Smooth apprasimation of MAE, Liogcosh = — i log cosh(y; — Hi) (] | Normalize loss for stable and £= = less sensitive to outliers. Lares faster training. CLASSIFICATION LOSS FUNCTIONS Binary Cross-Entropy Loss _ 1d >——) Squared Hinge Loss i= Standard foss for binary Lace = — n > log (Be) h / .| Penatizes margin violations | 4sq-hinge = » max(0, 1 - yif(x)) classification, +(1-y,)log (1 - 8] quadratically. t=1 Categorical Cross-Entropy Loss iad > 7) Log Loss (Logistic Loss) n For multi-class classification Lece = we 3 8 Vik log (Bax) we ,| Equivalent to binary cross- log = — y log (By) (one-hot labels). f=1 k=l entropy. f=1 Sparse Categorical Cross-Entropy r i. é Custom Classification Losses — 12 For multi-class with Lecce = — n Y le (Pry) ‘gi Design task-specific f= i Dy, e(%, Bi) integer labels. = classification losses, l t=1 Focal Loss aye) Weighted Cross-Entropy © iy 4 | 1< a Liocat = ar y (1-p,)* ™ log (Be) Handles class imbalance by focusing on hard examples, Hinge Loss Maximizes margin between | Linge = = =) max(0, 1 — yf (2;)) classes, Account for class imbalance using class weights. Class-Balanced Loss Rebalance based on effective number of samples, Luce = = 1} ivy; log (By,) Wey ® © © REGULARIZATION OBJECTIVES L1 Regularization (Lasso) 7 | Lis = Law +45 bol \eieed Fi Encourages sparsity in model weights. L2 Regularization (Ridge) Penalizes large weights, promotes smoothness. | Liz = Lata +A), w? j Elastic Net Regularization 4 eguianation | fey = Lao + Ay, [rol Combines L1 and L2 2 3 +422) wi regularization. | ae Aj = hy- i~ Bernoullt(p) Dropout Regularization Randomly drops neurons to prevent co-adaptation. i Early Stopping a = Training Loss Stop training when validation + Validation Loss Stoy” performance stops improving Weight Decay | L2 regularization applied during | Wray = w, — (VL + Aw,) optimization (e.g. in AdamW). | | x Maximum Margin Objectives 1 2 Ye?ce Maximize margin b nai [pel co ‘gin between 2 classes (e.9., SVM). s.t. y(w' a, +6) >1 Information Theoretic Objectives Maximize mutual information of minimize entropy. Multi-task Learning Objectives Learn multiple related tasks simultaneausly. “ye © Adversarial Robustness Objectives Make models robust to w) max I(Z;Y) — BI(Z;X) K £= » aly k=l min max L(fo(x +6), y) adversarial perturbations. o [Kilse Epochs - Sf CHOOSE THE RIGHT Loss OPTIMIZE SMARTER | <7, GENERALIZE BETTER bot & Align loss with problem Sf Better objectives lead to = df Regularization improves. PIE"
> ( - f0? i=1 ease = |} 3 iy 902 53 b- ACTUAL vs PREDICTED Average of the absolute errors. pa Mean Absolute Percentage Error (MAPE) _ 100% % yi- ve Average of absolute percentage errors. MAPE . ve ¥ : Actual values | _ Resquared (coetficient of determination) F D7 (vi — D2) be Predicted values Proportion of variance explained by the model, R*=1- EH)? ¥ : Mean of actual values 2| Adjusted R-squared I = R R-squared adjusted for number of predictors. R* =1-(1-R?) eel ODA NeW ae Mean Absolute Scaled Error (MASE) MASE = MAE (= MAE scaled by naive forecast error. Toy Liem+1 1% — Femi So) Lower is better for Median Absolute Error ‘i = error-based metrics. TIL] Median of absolute errors (robust to outliers}. MedAE = median(|yi — Bil) Higher is better for Explained Variance Score = Var(y — 9) 2 i 1 = Var(y — j) / Vary), EVS =1—- Very) R®, EVS, and correlation. -;4 Spearman and Pearson Correlation Cou(x.y) “= Measure strength and direction of linear! Tpearson = Tey Typearman = corr(rank(x),rank(y)) monotonic relationships. (2) CLASSIFICATION METRICS ROC CURVE cae ® Accuracy Re _TP + IN _ = Probability that the model Proportion of correct predictions, curacy = TP+TN + FP + EN & ranks a random positive © Precision ae TP 3 higher then a random Proportion af true positives among Precision = 5 pp negative. predicted positives. oO — 1.0 = Perfect Tr FPR(1- Specificity) = 1 bol ©} Recall (Sensitivity) Recall = 5 ay 0.5 = Random Proportion of true positives among TN PRECISION-RECALL CURVE actual positives. Specificity = aaa ‘ AUC-PR Specificity [ 1 1) Proportion of true negatives among f= 2 + Precision - Recall z Paitin ue a actual negatives. ~ Precision + Recail recall cane iG) Fi-score Fes 2 Recall ie Hpereoenc meen of preceoy anders 2 bag Loss (Logarithmic Loss) () F-beta scores ms ; =: Penalizes confident wrong predictions. Weighted harmonic mean of precision | | Artal Predicted Positive Hoses Matthews Correlation Coefficient (MCC) and recall, Positive TP FN Balanced measure even for imbalanced data. (True Positive) | (Fat ) * QE Confusion matrix ee Ee) ee eas OO table of TP, TN, FP, EN counts, Negative Fe oe actuals, adjusted for chance. (False Positive) (Tree Negative) 3 ess, Hamming Loss Fraction of misclassified labels. / RANKING AND RECOM Mean Average Precision (MAP) Mean of average precision across queries. Considers position and retevance with discount Mean Reciprocal Rank (MRR) Pracision@K Proportion of relevant items.in tap-K. Recall@k Proportion af relevant items in top-K. Normalized Discounted Cumulative Gain (NDCG) Inverse of rank of First relevant item, averaged. MENDATION METRICS 1 MAP = = > AP(q) Mesures ere Q 2 a ranking eusity = DOCEK rccartstening NDECEK. IDCGEIK ad relevance. ueR= 2 is Focuses om how try get rank, — ta fnteeiecsen rem wretreved P@K = ae eanrihex K recommenditions R@K = iio) oes How mary retevant [Relevant] ters are ‘caphaedin top-k = Hit Rate = 1, if hit Whether at least one relevant item is Hit@K = otherwise in top-K. \ = © Coverage Metrics noes lagu Items Raccmnncede| How much of the item catalogis | Coverage = — ~ [Torof Jrems| facommented across users. Diversity Metrics ‘ Promotes dissimilar items in Drowrsty = 1 ~ aah 7} i sim(t. J) recommendation lists. ay wy Serendipity | Based.on relevance and Recommending unexpected yet unexplainedness. relevant items. Ge) Novelty Jog (Popularity of ito) Promoting new or less-popular items, Silhouette Score Mezaures how similar an object is to its own cluster compared to other clusters. Range: [~1, 1] Davies-Bouldin Index (081) Average senilarity ratio of each chister with its most, ‘similar chester. Lower is better. Calinski-Harabasz Index (CHI) Ratio of betneen-clurter dispersion to within-ckaster daperson. Higher (s batter, Ounn Index Ratio of minimum intty-cluster distance to marimar Intra-cluster diameter. Higher is better, Gap Statistic Compares within-chater Gepertion with that expected ® RD OO] @ if O4B© CLUSTERING AND UNSUPERVISED METRICS ny ot) = at) Q aC) = Faxlaliy ea) k ek a, +0, &) Gy bi per d(c),c;) (i55) _ Tr(By) n-k o- cil TrWy) k-1 Wes Dunn = min¢ejd(c,, cj) €) max, Ay Gap(8) = E* log(W)|— log) Inertia (Within-Cluster Sum of Squares} Sum of squared distances of samples to thelr closest cluster center, Lower is better. Adjusted Rand Index (ARI) k Inertia = s » |la- all? i=Lxet; Index — Expected Measures similarity between chstering and tine isbets, | ARJ = ————— Range: [-1, 1} (1 is perfect. Max — Expected Normalized Mutual Information (NMI) 2-1U:¥) Measures mutual information between clustering NMI = ————_— and true tsbels. Range: (0. 1] (1 is pedect). H(U)+H(V) e k Purity . 1 Fraction ofthe largest class in each cluster. | Purity = = x mat Ie; bj) i=l Range: (0, 1] (1 is perfect). Entropy Measure of cluster label impurity. Lower is berter. k 1 Entropy = —— 97} py log (pis) t= 5 BEST PRACTICES Gj Choose metrics aligned with business goals, under a null reference datnbutica. Use multiple metrics for a@comprehensive view, i Analyze metrics across: diferent data slices. Monitor ower time and detect performance drift, Beware of data leakage in evaluation, Y ROADMAP SECTION —EE COMMON ML PITFALLS 08 AND BEST PRACTICES AVOID MISTAKES. BUILD BETTER MODELS. DELIVER RELIABLE RESULTS. ----> = @ a 6 m Ee . (1) DATA-RELATED PITFALLS Data leakage Information from the target or future data unintentionally used in training. Temporal leakage Using future information to predict the past. Violates causality. Scale leakage Fitting scalers/encaders on the full dataset instead of training data only. Information leakage Leaking label-proximal information through features or preprocessing. Train-test contamination Overlap or near-duplicate samples across train and test sets. Biased sampling Sampling that does not reflect the true population distribution. Unrepresentative data Training data does not represent deployment data or real-world use. Class imbalance problems Severe class imbalance leading to biased models and poor recall. Outlier contamination Outliers or noisy labels that distort learning and evaluation, Missing not at random (MNAR) Missingness depends on unobserved values; can bias results if ignored. a (2) MODEL-RELATED PITFALLS Overfitting Model learns noise instead of signal; performs well on train, poorly on unseen data, Underfitting Model too simple to capture the underlying patterns; high bias. Model selection bias Choosing the best model based on test data or repeated peeking. Hyperparameter tuning bias Tuning on test/validation data leads to overoptimistic performance estimates. Publication bias Only reporting successful results while ignoring failed experiments. Multiple comparisons problem Trying many models/metrics and picking the best inflates Type | error rate, Multiplicity in statistical testing Not adjusting for multiple hypothesis tests increases false discoveries. Model misspecification Incorrect assumptions or functional forms lead to systematic errors. Inappropriate baseline Weak or missing baselines make results hard to interpret. Ignoring domain knowledge Disregarding expert knowledge leads to unrealistic features or misleading conclusions. (3) BEST PRACTICES a Hd oS PR Start simple, add complexity gradually Begin with simple models and iterate based on evidence, Establish strong baselines Compare against meaningful baselines to create value. Use proper evaluation protocols Use cross-validation, time-based splits, or other appropriate methods. Validate on truly held-out data Keep a final test set untouched until the very end. Document everything Record decisions, assumptions, configs, and results. Reproducibility standards Fix random seeds, capture environments, and ensure reproducibility. Code review processes Review code for correctness, robustness, and clarity. Experiment tracking Track experiments, metrics, and artifacts systematically. Version control Use version control for code, data schemas, and configurations. Automated testing Add unit tests, data tests, and integration tests to catch issues early. 2 a Bi Measure what matters Be skeptical Oo Collaborate fa Iterate and learn Focus on impact Align metrics with business Question surprising results Worl with demain experts eg Use feedback loaps to Build models that solve goals and real-world impact. and validate thoroughly. and peers. continuously improve. teal problems reliably. /\ Remember: There are no shortcuts to building reliable ML systems. Rigor, discipline, and continuous learning are your best tools. X sy P ROADMAP SECTION LINEAR MODELS FOUNDATIONS TO ADVANCED MODELS FOR PREDICTION AND INFERENCE. a Fa (1) LINEAR REGRESSION BEST AT UNE (2) REGULARIZED LINEAR REGRESSION ‘= Simple linear regression «Ridge regression (L2) COEFFICIENT SHRINKAGE PATHS 3 Medel ane predictor eae Shrinks coefficients P y= Bot Bixte min jy ~ XB + AlBILE £1 o 4] ee near regression Lasso regression (L1) 2 3 x Uo Mu oe ae 2 fx Encourages sparsity. & 4 = Bo + Piz, +++ + Bory +e = . ze Ie a a min Jy ~ XB + AlBlh ‘ i Least squares estimation 4 Sas a Set [ <°"| Minimize sum of squared errors (SSE). Elastic Net log(A} a cS Combination of Li. and L2. mn 2 (ue) main Jy XIE + (eli + +52 ne) N i . CV CURVE (SELECT A} lormal Equations Regularization parameter selection 5. Closed-form solution. Choose A (and ct) wisely. 5 A B= (XTX) UXTy Use CV, AIC, BIC, or grid search. a > Geometric interpretation fi} Cross-validation for regularization nd = Projection of y onta fh k-fold CV to select best A. Ri column space of X. i , log(A) A kh fli fn { ) Feature selection with Lasso esurapt ars BE Nneer TED resent Sets some coetficients exactly to zero. Linearity, independence, homoscedasticity, LASSO SPARSITY normality of errors, no periect multicollinearity, fr Grouped Lasso Many coefficiants Lf Residual analysis. [s) Select groups of features. shrink to zero far ae adequacy. a J Adaptive Lasse y Oooo s8o000000 Look for patterns in residuals. —— eee Jyh, Data-driven weights. | ry R-squared and adjusted R-squared DECOMPOSITION OF VARIATION x = wi18,)) y= gel. PENALTY FUNCTIONS 9 wy R yc} SSR xy, BSEVtR-p—a) st ah 7 iar — U (Lasso) Ral-=- Ra1-— | sst= Nw -H — i ST SsT/@—1) YZ SCAD penalty e L2 (Ridge) Y poate. ie SSR= Fi (j- 9) (Explained) - Reduces bias of Lasso. 2 SEN verall significance af model. 4 f J a pu (SST = SSED/p SSE =E(yi ih)? Residual) Ordinal regression - SEH (n—p— 1) SST = SSR + SSE all For ordered categorical outcomes. ©) Statistical significance testing a — Cumulative link models. t-tests for individual coefficients. y, - yy a 4 SEB) - ——— (3) GENERALIZED LINEAR MODELS ADVANCED LINEAR MODELS — gowstvsois — F Logistic regression COMMON LINK FUNCTIONS ep Robust regression | Binary outcome. at . 1 tes Less sensitive to outliers. i logit(p) = tog ( ~) = Xp Distribution Link atu) =i Huber, pet: So Poisson regression Gaussian Identity g(4) = 1 ° Quantile regression alo Counts. ; ¥ Model conditional quantiles. | , i aby (S22 min -z log(u) = XB at | Waal ig tee (79) 2 Xu Peli — a 2 QUANTILE REGRESSION (x) Negative binomial regression =, | a, - Functional data regression — =09 Overdispersed counts. Poisson Log g(#) = log(s) 0 Predict from functions. ae ck yl iF =. = = at [ X(t)B(t)dt + — 7505 Tog) = HB a Ror yoat [X(t)B(t}dt +e See (4 * Gamma regression 4 —— Mixed-effects models =. —t=01 Positive continuous. Inverse Inverse sq. oOo Include fixed and random effects. 1 ee = XB or log(u) = X; Gaussian gtu) = a i li kf 4 ee Se a et a MIXED-EFFECTS EXAMPLE me ONE TIONS NB a Hierarchical linear models & Connect mean to linear predictor. (canonical) og: a{H) = logue) bit Nested structure, 2? Pak Lane >] Quasi-likelihood ; be paar 3 * [2] Relax distributional assumption: Multilevel models <—e ” — + | utional assumptions, MERN SARISNEE RELATIONSHIP Generalization of HLM. * a @ oe o Generalized estimating equations (GEE) | pistribution var(Y) ° R ‘ i : ° ‘andom intercepts and slopes rq Group 2 Group k Aye) Population-averaged effects. aE 4 7 13 Allow group-specific effects. b i uae = : . Gaussian o Zero-inflated models 7 RR Bayesian linear regression BAYESIAN POSTERIOR Extra zeros beyond assumed dist. Poisson u Priors. on B, a”. 5) Bastaiaent ‘ =- Prior 3 sterior via Bayes’ rule *, alt Hurdle models Binomial B(l-a~) ' \,o7> Likelihood o Two-part model. in aa ay Sparse linear models — Posterior Gamma ut High-dimensional p 2 n. “49 Overdispersion handling Lasso, elastic net, splke-and-slab. oo Ha ees eae . o°o Use NB, quasi-likelihood, Reo: Binanset | Cali 63 Covariate shift adaption 2 = . robust SEs. - —— a Adjust for Prain(X) # Prat(X). | B, Ke Reweighting, importance weights. BEST PRACTICES mat Understand Explore data and Prevent overfitting Choose the right Interpret models, ( ) assumptions and | visualize residuals | with regularization | model for your | validate results, check diagnostics. thoroughly. and CV. data and goal. document decisions. 4y- LINEAR MODELS ARE SIMPLE, INTERPRETABLE, AND POWERFUL — THE FOUNDATION OF MODERN ML. SECTION ROADMAP ENSEMBLE METHODS 11 | AND THEORY COMBINING STRENGTHS. REDUCING ERRORS. BUILDING ROBUST PREDICTORS. io Ss Ensemble Better Predictions (1) ENSEMBLE FUNDAMENTALS i (2) Reape F BAGGING PROCESS 888 Ensemble diversity WHY ENSEMBLES WORK? Bootstrap aggregating (Bagging) C00 Models make different errors. a Train models on bootstrap = Dataset if les. co Ensemble averaging vise he ros aoe ae Sa Reduces variance and stabilizes Bagging algorithm | | — predictions. ™ \ ! — ce. sak a > Aggregate. [ ial | f Soete | |: [ Boot M | ayy Weighted averaging | ie SA Qut-of-bag (O08) estimation Assign higher weights to stronger - (__j Use OOB samples to estimate models. Lower Error — G performance. (‘a Voting mechanisms ———— Random Forests Hogpblne sos premlins ty xolins ERROR DECOMPOSITION Bagging + random feature Ensemble error decomposition schlecht Error = Bias? + Variance + Noise. @ Bias? é Extremely Randomized Trees Ensembles reduce variance. @ Variance (Extra Trees) RANDOM FOREST SCHEMA a. 3 R its at each node. £T) Ensemble diversity measures I Noise leacom splits at each nods All Features ‘cL! Quantity dissimilarity between models, ©°9 subspace sampling 0-0-8 eo a oO 00 Use random subset of features. Tree Tree? Tree M G Correlation between ensemble members CORRELATION IMPACT fa ; 5 a Lower correlation + better ensembles. Z Feature bagging es “yen bee SDB Randomly sample features with wy Oracle performance soeet . replacement. Best achievable by selecting the best i Isolation Forests a High correlation ° member for each instance. wee aoe as Isolation-based anomaly detection ane with random trees. o Ensemble generalization bounds ©: Totally Ranidom: Tress Embed Theoretical quaraatees on ensemble ‘otally Random Trees Em| ng pews GENERALIZATION ROUNDS Pe Use random splits for embedding 008 ESTIMATION -fi=p _ and similarity. oer e0 © In-bag 7 | Ensemble selection Fens SE (4) + PE 1 Bagging variable importance ope ts POR ol ad ‘Choose a good subset of models. 4 dg l Measure importance across all oo Oe oF n. M: number ef models, pt avg. coreation | bogged mode. @ +0 © © “OOBsamples ~) (3) VOTING AND STACKING (4) ENSEMBLE THEORY DIVERSITY VS PERFORMANCE jorit i VOTING TYPES 2 Ensemble diversity theary 5 ene, NaprEy voting , oe Oiversity laads to better ensembles. a AA Class with most votes wins. Hard Voting Soft Voting _ 3 Optimal r = Kuncheva diversity measures & Diversity Weighted voting eeee Class Probables S's Measure dissimilarity between Bc Use model weights in voting. eeee classifiers e | : : & h Soft voting @eos * Disagreement measures 7 High FIX. Average predicted probabilities, | x Fraction of instances where models |_ me Gngely bi disagree. @GQ Hard voting [ ciass2 Q-statistic MARGIN DISTRIBUTION (9 Vote on class labels. , Measure beyond chance agreement. i <> Voting classifiers Correlation coefficient Wrong ‘Correct ios) Combine homogeneous models. ‘STACKING ARCHITECTURE p Lower correlation improves (Margin < 0) (Margin > 0} Stacking algorith Level 0 (Bae Models) ensemble, acking algorithms. —— = i = Combine heterogeneous models [Mode] “yr [Model & 1 Ensemble margin A 0 1 Roan he | Difference between top 2 class Margin Mateeamertatning Lave! | ileta-Leamner) votes/probabillties. sole ran ‘fi ERROR VS CORRELATION & : ie Meta Model Classifier strength and accuracy 5 Train on base model predictions. [meses Lf Stronger classifiers + better 5 £ - 4 a wer p 4A Blending Final Prediction enon bles: B | Better +*y Use holdout set for meta-training. @ Bias-variance decomposition 5 | i. 5, BLENDING (HOLOOUT BASED) in ensembles i Multi-level stacking Ensembles reduce variance mainly. Average Correlation (a) Stacking on multiple layers. [trainser ] — [ Hoidoutset | Double-fault measure ERROR DECOMPOSITION Ensemble pruning it | Probability that two classifiers 3 5 ; pa ok Remowe weak or redundant models. Base Models |->| Meta-Learmer | make error on same instance. B= Bias? + a Entropy of ensemble A Blas ( ededge Nalse ; ror from cl @) Nenu of uncertainty in ensemble sims posta fl abutter predictions, assumptions = 6) SUMMARY CHEAT SHEET KEY FORMULA (INTUITION) oe Diversity More # Better Balance (5. Evaluate Smartly Ca Prune & Select = 1-p a J0%n Dierent errors aa Teo many sitwlar Strength, diversity | |2=| useoo8,cv, and | ee Keep the useful, Eeng = E ( pe ) 20 make powerful = models give and correlation == proper metrics, remove the useless. | M ensembles. diminishing returns. matter, | E: avg. individual error, p: avg. correlation, M: #models: S— = BEST PRACTICES Ensure diversity “ 4 _ Start simple, Track and analyze : Monitor correlation o> Use 00B or CV 7 f t ly ok in data, features, | & between’ nicdels | 1008! for reliable q add complexity | model and ensemble or algorithms. : estimation. when needed. performance. y GREAT ENSEMBLES DON’T JUST COMBINE MODELS — THEY COMBINE WISDOM. ROADMAP SECTION BOOSTING ALGORITHMS Weak Learners: okezC) Boosting Strong Algorithm Predictor 6+ G-@ ors TURNING WEAK LEARNERS INTO POWERFUL PREDICTORS @ (@ life o AdaBoost algorithm ADABOOST WORKFLOW Gradient Boosting Machines (GBM) LS-Boost (Least Squares Boosting) & 7 Sequentially focus on é Build madels sequentially to minimize Minimizes squared error using hard-to-classify examples. Training Data differentiable loss functions. least squares fitting. . (Diu, (= 4) eee ; 52's oo OL, _ Loss function derivatives Margin-based boosting f ses exponential loss to = — wu dients tirst'derivath : Bs ieee i GHaghtticlaesined nots | dy tue ne ee a ives) Directly optimizes the classification more heavily. = i margin: Train weak | Weight updating ater fu) Residual fitting ; Trust region boosting AYE inerease weschts ®t misslesuined | Fit new model to the negative Restricts each step to stay within instances, decrease others. ies te a gradient (residuals), a safe “trust region”. ‘ompute error €; 7 ite weights wy. (i) il il A + Ensemble combining rule [See ORR wea) / Ne search and me P Totally Corrective Boosting Weighted majority vote l ey eee ae ne eae Refits all weak learners at each of weak learners. i U ite nea: 7 paze, iteration to reduce training error. Fiz) = a(x) + ahe(z) ae ee AdaBoost.R2 for regression : = = oe eu win eaUrGats Early stopping in boosting po Extension of AdaBoost { sai Nees a orien Stop adding learners when validation for continuous targets. | Final prediction: ) Ri i Z a performance stops improving. sian(F7(x)) | H (ut) AdaBoost.M1 and M2 . 2 Eonteael ; OM Column subsampling - i Leaf-wise growth, histogram-based, oon (2) M1 for multi-class (discrete), and highly efficient BOG Randomly sample a subset of M2forconfidence-rated outputs, i features at each iteration. WEIGHT UPDATING (BINARY) CatBoost Real AdaBoost tr) : : J|\_ Uses predicted probabilities Correctly classified kh ee a he pelea ia le ‘ with ordered boosting, Randomly sample a subset of instead of class labels. eeeee ooo | cee ac a = instances at each iteration, GentleAdaBoost Wipeasnczs alh past obarh iaeee boasting ; nae = cz Smoother updates, less sensitive si] Bin continuous features to speed up lonotone constraints to outliers and noise. training and reduce memory. Enforce monotonicity in predictions Misclassified == for specific features. LogitBoost es Seals Leaf-wise vs level-wise growth = = Do does logistic regression 4euua Les!-wise Level-wise Custom loss functions in boosting via stagewise updates. Weight increases (best-first) (breadth-first) ‘Te Use domain-specific or custom ee Reo tL differentiable loss functions. i iti it * * = —— - Stagewise additive modeling a Ps, as ae Add weak learners one at a time io) Multi-class boosting to minimize training loss. e Extend boosting to handle EXPONENTIAL LOSS @ Newton boosting oie) aera fata " Uses second-order derivatives i = p79F(x) Ly, F(a) =e (Hessian) for better convergence. =r = z = BOOSTING INSIGHT ee GRADIENT BOOSTING PROCESS Each iteration focuses more on what I Pucaet the current model gets wrang. | aa ll Fit weak Update aft + | learner || As) = Fal) Data { Weak | { "Weak Strong rex) fel) | |e vba) ) | Model1 || Model 2 Model 2-10 1 2 P| : \ = = =a— Margin w(x) Objective: Mig =, Lye F(x) + Q0F) rom i — OCF): regularization term Fosus shutts to-hare.csamples KEY TAKEAWAYS Boosting turns many Focus on mistakes. } Gradient boosting Regularization, subsampling State-of-the-art models G) weak learners into | l to improve model uses gradients to and early stopping help | tg like XGBoost, LightGBM, astrong ensemble. | 0 al accuracy iteratively. solve optimization prevent overfitting. and CatBoost are based problems. on boosting. BEST PRACTICES Use proper learning rate Use early stopping Apply subsampling ‘34 Tune number of trees, Monitor training and prop Ing ee r 9 shrinkage): small n, with validation set (rows & columns) for 4=]| max depth, min child ww .| validation loss during 9 uf ¥ better generalization, I to avoid overfitting. robustness and speed. weight, and regularization. =< boosting process. g splee y BOOSTING IS THE ART OF CORRECTING MISTAKES — ITERATION BY ITERATION, WE BUILD EXCELLENCE. Unsupervised Discover Structure Learning & Insights ROADMAP SECTION DISCOVERING HIDDEN STRUCTURE. FINDING PATTERNS WITHOUT LABELS. (1) CLUSTERING METHODS (2) ADVANCED CLUSTERING | (3) CLUSTERING EVALUATION K-means clustering Gaussian Mixture Models (GMM) @ Silhouette score Partitions data into K Assumes data is generated from Measures how well each point clusters by minimizing @ mixture of Gaussian distributions, lies within its cluster within-clusler variance, T T Range: [-1, 1] {higher is better) |. Lada a _ -_ { 1 “h | Ss i K-medoids clustering SSS Davies-Bouldin index Uses actual data points as Expectation-Maximization (EM) oe See Ay betwee AS cluster centers (medoids), algorithm — (Below is better). 7 Iteratively estimates hidden is E-step M-step DB= — >) max Ry variables and parameters (Estimate) Maxinize) K a1 dai ies Hierarchical clustering mecha Builds a hierarchy (dendrogram) = &&) Calinski-Harabasz index of nested clusters, | Spectral clustering Ratio of between-cluster Uses eigenvalues of similarity dispersion to within-cluster fr : matrix to reduce dimensions dispersion (Higher is better). oS Agglomerative clustering bora chistring. ppteciehes tio Bottom-up approach: start a\> - from individual points and : \ Graph-based clustering Models data as a graph and Ed partitions it using graph theory, merge clusters. Dunn index Ratio of minimum inter-cluster (> Divisive clustering = Top-down approach: start distance to manimum intracluster with all paints and recursively A = ae diameter (Higher is better). 5% Hileomploripariag Sele © = i -- a ib i ad Gap statistic Gap Linkage criteria PS Sages DERECHOS gs i Ie, Compares within-cluster disparsion Dafine distance between clusters to that expected under a null reference distribution. in hierarchical clustering, * Single = Average ze Mean Shift clustering Finds clusters by shifting points | towards high-density regions, Kernel K-means Applies K-means in a high- Number of clusters (k} Stability-based evaluation Measures consistency of clusters across different subsamples ‘Of perturbations. & split clusters. oO Pei & Affinity Propagation a a3 = Complete © Ward © © ® DBSCAN Density-based clustering that finds arbitrarily shaped clusters |, and detects noise. dimensional feature space Clustering tendency via kernel trick. © Density-based clustering , = assessment Groups dense regions separated /- ae, i CE Cp ale Evaluates whether data has by low-density areas. Possibilistic clustering m meaningful clustering structure. Allows partial membership Xz =i (degree of belonging) to clusters, : =e - + Et Core points and density a ong fa Hopkins statistic i) 05 1 Grapes Wir RRIF a Tests clustering tendency re points have at least MinPts comparing nearest neighbors Random No Chistered ‘wathin Epsilon neighborhood. Self-Organizing Meps (SOM) ©) rei aad of dala vs random points, neni Projects high-dimensional data i ae onto a low-dimensional grid ees Dip Epsilon-neighborhood preserving topology Ne ae Ww beads bags i Tests unimodality vs multimodality y i eig! : ; of the data distribution, Unmodal Mubimodal around a point, Growing Neural Gas Inceementally grows a neural Cluster validity indices network to represent the data General term for indices that distribution. measure quality of clustering solutions. @® © © = UNSUPERVISED LEARNING AT A GLANCE APPLICATIONS a ay ae oe = Customer segmentation &- Rs 2 Q =? é => q (oo — =C= + Anomaly detection eof eo o @ + Markel analysis + Image segmentation Unlabeled Discover Group: Evaluate Extract Bi 6 u + Bioinformatics Data Patterns Similar Points Clusters Insights + Recommendation systems BEST PRACTICES \ Understand your ODO standardize Try multiple Use multiple Check stability a, Visualize results Validate clusters data distribution | oo 7 features when \§ algorithms. and | all] evaluation metrics, across runs and | » @ inreduced | with domain and scale. oO OF pecassary, parameters, lo not just one. subsamples. dimensions. knowledge. “ UNSUPERVISED LEARNING REVEALS THE UNKNOWN. GOOD CLUSTERS, BETTER DECISIONS. ROADMAP SECTION DIMENSIONALITY REDUCTION © 15 | TECHNIQUES SIMPLIFY DATA. PRESERVE INFORMATION. REVEAL INSIGHTS. %3 Component 2 Low-p Component 1 (Simplified) (Complex) — —— =i +| | @| @| © —* LINEAR DIMENSIONALITY REDUCTION Principal Component Analysis (PCA) Projects data onto orthogonal ‘components that capture maximum variance. Eigenvalue decomposition Decompose covariance matrix £ = VAV" to obtain eigenvalues (A) and eigenvectors (V) E=Vav . Singular Value Decomposition (SVD) Factorize data matixX =UEv?, X= E Used for PCA and noise reduction. Variance explained Measures how much information is retained by the components. Explained Variance Ratio = pat Ay Scree plot Fi Plot eigenvalues In descending = order to chogse number of f components. va: Principal axes New axes (PCs) are orthogonal and ordered by explained varianee. Whitening transformation Transforms data to have unit oo 9. variance and zero covariance, \* > ye Z=AYEVT (X—y) sa Independent Component Analysis (ICA) Finds statistically independent © [ Pivieutls independent components, not just uncorrelated. | 7 penis BAe Canonical Correlation Analysis (CCA) Finds linear projections that maximize correlation between two sets of variables. Linear Discriminant Analysis (LDA) Finds projection that maximizes = NONLINEAR DIMENSIONALITY REDUCTION Kernel PCA Applies PCA in a high-dimensional feature space via kernel trick, Kernel Manifold learning Assumes data lies on a low-dimensional manifold embedded in high-D space. Preserves geodesic distances ° on the manifold. o > «¢ 6 = Be is =e 2 o ——___—— Locally Linear Embedding (LLE) @ Preserves local neighborbood ion -o-90 relationships. Spock 4 ‘ o Laplacian Eigenmaps Uses graph Laplacian to preserve Pl > “ ¥ oo local structure. c f fe NE Ee Diffusion maps 22 Captures data geometry through diffusion process on a graph. t-SNE Preserves local structure and reveals clusters in 20/30. eta 548 as Ore UMAP Fast manifold learning that preserves both local and global structure. TriMap Optimizes triplet relationships to maintain global structure. Ls (3) FEATURE SELECTION METHODS FILTER METHODS all Correlation-based selection Select features with high correlation with target and low inter-correlation. (6% Chi-square test for x) feature selection Measures independence és c between categorical ra ‘i x (0, —E,,)* features and target. 9 2° = ij Ey isl j= Information gain based selection Select features that provide maximum information about the target. om” IG(Y,X) = H(Y) — H(¥|X) WRAPPER METHODS Recursive Feature Elimination (RFE) Recursively train model and remove least important ome «BGO + 80-2 Forward selection Start with no features and add the best feature ateach step, EC) + i) +| (-] + Backward elimination Start with all features and the worst featur nathaes 7 B00 ~BGl EMBEDDED METHODS Embedded methods Feature selection occurs during model training (e.g., Lasso, Tree-based). @! @l) ©] @| O ili arametric t-SNE a 3 ae = Cass senereahy: Estemetsicsss Low-D Stability selection Run) Run2 = RunM Uses a neural network to learn Embedding ; th the low-dimensional mapping. Select features that are | wl al as consistently chosen across u uw High-D Data —> Site _ multiple subsamples. Q sale CO Features DIMENSIONALITY REDUCTION AT A GLANCE Why Reduce Linear vs Nonlinear Key Trade-off Applications Goal Dimensions? Linear: Global Information Visualization Keep what * Reduce noise structure | Preservation | Compression ) matters sinevesticany Nonlinear: Local ee sone Remove what s Msvelice ee & complex Simplicity Frsprmeessing ; doesn't. ® Avoid overfitting ‘Structure Feature engineering ~ BEST PRACTICES Standardize data Use scree plot or Preserve interpretability Validate reduced Beware of over-compression il before applying pt cross-validation to when it impacts | representations with | /\ that removes important Ql most techniques, choose components. decision-making. downstream tasks. information. P 4 SIMPLIFY WISELY. REDUCE DIMENSIONS, NOT UNDERSTANDING. ROADMAP SECTION 1 PROBABILISTIC MODELS 7 | AND BAYESIAN METHODS AX * Ge: MODELING UNCERTAINTY. LEARNING FROM DATA. MAKING BETTER DECISIONS. Prior Beliefs Data Evidence Posterior Knowledge = te (1) BAYESIAN FUNDAMENTALS Bayes theorem : (et) Update beliefs using evidence. L PUD) = P(D) P(H) () Prior distribution Initial beliet before seeing data, 33 ; 120) Likelihood function epi J Probability of data given hypothesis. D Posterior distribution P(H|D) Updated belief after seeing data. go Bayesian inference Combine prior and likelihood to get posterior. Conjugate priors Prior and posterior are in the same family. @) © @-A Beta-Binomial conjugacy petaia,s) Posterior Beta prior with Binomial ‘a > f. likelihood. Normal-Normal conjugacy yj, 92) posterior Normal prior with Normal i mi ® © likelihood. . (2) BAYESIAN MODELS Naive Bayes classifier is Assumes conditional independence | , *, ,% among features. e Bayesian linear regression Place priors over weights. yoXwte, e~.N(0,0%) Bayesian logistic regression ~~ For binary classification problems, Gaussian processes (GP) Distribution over functions. rom ee F(x) ~ GP(m(x), ke, x") <7 S| Be a (ee Kernel methods for GP Use kernel K(x, x’) to define : High ‘ " similarity in high-dimensional space. | " i, tow GP mean and covariance Mean m(x) functions Define the shape and smoothness js of functions. Covariance k(s, x") GP inference Predictive distribution at new points, | _. — Observed ae + @ ‘aan Baad ne 5 - A INFERENCE PARADIGMS o"%o oon . Initialize Propose Accept / Repeat & Input Kernel Compute Predictive Sf _ eh, 8 a Reject Collect Data Matrix Posterior Distribution isi heoeaviole ieuael = — (intractable) (BP / MCMC) (Fast) WHY PROBABILISTIC MODELS MATTER . Quantify | a] Incorporate | Prevent By Make better | Generalize to new O uncertainty prior knowledge overfitting decisions and unseen data BEST PRACTICES Choose priors —| Check model v ed Use MCMC for Validate with Interpret results in | ioo“e, Monitor uncertainty, @ | |Z= | | all | | Bes carefully. = assumptions. of%¢ complex models. al held-out data. %, domain context. “not just accuracy. io PROBABILISTIC THINKING LEADS TO ROBUST MODELS AND RELIABLE DECISIONS. ROADMAP SECTION RECOMMENDATION SYSTEMS 18 | AND RANKING DELIVER THE RIGHT ITEM TO THE RIGHT USER AT THE RIGHT TIME. O Recommendation Engine Recommended Items. 2-S- Bag RECOMMENDATION FUNDAMENTALS Eta Content-based filtering Recommend items similar to those a user liked in the past. Feature extraction f recommendations Extract informative features from items (e.g., T-IDF, embeddings, tags). sports tech action Al music fun Item similarity computation he desea Compute similarity between items using cosine similarity, "2 User similarity computation Compute similarity between users based on their preferences. a Collaborative filtering Recommend based on the collective behavior of users. RECOMMENDATION METHODS Bo Memory-based methods Use similarity to make predictions from neighbors. User-based collaborative filtering Find users with similar preferences i and recommend what they liked. Ot Os Os, Cee oe f ‘ Item-based collaborative filtering Recommend items similar ta those a user has interacted with, “Sty ag Model-based methods Learn a model from data to predict preferences. = Matrix factorization mm fp . flee Factorize user-item matrix ied A R= Pgh R P Singular Value Decomposition (SVD) ADVANCED RECOMMENDATION Factorization Machines Madel interactions between features efficiently. F=wot Vm + ; ye U,V) Xity ixp o Tensor Factorization Capture multi-way interactions (e.g., user x item x time). User or Context-aware recommendations Incorporate context such as time, location, device, mood. GME © Sequential recommendations Model user behavior sequences: with RNN/Transformer. [=f] Point-of-Interest (POI) RANKING AND LEARNING-TO-RANK Ranking problem formulation Given a user/query, order items by relevance. ? —- BEBE Ranked list 1 2 3 n Pointwise ranking 2 Oot Predict a relevance score for each item independently. fa Be | 0.38 ihe | iq 0.10 Pairwise ranking Learn preferences. between pairs of items. = B] > [al eases fg Le fle} verb Listwise ranking 1 Model the entire ranked fst 2 to optimize-a list-level metric, User-item interaction matrix ft lz fs =. In aul s f Decompose R = UEV" recommendations RankNet (pairwise) Matrix R where R,,; is the ca 1.0/1 ../0 F Recommend places based an ict pairwit interaction (explicit/implicit). uz |Q 1 0 ../1) jester al sea ialennese Neural network to Predict pairwise alalla > ‘ preference probabilities, tet 0 0.1 : FEES 4] (ie | res Non-negative Matrix Factorization (NMF) fo Sparsity problem in fiepeshate] Factorize R= WH, WH > 0 Cross-domain patty : dati LambdaRank recommendations recommendations bet eer chem interacting : Leverage information across Optimize ranking metrics directly g . : Fs different domains. using gradient boosting, are missing. EE | note = Weighted matrix factorization h te : * ne " [] *. BR | MAP Cold-start problem i : 7 a p lb O 5 Incerporate eonfidence/wesias sTojalc is Multi-stakeholder LambdaMART New users or items with little fa ) : for implicit feedback. —1.[3/0'e]a) eae en dnabe imbda or no interaction history. aula ened Bakes pereearees oh lacy sae boosted decision woo br Implicit feedback models Parties (users, providers, platforms), op otal: Popularity bias Learn from clicks, views, purchases, RASA ee : : Over-recommending papular ratings as implicit signals, ; Learning-to-Rank evaluation ; items to many users. ty hae * Fairness in recommendations Metrics: NDCG, MAP, MRR, mo Ensure fair exposure and: avoid Precision@k, Recall@k. lala a bias and discrimination. a Bayesian factorization ae 5 Filter bubbles yo ak uae a -Or__ Diversity in recommendations Rank aggregation Use Bayesian priors and probabilistic ai Provide varied and novel Combine rankings from multiple Limited exposure leads to inference for robustness. recommendations models or experts. model 1 —4 Py narrowed preferences. Mode 2 + —} AA Explainable recommendations Models 4 martina f AN, Provide reasons: “Because you SES — ee om ‘Aand 8°. Expert ranking oO - aietauee ys yon Leverage domain experts ta Cee} A = liked Aand 8° | improve ranking quality. i“ RECOMMENDATION PIPELINE 1, Data Collection 2, Preprocessing 3. Modeling 4, Serving {ro 5. Feedback Loop Q Interactions, content, > Cleaning, feature ? Train recommendation Generate top-N > we Collect feedback and context, metadata engineering models recommendations update models BEST PRACTICES OOD Handle sparsity Address cold-start Evaluate with Balance accuracy re Monitor in production Protect privacy with robust aps with side information | HO, offline and online | with diversity | —_ and adapt to i] and ensure algorithms. and hybrid models. metrics, and fairness. user feedback. transparency. o BETTER RECOMMENDATIONS. BETTER EXPERIENCES. STRONGER RELATIONSHIPS.