

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
A concise overview of key concepts and techniques in machine learning. It covers supervised and unsupervised learning, machine learning algorithms like support vector machines (svms) and decision trees, and related concepts such as high-dimensionality, feature selection, and feature engineering. The document also touches upon the application of machine learning in bioinformatics and genomics.
Typology: Exercises
1 / 3
This page cannot be seen from the preview
Don't miss anything!


Machine Learning Machine learning ✔✔form of artificial intelligence that does not require you to explicitly program, learning from patterns in data Supervised learning ✔✔type of machine learning in which the response variable is known Unsupervised learning ✔✔type of machine learning in which the response variable is unknown R ✔✔a mathematics based programming language that is often used for machine learning WEKA ✔✔a graphical program (or a visual programming language) High-dimensionality ✔✔a high-dimensionality problem is a machine-learning problem in which the number of dimensions or features is much more than the number of cases Clustering ✔✔part of unsupervised learning that in which similar cases are grouped together Classification ✔✔the process of using machine learning to identify different cases based on patterns found in data (example: classifying tumors as malignant or benign; classifying emails as spam or not spam) SVM ✔✔(Support Vector Machine) a machine learning algorithm that computes a hyperplane in order to separate different classes of data points (example: a SVM could be used to compute a hyperplane that separates data points that represent cancer and not cancer) OVA ✔✔(One vs. All) special types of SVMs that are used in multi-class problems - builds one SVM that compares each class to the rest of the classes (example: in a problem regarding the diagnosis of 14 different cancers, 14 SVMs would be built such as breast cancer vs. everything else, prostate cancer vs. everything else, etc.) Decision tree ✔✔a machine learning algorithm that makes a map with generic characteristics that can be used to determine what class a specific case falls into
Multi-class ✔✔many classes Binary class ✔✔2 classes Gene expression ✔✔ (from Wikipedia) process by which information from a gene is used in the synthesis of a functional gene product. Products are often proteins, but in non- protein coding genes such as ribosomal RNA (rRNA), transfer RNA (tRNA) or small nuclear RNA (snRNA) genes, the product is a functional RNA. Gene expression levels ✔✔(from Wikipedia) the ability to quantify the level at which a particular gene is expressed within a cell, tissue or organism. Ideally measurement of expression is done by detecting the final gene product (for many genes this is the protein) however it is often easier to detect one of the precursors, typically mRNA, and infer gene expression level. DNA microarray ✔✔technology used to determine gene expression levels Poorly differentiated cancer ✔✔a cancer whose origin is difficult to determine LOOCV / cross-validation ✔✔cross validation - randomly splitting the data into n number of groups and training the data on n-1 groups and testing it on a different group every time; LOOCV is a special case in which n = the number of cases Feature selection ✔✔normally used in high-dimensionality problems to pick features that play a bigger role in making a prediction Feature engineering ✔✔altering features based on non-linear relationships (examples: log, squaring, doubling, etc.) S2N ✔✔ (signal-to-noise ratio) if the data is really noisy (many missing values, nonsensical values, etc.) then the signal will not be right meaning that the S2N is low (opposite if S2N is high) Synonyms for feature ✔✔attribute, predictor, variable, independent variable, dimension Synonyms for response variable ✔✔prediction, signal, dependent variable