Prüfungen vorbereiten
Punkte erhalten
Leitfäden und Tipps
Auf Docsity verkaufen
Docsity AI

Prüfungen vorbereiten

Besser lernen dank der zahlreichen Ressourcen auf Docsity

Download-Punkte bekommen.

Heimse Punkte ein, indem du anderen Studierenden hilfst oder erwirb Punkte mit einem Premium-Abo

Leitfäden und Tipps

Auf Docsity verkaufen

Docsity AI

Einloggen Registrierung

Prüfungen vorbereiten

Besser lernen dank der zahlreichen Ressourcen auf Docsity

Dokumente suchen

Bereite deine Prüfungen mit den Lernmaterialien von anderen Studierenden gezielt vor.

Finde spezifische Dokumente für die Prüfungen deiner Universität

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Download-Punkte bekommen.

Heimse Punkte ein, indem du anderen Studierenden hilfst oder erwirb Punkte mit einem Premium-Abo

Dokumente teilen

20 Punkte

Für jedes hochgeladene Dokument

Alle Arten, um Gratis-Punkte zu bekommen

Sofort Punkte bekommen

Wähle ein Premium-Abo mit allen Punkten, die du benötigst

Studienangebote

Wähle dein nächstes Studienprogramm

Setz dich jetzt mit den besten Universitäten der Welt in Verbindung. Wähle zwischen Tausenden Universitäten und offiziellen Partnern.

Community

Kostenlose Leitfäden

Unsere Studi-Retter-Ebooks!

Lade unsere Leitfäden mit Lernmethoden, Hilfen zur Angstbewältigung und von Docsity-Tutoren erstellte Tipps zum Verfassen von Haus- und Abschlussarbeiten kostenlos herunter

Presentation about Apache Spark MLib, Grafiken und Mindmaps von Data Analysis (and Programming) with R

Universität Duisburg-Essen Data Analysis (and Programming) with R

Prof. Khbaiz Hadil

Apache Spark MLib • Introduction • Features • Classification • Clustering • MLlib pipeline concept • Pros and cons • Installation • Demo/usecases • Conclusion • Resources

Art: Grafiken und Mindmaps

2021/2022

Hochgeladen am 03.12.2022

hadeel-khibaiz 🇩🇪

1 dokument

1 / 25

Diese Seite wird in der Vorschau nicht angezeigt

Lass dir nichts Wichtiges entgehen!

Entdecken Grafiken und Mindmaps von Data Analysis (and Programming) with R Universität Duisburg-Essen

Zugehörige Dokumente

Messwiederholte Varianzanalyse

R Studio Zusammenfassung Statistik II

Schweißen von Mischverbindungen artverschiedener Stähle ...

(1)

Prüfungsschema personenbezogene Daten

Kodierungen für kategoriale Variablen

Einfaktorielle Varianzanalyse

Mehrfaktorielle Varianzanalyse

Kovarianzanalyse (ANCOVA)

Hauptkomponentenanalyse (PCA)

Einfache lineare Regression

Explorative Faktorenanalyse

Zusammenfassung aller Themen von Statistik II

Unvollständige Textvorschau

Nur auf Docsity: Lade Presentation about Apache Spark MLib und mehr Grafiken und Mindmaps als PDF für Data Analysis (and Programming) with R herunter!

Main

Concepts

Introduction
Features
- Classification
- Clustering
- MLlib pipeline concept
Pros and cons
Installation
Demo/usecases
Conclusion
Resources

Introducti

on

High Level Goals Mllib is Apache Spark‘s library which is Practical ML scalable and easy Simplify the development and deployment of scalable machine learning pipelines

Introducti

on

The primary ML API for spark is now DataFrame-based and the Mllib RDD- based API is in maintenance mode.
What are the implications?
- Mllib will still support the RDD-based API in spark.mllib with bug fixes
- Mllib will not add new features to RDD-based API
- The RDD-based API is expected to be removed in Spark 3.
Why is Mllib switching to the DataFrame-based API?
- DataFrames provide a more user-friendly API than RDDs
- DataFrames facilitate practical ML pipelines, particularly features and transformations

Featur

es

Classification

Mllib provides algorithms for classification, such as Decisio Tree, Naive Bayes, …
Naive Bayes is a simple probabilistic classification with independence assu ptions between pair of features Clustering
K-means is one of the most commonly used clustering algorithms which clusters data points in to a predefined number of clusters

Load/Clean Data Transformer Evaluator (^) Estimator

Featur

es

MLlib Pipeline Concept

Evaluate the model performance Help with automating the model tuning process

Featur

es

Pipeline

To represent a ML workflow
Consist of a set stages
Leverage the uniform API of transformer & estimator
A type of estimator –fit()
Can be persisted

Featur

es

Extraction

Extacting features from raw data
Word to Vector is an estimator which takes sequences of words representing documents
The model maps each word to a unique fixed-size vector
This vector can then be used as features for prediction, document similarity, calculations , …

Pros &

Cons

• Pros

Scalability
Performance
User-friendly API´s
Integration with SparkSQL, Streaming & GraphX
Cons
Configurability
Reliability
High-memory Consumption

Presentation about Apache Spark MLib, Grafiken und Mindmaps von Data Analysis (and Programming) with R

Zugehörige Dokumente

Unvollständige Textvorschau

Nur auf Docsity: Lade Presentation about Apache Spark MLib und mehr Grafiken und Mindmaps als PDF für Data Analysis (and Programming) with R herunter!

Main

Concepts

Introducti

on

Introducti

on

Featur

es

Featur

es

MLlib Pipeline Concept

Featur

es

Featur

es

Pros &

Cons

• Pros

Installin

g…

Codi

ng

Coding

Codi

ng

Coding