Download Data Analysis - Lecture Slides - Fall 2008 | IST 400 and more Study Guides, Projects, Research Information Technology in PDF only on Docsity!
Data AnalysisData Analysis
Andrea Wiggins
IST 400/
April 14, 2008p
Data Analysis
- • Data are collectedData are collected, created, and kept for the purpose of analysis
- Without analysis, it’s just a bunch of bitsjust a bunch of bits
- Data managers need familiarity with analysis practices http://flickr.com/photos/techne/100055322/
Overview
- • Types of data analysisTypes of data analysis
- Requirements for analysis
- Basic steps in data analysis
- Types of tools
- Scientific analysis workflows• Scientific analysis workflows
- Types of analysis output
Requirements for Analysis
- • DataData
- Analysis design
- Analysis tools
- Computing resources to run ththe analysis l i
- Human expertise
http://www.flickr.com/photos/anikarenina/369089979/
Complexity
- • Both analytic and computationalBoth analytic and computational
complexity are relevant
- Some operations are “cheap” and others are “expensive”
- Number of calculations required - every ffunction is made of other functions ti i d f th f ti
- Execution in serial versus parallel
processing: how many tasks at once?
Serial & Parallel Processes
Serial
Parallel
Small Scale Computing
- • RegularRegular microcomputers like your laptop
- Ordinary consumer PCs are able to doare able to do some significant computational work http://www.flickr.com/photos/cayusa/431036565/
Moderate Scale Computing
- • Relatively smallRelatively small, locally-managed clusters - Google’s smallest cluster: 13 servers - Reservoir Simulation Joint Industry Project’s cluster -> (^) http://www.cpge.utexas.edu/rsjip/
Confirmatory Data Analysis
- • Uses statisticalUses statistical tests to confirm or falsify hypotheses
- You know what you’re looking for
- • Analysis is usuallyAnalysis is usually carefully planned in advance
http://flickr.com/photos/activitystory/105110622/
Exploratory Data Analysis
- • Methods used for data miningMethods used for data mining
- Nontrivial knowledge discovery from data
- Looking at data to form hypotheses for
CDA testing (on a different data set)
- Don’t always know what you’re lookingDon t always know what you re looking
for, analysis evolves over time
- Caution: sometimes you find what
you’re looking for, even if it isn’t there!
Qualitative Data Analysis
- Most common in socialMost common in social sciences, where data sets are usually smaller
- Uses a variety of methods to analyze non-numerical data
- • Many qualitativeMany qualitative analysis methods are difficult or impossible to
http://flickr.com/photos/valix/939388335/ automate
Context of Analysis
- • ScientificScientific inquiry
- Business intelligence
- Monitoring
- Carefully plannedCarefully planned http://flickr com/photos/makou0629/1145908929/ regular reporting
- As-needed ad hoc analysis
http://flickr.com/photos/makou0629/1145908929/
Data Analysis: Revise &
Run
- • Revise the analysis & test againRevise the analysis & test again
- Also known as debugging
- Good idea to compare manually and automatically computed results when possible to verify that everything works
- Repeat as neededR d d
- Run the full analysis when ready
Data Analysis: Save Output
- • Save/export the analysis resultsSave/export the analysis results,
artifacts, and appropriate metadata
- Data selection criteria, sample, analysis design version
- When analysis was run, by whom
- System details
- Other relevant details dictated by your context of inquiry
Data Analysis: Use
- • Write up the resultsWrite up the results
- Often requires returning to the raw data, analyzed data, and other information about the analysis
- Questions always arise…
- Something looks out of place, doesn’t make sense, can’t possibly be true
- Double-check everything: results, analysis records, analysis metadata
Very Important Details
- • Data formatsData formats
- Format/s of raw data in source/s
- Format/s required for analysis
- Format/s of outputs: image, csv, statistics, descriptive text, etc.
- Data manipulation
- Moving from source to analysis to usable results, without losing/abusing anything
Analysis Workflows
- • Scientists Scientists “need access to tools andneed access to tools and
services that help ensure that metadata
are automatically captured or created in
real-time” - Cyberinfrastructure Vision for 21st Century
Discovery
- • Taverna Workbench demo videoTaverna Workbench demo video
- Example of a scientific workflow analysis tool, used for genetics - and social science!
- http://floss.syr.edu/Presentations/TavernaDemoRedux.m4v
Analysis Outputs
- • Most analysis startsMost analysis starts as numbers and ends up as words - Scholarly articles - White papers - Technical reportsp
- Visualizations
- More on Wednesday http://flickr.com/photos/thiru/278930492/
Dashboard Reports
- At-a-glance reports for regular, ongoing monitoring
- Uses many visualizationsi li ti
- Usually intended for managers & executives
http://flickr.com/photos/jauladeardilla/345883088/
Concluding Thoughts
- • Understanding how data is used willUnderstanding how data is used will
help you manage it better
- Planning ahead makes data analysis go
more smoothly
- Data analysis almost never goesa a a a ys s a os e e goes
perfectly
- Analysis is the fun part of research,
when discoveries are made