Data Analysis - Lecture Slides - Fall 2008 | IST 400, Study Guides, Projects, Research of Information Technology

Material Type: Project; Class: Selected Topics; Subject: Information Studies; University: Syracuse University; Term: Spring 2009;

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 08/09/2009

koofers-user-2ej
koofers-user-2ej 🇺🇸

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
3/25/2009
1
Data Analysis
Data
Analysis
Andrea Wiggins
IST 400/600
Ap
ril 14, 2008
p
Data Analysis
Data are collected
Data
are
collected
,
created, and kept for
the purpose of
analysis
Without analysis, it’s
just a bunch of bits
just
a
bunch
of
bits
Data managers
need familiarity with
analysis practices
http://flickr.com/photos/techne/100055322/
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Data Analysis - Lecture Slides - Fall 2008 | IST 400 and more Study Guides, Projects, Research Information Technology in PDF only on Docsity!

Data AnalysisData Analysis

Andrea Wiggins

IST 400/

April 14, 2008p

Data Analysis

  • • Data are collectedData are collected, created, and kept for the purpose of analysis
  • Without analysis, it’s just a bunch of bitsjust a bunch of bits
  • Data managers need familiarity with analysis practices http://flickr.com/photos/techne/100055322/

Overview

  • • Types of data analysisTypes of data analysis
  • Requirements for analysis
  • Basic steps in data analysis
  • Types of tools
  • Scientific analysis workflows• Scientific analysis workflows
  • Types of analysis output

Requirements for Analysis

  • • DataData
  • Analysis design
  • Analysis tools
  • Computing resources to run ththe analysis l i
  • Human expertise

http://www.flickr.com/photos/anikarenina/369089979/

Complexity

  • • Both analytic and computationalBoth analytic and computational

complexity are relevant

  • Some operations are “cheap” and others are “expensive”
  • Number of calculations required - every ffunction is made of other functions ti i d f th f ti
  • Execution in serial versus parallel

processing: how many tasks at once?

Serial & Parallel Processes

Serial

Parallel

Small Scale Computing

  • • RegularRegular microcomputers like your laptop
  • Ordinary consumer PCs are able to doare able to do some significant computational work http://www.flickr.com/photos/cayusa/431036565/

Moderate Scale Computing

  • • Relatively smallRelatively small, locally-managed clusters - Google’s smallest cluster: 13 servers - Reservoir Simulation Joint Industry Project’s cluster -> (^) http://www.cpge.utexas.edu/rsjip/

Confirmatory Data Analysis

  • • Uses statisticalUses statistical tests to confirm or falsify hypotheses
  • You know what you’re looking for
  • • Analysis is usuallyAnalysis is usually carefully planned in advance

http://flickr.com/photos/activitystory/105110622/

Exploratory Data Analysis

  • • Methods used for data miningMethods used for data mining
    • Nontrivial knowledge discovery from data
  • Looking at data to form hypotheses for

CDA testing (on a different data set)

  • Don’t always know what you’re lookingDon t always know what you re looking

for, analysis evolves over time

  • Caution: sometimes you find what

you’re looking for, even if it isn’t there!

Qualitative Data Analysis

  • Most common in socialMost common in social sciences, where data sets are usually smaller
  • Uses a variety of methods to analyze non-numerical data
  • • Many qualitativeMany qualitative analysis methods are difficult or impossible to

http://flickr.com/photos/valix/939388335/ automate

Context of Analysis

  • • ScientificScientific inquiry
  • Business intelligence
  • Monitoring
    • Carefully plannedCarefully planned http://flickr com/photos/makou0629/1145908929/ regular reporting
    • As-needed ad hoc analysis

http://flickr.com/photos/makou0629/1145908929/

Data Analysis: Revise &

Run

  • • Revise the analysis & test againRevise the analysis & test again
    • Also known as debugging
    • Good idea to compare manually and automatically computed results when possible to verify that everything works
    • Repeat as neededR d d
  • Run the full analysis when ready

Data Analysis: Save Output

  • • Save/export the analysis resultsSave/export the analysis results,

artifacts, and appropriate metadata

  • Data selection criteria, sample, analysis design version
  • When analysis was run, by whom
  • System details
    • Time to run, exceptions
  • Other relevant details dictated by your context of inquiry

Data Analysis: Use

  • • Write up the resultsWrite up the results
    • Often requires returning to the raw data, analyzed data, and other information about the analysis
  • Questions always arise…
    • Something looks out of place, doesn’t make sense, can’t possibly be true
    • Double-check everything: results, analysis records, analysis metadata

Very Important Details

  • • Data formatsData formats
    • Format/s of raw data in source/s
    • Format/s required for analysis
    • Format/s of outputs: image, csv, statistics, descriptive text, etc.
  • Data manipulation
    • Moving from source to analysis to usable results, without losing/abusing anything

Analysis Workflows

  • • Scientists Scientists “need access to tools andneed access to tools and

services that help ensure that metadata

are automatically captured or created in

real-time” - Cyberinfrastructure Vision for 21st Century

Discovery

  • • Taverna Workbench demo videoTaverna Workbench demo video
    • Example of a scientific workflow analysis tool, used for genetics - and social science!
    • http://floss.syr.edu/Presentations/TavernaDemoRedux.m4v

Analysis Outputs

  • • Most analysis startsMost analysis starts as numbers and ends up as words - Scholarly articles - White papers - Technical reportsp
  • Visualizations
    • More on Wednesday http://flickr.com/photos/thiru/278930492/

Dashboard Reports

  • At-a-glance reports for regular, ongoing monitoring
  • Uses many visualizationsi li ti
  • Usually intended for managers & executives

http://flickr.com/photos/jauladeardilla/345883088/

Concluding Thoughts

  • • Understanding how data is used willUnderstanding how data is used will

help you manage it better

  • Planning ahead makes data analysis go

more smoothly

  • Data analysis almost never goesa a a a ys s a os e e goes

perfectly

  • Analysis is the fun part of research,

when discoveries are made