Grids and Workflows: Scientific Workflow Execution on Data Grids, Papers of Computer Science

An overview of scientific workflows and grids, focusing on workflow design, scheduling, fault tolerance, and data movement. It covers various systems such as kepler, chimera, and griddb, and discusses challenges in executing workflows on the grid. Students and researchers in computer science, data science, and engineering may find this document useful for understanding the concepts and technologies behind grid computing and scientific workflow execution.

Typology: Papers

Pre 2010

Uploaded on 08/16/2009

koofers-user-709
koofers-user-709 🇺🇸

5

(1)

9 documents

1 / 55

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Grids and Workflows
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37

Partial preview of the text

Download Grids and Workflows: Scientific Workflow Execution on Data Grids and more Papers Computer Science in PDF only on Docsity!

Grids and Workflows

Overview

  • Scientific workflows and Grids
    • Taxonomy– Example systems
      • Kepler revisited• Data Grids
        • Chimera– GridDB

Executing Scientific Workflows on

Grids

  • Grids can address many challenges of

scientific workflow execution– Scalability– Detached execution

  • Many systems have been developed to aid

in design and execution of Grid workflows

Taxonomy

  • Classifies 4 elements of workflow systems

in context of Grid computing– Workflow design– Workflow scheduling– Fault Tolerance– Data Movement

Workflow Design

•^

Workflow Model/Specification

defines workflow

including task definition and structure definition

  • Abstract model
    • Workflow specified without referring to specific

resources

  • Concrete model
    • Bind workflow tasks to specific resources
      • Applications that use abstract can generate

concrete model before or during execution

Workflow Design

•^

Workflow Composition System

enables users

to assemble components into workflows

  • User-directed
    • Users edit workflows directly– Language-based (e.g., XML)– Graph-based (e.g., Kepler)
      • Automatic
        • Generate workflows from higher-level

requirements, e.g., data products, input values

  • Difficult to capture functionality of components

Workflow Scheduling

  • How to map workflows onto resources?• Decisions can be based on current task or

subworkflow (local) or entire workflow(global)

  • Global decisions may produce better

results, but high overhead

Workflow Scheduling

  • How to translate abstract models to

concrete models?

  • Static – concrete models generated before

execution– User directed or simulation based

  • Dynamic – make decisions at runtime
    • Prediction-based or just in time

Fault Tolerance

  • Failures may occur for a variety of reasons:

network failure, overloaded resourceconditions, non-availability of components

  • Failure handling: task-level and workflow-

level– Task-level – mask the effects of the failure– Workflow-level – manipulate workflow

structure

Fault Tolerance

  • Task level
    • Retry– Alternate resource– Checkpoint/restart– Replication
      • Workflow level
        • Alternate task– Redundancy– User-defined exception handling– Rescue workflow

Intermediate Data Movement

  • Centralized
    • Easy to implement– Good when large-scale data flow not required
      • Mediated
        • Intermediate data managed by distributed data

management system

  • Good when want to keep data for later use
    • Peer-to-Peer
      • Good for large-scale data transfer– But more difficulties to deployment

Some examples

  • Kepler• Taverna• Triana• GrADS• Pegasus

Taverna

  • Workflow management system of the myGrid

project

  • Workflow can be expressed either graphically

(Kepler-like GUI) or XML-based language(SCUFL)

  • Allows implicit iteration over incoming datasets• Allows multithreading to speed up interation• Good for services capable of simultaneous

processing, e.g., those backed by a cluster

Triana

  • Visual workflow-oriented data analysis

environment

  • Clients can log in to Triana Controlling

Service (TCS)

  • TCS can execute locally or distribute

based on distribution policy– Parallel – no host-based communication– Peer-to-peer – intermediate data passed

between hosts

  • Resources dynamically allocated