e-Science is scientific research enabled by widely distributed computational resources in collaboration among several institutes. One of issues in making use of e-Science infrastructure is to define complex workflows (composition of many tasks and their dependencies). We propose to employ Rake as a workflow definition language. In contrast to Makefile, Rake is an internal DSL and takes advantage of Ruby's scripting power which requires to define complex scientific workflows. In order to execute Rake workflow on distributed computer resources, we develop Pwrake, an Parallel Workflow extension for Rake. Pwrake is designed to work on Gfarm, a wide-area distributed file system. Gfarm provides a unified filesystem and consistent file time stamps among distributed computers, and also high performance of parallel I/O. We show a powerfulness of Rake as a workflow language and the scalable performance of distributed computing using Pwrake and Gfarm.
Pwrake : a Distributed Workflow Engine for e-Science
This presentation, by Masahiro Tanaka , is licensed under a Creative Commons Attribution ShareAlike 3.0
Version: 1.0 (506) by Coby Randquist on 2013-04-27