Cloud-aware data intensive workflow scheduling on volunteer computing systems

Toktam Ghafarian, Bahman Javadi

Research output: Contribution to journalArticlepeer-review

49 Citations (Scopus)

Abstract

Volunteer computing systems offer high computing power to the scientific communities to run large data intensive scientific workflows. However, these computing environments provide the best effort infrastructure to execute high performance jobs. This work aims to schedule scientific and data intensive workflows on hybrid of the volunteer computing system and Cloud resources to enhance the utilization of these environments and increase the percentage of workflow that meets the deadline. The proposed workflow scheduling system partitions a workflow into sub-workflows to minimize data dependencies among the sub-workflows. Then these sub-workflows are scheduled to distribute on volunteer resources according to the proximity of resources and the load balancing policy. The execution time of each sub-workflow on the selected volunteer resources is estimated in this phase. If any of the sub-workflows misses the sub-deadline due to the large waiting time, we consider re-scheduling of this sub-workflow into the public Cloud resources. This re-scheduling improves the system performance by increasing the percentage of workflows that meet the deadline. The proposed Cloud-aware data intensive scheduling algorithm increases the percentage of workflow that meet the deadline with a factor of 75% in average with respect to the execution of workflows on the volunteer resources.
Original languageEnglish
Article number2663
Pages (from-to)87-97
Number of pages11
JournalFuture Generation Computer Systems
Volume51
DOIs
Publication statusPublished - 2015

Keywords

  • cloud computing
  • infrastructure
  • scheduling

Fingerprint

Dive into the research topics of 'Cloud-aware data intensive workflow scheduling on volunteer computing systems'. Together they form a unique fingerprint.

Cite this