71
pages
English
Documents
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Découvre YouScribe en t'inscrivant gratuitement
Découvre YouScribe en t'inscrivant gratuitement
71
pages
English
Ebook
Obtenez un accès à la bibliothèque pour le consulter en ligne En savoir plus
Laboratoire de l’Informatique du
Parallélisme
ÉcoleNormaleSupérieuredeLyon
UnitéMixtedeRechercheCNRS INRIA ENSLYON
on 5668
Minimizing the stretch
when scheduling flows
of divisible requests
Arnaud Legrand ,
October 2006Alan Su ,
Fr´ed´eric Vivien
oResearch Report N 2006-19
ÉcoleNormaleSupérieurede
Lyon
46Allée d’Italie,69364LyonCedex 07,France
Téléphone: +33(0)4.72.72.80.37
Télécopieur: +33(0)4.72.72.80.80
Adresseélectronique: lip@ens-lyon.frMinimizing the stretch
when scheduling flows
of divisible requests
Arnaud Legrand , Alan Su , Fr´ed´eric Vivien
October 2006
Abstract
In this paper, we consider the problem of scheduling distributed biological se-
quence comparison applications. This problem lies in the divisible load frame-
work with negligible communication costs. Thus far, very few results have
been proposed in this model. We discuss and select relevant metrics for this
framework: namely max-stretch and sum-stretch. We explain the relation-
ship between our model and the preemptive uni-processor case, and we show
how to extend algorithms that have been proposed in the literature for the
uni-processor model to the divisible multi-processor problem domain. We re-
call known results on closely related problems, we show how to minimize the
max-stretch on unrelated machines either in the divisible load model or with
preemption,wederivenewlowerboundsonthecompetitiveratioofanyon-line
algorithm, we present new competitiveness results for existing algorithms, and
we develop several new on-line heuristics. We also address the Pareto opti-
mization of max-stretch. Then, we extensively study the performance of these
algorithms and heuristics in realistic scenarios. Our study shows that all pre-
viously proposed guaranteed heuristics for max-stretch for the uni-processor
model prove to be inefficient in practice. In contrast, we show our on-line
algorithms based on linear programming to be near-optimal solutions for max-
stretch. Our study also clearly suggests heuristics that are efficient for both
metrics, although a combined optimization is in theory not possible in the
general case.
Keywords: Bioinformatics, heterogeneous computing, scheduling, divisible
load, linear programming, stretch
R´esum´eDanscerapport,nousnousint´eressonsa`l’ordonnancementd’applicationscom-
parant de mani`ere distribu´ee des s´equences biologiques. Ce probl`eme se situe
dans le domaine des tacˆ hes divisibles avec couˆts de communications n´egli-
geables. Jusqu’a` pr´esent, tr`es peu de r´esultats ont ´et´e publi´es pour ce mod`ele.
Nous discutons et s´electionnons des m´etriques appropri´ees pour notre cadre de
travail, a` savoir le max-stretch et le sum-stretch. Nous expliquons les relations
entre notre mod`ele et le cadre mono-processeur avec pr´eemption, et nous mon-
trons comment ´etendre au cadre des tˆaches divisibles sur multi-processeur les
algorithmes propos´es dans la litt´erature pour le cas mono-processeur avec pr´e-
emption.Nousrappelonslesr´esultatsconnuspourdesprobl´ematiquesproches,
nous montrons comment minimiser le max-stretch sur des machines non cor-
r´el´ees (que les tacˆ hes soient divisibles ou simplement pr´eemptibles), nous ob-
tenons de nouvelles bornes inf´erieures de comp´etitivit´e pour tout algorithme
on-line, nous pr´esentons de nouveaux r´esultats de comp´etitivit´e pour des al-
gorithms de la litt´erature, et nous proposons de nouvelles heuristiques on-line.
Nous nous int´eressons ´egalement au probl`eme de la minimisation Pareto du
max-stretch. Ensuite, nous ´etudions, de mani`ere extensive, les performances
de tous ces algorithmes et de toutes ces heuristiques, et ce dans un cadre r´ea-
liste. Notre ´etude montre que les solutions garanties existantes minimisant le
max-stretch sur un processeur sont inefficaces dans notre cadre de travail. Ce-
pendant, nous montrons que nos algorithmes on-line bas´es sur la programma-
tionlin´eaireontdesperformancesprochesdel’optimalpourlemax-stretch.En
outre, notre ´etude sugg`ere clairement les heuristiques qui sont efficaces pour
lesdeuxm´etriques,bienquel’optimisationsimultan´eepourcesdeuxm´etriques
soit en th´eorie impossible dans le cas g´en´eral.
Mots-cl´es: Bioinformatique, ordonnancement, tacˆ hes divisibles,
programmation lin´eaire, flot pond´er´e, plates-formes h´et´erog`enes
21 Introduction
Theproblemofsearchinglarge-scalegenomicandproteomicsequencedatabanks
is an increasingly important bioinformatics problem. The results we present
in this paper concern the deployment of such applications in heterogeneous
parallel computing environments. In fact, this application is a part of a larger
class of applications, in which each task in the application workload exhibits an
“affinity” for particular nodes of the targeted computational platform. In the
genomic sequence comparison scenario, the presence of the required databank
on a particular node is the sole factor that constrains task placement decisions.
In this context, task affinities are determined by location and replication of the
sequence databanks in the distributed platform.
Numerous efforts to parallelize biological sequence comparison applications
havebeenrealized(e.g.,[10,12,28]). Theseeffortsarefacilitatedbythefactthat
such biological sequence comparison algorithms are typically computationally
intensive, embarrassingly parallel workloads. In the scheduling literature, this
computational model is effectively a divisible workload scheduling problem with
negligiblecommunicationoverheads. Theworkpresentedinthispaperconcerns
this application model, particularly in the context of online scheduling (i.e., in
which the scheduler has no knowledge of any job in the workload in advance of
its release date). Thus far, this specific problem has not been considered in the
scheduling literature.
Asidefromdivisibility,themaindifferencewithclassicalschedulingproblems
lies in the fact that the platforms we target are shared by many users. Con-
sequently, we need to ensure a certain degree of fairness between the different
users and requests. Defining a fair objective that accounts for the various job
characteristics (release date, processing time) is thus the first difficulty to over-
come. After having presented our motivating application and our framework in
Section 2, we review various classical metrics in Section 3 and conclude that the
stretch of a job is an appropriate basis for evaluation. As a consequence, we
mainlyfocusonthemax-stretchandsum-stretchmetrics. Tohaveagoodback-
ground on related objectives functions and results, in Section 4 we focus on the
max-flow and sum-flow metrics. Then in Section 5 we study sum-stretch opti-
mization, in Section 6 offline max-stretch optimization, and in Section 7 Pareto
offline optimization of max-stretch. Building on the previous sections, we focus
in Section 8 on the online optimization of max-stretch. This paper contains
no section devoted to the related work as the related work will be discussed
throughout this article. However, in Section 9 we summarize the known and
new results on complexity. Finally, we present in Section 10 an experimental
evaluation of the aforementioned heuristics, and we conclude in Section 11.
The main contributions of this work are:
Off-line sum-flow and sum-stretch. We show that sum-flow mini-
mization is NP-complete on unrelated machines under the divisible load
P
model (hR|r ,div| Fi is NP-complete). We also show that sum-stretchj j
minimization is NP-complete on one machine without preemption andP
also on unrelated machines under the divisible load model (h1|r| Sij jP
andhR|r ,div| Si are NP-complete).j j
Off-line max weighted flow. We present polynomial-time algo-
rithms to solve the minimization of max weighted flow, off-line, on unre-
??2 A. Legrand, A. Su, F. Vivien
lated machines, in the divisible load model and in the preemptive model:
hR|r ;div|maxw Fi andhR|r ;pmtn|maxw Fi are polynomial.j j j j j j
We also propose heuristics to solve the off-line Pareto minimization of
max weighted flow, either on one machine or on unrelated machines. We
present some cases in which these heuristics are optimal and we prove
that the off-line Pareto minimization of max-flow on unrelated machines
is NP-complete.
On-line sum-stretch and max-stretch. We show that FCFS is Δ-
competitive for the sum-stretch and max-stretch metrics on one machine,
where Δ denotes the ratio of the sizes of the largest and shortest jobs
submitted to the system. We also prove that no on-line algorithm has
simultaneously better competitive ratios for these two metrics.
We show that no online algorithm has a competitive ratio less than or
equal to 1.19484 for the minimization of sum-stretch, or less than or equal
√
1 2−1to Δ for the minimization of max-stretch. (The previous known
2
11 3bounds were respectively 1.036 and Δ .)
2
Forminimizingthesum-stretchononemachinewithpreemption,weshow
that Smith’s ratio rule —which is then equivalent to shortest processing
time— is n