Tools and Algorithms for Coping with Uncertainty in Application

Oct 18, 2010 - Web services: Amazon.com. Scientific simulations: weather prediction (simulation of typhoon Mawar in 2005). Research projects: SETI@home.
1MB taille 3 téléchargements 309 vues
Tools and Algorithms for Coping with Uncertainty in Application Scheduling on Distributed Platforms PhD Defense Louis-Claude C ANON Supervisor: Emmanuel J EANNOT

October 18, 2010

C ANON

Uncertainty in Scheduling

October 18, 2010

1 / 42

Context

Distributed Computing Distributed Platform Network of entities. In a computing platform, each entity has one processor. Use case examples: data sharing, parallel computation.

C ANON

Uncertainty in Scheduling

October 18, 2010

2 / 42

Context

Distributed Computing Distributed Platform Network of entities. In a computing platform, each entity has one processor. Use case examples: data sharing, parallel computation. Computation Web services: Amazon.com. Scientific simulations: weather prediction (simulation of typhoon Mawar in 2005). Research projects: SETI@home. C ANON

Uncertainty in Scheduling

October 18, 2010

2 / 42

Context

Application Task A parallel application consists of a set of tasks.

C ANON

Uncertainty in Scheduling

t1

t2

t3

t4

October 18, 2010

3 / 42

Context

Application Task A parallel application consists of a set of tasks.

t2

t1

t3

t4

Input and Output Each task processes data and produces a result.

C ANON

Uncertainty in Scheduling

di

t1

do

October 18, 2010

3 / 42

Context

Application Task A parallel application consists of a set of tasks.

t2

t1

t3

t4

Input and Output Each task processes data and produces a result.

di

do

t1

Precedence Some tasks require the results of other tasks (the precedences are specified by a task graph).

C ANON

Uncertainty in Scheduling

d12

t2

d24 t4

t1 d13

t3

d34

October 18, 2010

3 / 42

Context

Platform

Platform A parallel computing platform consists of a set of interconnected processors. Each tasks can be computed by one machine. The execution durations may be a function of the processor speeds and task costs. Communication durations are determined by the network capacity.

C ANON

Uncertainty in Scheduling

October 18, 2010

4 / 42

Context

Scheduling Schedule Structure A schedule may be defined by (not inclusively): a mapping each task is assigned to a processor dates start and end times of each execution an order the order in which each task must be executed

C ANON

Uncertainty in Scheduling

October 18, 2010

5 / 42

Context

Scheduling Schedule Structure A schedule may be defined by (not inclusively): a mapping each task is assigned to a processor dates start and end times of each execution an order the order in which each task must be executed Scheduling Strategies offline scheduling decisions are taken before any computation online decisions are taken while the tasks are executed

C ANON

Uncertainty in Scheduling

October 18, 2010

5 / 42

Context

Scheduling Schedule Structure A schedule may be defined by (not inclusively): a mapping each task is assigned to a processor dates start and end times of each execution an order the order in which each task must be executed Scheduling Strategies offline scheduling decisions are taken before any computation online decisions are taken while the tasks are executed Example of Criteria efficiency total duration of a schedule fairness for multiple users/organization C ANON

Uncertainty in Scheduling

October 18, 2010

5 / 42

Context

Uncertainty

Object computation duration computation success result correctness

C ANON

Uncertainty in Scheduling

October 18, 2010

6 / 42

Context

Uncertainty

Object computation duration computation success result correctness Consequence unpredictability of the performances failure of the schedule invalidity of the solution

C ANON

Uncertainty in Scheduling

October 18, 2010

6 / 42

Context

Uncertainty Characteristics

Nature [Haimes, 2009] methodological limitation(s) of the method (e.g., model simplification) epistemic inaccessible knowledge (e.g., online task submission) aleatory stochastic variability (e.g., hardware fault)

C ANON

Uncertainty in Scheduling

October 18, 2010

7 / 42

Context

Uncertainty Characteristics

Nature [Haimes, 2009] methodological limitation(s) of the method (e.g., model simplification) epistemic inaccessible knowledge (e.g., online task submission) aleatory stochastic variability (e.g., hardware fault) Origin hardware software human

C ANON

Uncertainty in Scheduling

October 18, 2010

7 / 42

Context

Uncertainty and Scheduling

Uncertainty

Scheduling

Object

Nature

Origin

Type

computation

methodo-

hardware,

optimiza-

duration

logical

software

tion

aleatory

hardware

evaluation

software,

caracteri-

human

zation

computation success result correctness

epistemic

Criterion

Problem1

robustness

R|prec|Cmax

reliability

R|prec|Cmax

precision

R|online − P time − nclv | Ci

1 R. L. Graham, E. L. Lawler, J. K. Lenstra et A. H. G. Rinnooy Kan: Optimization and approximation in deterministic sequencing and scheduling : a survey. Annals of Discrete Mathematics, 5:287–326, 1979. C ANON

Uncertainty in Scheduling

October 18, 2010

8 / 42

Context

Outline

1

Context

2

Robustness

3

Reliability

4

Precision

5

Conclusion

C ANON

Uncertainty in Scheduling

October 18, 2010

9 / 42

Robustness

Robustness

Outline 1

Context

2

Robustness Louis-Claude Canon and Emmanuel Jeannot, Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments, in IEEE TPDS 21(4), April 2010. Louis-Claude Canon and Emmanuel Jeannot, Scheduling Strategies for the Bicriteria Optimization of the Robustness and Makespan, in NIDISC 2008, Miami, Floride, April 2008. Louis-Claude Canon and Emmanuel Jeannot, A Comparison of Robustness Metrics for Scheduling DAGs on Heterogeneous Systems, in heteroPar’07, Austin, Texas, September 2007.

3

Reliability

4

Precision

5

Conclusion C ANON

Uncertainty in Scheduling

October 18, 2010

10 / 42

Robustness

Robustness

Application and Platform Models Application The parallel application is specified by a task graph.

d12

t2

d24 t4

t1

For each precedence constraint, data need to be transferred.

C ANON

Uncertainty in Scheduling

d13

t3

d34

October 18, 2010

11 / 42

Robustness

Robustness

Application and Platform Models Application The parallel application is specified by a task graph.

d12

t2

d24 t4

t1

For each precedence constraint, data need to be transferred.

d13

t3

d34

Platform The parallel computing platform consists of a set of processors. Processors are unrelated: the duration of each task is specific to the executing processor. Each pair of machines is interconnected by a dedicated link. C ANON

Uncertainty in Scheduling

October 18, 2010

11 / 42

Robustness

Robustness

Uncertainty Uncertainty on Computation Durations Evaluating analytically the duration of a computation is difficult because the application and the platform are complex (methodological uncertainty). Durations are random variables.

C ANON

Uncertainty in Scheduling

October 18, 2010

12 / 42

Robustness

Robustness

Uncertainty Uncertainty on Computation Durations Evaluating analytically the duration of a computation is difficult because the application and the platform are complex (methodological uncertainty). Durations are random variables. Random Variables Each duration is represented by a set of values and probabilities.

0.7

Discrete distribution

0.6 0.5 0.4

Each probability gives the likelihood that the corresponding duration occurs during a given execution. C ANON

0.3 0.2 0.1

Uncertainty in Scheduling

0 2

4

6

8

10

12

14

16

18

Time

October 18, 2010

12 / 42

Robustness

Robustness

Evaluation of the Efficiency Efficiency Efficiency is defined by the duration of the schedule execution (Cmax ). Evaluating this duration consists in evaluating a stochastic DAG.

C ANON

Uncertainty in Scheduling

October 18, 2010

13 / 42

Robustness

Robustness

Evaluation of the Efficiency Efficiency Efficiency is defined by the duration of the schedule execution (Cmax ). Evaluating this duration consists in evaluating a stochastic DAG. Stochastic DAG Evaluation We prove the problem to be #P’Complete. #P’ is a generalization of counting problems (#P) for reliability evaluation problems [Bodlaender et al., 2004].

C ANON

v3 , X3 X13 X23

v1 , X1

Uncertainty in Scheduling

X34

X12

v4 , X4 X24

v2 , X2

October 18, 2010

13 / 42

Robustness

Robustness

Evaluation of the Efficiency Efficiency Efficiency is defined by the duration of the schedule execution (Cmax ). Evaluating this duration consists in evaluating a stochastic DAG. Stochastic DAG Evaluation We prove the problem to be #P’Complete. #P’ is a generalization of counting problems (#P) for reliability evaluation problems [Bodlaender et al., 2004].

v3 , X3 X13

X34 X23

v1 , X1 X12

v4 , X4 X24

v2 , X2

Remark on the Complexity Class Any #P’ problem based on a NP-Complete problem is #P’-Complete. However, this evaluation problem corresponds to a P problem. C ANON

Uncertainty in Scheduling

October 18, 2010

13 / 42

Robustness

Robustness

Uncertainty-Related Criterion Robustness Capacity of a system to maintain its performances despite variations (criterion depending on an efficiency measure).

C ANON

Uncertainty in Scheduling

October 18, 2010

14 / 42

Robustness

Robustness

Uncertainty-Related Criterion Robustness Capacity of a system to maintain its performances despite variations (criterion depending on an efficiency measure). Robust Schedule Random variables model the variations in the inputs. A schedule is robust if its total duration remains stable despite the task durations variations.

C ANON

0.4 Narrow distribution Wide distribution 0.35

0.3

0.25

0.2

0.15

0.1

0.05

0 0

Uncertainty in Scheduling

5

10

15

20

Time

October 18, 2010

14 / 42

Robustness

Robustness

Related Work Robustness Measures [Wu et al., 1994] Same methodological approach as ours but with unavailabilities. [Ali et al., 2004] Method for defining a relevant robustness measure. [Shestak et al., 2006] Stochastic robustness metric and deterministic robustness metric. [Bölöni et al., 2002] Total slack and differential entropy.

C ANON

Uncertainty in Scheduling

October 18, 2010

15 / 42

Robustness

Robustness

Related Work Robustness Measures [Wu et al., 1994] Same methodological approach as ours but with unavailabilities. [Ali et al., 2004] Method for defining a relevant robustness measure. [Shestak et al., 2006] Stochastic robustness metric and deterministic robustness metric. [Bölöni et al., 2002] Total slack and differential entropy. Optimization [Gao, 1995] Insertion of temporal protection. [Davenport et al., 2001] Insertion of temporal slack. [Fargier et al., 2003] Scheduling techniques using fuzzy logic. C ANON

Uncertainty in Scheduling

October 18, 2010

15 / 42

Robustness

Robustness

Robustness Measures

Many measures exist in the literature. For a robust schedule, we expect: a small standard deviation of the total duration a small differential entropy of the total duration a large expected slack (large temporal protection) values for the stochastic metrics close to 1 a lateness probability close to 0 a small 99th percentile of the total duration (almost equivalent to the expected value of the total duration)

C ANON

Uncertainty in Scheduling

October 18, 2010

16 / 42

Empirical Comparison

Application layered task graph with 1000 tasks, 10% uncertainty, beta distribution Platform 50 processors (2.5 GFLPOS each) Schedules 5000 randomly generated

Robustness

Robustness

Selected Measures Robustness Measure The standard deviation of the total duration is equivalent to the entropy of the total duration and one of the stochastic metrics.

C ANON

Uncertainty in Scheduling

October 18, 2010

18 / 42

Robustness

Robustness

Selected Measures Robustness Measure The standard deviation of the total duration is equivalent to the entropy of the total duration and one of the stochastic metrics. Expected Value of the Slack Invalid measure of robustness (no correlation with other robustness measures).

C ANON

Uncertainty in Scheduling

October 18, 2010

18 / 42

Robustness

Robustness

Selected Measures Robustness Measure The standard deviation of the total duration is equivalent to the entropy of the total duration and one of the stochastic metrics. Expected Value of the Slack Invalid measure of robustness (no correlation with other robustness measures). Multi-criteria Problem Correlation between expected value and standard deviation of the total duration. However, the efficiency and the robustness are not equivalent.

C ANON

Uncertainty in Scheduling

October 18, 2010

18 / 42

Robustness

Robustness

Pareto Region Study Methods

5e−02 2e−02

Standard deviation

5e−03 2e−03

Multi-criteria evolutionary algorithm: we prove its convergence.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

5e−04

Greedy construction: aggregate both criteria and schedule each task by making the best local choice.

2e−01

Search space



● ● ●● ●● ●

1

2

5

10

20

50

SW SE NW Random

100

Mean

C ANON

Uncertainty in Scheduling

October 18, 2010

19 / 42

Robustness

Robustness

Pareto Region Study Methods

5e−02 2e−02

Standard deviation

5e−03 2e−03

Multi-criteria evolutionary algorithm: we prove its convergence.

● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●

5e−04

Greedy construction: aggregate both criteria and schedule each task by making the best local choice.

2e−01

Search space



● ● ●● ●● ●

1

2

5

10

20

50

SW SE NW Random

100

Mean

Conclusion Several methods that estimate the Pareto-front. C ANON

Uncertainty in Scheduling

October 18, 2010

19 / 42

Reliability

Reliability

Outline

1

Context

2

Robustness

3

Reliability Anne Benoit, Louis-Claude Canon, Emmanuel Jeannot and Yves Robert. On the complexity of task graph scheduling with transient and fail-stop failures, submitted to Journal of Scheduling.

4

Precision

5

Conclusion

C ANON

Uncertainty in Scheduling

October 18, 2010

20 / 42

Reliability

Reliability

Application and Platform Models Application The parallel application is specified by a task graph.

t2 t4

t1 t3

C ANON

Uncertainty in Scheduling

October 18, 2010

21 / 42

Reliability

Reliability

Application and Platform Models Application The parallel application is specified by a task graph.

t2 t4

t1 t3

Platform The parallel computing platform consists of a set of processors. Processors are unrelated: the duration of each task is specific to the executing processor. Immediate synchronizations occur on the network. C ANON

Uncertainty in Scheduling

October 18, 2010

21 / 42

Reliability

Reliability

Fault Model Uncertainty on Computation Successes Each machine may fail during the execution of a task with a non-zero probability (aleatory uncertainty).

C ANON

Uncertainty in Scheduling

October 18, 2010

22 / 42

Reliability

Reliability

Fault Model Uncertainty on Computation Successes Each machine may fail during the execution of a task with a non-zero probability (aleatory uncertainty). Transient failures An execution fails but the processor recovers immediately. Example: arithmetic/software errors or recoverable hardware faults (power loss).

C ANON

Uncertainty in Scheduling

October 18, 2010

22 / 42

Reliability

Reliability

Fault Model Uncertainty on Computation Successes Each machine may fail during the execution of a task with a non-zero probability (aleatory uncertainty). Transient failures An execution fails but the processor recovers immediately. Example: arithmetic/software errors or recoverable hardware faults (power loss). Fail-stop failures A processor dies until the end of the schedule (all remaining tasks fail). Example: hardware resource crashes, recovery of a loaned machine by a user during a cycle-stealing episode.

C ANON

Uncertainty in Scheduling

October 18, 2010

22 / 42

Reliability

Reliability

Scheduling with Replication General policy Each task can be scheduled after at least one replica of each of its predecessors is finished (if there is no failure).

t2

p1 p2

t1 t2

p3 p4

t1 time

C ANON

Uncertainty in Scheduling

October 18, 2010

23 / 42

Reliability

Reliability

Scheduling with Replication General policy Each task can be scheduled after at least one replica of each of its predecessors is finished (if there is no failure).

t2

p1 p2

t1 t2

p3 p4

t1 time

Strict policy A task must be scheduled after all the end times of the replicas of its predecessors. Also called replication for reliability scheme.

C ANON

Uncertainty in Scheduling

October 18, 2010

23 / 42

Reliability

Reliability

Uncertainty-Related Measure

Reliability The reliability of a static schedule is the probability that it terminates successfully. A schedule is successful if all tasks have at least one successful replica. The execution of any replica is successful if at least one replica for each of its predecessors are successfully executed and if the processor does not fail during the execution (or has not yet been subjected to a fail-stop failure).

C ANON

Uncertainty in Scheduling

October 18, 2010

24 / 42

Reliability

Reliability

Related Work

Bi-criteria Scheduling [Dongarra et al., 2007] Scheduling without replication. [Jeannot et al., 2008] Scheduling without replication. [Girault et al., 2009] Strict scheduling with transient faults.

C ANON

Uncertainty in Scheduling

October 18, 2010

25 / 42

Reliability

Reliability

Related Work

Bi-criteria Scheduling [Dongarra et al., 2007] Scheduling without replication. [Jeannot et al., 2008] Scheduling without replication. [Girault et al., 2009] Strict scheduling with transient faults. Reliability Block Diagram [Bream, 1995] Introduce a diagram-based technique for reliability evaluation. This problem is considered difficult (NP-Hard).

C ANON

Uncertainty in Scheduling

October 18, 2010

25 / 42

Reliability

Reliability

Reliability Evaluation

Main Contribution We prove the general problem to be #P’-Complete.

C ANON

Uncertainty in Scheduling

October 18, 2010

26 / 42

Reliability

Reliability

Reliability Evaluation

Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.

C ANON

p1 p2

t2 t1 t2

p3 p4

Uncertainty in Scheduling

t1 time

October 18, 2010

26 / 42

Reliability

Reliability

Reliability Evaluation

Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.

C ANON

p1 p2

t2 t1 t2

p3 p4

Uncertainty in Scheduling

t1 time

October 18, 2010

26 / 42

Reliability

Reliability

Reliability Evaluation

Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.

C ANON

p1 p2

t2 t1

p3 p4

Uncertainty in Scheduling

t2 t1 time

October 18, 2010

26 / 42

Reliability

Reliability

Reliability Evaluation

Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.

C ANON

p1 p2

t2 t1 t2

p3 p4

Uncertainty in Scheduling

t1 time

October 18, 2010

26 / 42

Reliability

Reliability

Monotonic Chain Case

Monotonicity Property A schedule of chains is monotonic if the success of any task on processor pj depends only on the successes of the first j processors.

C ANON

Uncertainty in Scheduling

October 18, 2010

27 / 42

Reliability

Reliability

Monotonic Chain Case

Monotonicity Property A schedule of chains is monotonic if the success of any task on processor pj depends only on the successes of the first j processors. Evaluation Algorithm The reliability of a monotonic schedule π on a platform with fail-stop failures is rel(π) = Pr[Jnm ]. Pr[Ji1 ] = Pr[Ei1 ] Pr[J1j ] = Pr[J1 j−1 ] + (1 − Pr[J1 j−1 ]) Pr[E1j ] Pr[Jij ] = Pr[Ji j−1 ] + (Pr[Jρ(i,j)−1 j−1 ] − Pr[Ji j−1 ]) Pr[Eij ]

C ANON

Uncertainty in Scheduling

October 18, 2010

27 / 42

Reliability

Reliability

Summary T Transient F Fail-stop GT

GF

GFm

G General S Strict m monotonic

ST

SF

SFm

t triangular dark gray #P’-Complete light gray polynomial



SFt

white open bleu significant contribution arrow reduction ≤m

C ANON

Uncertainty in Scheduling

October 18, 2010

28 / 42

Precision

Precision

Outline

1

Context

2

Robustness

3

Reliability

4

Precision Louis-Claude Canon, Emmanuel Jeannot and Jon Weissman, A Dynamic Approach for Characterizing Collusion in Desktop Grids, in IEEE IPDPS, Atlanta, Georgia, April 2010.

5

Conclusion

C ANON

Uncertainty in Scheduling

October 18, 2010

29 / 42

Precision

Precision

Application and Platform Models Application The parallel application is specified by a set of independent tasks.

C ANON

Uncertainty in Scheduling

t1

t2

t3

t4

October 18, 2010

30 / 42

Precision

Precision

Application and Platform Models Application The parallel application is specified by a set of independent tasks.

t1

t2

t3

t4

Platform The parallel computing platform consists of a set of processors. Processors are unrelated: the duration of each task is specific to the executing processor. Machines are connected to a central server via Internet.

C ANON

Uncertainty in Scheduling

October 18, 2010

30 / 42

Precision

Precision

Uncertainty Uncertainty on Result Correctness The results returned by each machine may be incorrects (epistemic uncertainty). 35% of SETI@home participants have given at least one incorrect result [Kondo et al., 2007]. These Byzantine faults are due to unreliability or malicious behaviors (for credit increase).

C ANON

Uncertainty in Scheduling

October 18, 2010

31 / 42

Precision

Precision

Uncertainty Uncertainty on Result Correctness The results returned by each machine may be incorrects (epistemic uncertainty). 35% of SETI@home participants have given at least one incorrect result [Kondo et al., 2007]. These Byzantine faults are due to unreliability or malicious behaviors (for credit increase). Incorrectness Induces Redundancy Each task can be assigned to several machines. One of the received results must be selected as the final one by a certification mechanism.

C ANON

Uncertainty in Scheduling

October 18, 2010

31 / 42

Precision

Precision

Threat Model Collusion Some machines produce the same incorrect result for a given task. Task1 Non-colluders Collusion

Task2

No collusion Colluders

C ANON

Uncertainty in Scheduling

October 18, 2010

32 / 42

Precision

Precision

Threat Model Collusion Some machines produce the same incorrect result for a given task. Task1 Non-colluders Collusion

Task2

No collusion Colluders

Colluding groups Machines can be partitioned into several groups: machines in the same group always return the same result for a given task. Collusion occurs with a given probability. There may be cooperation between distinct colluding groups. C ANON

Uncertainty in Scheduling

October 18, 2010

32 / 42

Precision

Precision

Characterization Problematic Objective Study the machine behaviors: estimate the probability that any pair of machines gives the same incorrect result for the same task. estimate the colluding groups composition.

C ANON

Uncertainty in Scheduling

October 18, 2010

33 / 42

Precision

Precision

Characterization Problematic Objective Study the machine behaviors: estimate the probability that any pair of machines gives the same incorrect result for the same task. estimate the colluding groups composition. Inputs Chronological succession of events: < d, p, t, r > at time d, machine p finishes task t and returns result r . < d, t > at time d, task t finishes.

C ANON

Uncertainty in Scheduling

October 18, 2010

33 / 42

Precision

Precision

Related Work

Scheduling [Zhao et al., 2005] quiz [Domingues et al., 2007] intermediate verifications [Silaghi et al., 2009] use a reputation system to detect colluders [Krings et al., 2005] a posteriori verification of results [Wong, 2005] no unreliability and verification of results

C ANON

Uncertainty in Scheduling

October 18, 2010

34 / 42

Precision

Precision

Related Work

Scheduling [Zhao et al., 2005] quiz [Domingues et al., 2007] intermediate verifications [Silaghi et al., 2009] use a reputation system to detect colluders [Krings et al., 2005] a posteriori verification of results [Wong, 2005] no unreliability and verification of results Reputation System [Kamvar et al., 2003] EigenTrust algorithm [Jøsang, 1999] subjective logic that allows reaching consensus

C ANON

Uncertainty in Scheduling

October 18, 2010

34 / 42

Precision

Precision

Interaction Model

Two Interaction Representations Interaction between machines i and j: collusion machines i and j collude together (collusion estimation cij ) agreement machines i and j agree together (agreement estimation aij )

C ANON

Uncertainty in Scheduling

October 18, 2010

35 / 42

Precision

Precision

Interaction Model

Two Interaction Representations Interaction between machines i and j: collusion machines i and j collude together (collusion estimation cij ) agreement machines i and j agree together (agreement estimation aij ) Relations aij ≤ 1 + 2 × cij − cii − cjj cij ≤

1+aij −a1i −a1j 2

C ANON

(given that the index of the largest groups is 1)

Uncertainty in Scheduling

October 18, 2010

35 / 42

Precision

Precision

Online Algorithm 1

1

1

Data structure Nodes correspond to sets of machines.

0.4

0.4

Edges correspond to interaction characteristics (agreement here). 1

C ANON

Uncertainty in Scheduling

October 18, 2010

36 / 42

Precision

Precision

Online Algorithm 1

1

1

Data structure Nodes correspond to sets of machines.

0.4

0.4

Edges correspond to interaction characteristics (agreement here). 1

Algorithm Initially, each machine is in a singleton. Estimated groups are successively merged and split.

C ANON

Uncertainty in Scheduling

October 18, 2010

36 / 42

Precision

Precision

Online Algorithm 1

Data structure Nodes correspond to sets of machines. Edges correspond to interaction characteristics (agreement here).

0.4

1

Algorithm Initially, each machine is in a singleton. Estimated groups are successively merged and split.

C ANON

Uncertainty in Scheduling

October 18, 2010

36 / 42

Collusion Probability

0.5

Agreement convergence time Collusion convergence time Agreement stabilized accuracy Collusion stabilized accuracy

0.3 0.2 0.1

1









0.01











− 0.5

1 Collusion probability







− 0.0



Pair

Collusion RMSD

2



0

Time (days)

3

0.4

4

Time required for convergence

Convergence time time needed in order to achieve a desirable accuracy Stabilized accuracy accuracy achieved after a large number of events

Conclusion

Outline

1

Context

2

Robustness

3

Reliability

4

Precision

5

Conclusion

C ANON

Uncertainty in Scheduling

October 18, 2010

38 / 42

Conclusion

Summary

Uncertainty

Scheduling

Object

Nature

Origin

Type

computation

methodo-

hardware,

optimiza-

duration

logical

software

tion

aleatory

hardware

evaluation

software,

caracteri-

human

zation

computation success result correctness

C ANON

epistemic

Uncertainty in Scheduling

Criterion

Problem

robustness

R|prec|Cmax

reliability

R|prec|Cmax

precision

R|online − P time − nclv | Ci

October 18, 2010

39 / 42

Conclusion

Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness

C ANON

Uncertainty in Scheduling

October 18, 2010

40 / 42

Conclusion

Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness

Reliability taxonomy of reliability problems complexity classes of corresponding evaluation problems

C ANON

Uncertainty in Scheduling

October 18, 2010

40 / 42

Conclusion

Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness

Reliability taxonomy of reliability problems complexity classes of corresponding evaluation problems

Precision strong model of adversity methods for estimating machine behaviors

C ANON

Uncertainty in Scheduling

October 18, 2010

40 / 42

Conclusion

Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness

Reliability taxonomy of reliability problems complexity classes of corresponding evaluation problems

Precision strong model of adversity methods for estimating machine behaviors

Generic conclusion probabilistic dimension and multi-criteria formulation

C ANON

Uncertainty in Scheduling

October 18, 2010

40 / 42

Conclusion

Future Directions

Defeat collusion using our proposed characterization system

C ANON

Uncertainty in Scheduling

October 18, 2010

41 / 42

Conclusion

Future Directions

Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty)

C ANON

Uncertainty in Scheduling

October 18, 2010

41 / 42

Conclusion

Future Directions

Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty) Explore other multi-criteria techniques (e.g., -constraint method)

C ANON

Uncertainty in Scheduling

October 18, 2010

41 / 42

Conclusion

Future Directions

Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty) Explore other multi-criteria techniques (e.g., -constraint method) Propose tractable uncertainty models

C ANON

Uncertainty in Scheduling

October 18, 2010

41 / 42

Conclusion

Future Directions

Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty) Explore other multi-criteria techniques (e.g., -constraint method) Propose tractable uncertainty models Design guaranteed algorithm

C ANON

Uncertainty in Scheduling

October 18, 2010

41 / 42

Conclusion

Questions

Thank you for your attention. Questions, comments, remarks, . . .

C ANON

Uncertainty in Scheduling

October 18, 2010

42 / 42

Conclusion

Defeating the Collusion Uncertainty-Related Criterion precision proportion of correctly certified results overhead average duplication ration Effect of the trace

3.5

Overhead

0.985

BOINC precision CAA precision BOINC overhead CAA overhead

0.980

Precision

0.990

4.0

0.995





0.970

0.975

3.0







(Overnet, 1000)

(Microsoft, 1500)

(SETI@Home, 2000)

Trace file

C ANON

Uncertainty in Scheduling

October 18, 2010

43 / 42

Conclusion

Literature on Collusion Wong, An authentication protocol in Web-computing, IPDPS 2006 Propose a ring-based scheduling approach (duplication ratio of 2). Estimate the fraction and probability of collusion (given that non-colluding machines do not fail). Certify the results based on a probabilistic analysis of the results (using only the results of the active tasks). Extension with an audit mechanism that take into account the estimation (better than random sampling).

C ANON

Uncertainty in Scheduling

October 18, 2010

44 / 42

Conclusion

Literature on Collusion Wong, An authentication protocol in Web-computing, IPDPS 2006 Propose a ring-based scheduling approach (duplication ratio of 2). Estimate the fraction and probability of collusion (given that non-colluding machines do not fail). Certify the results based on a probabilistic analysis of the results (using only the results of the active tasks). Extension with an audit mechanism that take into account the estimation (better than random sampling). Taufer, Anderson et al., Homogeneous Redundancy, IPDPS 2005 Characterize a divergent application: large numerical differences in the results generated by different machines. Fuzzy comparison is not sufficient. Homogeneous redundancy: assign tasks to numerically equivalent machines (same software and platform characteristics). C ANON

Uncertainty in Scheduling

October 18, 2010

44 / 42

Conclusion

Multi-criteria Optimization f2

o2 o1 o3

o4

f1 C ANON

Uncertainty in Scheduling

October 18, 2010

45 / 42

Conclusion

Schedule Size Inputs Number of tasks: n Number of processors: m Duration of every possible executions (each tasks on each processors): wij Longest duration: W = maxij wij The size of the input is O(nm log(W )).

C ANON

Uncertainty in Scheduling

October 18, 2010

46 / 42

Conclusion

Schedule Size Inputs Number of tasks: n Number of processors: m Duration of every possible executions (each tasks on each processors): wij Longest duration: W = maxij wij The size of the input is O(nm log(W )). Schedule providing Dates Largest date: O(nW ) (each task are scheduled on the slowest processor) With duplication, each task may be scheduled on each processor: there is O(nm) dates. A schedule requires O(nm log(nW )) space for encoding the dates. C ANON

Uncertainty in Scheduling

October 18, 2010

46 / 42

Conclusion

Stochastic DAG evaluation

Random variables type Discrete Non-discrete

C ANON

Chain regular domain normal, gamma, Erlang

Join all exponential, Weibull

Uncertainty in Scheduling

Seriesparallel regular domain none

October 18, 2010

47 / 42