Tools and Algorithms for Coping with Uncertainty in Application Scheduling on Distributed Platforms PhD Defense Louis-Claude C ANON Supervisor: Emmanuel J EANNOT
October 18, 2010
C ANON
Uncertainty in Scheduling
October 18, 2010
1 / 42
Context
Distributed Computing Distributed Platform Network of entities. In a computing platform, each entity has one processor. Use case examples: data sharing, parallel computation.
C ANON
Uncertainty in Scheduling
October 18, 2010
2 / 42
Context
Distributed Computing Distributed Platform Network of entities. In a computing platform, each entity has one processor. Use case examples: data sharing, parallel computation. Computation Web services: Amazon.com. Scientific simulations: weather prediction (simulation of typhoon Mawar in 2005). Research projects: SETI@home. C ANON
Uncertainty in Scheduling
October 18, 2010
2 / 42
Context
Application Task A parallel application consists of a set of tasks.
C ANON
Uncertainty in Scheduling
t1
t2
t3
t4
October 18, 2010
3 / 42
Context
Application Task A parallel application consists of a set of tasks.
t2
t1
t3
t4
Input and Output Each task processes data and produces a result.
C ANON
Uncertainty in Scheduling
di
t1
do
October 18, 2010
3 / 42
Context
Application Task A parallel application consists of a set of tasks.
t2
t1
t3
t4
Input and Output Each task processes data and produces a result.
di
do
t1
Precedence Some tasks require the results of other tasks (the precedences are specified by a task graph).
C ANON
Uncertainty in Scheduling
d12
t2
d24 t4
t1 d13
t3
d34
October 18, 2010
3 / 42
Context
Platform
Platform A parallel computing platform consists of a set of interconnected processors. Each tasks can be computed by one machine. The execution durations may be a function of the processor speeds and task costs. Communication durations are determined by the network capacity.
C ANON
Uncertainty in Scheduling
October 18, 2010
4 / 42
Context
Scheduling Schedule Structure A schedule may be defined by (not inclusively): a mapping each task is assigned to a processor dates start and end times of each execution an order the order in which each task must be executed
C ANON
Uncertainty in Scheduling
October 18, 2010
5 / 42
Context
Scheduling Schedule Structure A schedule may be defined by (not inclusively): a mapping each task is assigned to a processor dates start and end times of each execution an order the order in which each task must be executed Scheduling Strategies offline scheduling decisions are taken before any computation online decisions are taken while the tasks are executed
C ANON
Uncertainty in Scheduling
October 18, 2010
5 / 42
Context
Scheduling Schedule Structure A schedule may be defined by (not inclusively): a mapping each task is assigned to a processor dates start and end times of each execution an order the order in which each task must be executed Scheduling Strategies offline scheduling decisions are taken before any computation online decisions are taken while the tasks are executed Example of Criteria efficiency total duration of a schedule fairness for multiple users/organization C ANON
Uncertainty in Scheduling
October 18, 2010
5 / 42
Context
Uncertainty
Object computation duration computation success result correctness
C ANON
Uncertainty in Scheduling
October 18, 2010
6 / 42
Context
Uncertainty
Object computation duration computation success result correctness Consequence unpredictability of the performances failure of the schedule invalidity of the solution
C ANON
Uncertainty in Scheduling
October 18, 2010
6 / 42
Context
Uncertainty Characteristics
Nature [Haimes, 2009] methodological limitation(s) of the method (e.g., model simplification) epistemic inaccessible knowledge (e.g., online task submission) aleatory stochastic variability (e.g., hardware fault)
C ANON
Uncertainty in Scheduling
October 18, 2010
7 / 42
Context
Uncertainty Characteristics
Nature [Haimes, 2009] methodological limitation(s) of the method (e.g., model simplification) epistemic inaccessible knowledge (e.g., online task submission) aleatory stochastic variability (e.g., hardware fault) Origin hardware software human
C ANON
Uncertainty in Scheduling
October 18, 2010
7 / 42
Context
Uncertainty and Scheduling
Uncertainty
Scheduling
Object
Nature
Origin
Type
computation
methodo-
hardware,
optimiza-
duration
logical
software
tion
aleatory
hardware
evaluation
software,
caracteri-
human
zation
computation success result correctness
epistemic
Criterion
Problem1
robustness
R|prec|Cmax
reliability
R|prec|Cmax
precision
R|online − P time − nclv | Ci
1 R. L. Graham, E. L. Lawler, J. K. Lenstra et A. H. G. Rinnooy Kan: Optimization and approximation in deterministic sequencing and scheduling : a survey. Annals of Discrete Mathematics, 5:287–326, 1979. C ANON
Uncertainty in Scheduling
October 18, 2010
8 / 42
Context
Outline
1
Context
2
Robustness
3
Reliability
4
Precision
5
Conclusion
C ANON
Uncertainty in Scheduling
October 18, 2010
9 / 42
Robustness
Robustness
Outline 1
Context
2
Robustness Louis-Claude Canon and Emmanuel Jeannot, Evaluation and Optimization of the Robustness of DAG Schedules in Heterogeneous Environments, in IEEE TPDS 21(4), April 2010. Louis-Claude Canon and Emmanuel Jeannot, Scheduling Strategies for the Bicriteria Optimization of the Robustness and Makespan, in NIDISC 2008, Miami, Floride, April 2008. Louis-Claude Canon and Emmanuel Jeannot, A Comparison of Robustness Metrics for Scheduling DAGs on Heterogeneous Systems, in heteroPar’07, Austin, Texas, September 2007.
3
Reliability
4
Precision
5
Conclusion C ANON
Uncertainty in Scheduling
October 18, 2010
10 / 42
Robustness
Robustness
Application and Platform Models Application The parallel application is specified by a task graph.
d12
t2
d24 t4
t1
For each precedence constraint, data need to be transferred.
C ANON
Uncertainty in Scheduling
d13
t3
d34
October 18, 2010
11 / 42
Robustness
Robustness
Application and Platform Models Application The parallel application is specified by a task graph.
d12
t2
d24 t4
t1
For each precedence constraint, data need to be transferred.
d13
t3
d34
Platform The parallel computing platform consists of a set of processors. Processors are unrelated: the duration of each task is specific to the executing processor. Each pair of machines is interconnected by a dedicated link. C ANON
Uncertainty in Scheduling
October 18, 2010
11 / 42
Robustness
Robustness
Uncertainty Uncertainty on Computation Durations Evaluating analytically the duration of a computation is difficult because the application and the platform are complex (methodological uncertainty). Durations are random variables.
C ANON
Uncertainty in Scheduling
October 18, 2010
12 / 42
Robustness
Robustness
Uncertainty Uncertainty on Computation Durations Evaluating analytically the duration of a computation is difficult because the application and the platform are complex (methodological uncertainty). Durations are random variables. Random Variables Each duration is represented by a set of values and probabilities.
0.7
Discrete distribution
0.6 0.5 0.4
Each probability gives the likelihood that the corresponding duration occurs during a given execution. C ANON
0.3 0.2 0.1
Uncertainty in Scheduling
0 2
4
6
8
10
12
14
16
18
Time
October 18, 2010
12 / 42
Robustness
Robustness
Evaluation of the Efficiency Efficiency Efficiency is defined by the duration of the schedule execution (Cmax ). Evaluating this duration consists in evaluating a stochastic DAG.
C ANON
Uncertainty in Scheduling
October 18, 2010
13 / 42
Robustness
Robustness
Evaluation of the Efficiency Efficiency Efficiency is defined by the duration of the schedule execution (Cmax ). Evaluating this duration consists in evaluating a stochastic DAG. Stochastic DAG Evaluation We prove the problem to be #P’Complete. #P’ is a generalization of counting problems (#P) for reliability evaluation problems [Bodlaender et al., 2004].
C ANON
v3 , X3 X13 X23
v1 , X1
Uncertainty in Scheduling
X34
X12
v4 , X4 X24
v2 , X2
October 18, 2010
13 / 42
Robustness
Robustness
Evaluation of the Efficiency Efficiency Efficiency is defined by the duration of the schedule execution (Cmax ). Evaluating this duration consists in evaluating a stochastic DAG. Stochastic DAG Evaluation We prove the problem to be #P’Complete. #P’ is a generalization of counting problems (#P) for reliability evaluation problems [Bodlaender et al., 2004].
v3 , X3 X13
X34 X23
v1 , X1 X12
v4 , X4 X24
v2 , X2
Remark on the Complexity Class Any #P’ problem based on a NP-Complete problem is #P’-Complete. However, this evaluation problem corresponds to a P problem. C ANON
Uncertainty in Scheduling
October 18, 2010
13 / 42
Robustness
Robustness
Uncertainty-Related Criterion Robustness Capacity of a system to maintain its performances despite variations (criterion depending on an efficiency measure).
C ANON
Uncertainty in Scheduling
October 18, 2010
14 / 42
Robustness
Robustness
Uncertainty-Related Criterion Robustness Capacity of a system to maintain its performances despite variations (criterion depending on an efficiency measure). Robust Schedule Random variables model the variations in the inputs. A schedule is robust if its total duration remains stable despite the task durations variations.
C ANON
0.4 Narrow distribution Wide distribution 0.35
0.3
0.25
0.2
0.15
0.1
0.05
0 0
Uncertainty in Scheduling
5
10
15
20
Time
October 18, 2010
14 / 42
Robustness
Robustness
Related Work Robustness Measures [Wu et al., 1994] Same methodological approach as ours but with unavailabilities. [Ali et al., 2004] Method for defining a relevant robustness measure. [Shestak et al., 2006] Stochastic robustness metric and deterministic robustness metric. [Bölöni et al., 2002] Total slack and differential entropy.
C ANON
Uncertainty in Scheduling
October 18, 2010
15 / 42
Robustness
Robustness
Related Work Robustness Measures [Wu et al., 1994] Same methodological approach as ours but with unavailabilities. [Ali et al., 2004] Method for defining a relevant robustness measure. [Shestak et al., 2006] Stochastic robustness metric and deterministic robustness metric. [Bölöni et al., 2002] Total slack and differential entropy. Optimization [Gao, 1995] Insertion of temporal protection. [Davenport et al., 2001] Insertion of temporal slack. [Fargier et al., 2003] Scheduling techniques using fuzzy logic. C ANON
Uncertainty in Scheduling
October 18, 2010
15 / 42
Robustness
Robustness
Robustness Measures
Many measures exist in the literature. For a robust schedule, we expect: a small standard deviation of the total duration a small differential entropy of the total duration a large expected slack (large temporal protection) values for the stochastic metrics close to 1 a lateness probability close to 0 a small 99th percentile of the total duration (almost equivalent to the expected value of the total duration)
C ANON
Uncertainty in Scheduling
October 18, 2010
16 / 42
Empirical Comparison
Application layered task graph with 1000 tasks, 10% uncertainty, beta distribution Platform 50 processors (2.5 GFLPOS each) Schedules 5000 randomly generated
Robustness
Robustness
Selected Measures Robustness Measure The standard deviation of the total duration is equivalent to the entropy of the total duration and one of the stochastic metrics.
C ANON
Uncertainty in Scheduling
October 18, 2010
18 / 42
Robustness
Robustness
Selected Measures Robustness Measure The standard deviation of the total duration is equivalent to the entropy of the total duration and one of the stochastic metrics. Expected Value of the Slack Invalid measure of robustness (no correlation with other robustness measures).
C ANON
Uncertainty in Scheduling
October 18, 2010
18 / 42
Robustness
Robustness
Selected Measures Robustness Measure The standard deviation of the total duration is equivalent to the entropy of the total duration and one of the stochastic metrics. Expected Value of the Slack Invalid measure of robustness (no correlation with other robustness measures). Multi-criteria Problem Correlation between expected value and standard deviation of the total duration. However, the efficiency and the robustness are not equivalent.
C ANON
Uncertainty in Scheduling
October 18, 2010
18 / 42
Robustness
Robustness
Pareto Region Study Methods
5e−02 2e−02
Standard deviation
5e−03 2e−03
Multi-criteria evolutionary algorithm: we prove its convergence.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5e−04
Greedy construction: aggregate both criteria and schedule each task by making the best local choice.
2e−01
Search space
●
● ● ●● ●● ●
1
2
5
10
20
50
SW SE NW Random
100
Mean
C ANON
Uncertainty in Scheduling
October 18, 2010
19 / 42
Robustness
Robustness
Pareto Region Study Methods
5e−02 2e−02
Standard deviation
5e−03 2e−03
Multi-criteria evolutionary algorithm: we prove its convergence.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
5e−04
Greedy construction: aggregate both criteria and schedule each task by making the best local choice.
2e−01
Search space
●
● ● ●● ●● ●
1
2
5
10
20
50
SW SE NW Random
100
Mean
Conclusion Several methods that estimate the Pareto-front. C ANON
Uncertainty in Scheduling
October 18, 2010
19 / 42
Reliability
Reliability
Outline
1
Context
2
Robustness
3
Reliability Anne Benoit, Louis-Claude Canon, Emmanuel Jeannot and Yves Robert. On the complexity of task graph scheduling with transient and fail-stop failures, submitted to Journal of Scheduling.
4
Precision
5
Conclusion
C ANON
Uncertainty in Scheduling
October 18, 2010
20 / 42
Reliability
Reliability
Application and Platform Models Application The parallel application is specified by a task graph.
t2 t4
t1 t3
C ANON
Uncertainty in Scheduling
October 18, 2010
21 / 42
Reliability
Reliability
Application and Platform Models Application The parallel application is specified by a task graph.
t2 t4
t1 t3
Platform The parallel computing platform consists of a set of processors. Processors are unrelated: the duration of each task is specific to the executing processor. Immediate synchronizations occur on the network. C ANON
Uncertainty in Scheduling
October 18, 2010
21 / 42
Reliability
Reliability
Fault Model Uncertainty on Computation Successes Each machine may fail during the execution of a task with a non-zero probability (aleatory uncertainty).
C ANON
Uncertainty in Scheduling
October 18, 2010
22 / 42
Reliability
Reliability
Fault Model Uncertainty on Computation Successes Each machine may fail during the execution of a task with a non-zero probability (aleatory uncertainty). Transient failures An execution fails but the processor recovers immediately. Example: arithmetic/software errors or recoverable hardware faults (power loss).
C ANON
Uncertainty in Scheduling
October 18, 2010
22 / 42
Reliability
Reliability
Fault Model Uncertainty on Computation Successes Each machine may fail during the execution of a task with a non-zero probability (aleatory uncertainty). Transient failures An execution fails but the processor recovers immediately. Example: arithmetic/software errors or recoverable hardware faults (power loss). Fail-stop failures A processor dies until the end of the schedule (all remaining tasks fail). Example: hardware resource crashes, recovery of a loaned machine by a user during a cycle-stealing episode.
C ANON
Uncertainty in Scheduling
October 18, 2010
22 / 42
Reliability
Reliability
Scheduling with Replication General policy Each task can be scheduled after at least one replica of each of its predecessors is finished (if there is no failure).
t2
p1 p2
t1 t2
p3 p4
t1 time
C ANON
Uncertainty in Scheduling
October 18, 2010
23 / 42
Reliability
Reliability
Scheduling with Replication General policy Each task can be scheduled after at least one replica of each of its predecessors is finished (if there is no failure).
t2
p1 p2
t1 t2
p3 p4
t1 time
Strict policy A task must be scheduled after all the end times of the replicas of its predecessors. Also called replication for reliability scheme.
C ANON
Uncertainty in Scheduling
October 18, 2010
23 / 42
Reliability
Reliability
Uncertainty-Related Measure
Reliability The reliability of a static schedule is the probability that it terminates successfully. A schedule is successful if all tasks have at least one successful replica. The execution of any replica is successful if at least one replica for each of its predecessors are successfully executed and if the processor does not fail during the execution (or has not yet been subjected to a fail-stop failure).
C ANON
Uncertainty in Scheduling
October 18, 2010
24 / 42
Reliability
Reliability
Related Work
Bi-criteria Scheduling [Dongarra et al., 2007] Scheduling without replication. [Jeannot et al., 2008] Scheduling without replication. [Girault et al., 2009] Strict scheduling with transient faults.
C ANON
Uncertainty in Scheduling
October 18, 2010
25 / 42
Reliability
Reliability
Related Work
Bi-criteria Scheduling [Dongarra et al., 2007] Scheduling without replication. [Jeannot et al., 2008] Scheduling without replication. [Girault et al., 2009] Strict scheduling with transient faults. Reliability Block Diagram [Bream, 1995] Introduce a diagram-based technique for reliability evaluation. This problem is considered difficult (NP-Hard).
C ANON
Uncertainty in Scheduling
October 18, 2010
25 / 42
Reliability
Reliability
Reliability Evaluation
Main Contribution We prove the general problem to be #P’-Complete.
C ANON
Uncertainty in Scheduling
October 18, 2010
26 / 42
Reliability
Reliability
Reliability Evaluation
Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.
C ANON
p1 p2
t2 t1 t2
p3 p4
Uncertainty in Scheduling
t1 time
October 18, 2010
26 / 42
Reliability
Reliability
Reliability Evaluation
Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.
C ANON
p1 p2
t2 t1 t2
p3 p4
Uncertainty in Scheduling
t1 time
October 18, 2010
26 / 42
Reliability
Reliability
Reliability Evaluation
Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.
C ANON
p1 p2
t2 t1
p3 p4
Uncertainty in Scheduling
t2 t1 time
October 18, 2010
26 / 42
Reliability
Reliability
Reliability Evaluation
Main Contribution We prove the general problem to be #P’-Complete. Intuition of the Exponential Evaluation The problem can be solved by a exponential algorithm by considering each equiprobable scenarios. Then, the solution can be found by counting the number of scenarios that lead to a successful schedule.
C ANON
p1 p2
t2 t1 t2
p3 p4
Uncertainty in Scheduling
t1 time
October 18, 2010
26 / 42
Reliability
Reliability
Monotonic Chain Case
Monotonicity Property A schedule of chains is monotonic if the success of any task on processor pj depends only on the successes of the first j processors.
C ANON
Uncertainty in Scheduling
October 18, 2010
27 / 42
Reliability
Reliability
Monotonic Chain Case
Monotonicity Property A schedule of chains is monotonic if the success of any task on processor pj depends only on the successes of the first j processors. Evaluation Algorithm The reliability of a monotonic schedule π on a platform with fail-stop failures is rel(π) = Pr[Jnm ]. Pr[Ji1 ] = Pr[Ei1 ] Pr[J1j ] = Pr[J1 j−1 ] + (1 − Pr[J1 j−1 ]) Pr[E1j ] Pr[Jij ] = Pr[Ji j−1 ] + (Pr[Jρ(i,j)−1 j−1 ] − Pr[Ji j−1 ]) Pr[Eij ]
C ANON
Uncertainty in Scheduling
October 18, 2010
27 / 42
Reliability
Reliability
Summary T Transient F Fail-stop GT
GF
GFm
G General S Strict m monotonic
ST
SF
SFm
t triangular dark gray #P’-Complete light gray polynomial
∅
SFt
white open bleu significant contribution arrow reduction ≤m
C ANON
Uncertainty in Scheduling
October 18, 2010
28 / 42
Precision
Precision
Outline
1
Context
2
Robustness
3
Reliability
4
Precision Louis-Claude Canon, Emmanuel Jeannot and Jon Weissman, A Dynamic Approach for Characterizing Collusion in Desktop Grids, in IEEE IPDPS, Atlanta, Georgia, April 2010.
5
Conclusion
C ANON
Uncertainty in Scheduling
October 18, 2010
29 / 42
Precision
Precision
Application and Platform Models Application The parallel application is specified by a set of independent tasks.
C ANON
Uncertainty in Scheduling
t1
t2
t3
t4
October 18, 2010
30 / 42
Precision
Precision
Application and Platform Models Application The parallel application is specified by a set of independent tasks.
t1
t2
t3
t4
Platform The parallel computing platform consists of a set of processors. Processors are unrelated: the duration of each task is specific to the executing processor. Machines are connected to a central server via Internet.
C ANON
Uncertainty in Scheduling
October 18, 2010
30 / 42
Precision
Precision
Uncertainty Uncertainty on Result Correctness The results returned by each machine may be incorrects (epistemic uncertainty). 35% of SETI@home participants have given at least one incorrect result [Kondo et al., 2007]. These Byzantine faults are due to unreliability or malicious behaviors (for credit increase).
C ANON
Uncertainty in Scheduling
October 18, 2010
31 / 42
Precision
Precision
Uncertainty Uncertainty on Result Correctness The results returned by each machine may be incorrects (epistemic uncertainty). 35% of SETI@home participants have given at least one incorrect result [Kondo et al., 2007]. These Byzantine faults are due to unreliability or malicious behaviors (for credit increase). Incorrectness Induces Redundancy Each task can be assigned to several machines. One of the received results must be selected as the final one by a certification mechanism.
C ANON
Uncertainty in Scheduling
October 18, 2010
31 / 42
Precision
Precision
Threat Model Collusion Some machines produce the same incorrect result for a given task. Task1 Non-colluders Collusion
Task2
No collusion Colluders
C ANON
Uncertainty in Scheduling
October 18, 2010
32 / 42
Precision
Precision
Threat Model Collusion Some machines produce the same incorrect result for a given task. Task1 Non-colluders Collusion
Task2
No collusion Colluders
Colluding groups Machines can be partitioned into several groups: machines in the same group always return the same result for a given task. Collusion occurs with a given probability. There may be cooperation between distinct colluding groups. C ANON
Uncertainty in Scheduling
October 18, 2010
32 / 42
Precision
Precision
Characterization Problematic Objective Study the machine behaviors: estimate the probability that any pair of machines gives the same incorrect result for the same task. estimate the colluding groups composition.
C ANON
Uncertainty in Scheduling
October 18, 2010
33 / 42
Precision
Precision
Characterization Problematic Objective Study the machine behaviors: estimate the probability that any pair of machines gives the same incorrect result for the same task. estimate the colluding groups composition. Inputs Chronological succession of events: < d, p, t, r > at time d, machine p finishes task t and returns result r . < d, t > at time d, task t finishes.
C ANON
Uncertainty in Scheduling
October 18, 2010
33 / 42
Precision
Precision
Related Work
Scheduling [Zhao et al., 2005] quiz [Domingues et al., 2007] intermediate verifications [Silaghi et al., 2009] use a reputation system to detect colluders [Krings et al., 2005] a posteriori verification of results [Wong, 2005] no unreliability and verification of results
C ANON
Uncertainty in Scheduling
October 18, 2010
34 / 42
Precision
Precision
Related Work
Scheduling [Zhao et al., 2005] quiz [Domingues et al., 2007] intermediate verifications [Silaghi et al., 2009] use a reputation system to detect colluders [Krings et al., 2005] a posteriori verification of results [Wong, 2005] no unreliability and verification of results Reputation System [Kamvar et al., 2003] EigenTrust algorithm [Jøsang, 1999] subjective logic that allows reaching consensus
C ANON
Uncertainty in Scheduling
October 18, 2010
34 / 42
Precision
Precision
Interaction Model
Two Interaction Representations Interaction between machines i and j: collusion machines i and j collude together (collusion estimation cij ) agreement machines i and j agree together (agreement estimation aij )
C ANON
Uncertainty in Scheduling
October 18, 2010
35 / 42
Precision
Precision
Interaction Model
Two Interaction Representations Interaction between machines i and j: collusion machines i and j collude together (collusion estimation cij ) agreement machines i and j agree together (agreement estimation aij ) Relations aij ≤ 1 + 2 × cij − cii − cjj cij ≤
1+aij −a1i −a1j 2
C ANON
(given that the index of the largest groups is 1)
Uncertainty in Scheduling
October 18, 2010
35 / 42
Precision
Precision
Online Algorithm 1
1
1
Data structure Nodes correspond to sets of machines.
0.4
0.4
Edges correspond to interaction characteristics (agreement here). 1
C ANON
Uncertainty in Scheduling
October 18, 2010
36 / 42
Precision
Precision
Online Algorithm 1
1
1
Data structure Nodes correspond to sets of machines.
0.4
0.4
Edges correspond to interaction characteristics (agreement here). 1
Algorithm Initially, each machine is in a singleton. Estimated groups are successively merged and split.
C ANON
Uncertainty in Scheduling
October 18, 2010
36 / 42
Precision
Precision
Online Algorithm 1
Data structure Nodes correspond to sets of machines. Edges correspond to interaction characteristics (agreement here).
0.4
1
Algorithm Initially, each machine is in a singleton. Estimated groups are successively merged and split.
C ANON
Uncertainty in Scheduling
October 18, 2010
36 / 42
Collusion Probability
0.5
Agreement convergence time Collusion convergence time Agreement stabilized accuracy Collusion stabilized accuracy
0.3 0.2 0.1
1
−
−
−
−
0.01
−
−
−
−
−
− 0.5
1 Collusion probability
−
−
−
− 0.0
−
Pair
Collusion RMSD
2
−
0
Time (days)
3
0.4
4
Time required for convergence
Convergence time time needed in order to achieve a desirable accuracy Stabilized accuracy accuracy achieved after a large number of events
Conclusion
Outline
1
Context
2
Robustness
3
Reliability
4
Precision
5
Conclusion
C ANON
Uncertainty in Scheduling
October 18, 2010
38 / 42
Conclusion
Summary
Uncertainty
Scheduling
Object
Nature
Origin
Type
computation
methodo-
hardware,
optimiza-
duration
logical
software
tion
aleatory
hardware
evaluation
software,
caracteri-
human
zation
computation success result correctness
C ANON
epistemic
Uncertainty in Scheduling
Criterion
Problem
robustness
R|prec|Cmax
reliability
R|prec|Cmax
precision
R|online − P time − nclv | Ci
October 18, 2010
39 / 42
Conclusion
Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness
C ANON
Uncertainty in Scheduling
October 18, 2010
40 / 42
Conclusion
Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness
Reliability taxonomy of reliability problems complexity classes of corresponding evaluation problems
C ANON
Uncertainty in Scheduling
October 18, 2010
40 / 42
Conclusion
Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness
Reliability taxonomy of reliability problems complexity classes of corresponding evaluation problems
Precision strong model of adversity methods for estimating machine behaviors
C ANON
Uncertainty in Scheduling
October 18, 2010
40 / 42
Conclusion
Results Robustness comparison of robustness measures (selection of the standard deviation) multi-criteria methods that tackle robustness
Reliability taxonomy of reliability problems complexity classes of corresponding evaluation problems
Precision strong model of adversity methods for estimating machine behaviors
Generic conclusion probabilistic dimension and multi-criteria formulation
C ANON
Uncertainty in Scheduling
October 18, 2010
40 / 42
Conclusion
Future Directions
Defeat collusion using our proposed characterization system
C ANON
Uncertainty in Scheduling
October 18, 2010
41 / 42
Conclusion
Future Directions
Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty)
C ANON
Uncertainty in Scheduling
October 18, 2010
41 / 42
Conclusion
Future Directions
Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty) Explore other multi-criteria techniques (e.g., -constraint method)
C ANON
Uncertainty in Scheduling
October 18, 2010
41 / 42
Conclusion
Future Directions
Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty) Explore other multi-criteria techniques (e.g., -constraint method) Propose tractable uncertainty models
C ANON
Uncertainty in Scheduling
October 18, 2010
41 / 42
Conclusion
Future Directions
Defeat collusion using our proposed characterization system Develop online scheduling algorithm (relevant in case of high uncertainty) Explore other multi-criteria techniques (e.g., -constraint method) Propose tractable uncertainty models Design guaranteed algorithm
C ANON
Uncertainty in Scheduling
October 18, 2010
41 / 42
Conclusion
Questions
Thank you for your attention. Questions, comments, remarks, . . .
C ANON
Uncertainty in Scheduling
October 18, 2010
42 / 42
Conclusion
Defeating the Collusion Uncertainty-Related Criterion precision proportion of correctly certified results overhead average duplication ration Effect of the trace
3.5
Overhead
0.985
BOINC precision CAA precision BOINC overhead CAA overhead
0.980
Precision
0.990
4.0
0.995
●
●
0.970
0.975
3.0
●
●
●
(Overnet, 1000)
(Microsoft, 1500)
(SETI@Home, 2000)
Trace file
C ANON
Uncertainty in Scheduling
October 18, 2010
43 / 42
Conclusion
Literature on Collusion Wong, An authentication protocol in Web-computing, IPDPS 2006 Propose a ring-based scheduling approach (duplication ratio of 2). Estimate the fraction and probability of collusion (given that non-colluding machines do not fail). Certify the results based on a probabilistic analysis of the results (using only the results of the active tasks). Extension with an audit mechanism that take into account the estimation (better than random sampling).
C ANON
Uncertainty in Scheduling
October 18, 2010
44 / 42
Conclusion
Literature on Collusion Wong, An authentication protocol in Web-computing, IPDPS 2006 Propose a ring-based scheduling approach (duplication ratio of 2). Estimate the fraction and probability of collusion (given that non-colluding machines do not fail). Certify the results based on a probabilistic analysis of the results (using only the results of the active tasks). Extension with an audit mechanism that take into account the estimation (better than random sampling). Taufer, Anderson et al., Homogeneous Redundancy, IPDPS 2005 Characterize a divergent application: large numerical differences in the results generated by different machines. Fuzzy comparison is not sufficient. Homogeneous redundancy: assign tasks to numerically equivalent machines (same software and platform characteristics). C ANON
Uncertainty in Scheduling
October 18, 2010
44 / 42
Conclusion
Multi-criteria Optimization f2
o2 o1 o3
o4
f1 C ANON
Uncertainty in Scheduling
October 18, 2010
45 / 42
Conclusion
Schedule Size Inputs Number of tasks: n Number of processors: m Duration of every possible executions (each tasks on each processors): wij Longest duration: W = maxij wij The size of the input is O(nm log(W )).
C ANON
Uncertainty in Scheduling
October 18, 2010
46 / 42
Conclusion
Schedule Size Inputs Number of tasks: n Number of processors: m Duration of every possible executions (each tasks on each processors): wij Longest duration: W = maxij wij The size of the input is O(nm log(W )). Schedule providing Dates Largest date: O(nW ) (each task are scheduled on the slowest processor) With duplication, each task may be scheduled on each processor: there is O(nm) dates. A schedule requires O(nm log(nW )) space for encoding the dates. C ANON
Uncertainty in Scheduling
October 18, 2010
46 / 42
Conclusion
Stochastic DAG evaluation
Random variables type Discrete Non-discrete
C ANON
Chain regular domain normal, gamma, Erlang
Join all exponential, Weibull
Uncertainty in Scheduling
Seriesparallel regular domain none
October 18, 2010
47 / 42