Bibliography for Storage Model

Mar 14, 2012 - [1] Jinoh Kim, A. Chandra, and Jon B. Weissman. Passive network performance estimation for large-scale, data-intensive computing.
114KB taille 5 téléchargements 410 vues
Bibliography for Storage Model March 14, 2012

Contents 1

Analytical Model

1

2

Statistical Model

2

3

General Simulation

3

4

Parallel File System Simulation

4

5

Misc

4

6

TODO

5

7

Modeling Storage Devices from Simitci's Book

6

8

Modeling Storage Networks from Simitci's Book

7

9

Books

7

1

Analytical Model

[1, 2, 3]

References [1] Robert Ross.

Reactive Scheduling for Parallel I/O Systems.

PhD thesis, Clemson University, 2000.

Propose an analytical model for estimating read and write request times on one I/O sever in Appendix A. The model captures disk, network and memory eects and assumes constant bandwidths. Each Request time is divided into a setup time and a transfer time. The network and the disk transfers overlap over some period, which is determined by the state of the memory. The larger free memory, the larger this overlapping. Finally, dirty page writeback interference is accounted for. Two potential errors: Equations A.7 to A.9 do not correctly formalize the overlap (when the fraction is one third, it means that half the transfer occurs simultaneously on the network and one the disk); the ability to cache entire requests does not improve overlapping. Contention is not taken into account in the case of multiple requests. An improved bandwidth model is proposed to assess lower bandwidth for small requests (each bandwidth has two modes, slow and fast, and the transition between both is linear in request sizes). [2] Huseyin Simitci. An analytical throughput model for bulk data transfers over wide area networks. In IEEE International Symposium on Cluster Computing and the Grid, CCGrid'04, pages 602609, April

2004. Propose an analytical model to predict transfer times between two remote sites based on queuing

1

theory. The model is a closed queuing network in which there is a xed number of concurrent transfers. Their durations is determined by latencies and waiting periods on each network equipment. As their is no closed form to express the expected duration of a transfer, a Mean Value Analysis is performed. Lower and upper bounds are also proposed. I was not able to reproduce the theoretical curves on Figures 6 and 7, even when trying to infer the missing values. Also, I thought of an approximation: equation 9 could contains the average of S_i instead of their sum. This is an approximation because it assumes that each queue is equally lled, which is true only when their service times is equal. More details about queuing theory in The Art of Computer Systems Performance Analysis from Jain (1991). [3] Yuhui Deng and Frank Wang.

Exploring the performance impact of stripe size on network attached

storage systems. Journal of Systems Architecture, 54(8):787796, 2008. Study SAN that are based on RAID (aggregate the performance, storage capacity and reliability of several disks). Propose an analytical model of performance for determining the time to read or to write data. Varki et al. suggest to model only cache and disk for predict SAN performance. Overall, the paper presentation is very clean and it is interesting for its discussion/presentation of several notions: sub-commands combination (aggregating several reads or writes in order to reposition the disk drive only once), augmented storage interface (for accessing the size of a track) and scatter/gather (allows DMA with non-contiguous data in disk cache). Those principles are used to motivate the paper thesis: stripe size has a negligible performance impact. The analytical model that support this hypothesis relies on basic mathematics. The empirical study does not validate the model as such, only its consequence (the hypothesis that stripe size has little impact). The IO benchmark is Bonnie++ and mostly focuses on small les (90% of the requests). However, their impact on the performance can still be negligible. Additionally, the statistical dispersion is not assessed and it is not clear that a maximum of 2% CPU utilization indicates that the CPU performance has a negligible impact? The study is close to disk drive eects, what about solid state drive?

2

Statistical Model

[1, 2, 3]

References [1] Jay Lofstead, Fang Zheng, Qing Liu, Scott Klasky, Ron Oldeld, Todd Kordenbrock, Karsten Schwan, and Matthew Wolf. Managing variability in the io performance of petascale storage systems. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC '10, pages 112.

ACM, November 2010.

Characterize a performance issue due to I/O interferences in HPC. Internal

interferences occur when several processes in the same application access to the same storage resource (inducing contention). External interferences are related to other concurrent applications. The paper describes several experiments that characterize these interferences on 3 machines (among which, Jaguar and Franklin) using some application and the IOR benchmark. When increasing the load, congestion is observed (the aggregated bandwidth decreases). These interferences cause imbalance in the usage of the storage system. The considered applications are assumed to consist of a set of processes, each writing periodically in a le on a specic storage node. To tackle the imbalance problem, the proposed mechanism distribute the le on available nodes when necessary. The proposed adaptive writes are thus standard writes that are performed on less loaded storage resources than where they were initially supposed to be performed. [2] Mario Lassnig, Thomas Fahringer, Vincent Garonne, Angelos Molfetas, and Miguel Branco. Identication, modelling and prediction of non-periodic bursts in workloads. In IEEE International Symposium on Cluster Computing and the Grid, CCGrid'10, pages 485494, May 2010. Provide empirical evidence

showing that classical metrics are not sensitive to incorrect predictions of bursts in batch environments. Propose a prediction method for non-periodic bursts and a metric to assess the quality of such methods.

2

As stated in the conclusion, the work is preliminary and its goal was to give a proof of concept and to establish lower bounds on which to improve. Some aws in the content and the presentation: the proposed method is not understandable and is explained only textually. Algorithm 2 and its description does not match (text says that superuous predictions are penalized, whereas Lines 15 to 19 of Algorithm 2 concerns identied bursts). Algorithm 2 inputs are not clean (useless parameters and raw data): two lists would be sucient, the predicted bursts and the identied burst. It is unclear whether the proposed is missing from the nal evaluation. Moreover, the proposed metric to evaluate bursts predictions leads to many inconclusive values (its usefulness is questionable). There is a fundamental question: how to predict non-periodic bursts? The paper assumes that any burst occurs after some precondition (the durations between each burst are similar) that seems to be related to some periodic behavior (contradiction?). Even though the work is still in early development, the presentation provides enough elements that could allow a independent continuation (and the authors invite researchers to use their DQ2 trace). Mention of mass-storage systems (CASTOR, dCache, DPM). Several good general ideas: instead of predicting bursts, estimate a probability that a burst will occur; RMSE is close to MAE when there is no error concerning outliers; because of time constraints, do slow science and present partial results rather than hastening towards poor contributions. [3] Michael P. Kasick, Jiaqi Tan, Rajeev Gandhi, and Priya Narasimhan. Black-box problem diagnosis in parallel le systems.

In USENIX Conference on File and Storage Technologies, FAST'10, pages 44,

February 2010. Rather an empirical approach to fault diagnosis that any statistical modelling. Propose an approach to detect and diagnosis faults in strip-based parallel le systems, i.e., those for which any le is distributed on all the server (PVFS, Lustre). Diagnosis diers from detection in that they want to identify the fault cause among the following set: detectable increased disk load (eg, updatedb); undetectable increased disk load; detectable increased network load (backup process); undetectable increased network load (packet loss). The approach assume that performance requirements are unknown (black-box method). To achieve this goal, a set of metrics about network and I/O are studied. The hypothesis is that in strip-based systems, any fault will induce an asymmetry in the data servers. The nal solution is based on a succession of thresholds (between ve and ten) and a training phase, which makes its applicability questionable. Also, the experiments use at most 10 clients and 12 servers and Ganesha is not compared to for the detection part. Use several benchmark: dd (only one le without I/O from clients), IOZone, PostMark (many small les stressing the metadata servers). Despite its length, the article is exemplary in its presentation: the hypothesis on which is based the solution is emphasized (and supporting observation are provided), the assumptions clearly identify the applicative context, many discussions provide a deep understanding of the phenomenons, extensive details are given about the experiments (even the disk and network card characteristics). Note the concept of read-ahead for Lustre (read other servers for next strips even when one is still pending). Also, TroveMethod is set to directio for PVFS conguration (10Kullback-Leibler divergence.

3

General Simulation

[1, 2]

References [1] Garrett R. Yaun, David Bauer, Harshad L. Bhutada, Christopher D. Carothers, Murat Yuksel, and Shivkumar Kalyanaraman.

Large-scale network simulation techniques: Examples of tcp and ospf models.

ACM SIGCOMM Computer Comunications Review, 33(3):2741, July 2004. ROSS and ROSSNet simu-

lators that are based on optimistic parallel techniques: simulation objects are allowed to process events independently and can undo computation if an error is performed (reverse computation). The system is stated to exploit parallelism.

3

[2] A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, Ron Oldeld, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob.

The structural simulation toolkit.

ACM SIGMETRICS

Performance Evaluation Review, 38(4):3742, March 2011. SST includes a collection of hardware compo-

nent models including processors, disks, memories and networks at dierent accuracy. SST use a parallel component-based discrete event simulation based on MPI. The users are able to leverage multi-scale nature of SST by trading o between accuracy, complexity, and time to solution.

4

Parallel File System Simulation

[1, 2, 3]

References [1] Ning Liu, Christopher D. Carothers, Jason Cope, Philip Carns, Robert Ross, Adam Crume, and Carlos Maltzahn. Modeling a leadership-scale storage system. In International Conference on Parallel Processing and Applied Mathematics, PPAM'10, September 2011. Discrete-event simulator of a storage system whose

multi-layer architecture relies on PVFS. While details are given about the architecture, few are given regarding the simulation techniques: each element is modeled as a logical process that consists of three buers (incoming, outgoing and processing). Buers are connected relatively to the simulated network and the time to process each entry is not dened. It is based on the discrete-event simulator ROSS. The simulator is shown to be unable to tackle network contention, despite some suggestions for extending the model. [2] E. Molina-Estolano, Carlos Maltzahn, J. Bent, and S. A. Brandt. Building a parallel le system simulator. Journal of Physics: Conference Series, 180(1), 2009. IMPIOUS simulator for fast evaluation of parallel

le system designs. Do not present the specics of the underlying models. The parameters of the storage system model include the data placement strategy (round robin, randomized, biased towards emptier nodes), resource locking protocol, redundancy mechanism, client buer cache (synchronous or not, pages or arbitrary extents), the number of clients, the number of data providers, the page size, the network and disk models. The model is instantiated for several parallel le system: PVFS, PanFS and Ceph. On each machine of the storage system, a NaiveFS model is used on top of Disksim. Their experiments make use of the PatternIO benchmark. While the simulations are considerably imprecise, the trends are correctly reported by the proposed simulator. [3] Bradley Settlemyer. University, 2009.

A Study of Client-side Caching in Parallel File Systems.

PhD thesis, Clemson

PhD on Caching for parallel I/O. Focused on a model designed for a specic archi-

tecture (PVFS). It presents the Hecios simulator for validating caching strategies (for small messages, the eciency is limited by network and disk latencies). Hecios is based on INET from OMNeT++ for simulating TCP transfers.

5

Misc

[1]

References [1] Jinoh Kim, A. Chandra, and Jon B. Weissman. Passive network performance estimation for large-scale, data-intensive computing.

IEEE Transactions on Parallel and Distributed Systems, 22(8):13651373,

2011. Propose a system for disseminating information with the objective of predicting network performance in a distributed system. The prediction is based on prior measurements and the approach is passive

4

(no explicit additional measurement). Instead of performing point-to-point measurements between each pair of nodes, the system propagates all relevant information and infer indirectly the estimation when possible. The dissemination approach is the core contribution and is optimized with two mechanisms: critical measures are disseminated immediately, while others are buered and sent together later; conrming measures are discarded. The approach is validated with PlanetLab traces, GridFTP workloads and S3 data traces.

6

TODO

[1, 2, 3] [4, 5, 6, 7, 8, 9] [10, 11, 12, 13] [14, 15, 16, 17, 18, 19]

References [1] Pedro Velho, Lucas Schnorr, Henri Casanova, and Arnaud Legrand. Flow-level network models: have we reached the limits? Research Report RR-7821, INRIA, November 2011. [2] Pedro Velho and Arnaud Legrand.

Accuracy study and improvement of network simulation in the

simgrid framework. In International Conference on Simulation Tools and Techniques, SIMUTools '09, pages 13:113:10, ICST, Brussels, Belgium, Belgium, 2009. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). [3] Dror G. Feitelson.

Workload modeling for performance evaluation.

In Performance Evaluation of

Complex Systems: Techniques and Tools, pages 114141, London, UK, UK, 2002. Springer-Verlag.

[4] Abhinav Bhatele and Laxmikant V. Kalé. Quantifying network contention on large parallel machines. Parallel Processing Letters, 19(4):553572, 2009.

[5] Li Ou, Xubin He, S.L. Scott, Zhiyong Xu, and Yung chin Fang. performance parallel le system.

Design and evaluation of a high

In IEEE Conference on Local Computer Networks, pages 100107,

November 2005. [6] Huseyin Simitci and Daniel A. Reed. Adaptive disk striping for parallel input/output. In IEEE Symposium on Mass Storage Systems, pages 88102. IEEE, 1999.

[7] Mohd Nazri Ismail and Abdullah Mohd Zin. Comparing the accuracy of simulation model with local area network, wide area network and test-bed for remote data transfers measurement.

International

Journal of Soft Computing Applications, 3:164186, June 2008.

[8] William T. C. Kramer and Clint Ryan.

Performance variability of highly parallel architectures.

In

International Conference on Computational Science, ICCS'03, pages 560569, Berlin, Heidelberg, 2003.

Springer-Verlag. [9] Lavanya Ramakrishnan and Daniel A. Reed. Performability modeling for scheduling and fault tolerance strategies for scientic workows. In International Symposium on High performance Distributed Computing, HPDC '08, pages 2334, New York, NY, USA, 2008. ACM.

[10] Nancy Tran and Daniel A. Reed. Automatic arima time series modeling for adaptive i/o prefetching. IEEE Transactions on Parallel and Distributed Systems, 15(4):362377, 2004.

[11] Renato F. Yonggang Liu. Towards simulation of parallel le system scheduling algorithms with PFSsim. In IEEE International Workshop on Storage Network Architecture and Parallel I/O, May 2011. [12] Nisheeth K. Vishnoi. The impact of noise on the scaling of collectives: the nearest neighbor model. In International Conference on High Performance Computing, HiPC'07, pages 476487, Berlin, Heidelberg,

2007. Springer-Verlag.

5

[13] Gengbin Zheng, Gagan Gupta, Eric Bohm, Isaac Dooley, and Laxmikant V. Kalé. Simulating large scale parallel applications using statistical models for sequential execution blocks.

In IEEE International

Conference on Parallel and Distributed Systems, ICPADS '10, pages 221228, Washington, DC, USA,

2010. IEEE Computer Society. [14] Mario Lassnig, Thomas Fahringer, Vincent Garonne, Angelos Molfetas, and Martin Barisits. A similarity measure for time, frequency, and dependencies in large-scale workloads. In International Conference for High Performance Computing, Networking, Storage and Analysis, SC '11, pages 43:143:11, New York,

NY, USA, November 2011. ACM. [15] Hai Nguyen and Amy Apon.

Hierarchical performance measurement and modeling of the linux le

system. In International Conference on Performance Engineering, ICPE '11, pages 7384, New York, NY, USA, 2011. ACM. [16] Kalyan S. Perumalla.

µπ :

a scalable and transparent system for simulating mpi programs. In Inter-

national Conference on Simulation Tools and Techniques, SIMUTools '10, pages 62:162:6, Brussels,

Belgium, 2010. ICST (Institute for Computer Sciences, Social-Informatics and Telecommunications Engineering). [17] Steve C. Chiu, Wei keng Liao, and Alok N. Choudhary.

Distributed smart disks for i/o-intensive

workloads on switched interconnects. Future Generation Computer Systems, 22(5):643656, April 2006. [18] Mustafa Uysal, Guillermo A. Alvarez, and Arif Merchant. A modular, analytical throughput model for modern disk arrays. In International Symposium in Modeling, Analysis and Simulation of Computer and Telecommunication Systems, MASCOTS '01, pages 183193, Washington, DC, USA, 2001. IEEE

Computer Society. [19] Prabu Dorairaj, Devi Prasad Bhukya, and Lakshminarayana Prasad Kantam. Performance tuning of storage system using design of experiments.

7

Modeling Storage Devices from Simitci's Book

[1, 2, 3, 4, 5, 6]

References [1] Elizabeth Shriver, Arif Merchant, and John Wilkes. An analytic behavior model for disk drives with readahead caches and request reordering. ACM SIGMETRICS Performance Evaluation Review, 26(1):182191, June 1998. [2] Elizabeth Varki. Response time analysis of parallel computer and storage systems. IEEE Transactions on Parallel and Distributed Systems, 12(11):11461161, November 2001.

[3] Daniel A. Menascé, Odysseas I. Pentakalos, and Yelena Yesha. An analytic model of hierarchical mass storage systems with network-attached storage devices. ACM SIGMETRICS Performance Evaluation Review, 24(1):180189, May 1996.

[4] Edward K. Lee and Randy H. Katz. An analytic performance model of disk arrays. ACM SIGMETRICS Performance Evaluation Review, 21(1):98109, June 1993.

[5] Michelle Y. Kim and Asser N. Tantawi. Asynchronous disk interleaving: Approximating access delays. IEEE Transactions on Computers, 40(7):801810, July 1991.

[6] Bruce L. Jacob, Peter M. Chen, Seth R. Silverman, and Trevor N. Mudge.

An analytical model for

designing memory hierarchies. IEEE Transactions on Computers, 45(10):11801194, October 1996.

6

8

Modeling Storage Networks from Simitci's Book

[1, 2, 3] Check IEEE Conference on Massive Data Storage.

References [1] Jitendra Padhye, Victor Firoiu, Don Towsley, and Jim Kurose. Modeling tcp throughput: a simple model and its empirical validation. ACM SIGCOMM Computer Comunications Review, 28(4):303314, October 1998. [2] K. Voruganti and P. Sarkar. An analysis of three gigabit networking protocols for storage area networks. In IEEE International Conference on Performance, Computing and Communications, pages 259265, April 2001. [3] Yao-Long Zhu, Shu-Yu Zhu, and Hui Xiong. Performance analysis and testing of the storage area network. In IEEE Symposium on Massive Data Storage, USA, April 2002.

9

Books

[1, 2, 3, 4]

References [1] Raj Jain. The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation, and Modeling. Wiley- Interscience, New York, NY, April 1991.

[2] Douglas C. Montgomery. Design and Analysis of Experiments. Wiley, 8 edition, April 2012. [3] Erol Gelenbe and Guy Pujolle. Introduction to Queueing Networks. Wiley, 2 edition, July 1998. [4] Huseyin Simitci. Storage Network Performance Analysis. Wiley, 2003.

7