Article

Article title STOCHASTIC SIMULATION AND INDICES ESTIMATIONS OF STRUCTURAL REDUNDANCY OF LARGE-SCALE COMPUTER SYSTEMS
Authors V.A. Pavsky, K.V. Pavsky
Section SECTION I. PRINCIPLES OF CONSTRUCTION, ARCHITECTURE AND HARDWARE BASE SUPERCOMPUTERS
Month, Year 12, 2014 @en
Index UDC 681.324, 519.21
DOI
Abstract The mathematical model for estimation of reliability of distributed computer systems (CS) functioning with reserve is constructed by using methods of queuing theory. The model is formalized with system of differential equations. Based on the statistics of failure for cluster CSs, it is preferable to assume that the time between failures is Weibull distributed with a shape parameter value 0.73 and 0.78. But the mathematical model with these parameters is laborious and doesn’t have analytical solution. But the analytical solution for shape parameter values of 1 (exponential distribution) is possible. The analytical solution allowing to calculate reliability indices is obtained. The functional dependency of the probability of computer system’s low performance on the reserve size is found. The estimations for this probability are offered. The calculation of mathematical expectation and dispersion of refusal machines numbers is proposed. Formulas derivation is based on methods allowing to get system of equations for moments without finding probabilities states. The formulas and they estimations are suitable to reverse engineering. The results of analytical modeling are confirmed by simulation modeling.

Download PDF

Keywords Distributed computer systems; reserve; mathematical models; reliability; indices estimations; Weibull distribution; analysis.
References 1. Available at: Top500 Supercomputer sites// http://www.top500.org (Accessed 10 November 2014).
2. Nikolic S. High Performance Computing Directions: The Drive to ExaScale Computing, Trudy Mezhdunarodnoy nauchnoy konferentsii “Parallel'nye vychislitel'nye tekhnologii (PaVT’2012) [Proceedings of International scientific conference “Parallel computing technologies
(Pushchino’2012)]. Novosibirsk, 2012, Available at: http://pavt.susu.ru/2012/talks/Nikolic.pdf (Accessed 10 November 2014).
3. Khoroshevskiy V.G. Arkhitektura vychislitel'nykh system [Architecture of computing systems]. Moscow: MGTU im. Baumana, 2008, 520 p.
4. Pavskiy V.A., Pavskiy K.V. Khoroshevskiy V.G. Matematicheskaya model' i raschet pokazateley funktsionirovaniya vychislitel'nykh sistem so strukturnoy izbytochnost'yu [Mathematical model and of indices calculation of computer systems functioning with structural redundancy], Izvestiya YuFU. Tekhnicheskie nauki [Izvestiya SFedU. Engineering Sciences], 2012, No. 5 (130), pp. 37-41.
5. Saati T.L. Elementy teorii massovogo obsluzhivaniya i ee prilozheniya [Elements of queueing theory and its applications]. 3rd ed. Moscow: Knizhnyy dom «LIBROKOM», 2010, 520 p.
6. Khoroshevskiy V.G., Pavskiy V.A., Pavskiy K.V. Raschet pokazateley zhivuchesti raspredelennykh vychislitel'nykh sistem [To calculate the survivability of distributed computing systems], Vestnik Tomskogo gosudarstvennogo universiteta. Upravlenie, vychislitel'naya
tekhnika i informatika [Vestnik of Tomsk state University. Control, computer engineering and Informatics], 2011, No. 2 (15), pp. 81-88.
7. Ovcharov L.A. Prikladnye zadachi teorii massovogo obsluzhivaniya [Applied problems in theory of mass service]. Moscow: Mashinostroenie, 1969, 324 p.
8. Schroeder В., Gibson Garth A. A large-scale study of failures in high-performance computing systems, Proceedings of the International Conference on Dependable Systems and Networks
(DSN2006), Philadelphia, PA, USA, June 25-28, 2006, 10 р.
9. Analyzing failure data. Available at: рttp://www.pdl.cmu.edu/FailureData/ (Accessed 10 November 2014).

Comments are closed.