Authors V.V. Getmanskiy, E.O. Movchan, A.E. Andreev
Month, Year 11, 2016 @en
Index UDC 004.272.32, 519.683.4, 519.683.8
DOI 10.18522/2311-3103-2016-11-2739
Abstract One way to speedup a multibody system dynamics simulation by code optimization for CPU architecture with vector registers is considered. The problem is up to date because of necessity to perform a lot of computations in a short amount of time. Mathematical formulation of problem and computational algorithm are described. The efficient code development for dynamic stress-strain solver for bodies in complex mechanism is proposed. The code runs on processors supporting SIMD operations of SSE, AVX, FMA and KNC (Xeon Phi Knights Corner) instruction set exten-sions. The discrete elements method is used as one of the implementations of multibody system dynamics simulation. The most computational expensive parts of the code are right-hand side calculation using 3-dimensional matrix-vector transformations, Euler angles and rotation matrices calculation, numerical integration using Runge-Kutta 4-th order method. Computational algorithm has a limited scalability in case of using parallel computing because of strong data dependency between parallel code branches. Therefore the optimization of code is an important study for achieving speedup of computations. Special data format for storing matrices and vectors in memory and efficient vectorization of matrix-vector operations is considered. Block multiplication of matrices and vectors with greater dimension than the vector register length is developed. In case of dimension of matrices and vectors lower than vector register length (single precision floating point data for AVX and double precision for KNC) special microalgorithms for packing several matrix rows and vectors with elements permutation are developed. The microalgorithms are implemented using intrinsic functions for each vector instructions set. Speedup of up to 3 times is achieved using vectorization. The computation time of intrinsic-implemented algorithm is compared with compiler auto-vectorization feature. The microalgorithms are implemented using intrinsic functions for each vector instructions set. Speedup of up to 3 times is achieved using vectorization. The computation time of intrinsic-implemented algorithm is compared with compiler auto-vectorization feature.

Download PDF

Keywords Auto-vectorization; code optimization; intrinsic; vector registers; multibody dynamics; SIMD.
References 1. Getmanskiy V.V., Gorobtsov A.S., Izmaylov T.D. Rasparallelivanie rascheta napryazhenno-deformirovannogo sostoyaniya tela v mnogotel'noy modeli metodom dekompozitsii raschetnoy oblasti [Parallelization of the calculation of the stress-strain state of bodies in a multibody model decomposition method the computational region], Izvestiya VolgGTU. Seriya "Aktual'nye problemy upravleniya, vychislitel'noy tekhniki i informatiki v tekhnicheskikh sistemakh" [Izvestia Volgograd State Technical University. Series Actual Problems of Management, Computing Hardware and Informatics in Engineering Systems”], 2013, Issue 16, No. 8 (111), pp. 5-10.
2. Getmanskiy V.V., Gorobtsov A.S., Sergeev E.S., Ismailov T.D., Shapovalov O.V. Concurrent simulation of multibody systems coupled with stress-strain and heat transfer solvers, Journal of Computational Science, 2012, No. 3 (6), pp. 492-497.
3. Sergeev E.S., Getmanskiy V.V., Gorobtsov A.S. Perenos sistemy mnogotel'noy dinamiki na vychislitel'nyy klaster [The transfer of multibody system dynamics on the compute cluster], Nauchno-tekhnicheskie vedomosti Sankt-Peterburgskogo gos. politekhn. un-ta [Scientific-technical Bulletin of Saint-Petersburg, gosudarstvenno Polytechnic University], 2010, Issue 101, pp. 93-99.
4. Gorobtsov A.S., Getmanskiy V.V., Andreev A.E., Doan D.T. Simulation and Visualization Software for Vehicle Dynamics Analysis Using Multibody System Approach, Creativity in In-telligent Technologies and Data Science. CIT&DS 2015: Proceedings: ed. by A. Kravets et. al., Springer International Publishing, Switzerland, 2015, pp. 379-391.
5. Andreev A., Nasonov A., Novokschenov A., Bochkarev A., Kharkov E., Zharikov D., Kharchenko S., Yuschenko A. Vectorization algorithms of block linear algebra operations using SIMD instructions, Communications in Computer and Information Science, 2014, Vol. 535, pp. 323-340.
6. Mulansky M. Optimizing Large-Scale ODE Simulations, SIAM Journal of Scientific Computing, 2014, 18 p.
7. Bialas P., Kowal J., Strzelecky A. GPU-accelerated and CPU SIMD optimized Monte Carlo simulation, Computing and Informatics, 2014, Vol. 33, pp. 1191-1208.
8. Kral S. Franchetti F., Lorenz J. and Ueberhuber C.W. SIMD Vectorization of Straight Line FFT Code, pp. 251-260.
9. Jeong H., Kim S., Lee W. and Myung S.-H. Performance of SSE and AVX Instruction Sets, // The 30th International Symposium on Lattice Field Theory (June 24 – 29, 2012 Cairns, Aus-tralia): Proceedings, 2012, pp. 249-258.
10. Zaranek S.V., Chou B., Sharma G., Zarrinkub Kh. Uskorenie algoritmov i prilozheniy MATLAB [Acceleration of algorithms and MATLAB applications]. Available at: (accessed 22 September2016).
11. Lemeshevskiy S.V. Chislennye metody resheniya uravneniy v chastnykh proizvodnykh [Nu-merical methods for solving partial differential equations]. Available at: (accessed 12 August 2016).
12. Alekseev V.A., Golovashkin D.L. Vektorizatsiya metoda rasprostranyayushchegosya puchka i ego realizatsiya po tekhnologii CUDA [Vectorization method the beam propagation and its implementation on CUDA technology], Komp'yuternaya optika [Computer optics], 2010,
Vol. 34, No. 2, pp. 225-230.
13. Ivanov K.A. [i dr.]. Prikladnaya teoriya plastichnosti [Applied theory of plasticity]. Moscow: Politekhnika, 2009, 376 p.
14. Gorelov Yu.N. Chislennye metody resheniya obyknovennykh differentsial'nykh uravneniy (metod Runge – Kutta): ucheb. posobie [Numerical methods for solving ordinary differential equations (Runge – Kutta): textbook]. Samara: Izd-vo «Samarskiy universitet», 2006, 48 p.
15. Yaglom I.M. Geometricheskie preobrazovaniya. Ch. 1. Dvizheniya i preobrazovaniya podobiya [The geometric transformation. Part 1. Motions and similarity transformations]. Moscow: Gosudarstvennoe izdatel'stvo tekhniko-teoreticheskoy literatury, 1956, 280 p.
16. Kuznetsov E.B. Ob odnom podkhode k integrirovaniyu kinematicheskikh uravneniy Eylera [About one approach to integrating the kinematic Euler equations], Vychislitel'naya matematika i matematicheskaya fizika [Computational mathematics and mathematical physics], 1998, Vol. 38, No. 11, pp. 1806-1813.
17. Konishchev D. Chto takoe vyravnivanie i kak ono vliyaet na rabotu vashikh program [What is alignment and how it affects your programs]. Available at: search? (accessed 12 August 2016).
18. Ermolitskiy A.E. Metody povysheniya effektivnosti vektorizatsii v optimiziruyushchem kompilyatore [Methods of increasing the efficiency of vectorization in an optimizing compiler], Voprosy radioelektroniki. Ser. EVT [Questions of radio electronics. Series of EVT], 2010, Issue 3, pp. 41-50.
19. Muller J.-M., Brisebarre N. The Fused Multiply-Add Instructions, Handbook of Floating-point Arithmetic, 2009, pp. 151-179.
20. Zumbusch G. Vectorized Higher Order Finite Difference Kernels, State-of-the-Art in Scientific and Parallel Computing (PARA), 2012, pp. 343-357.

Comments are closed.