Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Flexible Aerodynamic Solver Technology in an HPC environment I. Mary, N. Alferez, J.M. Legouez Computational Fluid Dynamics Department Outline • Few words on Office National Etude recherche Aerospatiale • HPC for Direct numerical simulation of turbulence • Application: dynamic stall of rotor blade ONERA: a state research laboratory dedicated to aerospace Prospective: long term research Expert advisor to the government Innovative solutions for industry 3 3 A fleet of test facilities unrivaled in Europe • 150 experimental test rigs and dedicated metrology systems • Combustion, aeroelasticity, optics, instrumentation and sensing, space environment 4 4 Europe’s leading center of expertise in large wind tunnels • Global clientele • Half of the European fleet • Key resources for Airbus and Dassault • 50 years of working for industry • Speed envelope from Mach 0.1 to Mach 20 • Research/experimentation synergies and integration 5 5 ONERA: close to our partners 6 Ile-de-France Nord-Pas-de-Calais 1,275 employees 91 employees Total Dassault EADS Thales Dassault MBDA Meudon SNPE Safran Astech aerospace cluster Université Paris Saclay Midi-Pyrénées 453 employees Total Toulouse Airbus Thales Dassault Fauga-Mauzac EADS Astrium Thales Alenia Space Safran Aerospace Valley 6 Brussels EU EDA Lille A balanced business portfolio: • 1/3 civil • 1/3 defense • 1/3 dual-use Châtillon Palaiseau Modane-Avrieux Rhône-Alpes 162 employees Salon de Provence Provence-Alpes-Côte d’Azur 48 employees Airbus Helicopters - Dassault Pegase cluster Large wind tunnels 2 Legacy codes CEDRE and elsA for cfd used by Airbus & Safran elsA : Multi-purpose CFD simulation platform Internal and external aerodynamics From low subsonic to high supersonic Compressible 3-D Navier-Stokes equations Moving deformable bodies Aircraft, helicopters, turbomachinery, CROR, missiles, launchers… Design and implementation →Object-Oriented →Kernel in C++/Fortran →Millions of lines →User interface in Python →Python-CGNS interface for CGNS extraction and coupling with external software →CPU and parallel efficiency on a large panel of computer platforms Need of a new code to work on HPC implementation without the constraint of large legacy codes: FAST project Flexible Aerodynamic Solveur Technology: FAST Transfer component CGNS/Python, C Black box CASSIOPEE Python Co-processing CGNS/Python, C, Fortran Mesh generation and Adaptation component CGNS/Python, C elsA Nastran Cèdre HPC « interior points » fluid solvers on unstructured grids HPC « interior points » fluid solvers on structured grids • Python for: CGNS/Python, C, fortran CGNS/Python, C, fortran • CGNS Python standard for data representation in memory: FAST - users scripting - gluing between pre/post and solver modules - no needs of data copy between modules HPC required for Turbulence modelling or optimization Cutoff wave number of resolved space scales: DNS : kc > kKolmogorov, , , all scales resolved RANS : kc 0, all scales modelled LES : kenergy-containing < kc < kKolmogorov full airplane simulation: • Nowadays RANS affordable, but thousands of simulations required for shape otimization process • DNS affordable around 2050 if Moore’s law still OK High fidelity flow simulation Need of high fidelity data to describe finely the turbulence • Main applications: - prediction of sources for the aeroacoustics - comprehension of physical phenomena - data base for the development of turbulence models • Complementarity between experience and numerical simulation (LES or DNS) - zone where measurements are difficult - confining • Some recent collaborative examples with experimental teams: Vortex breakdown Tail shake interaction Slat noise sources Impinging hot jet High fidelity flow simulation Choice of the mathematical model for the fluid problem • Compressible Navier-Stokes equation - Newtonian fluid - Perfect gas - Turbulence modelling: DNS, LES, Hybrid RANS/LES or RANS/DNS • Keypoints for LES/DNS simulations by order of importance - Mesh resolution (drive 70% of the flow solution) - Low dissipative convective scheme - Physical duration of the simulation (often too short due to CPU limitation) - SGS model (weak influence if mesh OK, hazardous otherwise…) • Success of LES/DNS relies mainly on the capacity to solve a huge number of degrees of freedom - Need important supercomputer resources - efficient solver(s) Exemple of DNS resolution for transitional wall bounded flow Mesh convergence study: Mach = 0.25 ( U = 100m/s) – Reθ= 600 – 1400 – DNS box size: 6×0.5×3 cm3 – 200 millions cellules – 50 000 Δt – 5 ×1013 degrees of freedom – 48h with 1000 cœurs Nehalem – C. Laurent PhD (D. Arnal and A. Lerat) Q criterion coloured by streamwise velocity FastS solver • 3 layers Python/C/Fortran module - python for scripting and glue - c for memory management - fortran for loop computation • Multibloc structured solver for nonlinear Navier-Stokes equations (NS) - 5 variables in 3D problem: density, velocities, pressure - edp contains d/dx and d2/dx2 operators - stencils solver spreads over 21 neighbours cells - after optimisation, 20 arrays(Ndim) needed to solve NS over Ndim cells - float 64bit • 2nd order VF • hybrid centered/upwind scheme based on sensor flow regularity • Time integration: • explicit RK3 • 2nd order implicit method (Gear + Newton +LU-SGS) • Parallelism based on hybrid MPI/OpenMP method • Carefull memory design for cache access optim. (superscalar proc) HPC status • Distributed memory: - Weak scaling easy to reach over O(5000) MPI process Core 8 32 64 128 256 2165 4096 CPU per subiteration And cells: 0.90μs 0.90μs 0.91μs 0.93μs 0.95μs 0.98μs 1.06μs - Hybrid MPI/openmp for scalability over O(50000) cores • Shared memory (openmp): - Scaling more difficult (synchro, Numa) • Efficiency at the node level is the more difficult to obtain - DRAM access = bottleneck for CFD improve the use of L1-L3 Implementation of cache blocking technic Need to deeply rewrite the computational sheet - Efficient use of SIMD unit is crucial Openmp 4 directive (simd, align,…), no intrinsics Avoid splilling of vectorial registers Exemple of the OMP strategy for a Westmere bi-socket node (1) Zone Split « socket » (the work, not the memory) socket 2 socket 1 Automatic splitting of the work across 2 « sockets »: • manual synchro at « sockets » interface (explicit lock and flush) • improve memory placement (first touch policy) Exemple of the OMP strategy for a westmere bi-socket node (2) Socket 1 (thread 1 to 6) « thread » splitting th1 th2 th3 th3 th2 th1 th4 th5 th6 th6 th5 th4 th4 th5 th6 th6 th5 th4 th1 th2 th3 th3 th2 th1 th1 th2 th3 th3 th2 th1 th4 th5 th6 th6 th5 th4 th4 th5 th6 th6 th5 th4 th1 th2 th3 th3 th2 th1 Automatic splitting of the work accros the cores of the socket: • Manual lock and flush at thread interfaces • Adjustable size for block thN (cache blocking) • better L3 cache sharing due to stencil Taylor-Green Vortex (Re=1600): HPC efficiency (1) Explicit time integration (RK3) Cartesian grid 300*300*300 t=0 t=8 t=16 Taylor-Green vortex (Re=1600): HPC efficiency (2) Cache blocking and vecto effect Intel Ivybridge node (20 cores) cache blocking size Icache Jcache Kcache 300 300 300 300 300 6 300 300 5 300 40 5 300 20 5 300 2 3 300 1 4 300 1 5 CPU*coeur/cell/ssiter *10⁶ FastS -novec FastS -avx 2,94 1,49 2,72 1,36 0,58 0,32 0,52 0,31 0,47 0,26 0,42 0,23 0,41 0,20 0,40 0,21 • speedup FastS • • cache + vecto = 14 Vecto = 2 ( 4 en theory) • optim at core level more important than mpi/omp opt. • Improvement of vectorization under progress (IPCC Intel) Python overhead negligeable if the domain size is sufficient (overhead ≈ computation of 5000 cells) Taylor-Green vortex(Re=1600): HPC efficiency (3) potential improvement Arithmetic intensity close to 2 for cartesian solver, and 3 for curvilinear solver Still place for improvement for curvilinear solver Taylor-Green vortex (Re=1600): HPC efficiency (4) Westmere node (12 cores) and Haswell node (24 cores) NUMA access L3 saturation Compact thread affinity Exemple of computation by DNS Stall phenomenon on rotorblades due to laminar separation bubble High speed forward flight Low speed and high AoA on the retreating side Dynamic Stall V0 High speed forward flight : Large vibratory stresses Aeroelastic instabilities 21 angle of attack (AoA) on the rotor disc Introduction: flow physics description of an airfoil near stall McCroskey & Philippe (1974) • Laminar separation and turbulent reattachment (LSB) • Turbulent boundary layer with adverse pressure gradient • Trailing edge separation LSB study by Horton (1968) • Laminar separation • Inflexion point, shear-layer (KelvinHelmholtz convective instability) • Turbulent reattachment Introduction: flow physics description of dynamic stall for moving airfoil Doligalski T. et al. (1994) • Formation and spillage of the leading edge vortex (LEV) : LSB bursting ? • Dynamic stall : Lift overshoot is linked with the advection of the LEV • Re number < 10000 : strong interaction of the LEV with the wall Gaster (1966), Horton (1968), Owen & Klanfer (1953) • Experimental investigation of LSB on flat plate • Influence of Re number and pressure gradient on bubble size • Short and long bubbles, bubble bursting Introduction Stall prediction by RANS modelling Dynamic stall static stall Unaccurate (delay) prediction of the stall (static or dynamic) regardless of the choice of RANS and transition model Introduction : recent studies of LSB physics and modelling Stable LSB on a flat plate and airfoil configuration • Transition process: TS, KH instabilities and very fast transition to 3D flow (Watmuff JFM1999, Alam& Sandham JFM2000) • Absolute/convective instability (Yang & Voke JFM2000, Marxen et al. JFM2004, Jones et al. JFM2008 ) • Acoustic loop feedback between LSB and TE ( Jones et al. JFM2010) • • Upstream disturbance amplitude affects the size of LSB (Alam& Sandham JFM2000) LSB modelling for RANS (Spalart & Strelets JFM2000, Laurent et al., Comput. Fluids2011, Richez et al., TCFD2008) LSB bursting on a flat plate and airfoil configuration • Flate plate: Switch between short and long LSB thanks to dynamic variation of the perturbation amplitude ( Marxen & Henningson JFM2011) • LSB bursting on airfoil (present study) High fidelity LES of LSB bursting on airfoil leading to stall Objective: study the leading edge stall mechanism as a dynamical process thanks to small incidence variation to improve understanding of this complex transient flow Cy Attached TBL on suction side Detached TBL on suction side αs Max Cy ; Stable attached TBL αs+ ε α Stall state Tools • High fidelity LES of moving airfoil from αs to αs+ ε • Realize an ensemble averaging of the transient process by repeating the numerical experiment with different initial condition at αs Flow configuration • Naca-0012 at Re=105: LSB and TBL on suction side for affordable CPU cost o o o • Critical incidence and variation determined by LES: αs= 10.55 and ε= 3 or 0.25 LES of LSB bursting on airfoil: motion details Smooth ramp up motion from 10.55o to 10.8o or 13.55o - Begin of the motion at To 2U - Motion duration = base case0 2.2c ; fast case 100 0 case ;slow 2 Slow motion to reduce the flow perturbation (for ):0 - Leading edge velocity 3 times smaller than in usual dynamic stall study - No angular velocity and angular acceleration at the beginning and end of the motion Limited effect of the motion on pressure distribution (for 0 ) : - Little effect on pressure gradient along the suction side - No change in bubble size LES of LSB bursting on airfoil: Computational details • Number of grid points: 160 million -4 • ∆t = 0.15μs = 2.75 10 c/ U • Number of Westmere core:480 (GENCI) • Accuracy checked by convergence study • M = 0.16 • 1 c/U ( 3600 time steps) computed in 1 hours • 1 chord spanwise extend (Flow periodicity) • Deterministic disturbance input 5 U 5 10 U sin( t ) pert Nearly DNS resolution - Resolution at the suction wall before the motion (most constraining configuration) + Direction Points ∆l at reattachement Streamwise 634 6 Wall normal 351 0.8 spanwize 900 7 - ∆l < 10η(Kolmogorov scale) for the stalled flow configuration in all directions above the suction side Steady states around αc : High Reynolds number physics Q criteria attached flow, α = 10.55 °, Q = 500 Transition process: (Jones et al. JFM2008) Large spanwise 2D vortices 3D structures - short LSB: 0.15c - turbulent boundary layer over 80% of the chord Time evolution of the bubble bursting https://www.youtube.com/watch?v=2ZMKWB3tQ V8 Base case motion at ω0 Effects of stall on lift and drag • Initial turbulent state before the motion do not affect the fast transient process for this motion parameters • Two different regimes after the end of the motion Effect on lift T*-T0* Beginning (at T*=T0*) and end of the motion Effect on drag T*-T0* Unsteady analysis : spanwise and short time average Time evolution of spanwise and short time (6% c/U) average data Stall development : Time evolution of the displacement thickness Bubble bursting:Ensemble average case 0 Isoline= vanishing skin friction coefficient 1 c 1 c 1 c The shear layer moves closer to the wall to a constant distance (T*-T0*= 10 to 16) The shear layer goes away from the wall (T*-T0*= 5 to 10) Constant value (T*-T0*= 0 to 5) LSB 1 c ≈ distance between the shear layer and the wall in the LSB Stall development : Upstream motion of the point of transition Bubble bursting: Ensemble average case 0 Max of Turbulent Kinetic Energy in the boundary layer Estimated location of transition Effect of the motion : cases ω= 10ω0 and ω= ω0 /2 • No significant change of the point of transition position between the different motion law • Same LSB growth rate Same initial flow condition (T0* = 14.7) ω= 10ω0 ω= ω0 /2 Bursting criterion for RANS • Diwan et al. (JFM 2006) criterion : P < -28 with ΔU = variation of external velocity along ΔX A Diwan criterion for the three different motions Conclusions: HPC optimisation • • • • A deep modifification of the source code is required - Memory acces - Algorithm - Toward a general coding for futures hardware adaptation Efficiency weackly affected by MPI transfer in CFD ( Nproc <= 4000) hybride MPI/Openmp - no large gain % full MPI ( Nproc <= 4000) - debug more difficult: “race condition” still hard to track Difficult to optimize implicit Algorithm (cache blocking impossible) - Work in progress to allow time consistent simulation with local timestep • 1 Pflops sur 30000 coeurs Skylake (estimation) Conclusions: LES/DNS simulations • One measurement in windtunnel = 1 month of petaflop computation: - Re=1 000 000 - Model size ≈ 1 meter - U < 100m/s - CPU cost % windtunnel? • HPC very usefull in turbulence: - Database for turbulence modelling (DNS) - Understand complex flow phenomena:: * DNS or LES for Re affordable * (U)RANS or hybride RANS/LES for industrial applications