Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Staggered mesh methods for MHDand charged particle simulations of astrophysical turbulence Åke Nordlund Niels Bohr Institute for Astronomy, Physics, and Geophysics University of Copenhagen Context examples Star Formation Planet Formation Gravitational fragmentation (or not!) Stars The IMF is a result of statistics of MHD-turbulence Turbulent convection determines structure BCs Stellar coronae & chromospheres Heated by magnetic dissipation Charged particle contexts Solar Flares To what extent is MHD OK? Particle acceleration mechanisms? Reconnection & dissipation? Gamma-Ray Bursts Relativistic collisionless shocks? Weibel-instability creates B? Synchrotron radiation or gitter radiation? Overview MHD methods Radiative transfer Godunov-like vs. direct Staggered mesh vs. centered method Fast & cheap methods Charged particle dynamics Methods & examples Solving the (M)HD Partial Differential Equations (PDEs) Godunov-type methods Solve the local Riemann problem (approx.) OK in ideal gas hydro MHD: 7 waves, 648 combos (cf. Schnack’s talk) Constrained Transport (CT) Gets increasingly messy when adding gravity ... non-ideal equation of state (ionization) ... radiation ... Direct methods Evaluate right hand sides (RHS) High order spatial derivatives & interpolations Spectral Compact Local stencils e.g. 6th order derivatives, 5th order interpolations Step solution forward in time Runge-Kutta type methods (e.g. 3rd order): Adams-Bashforth Hyman’s method RK3-2N Saves memory – uses only F and dF/dt (hence 2N) Which variables? Conservative! Mass Momentum Internal energy not total energy consider cases where magnetic or kinetic energy dominates total energy is well conserved e.g. Mach 5 supersonic 3D-turbulence test (Wengen) less than 0.5% change in total energy Dissipation Working with internal energy also means that all dissipation (kinetic to thermal, magnetic to thermal) must be explicit Shock- and current sheet-capturing schemes Negative part of divergence captures shocks Ditto for cross-field velocity captures current sheets Advantages Much simpler HD ~ 700 flops / point (6th/5th order in space) ENZO ~ 10,000 flops / point FLASH ~ 20,000 flops / point MHD ~ 1100 flops / point Trivial to extend Non-ideal equation-of-state Radiative energy transfer Relativistic Direct method: Disadvantages? Smaller Courant numbers allowed 3 sub-step limit ~ 0.6 (runs at 0.5) 2 sub-step limit ~ 0.4 (runs at 0.333) PPM typically runs at 0.8 factor 1.6 further per full step (unless directionally split) Comparison of hydro flops ~2,000 (direct, 3 sub-steps) ~10,000 (ENZO/PPM, FLASH/PPM) Need to also compare flops per second Cache use? Perhaps much more diffusive? 2D implosion test indicates not so square area with central, rotated low pressure square generates thin ’jet’ with vortex pairs moves very slowly, in ~ pressure equilibrium essentially a wrinkled 2D contact discontinuity see Jim Stone’s test pages, with references 2D Implosion Test Imagine: non-ideal EOS + shocks + radiation + conduction along B Ionization: large to small across a shock Radiation: thick to thin across a shock Heat conduction only along B ... Rieman solver? Any volunteers? Operator and/or direction split? With anisotropic resistivity & heat conduction?! Non-ideal EOS + radiation + MHD: Validation? Godunov-type methods No exact solutions to check against Difficult to validate Direct methods Need only check conservation laws mass & momentum, no direct change energy conservation; easy to verify Valid equations + stable methods valid results Staggered Mesh Code (Nordlund et al) Cell centered mass and thermal energy densities Face-centered momenta and magnetic fields Edge-centered electric fields and electric currents Advantages: •simplicity; OpenMP (MPI btw boxes) •consistency (e.g., divB=0) •conservative, handles extreme Mach Code Philosophy Simplicity F90/95 for ease of development Simplicity minimizes operator count Conservative (per volume variables) Accuracy Can nevertheless handle SNe in the ISM 6th/5th order in space, 3rd order in time Speed About 650,000 zone-updates/sec on laptop Code Development Stages 1. Simplest possible code Dynamic allocation F95 array valued function calls 2. P4 speed is the SAME as with subroutine calls SMP/OMP version Open MP directives added 3. No need to recompile for different resolutions Uses auto-parallelization and/or OMP on SUN, SGI & IBM MPI version for clusters Implemented with CACTUS (see www.cactuscode.org) Scales to arbitrary number of CPUs CACTUS Provides “flesh” (application interface) Handles cluster-communication Handles GRID computing Presently experimental Handles grid refinement and adaptive meshes E.g. MPI (but not limited to MPI) AMR not yet available “thorns” (applications and services) Parallel I/O Parameter control (live!) Diagnostic output X-Y plots JPEG slices Isosurfaces MHD J B E J Qm J E F J B E E u B B / t E mhd.f90 Example Code !---------------------------------------------------Induction Equation ! Magnetic field's time derivative, dBdt = - curl(E) !--------------------------------------------------- stagger-code/src-simple call ddzup_set(Ey, scr1) ; call ddyup_set(Ez, scr2) SUBROUTINE mhd(eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt) !$omp parallel do private(iz) Makefile (with includes for OS- and host-dep) !---------------------------------------------------iz=1,mz USE params ! do Magnetic field's time derivative, dBdt = - curl(E) Subdirectories optional code: dBxdt(:,:,iz) = with dBxdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) USE stagger !---------------------------------------------------end do INITIAL (initial values) dBxdt = dBxdt + ddzup(Ey) - ddyup(Ez) call ddxup_set(Ez, scr1) real, dimension(mx,my,mz) ::;&call ddzup_set(Ex, scr2) dBydt = dBydt ddxup(Ez) - ddzup(Ex) !$omp do + private(iz) parallel BOUNDARIES eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt do iz=1,mz = dBzdt + ddyup(Ex) !hpf$dBzdt distribute (*,*,block) :: & - ddxup(Ey) EOS (equation of state) dBydt(:,:,iz) = dBydt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) SUBROUTINE mhd(CCTK_ARGUMENTS) !hpf$ eta,Ux,Uy,Uz,Bx,By,Bz,dpxdt,dpydt,dpzdt,dedt,dBxdt,dBydt,dBzdt endallocatable, USE::hd_params real, & doFORCING dimension(:,:,:) call ddyup_set(Ex, scr1) ; call ddxup_set(Ey, scr2) USE stagger_params Jx,Jy,Jz,Ex,Ey,Ez, & parallel EXPLOSIONS !$omp do private(iz) USE stagger Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2 iz=1,mz (*,*,block) :: & !hpf$do distribute COOLING dBzdt(:,:,iz) = dBzdt(:,:,iz) + scr1(:,:,iz) - scr2(:,:,iz) IMPLICIT NONE !hpf$ Jx,Jy,Jz,Ex,Ey,Ez, & DECLARE_CCTK_ARGUMENTS doEXPERIMENTS !hpf$end Bx_y,Bx_z,By_x,By_z,Bz_x,Bz_y,scr1,scr2 DECLARE_CCTK_PARAMETERS DECLARE_CCTK_FUNCTIONS stagger-code/src (SMP production) Ditto Makefile and subdirs CACTUS_Stagger_Code CCTK_REAL, allocatable, dimension(:,:,:) :: & Jx, Jy, Jz, Ex, Ey, Ez, & Bx_y, Bx_z, By_x, By_z, Bz_x, Bz_y Code becomes a ”thorn” in the CACTUS ”flesh” Physics (staggered mesh code) Equation of state Opacity Qualitative: H+He+Me Accurate: Lookup table Qualitative: H-minus Accurate: Lookup table Radiative energy transfer Qualitative: Vertical + a few (4) Accurate: Comprehensive set of rays Staggered Mesh Code Details Dynamic memory allocation Parallelized Any grid size; no recompilation Shared memory: OpenMP (and auto-) parallelization MPI: Direct (Galsgaard) or via CACTUS Organization – Makefile includes Experiments Selectable features EXPERIMENTS/$(EXPERIMENT).mkf Eq. of state Cooling & conduction Boundaries OS and compiler dependencies hidden OS/$(MACHTYPE).f90 OS/$(HOST).mkf OS/$(COMPILER).mkf Radiative Transfer Requirements Comprehensive Need at least 20-25 (double) rays 4-5 frequency bins (recent paper) At least 5 directions Speed issue Would like 25 rays to add negligible time BenchmarkTiming Results microseconds/point/substep Pentium, 4 2 GHz Alpha EV7 1.3 GHz 128x105x128 dcsc.sdu.dk accum hyades accum mass+momentum fixed mesh 1,80 1,80 1,57 1,57 variable mesh mhd fixed mesh 1,01 2,81 0,93 2,50 variable mesh energy fixed mesh 0,42 3,23 0,37 2,87 variable mesh eqation of state ideal 3,23 2,87 H+He subroutine 0,98 H+He table lookup table 0,11 3,33 2,87 opacity H-minus 0,20 lookup table 0,09 3,42 2,87 radiative transfer rays rays Feautrier 0,026 132 0,046 63 Splines Hermite 0,027 129 0,047 61 Integral 0,045 76 0,080 36 Altix Itanium-2 Scaling Applications Star Formation Planet Formation Stars Stellar coronae & chromospheres Star Formation Nordlund & Padoan 2002 Key feature: intermittency! What does it mean in this context? How does it influence star formation? Collapsing features are relatively well defined! Low density, high velocity gas fills most of the volume! High density, low velocity features occupy very little space, but carry much of the mass! Inertial dynamics in most of the volume! It greatly simplifies understanding it! Turbulence Diagnostics of Molecular Clouds Padoan, Boldyrev, Langer & Nordlund, ApJ 2002 (astro-ph/0207568) Numerical (2503 sim) & Analytical IMF Padoan & Nordlund (astro-ph/0205019) Low Mass IMF Padoan & Nordlund, ApJ 2004 (astro-ph/0205019) Planet formation; gas collapse Coronal Heating Initial Magnetic Field Potential extrapolation of AR 9114 Coronal Heating: TRACE 195 Loops Current sheet hierarchy Current sheet hierarchy: close-up Scan through hierarchy: dissipation Note that all features rotate as we scan through – this means that these currents sheets are all curved in the 3rd dimension. Hm, the dissipation looks pretty intermittent– large nice empty areas to ignore with an AMR code, right? Electric current J This is still the dissipation. Lets replace it by the electric current, as a check! Hm, not quite as empty, but the electric current is at least mostly weak, right? J log(J) So, let’s replace the current with the log of current, to see the levels of the hierarchy better! Log of the electric current Not really much to win with AMR here, if we want to cover the hierarchy! Solar & stellar surface MHD Faculae Sunspots Chromospheres Coronae Faculae: Center-toLimb Variation Radiative transfer ’Exact’ radiative energy transfer is not expensive allows up to ~100 rays per point for 2 x CPU-time parallelizes well (with MPI or OpenMP) Reasons for not using Flux Limited Diffusion Not the right answer (e.g. missing shadows) Is not cheaper Radiative Transfer: Significance Cosmology End of Dark Ages Star Formation Feedback: evaporation of molecular clouds Dense phases of the collapse Planet Formation External illumination of discs Structure and cooling of discs Stellar surfaces Surface cooling: the driver of convection Radiative transfer methods Fast local solvers Feautrier schemes; the fastest (often) Optimized integral solutions; the simplest A new approach to parallellizing RT Solve within each domain, with no bdry radiation Propagate and accumulate solutions globally Moments of the radiation field Phew, 7 variables!?! Give up, adopting some approximation? Flux Limited Diffusion Did someone say ”shadows”?? Or, solve as it stands? Fast solvers Parallelize Did someone say ”difficult”? Rays Through Each Grid Point Interpolate source function to rays in each plane How many rays are needed? Depends entirely on the geometry For stellar surfaces, surprisingly few! 1 vertical + 4 slanted, rotating 1% accuracy in the mean Q a few % in fluctuating Q 8 rays / 48 rays see plots 8 rays / 48 rays Radiative transfer steps Interpolate source function(s) and opacity Solve along rays May be done in parallel (distribute rays) Interpolate back to rectangular mesh Simple translation of planes – fast Inverse of 1st interpolation (negative shift) Add up Integrate over angles (and possibly frequencies or bins) Along straight rays, solve dI I S d Or actually, solve directly for the cooling (I-S)! dI I S d q I S dq dS q d d Source Function (input) New Source Function (input) Formal (and useful) solutions For simplicity, let’s consider the standard formulation dI I S d Has the formal solution: I ( ) I 0 e | 0 | S ( ' ) e | | d ' Doubly useful As a direct method Very accurate, if S() is piecewise parabolic The slowness of exp() can be largely avoided As a basis for domain decomposition Add ’remote’ contributions separately! Direct solution, integral form How to parallelize (Heinemann, Dobler, Nordlund & Brandenburg – in prep.) Solve for the intensity generated internally in each domain, separately and in parallel Then propagate and accumulated the boundary intensities, modified only by trivial optical depth factors Putting it together The Transfer Equation & Parallelization Analytic Solution: Processors The Transfer Equation & Parallelization Analytic Solution: Processors Intrinsic Calculation Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Communication Ray direction The Transfer Equation & Parallelization Analytic Solution: Processors Intrinsic Calculation Ray direction Pencil Code (Brandenburg et al) CPU-time per ray-point Ignore! (bad node distribution) about 160 nsec / pt / ray Can be improved w factor 4-5! CPU-time per point (Pencil Code) Timing Results, Stagger Code microseconds/point/substep Alpha EV7 1.3 GHz Pentium 4, 2 GHz accum hyades accum dcsc.sdu.dk 128x105x128 mass+momentum 1.57 1.57 1.80 1.80 fixed mesh variable mesh mhd 2.50 0.93 2.81 1.01 fixed mesh variable mesh energy 2.87 0.37 3.23 0.42 fixed mesh variable mesh eqation of state 2.87 3.33 0.11 lookup table opacity 2.87 3.42 0.09 lookup table rays rays radiative transfer 63 132 0.046 0.026 Feautrier 61 129 0.047 0.027 Hermite 36 76 0.080 0.045 Integral Radiative Transfer Conclusions The methods are conceptually simple fast robust scale well in parallel environments Collisionless shocks Not an artists rendering! Shows electrical current filaments in a collisionless shock simulation with ~ 109 particles and ~ 3 109 mesh zones Particle-in-Cell (PIC) code Based on original 2-D, non-relativistic code by Michael Hesse, GSF 3-D, relativistic version developed by Frederiksen, Haugbølle, Hededal & Nordlund, Copenhagen Steps Relativistic particle move, using B & E Uses - relativistic momenta About 3 105 particle updates / sec on P4 laptop Parallelizes nearly linearly (OpenMP on Altix) Gather fields; ni, ne , ji , je 2nd order; Triangular Shaped Clouds (TSC) Push B & E – staggered in space and time Electrostatic solver Use of Maxwell’s Equations in the code Fields on mesh 1 E B 2 0 J c t B E 0 t B 0 E 0 Basic tests: wave propagation, etc Sampled particles Example: Single electron Electron & proton circling in separate orbits Relativistic; =10 NOTE: resolution implications of high ! Far field: Synchrotron radiation The Weibel Instability Well known and understood First principles; anisotropic PDFs Numerical studies, electron-positron, 2-D Wallace & Epperlein 1991, Yang et al 1994 Kazimura et al 1998 (ApJ) Numerical studies, relativistic, ion-electron Weibel 1959, Fried 1959, Yoon & Davidson 1987 Califano et al 1997, ‘98, ‘99, ‘00, ‘01, ’02, .. Application to GRBs Medvedev & Loeb 1999, Medvedev 2000, ’01, … The Weibel Instability (two-stream) (Weibel 1959, Medvedev & Loeb 1999) Experiments 3-D Cold beam from the left Of the order 200x200x800 mesh, 109 part. Carries negligible magnetic field Hits denser plasma, initially field free Weibel instability B, E So, what is this? A Weibel-like instability at high Initial scales ~ skin depth Conventional expectation: restricted to skin depth Generated fields propagate at v~c Fluctuations ‘ride’ on the beam Losses supported by beam population Scales grow down the line!! Coherent Structures in Collisionless Shocks Electron and ion current channels Along Across Ion and electron structures A non-Fermi acceleration scenario Electrons are accelerated instantaneously inside the Debye cylinder surrounding the ion current channels. Hededal, Haugbølle, Frederiksen and Nordlund (2004) astro-ph/0408558 Electron path near ion channel CH note: 10%-40% optical dark (HETE, BeppoSax). 50% detected in radio. Hededal, Haugbølle, Frederiksen and Nordlund (2004) astro-ph/0408558 Perspectives for the future Star Formation Planet Formation Is turbulent fragmentation the main mechnism? How important are magnetic fields are important for the IMF? Include radiative transfer during collapse! Magnetic fields are also important during collapse! RT important for initial conditions ... ... as well as for disc structure and cooling Stellar surfaces Include approx. RT in simulations of chromosphere 30 Mm Solar Plans Corona 20 Mm Chromosphere Faculae Sunspots Convection: from granulation to supergranulation scales 50 Mm