Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ICECUBE DATA ANALYSIS PHYS-F-467 Winter 2011 Kael Hanson PYTHON WHY PYTHON Python is much easier to learn than C++ - see documentation at www.python.org Python is a more concise language you can do more with fewer keystrokes Python is better suited for interactive use vs CINT - especially with IPython There is an extensive “standard library” which comes with all Python distributions which does everything from bzip to XML/RPC Scientific / mathematical support with add-on packages Numpy and Scipy Extensible with C/C++ : this is how IceCube uses it. C++ for speed, Python to glue everything together at the higher level. PYTHON IN A FEW MINUTES Python is close to many other languages you have seen Non-typed : you don’t need to worry about variable declarations. Variables can be Complex Strings Arrays Objects Object oriented Integers Good package structure facilitates library creation Floats Watch the whitespace! INTEGERS >>> a = 15 >>> a/2 7 >>> b = 0x58900 >>> b 362752 >>> a * 60 + 4 904 Note truncation! Hexadecimal FLOATS AND COMPLEX >>> a = 15.0 >>> a / 2 7.5 >>> b = 2.44E-55 >>> b 2.4399999999999999e-55 >>> z = 5 - 4j >>> z (5-4j) >>> z.conjugate() (5+4j) >>> z * z.conjugate() (41+0j) >>> No truncation! Scientific notation It’s kind of neat that Python supports complex numbers out-of-the-box. Good if you are a signal processor. ARRAYS & LISTS An array is a possibly heterogenous collection of items. Could be strings, integers, floats, complex, objects, other arrays ... >>> a = [ 6, 26.0, [ 4, 7 ] ] >>> a[2] [4, 7] >>> a.append([]) >>> a[2:] [[4, 7], []] >>> len(a) 4 >>> b = [ 5, 90., -1 ] >>> max(b) 90.0 Note the index notation and slice notation. You can even slice backwards to invert the array: >>> >>> [0, >>> 5 >>> [3, >>> [4, >>> [5, a = range(6) a 1, 2, 3, 4, 5] a[-1] a[3:5] 4] a[4:1:-1] 3, 2] a[-1::-1] 4, 3, 2, 1, 0] DICTIONARIES You may know these better as maps - allows noninteger indices (but you could use integers, too) >>> d = { 'you' : 1, 'me' : 2 } >>> d['me'] 2 Nice testing for key existence >>> 'him' in d False >>> 'you' in d True Get keys, doublets of (key, val) as lists >>> d.keys() ['me', 'you'] >>> d.items() [('me', 2), ('you', 1)] SORTING Sorting interface in Python is tied to mutable lists: i.e. lists and not tuples or strings >>> u = [ 25, 4, 6, 12, 19, 177, 201 ] >>> u.sort() >>> u [4, 6, 12, 19, 25, 177, 201] >>> u.sort(cmp=lambda x, y: cmp(x % 5, y % 5)) >>> u [25, 6, 201, 12, 177, 4, 19] Note what happens when we try to do this to a string: >>> p = "Cette phrase est pas triée" >>> p.sort() Traceback (most recent call last): File "<stdin>", line 1, in <module> AttributeError: 'str' object has no attribute 'sort' FOLLOWING THIS TOPIC A BIT ... If you really need to it is alway possible to turn non-mutable into mutable and then sort this function - but look at what happens >>> p = list(p) >>> p ['C', 'e', 't', 't', 'e', ' ', 'p', 'h', 'r', 'a', 's', 'e', ' ', 'e', 's', 't', ' ', 'p', 'a', 's', ' ', 't', 'r', 'i', '\xc3', '\xa9', 'e'] >>> p.sort() >>> p [' ', ' ', ' ', ' ', 'C', 'a', 'a', 'e', 'e', 'e', 'e', 'e', 'h', 'i', 'p', 'p', 'r', 'r', 's', 's', 's', 't', 't', 't', 't', '\xa9', '\xc3'] >>> ''.join(p) ' Caaeeeeehipprrssstttt\xa9\xc3' Not what you expected, right? Why? LAST SLIDE ON THIS ... The problem is the encoding: Python is by default UTF-8. When the string is turned into a list the ‘é’ character is represented by two elements in the list which then get mixed in the sort. Solution: use Unicode to ensure that each character is represented by an element in the broken up list. You can do this by putting a ‘u’ before the string: >>> p = u"Cette phrase est pas triée" >>> list(p) [u'C', u'e', u't', u't', u'e', u' ', u'p', u'h', u'r', u'a', u's', u'e', u' ', u'e', u's', u't', u' ', u'p', u'a', u's', u' ', u't', u'r', u'i', u'\xe9', u'e'] >>> p = list(p) >>> p.sort() >>> print ''.join(p) Caaeeeeehipprrssstttté ITERATION Python offers very rich iteration over sequences. Look up generators for an even fancier way to represent very large, possibly infinite, sequences. A typical iteration idiom is: >>> >>> >>> ... ... >>> 5 p = "Cette phrase est pas triée" num_e = 0 for x in p: if x == 'e': num_e += 1 print num_e There are much more succinct ways to do this however, like: >>> len( [ x for x in p if x=='e' ] ) 5 The expression in square brackets is called a list comprehension. If you just want the equivalent of a for (int i = 0; i < 10000000; i++) in C++ then >>> for x in xrange(10000000): ... pass DEFINING FUNCTIONS Function declaration is pretty simple. Note the use of default arguments and multiple return values (variable passing is by value): >>> def myfun(x, y, z=10): ... print x, y, z ... return x+1, y+2, z+3 ... >>> myfun(4, 10, 20) 4 10 20 (5, 12, 23) Often you need a quick function as argument to another function but you don’t feel like writing the full declaration. Shorthand for this anonymous function is called lambda expression (borrowed from LISP) >>> map( lambda x, y, z: (x, y, z), (0, 1, 2), (2, 1, 0), (1, 0, -1) ) [(0, 2, 1), (1, 1, 0), (2, 0, -1)] CLASSES from collections import deque class RollingMultiplicityTrigger: ! def __init__(self, win=10000L, mult=3): ! ! self.hdl = deque() ! ! self.mult = mult ! ! self.win = win ! ! self.hits = list() ! ! self.toc = 0 ! ! def onHit(self, hit): ! rtl = tuple() ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! self.hdl.append(hit) ! ! ! ! ! return rtl if len(self.hdl) > self.mult: ! hout = self.hdl.popleft() ! if self.toc > 0: ! ! self.toc -= 1 ! ! self.hits.append(hout) ! ! if self.toc == 0: ! ! ! rtl = tuple(self.hits) ! ! ! self.hits = list() if (self.hdl[-1].utc - self.hdl[0].utc) <= self.win: self.toc = self.mult STANDARD LIBRARY Too much to cover – see http://docs.python.org/library/ index.html for complete on-line docs of library. To use functions from either the standard packages or from your own packages, make sure the package is locatable in your PYTHONPATH environment variable (not need for the system-installed and standard packages) and use the import statement: >>> >>> >>> 16 >>> (8, from struct import pack, unpack s = pack(">iiq", 8, 200, 5319082742) len(s) unpack(">iiq", s) 200, 5319082742) ICECUBE WHY “RAW” DATA? First of all, it’s not really raw data - IceCube internal, yes. The only other common file format would be ROOT and I don’t have any ROOT files handy. Most converters I3 → ROOT will suffer some loss of info. I find Python much more compact and expressive than C++, especially handy when you are typing interactively. Python is so cool that you can still use ROOT (or ROOT is so cool that you can wrap it in Python, take your pick). WHY ICECUBE DATA? When you compare I3 to CMS data you must appreciate its simplicity: 1 type of detector - DOM. It is a bit complicated due to waveform readout. Calibration transform to physical quantities is complex but taken care of already by Level 2. Event data is just header plus list of DOM records. IceTray data structure is conceptually just a dictionary with a bunch of arbitrary keys which are filled at run time. There is not a static event type. This is both good and bad: Good - very flexible : you can stuff anything you want into the events Bad - very flexible : common things like event # and time can live in any objects and only convention keeps the names from being arbitrary strings. It can get a little confusing what is where. ANALYSIS LEVELS Raw data : 2.2 kHz / 10 MB/s to disk. “Pole” filter level : calibrations / reconstructions applied on cluster at S. Pole. Need 1:10 filter:raw ratio in order to fit data over satellite Level 2 : more reconstructions applied in the North on bigger cluster. No filtering done : data still globally used by all analyses (OK, muon astronomy uses only DST which is split off at earlier stage) Level 3 : filtered about 1:10, specialized to analysis type. FILTERS Filter Name Rate [Hz] Description Cascade 25.7 Electromagnetic cascade event topology selection filter DeepCore 17.2 IceCube DeepCore EHE 1.4 Extreme high energy filter MinBias 3.8 Simple prescale on the raw data Galactic Center HE 11.2 High Energy Galactic Center Galactic Center LE 39.5 Low Energy Galactic Center IceTop 3 Station 7.9 IceTop 3 IceTop 3 / In-Ice SMT 3.4 IceTop 3 / Deep Ice Coincidence IceTop 8 Station 1.6 IceTop 8 IceTop 8 / In-Ice SMT 0.5 IceTop 8 / Deep Ice Coincidence Low Energy Contained 11.7 Low energy contained event - there is a veto on the outer strings Low Up 17.8 Low energy upward muon filter Muon Filter 33.8 Standard upward or HE downward muon filter Physics Min Bias 1.0 Basically a prescaled by 200 SMT 1 SLOP 0.9 Slow particle trigger Total Filter Rate: 160 Hz ROOT PY/ROOT ROOT can be used from Python! All that is needed is a special compile of ROOT to get the bindings correct and then. It’s a fairly straightforward task to guess how to use this new tool combination: In [4]: import ROOT In [5]: hQ = ROOT.TH1F("hQ", "Charge Histo for 13-30", 100, 0, 10) In [6]: hQ.Draw() <TCanvas::MakeDefCanvas>: created default TCanvas with name c1 In [7]: ICETRAY ABOUT ... IceCube data analysis framework is based on C++ wrapped in Python - you have two basic choices: Write C++ modules and chain them together with Python scripts. This is fastest and best for modules that will see a lot of events - at the lower-levels of analysis Use straight Python to access the object ENVIRONMENT SETUP Create a subdirectory called PHYS467 and do your work in there. You are going to need a few variables in your shell environment. Put the following into a shell script which you can source when you want to use the IceCube / IceTray environment: export ROOTSYS=$I3SOFT/DPORTS/i3tools/root-v5.24.00b export LD_LIBRARY_PATH=$ROOTSYS/lib PATH=$ROOTSYS/bin:$PATH LD_LIBRARY_PATH=$ROOTSYS/lib:$LD_LIBRARY_PATH Now source the file - yeah you need the dot. [khanson@lxpub2 ~]$ . i3env Finally source the setup file for the IceRec workspace: [khanson@lxpub2 ~]$ . $I3SOFT/IceRec/V03-03-02/build/env-shell.sh ************************************************************************ * * * W E L C O M E to I C E T R A Y * * * * Version icerec.releases.V03-03-02 r74791 * * * * You are welcome to visit our Web site * * http://icecube.umd.edu * * * ************************************************************************ Icetray environment has: I3_SRC = /x4500_mnt/pool5/ice3/i3soft.x86_64/IceRec/V03-03-02/src I3_BUILD = /x4500_mnt/pool5/ice3/i3soft.x86_64/IceRec/V03-03-02/build I3_PORTS = /x4500_mnt/pool5/ice3/i3soft.x86_64/DPORTS/i3tools Python = Python 2.4.3 [khanson@lxpub2 ~]$ There! Everything should be ready to go now. GETTING STARTED We are going to look at the data from early IC79 in June 2010 : /data/IC79/exp/filtered/level2/Level2_IC79_data_Run00115994_Part00000020.i3.gz Start IPython and load up the IceCube Python libraries: [khanson@lxpub2 phys467]$ ipython Python 2.4.3 (#1, Sep 10 2009, 18:34:35) Type "copyright", "credits" or "license" for more information. IPython 0.8.4 -- An enhanced Interactive Python. ? -> Introduction and overview of IPython's features. %quickref -> Quick reference. help -> Python's own help system. object? -> Details about 'object'. ?object also works, ?? prints more. In [1]: from icecube import icetray, dataio, dataclasses, jebclasses In [2]: OPEN THE DATAFILE Open the I3 file using the dataio.I3File class. Note this is now calling into C++ code but you don’t know and probably don’t really care. Grab a frame with the pop_physics ( ) method of the file. Then print the frame to see what you have. I don’t have space to print the whole frame here. In [2]: i3f = dataio.I3File("/data/IC79/exp/filtered/level2/ Level2_IC79_data_Run00115994_Part00000020.i3.gz") In [3]: frame = i3f.pop_physics() In [4]: print frame [ I3Frame (Physics): 'BadDomsList' [Physics] ==> I3Vector<OMKey> (2937) 'BadDomsListSLC' [Physics] ==> I3Vector<OMKey> (2937) 'DrivingTime' [Physics] ==> I3Time (38) 'FilterMask' [Physics] ==> I3Map<string, I3FilterResult> (749) 'FiniteRecoCuts' [Physics] ==> I3FiniteCuts (80) 'FiniteRecoFit' [Physics] ==> I3Particle (150) 'FiniteRecoLlh' [Physics] ==> I3StartStopParams (61) 'I3EventHeader' [Physics] ==> I3EventHeader (91) ACCESSING FILTER INFO In [5]: filters = frame['FilterMask'] In [6]: print filters { ! CascadeFilter_10 : 0 , 0 ! DeepCoreFilter_10 : 0 , 0 ! EHEFilter_10 : 0 , 0 ! FilterMinBias_10 : 1 , 0 ! GalacticCenterFilter_HE_10 : 0 , 0 ! GalacticCenterFilter_LE_10 : 1 , 1 ! I3DAQDecodeException : 0 , 0 ! ICOnlineL2Filter_10 : 0 , 0 ! IceTopMuonCalibration_10 : 0 , 0 ! IceTopSTA3_10 : 0 , 0 ! IceTopSTA3_InIceSMT_10 : 0 , 0 ! IceTopSTA8_10 : 0 , 0 ! IceTopSTA8_InIceSMT_10 : 0 , 0 ! InIceSMT_IceTopCoincidence_10 : 0 , 0 ! LID : 0 , 0 ! LowEnergyContainedFilter_10 : 0 , 0 ! LowUpFilter_10 : 0 , 0 ! MoonFilter_10 : 0 , 0 ! MuonFilter_10 : 0 , 0 ! PhysicsMinBiasTrigger_10 : 0 , 0 ! SlopFilter_10 : 0 , 0 ! SunFilter_10 : 0 , 0 } To access an object in the frame you just have to look it up using the string key. Caveat: not all objects have been wrapped in Python so you may get an exception trying to access some. Most objects will do something smart if you try to print them. Also try TAB which will invoke a completion for IPython basically giving you a hint on the member functions that the object supports. Filters is actually itself a dictionary of FilterResult objects each of which have a ConditionPassed and PrescalePassed member variable. The filter has fired if both are True. In [7]: for filtername, result in filters: ...: if result.ConditionPassed: ...: print filtername, result.PrescalePassed ...: ...: FilterMinBias_10 False GalacticCenterFilter_LE_10 True The filter which passed on this event was the Galactic Center Low Energy. Note that the FilterMinBias always passes - it has no condition - but the prescale is not usually set . Go ahead and pop a few more frames out of this file and inspect the filters. Write a function which, given an I3File and the name of a filter which will keep reading events from the file until it finds that filter passed with proper prescale and stops reading at that event, returning the frame or None if you reached the end-of-file. The typical idiom to test for EOF is In [39]: nevt = 0 In [40]: while i3f.more(): ....: frame = i3f.pop_physics() ....: nevt += 1 EVENT INFORMATION In [48]: evt = frame['I3EventHeader'] In [49]: evt. evt.CONFIG_IN_TRANSITION evt.EndTime evt.EventID evt.GetDataStream evt.OK evt.RunID evt.StartTime evt.State evt.StateType evt.SubRunID evt.UNKNOWN_STATE evt.__class__ evt.__copy__ evt.__deepcopy__ evt.__delattr__ evt.__dict__ evt.__doc__ evt.__getattribute__ evt.__hash__ evt.__init__ evt.__instance_size__ evt.__module__ evt.__new__ evt.__reduce__ evt.__reduce_ex__ evt.__repr__ evt.__setattr__ evt.__str__ evt.__weakref__ In [49]: print evt.RunID, evt.EventID, evt.StartTime, evt.EndTime 115994 12217749 2010-06-04 08:57:17.589,820,454,7 UTC 2010-06-04 08:57:17.589,841,512,6 UTC PULSE HITS Without getting into too much detail about the IceCube secret analysis methods - let it be sufficient to say the the waveforms from the DOMs are deconvolved using statistical methods into individual PMT pulses. The best algorithm currently places them in the ‘NFEMergedPulses’ frame element: In [72]: pulses = frame['NFEMergedPulses'] In [73]: pulses.keys() Out[73]: [OMKey(13,30), OMKey(19,39), OMKey(21,29), OMKey(27,42), OMKey(29,30), OMKey(29,31), OMKey(83,19), OMKey(83,22), OMKey(19,41), OMKey(27,43), OMKey(29,32), OMKey(84,23), OMKey(20,27), OMKey(27,44), OMKey(81,20), OMKey(84,38)] In [74]: rp = pulses[icetray.OMKey(84,23)][0] In [75]: print rp.Charge, rp.Time, rp.Width 0.622214024105 10767.6224455 98.2307650331 Charge is in p.e., Time is in ns since start of event, Width is 0.1 ns units. Exercise A: Pick a channel and histogram the Charge from [0, 5], Time from [5000, 15000], and the Width from [0, 1000] Exercise B: Histogram the total number of channels, hits, and charge in events in separate histograms. B.2 choose only EHE filtered events and re-histogram. TRACK RECONSTRUCTIONS In [9]: linefit = frame['LineFit'] In [10]: print linefit [ I3Particle MajorID : MinorID : Zenith : Azimuth : X : Y : Z : Time : Energy : Speed : Length : Type : Shape : Status : Location : ] 4258366462548825510 601524 0.312473 2.20683 108.013 51.0589 -180.022 11816.9 nan 0.325006 nan unknown InfiniteTrack OK Anywhere LineFit is a pretty simplistic fit - just tries to fit the track to a line - no scattering of photons in ice. It’s not too bad for long tracks but it’s principal use is as a guess for the more complex NLLS fits. Only real interesting thing in the ‘Params’ is the LFVel variable which is often used to discriminate good µ which tends to have LFVel close to the speed of light : 0.3 m / ns. In [14]: from icecube import linefit In [15]: lineFitParams = frame['LineFitParams'] In [17]: lineFitParams.LFVel Out[17]: 0.32500569196165863 Exercise: Histogram the linefit zenith angles (maybe cos(linefit.zenith) is better), and linefit velocities. Make a 2D scatterplot. Compare linefit zenith / azimuth against SPE fit quantities. MORE ON TRACKS You get energy too for some track reconstructions. In [55]: print frame['MPEFitMuE'] [ I3Particle MajorID : 16460434401484526989 MinorID : 40 Zenith : 0.270266 Azimuth : 3.45852 X : 64.0221 Y : -97.9014 Z : 34.7174 Time : 10565 Energy : 1518.35 Speed : 0.299792 Length : nan Type : unknown Shape : InfiniteTrack Status : OK Location : Anywhere ] Also track quality avail in the ‘FitParams’ blocks: In [46]: fitInfo = frame['SPEFit4FitParams'] In [47]: fitInfo. fitInfo.__class__ fitInfo.__delattr__ fitInfo.__dict__ fitInfo.__doc__ fitInfo.__getattribute__ fitInfo.__hash__ fitInfo.__init__ fitInfo.__instance_size__ fitInfo.__module__ fitInfo.__new__ fitInfo.__reduce__ fitInfo.__reduce_ex__ fitInfo.__repr__ fitInfo.__setattr__ fitInfo.__str__ fitInfo.__weakref__ fitInfo.logl fitInfo.ndof fitInfo.nmini fitInfo.rlogl SIMULATION At some point you will need to understand how your detector reacts to known stimuli - including signal (neutrino signals) and background (cosmic ray muons) IceCube simulation does not use GEANT framework but rather developed from other bases: Physics generation : background : Corsika (KASKADE) Physics generation : signal : NuGen (IceCube); GeNIE Ice optics Photonics tabulated photon densities PPC - direct photon tracking on GPUs) Detector simulation : IceSim (IceCube) SIMULATED EVENTS In [1]: from icecube import icetray, dataio, dataclasses, simclasses In [2]: i3f = dataio.I3File("/data/IC79/sim/corsika/level2/5741/ Level2_IC79_corsika.005741.000000.i3.gz") In [3]: frame = i3f.pop_physics() “Is it real or is it Memorex?” -- famous ad slogan of the 1980’s Ideally, you want your simulation to look as much like data as is possible, down to the way it is presented to the analysis tools. IceSim data comes as I3 files which have very similar structure to I3 files from real data - let’s look at one: In [4]: print frame [ I3Frame (Physics): 'BadDomsList' [Physics] ==> I3Vector<OMKey> (1673) 'BadDomsListSLC' [Physics] ==> I3Vector<OMKey> (1673) 'CorsikaWeightMap' [Physics] ==> I3Map<string, double> (376) 'DrivingTime' [Physics] ==> I3Time (38) 'FilterMask' [Physics] ==> I3Map<string, I3FilterResult> (732) ... 'I3EventHeader' [Physics] ==> I3EventHeader (91) 'I3MCTree' [Physics] ==> I3Tree<I3Particle> (6822) 'I3TriggerHierarchy' [Physics] ==> I3Tree<I3Trigger> (134) 'IceTopRawData' [Physics] ==> I3Map<OMKey, vector<I3DOMLaunch> > (46) 'InIceRawData' [Physics] ==> I3Map<OMKey, vector<I3DOMLaunch> > (7536) 'LineFit' [Physics] ==> I3Particle (150) 'LineFitParams' [Physics] ==> I3LineFitParams (71) 'LineFit_SLC' [Physics] ==> I3Particle (150) 'LineFit_SLCParams' [Physics] ==> I3LineFitParams (71) 'MCHitSeriesMap' [Physics] ==> I3Map<OMKey, vector<I3MCHit> > (7546) 'MMCTrackList' [Physics] ==> I3Vector<I3MMCTrack> (304) ... ] I3MCTREE In [5]: mctree = frame['I3MCTree'] In [6]: for p in mctree.GetPrimaries(): print p ...: [ I3Particle MajorID : 12790075154199624444 MinorID : 205 Zenith : 0.406889 Azimuth : 4.77803 X : 119.443 Y : -361.277 Z : 1949.99 Time : 3525.23 Energy : 4747.12 Speed : 0.299792 Length : -1 Type : PPlus Shape : Primary Status : NotSet Location : IceTop ] Notice that tracks and particles are both represented by I3Particle C++ class. This is a tree structure rooted in one or more primary particles. Each particle can have daughters, and so on, etc. def walkMCTree(tree, p): yield p for d in tree.GetDaughters(p): for x in walkMCTree(tree, d): yield x Is a recursive Python generator for stepping through the MC tree hierarchy. You can use it like this: In [43]: plist = list( walkMCTree(mctree, mctree.GetPrimaries()[0]) ) In [44]: len(plist) Out[44]: 53 Of course, like this, lineage is lost! SEARCHING FOR NEUTRINOS General procedure follows Develop cuts which reduce the background but keeps sufficient level of signal. Ideally you optimize something - maybe: Model rejection factor (MRF) : APP 19 (2003) 393 or http://arxiv.org/pdf/ astro-ph/0209350 Model discovery potential Along the way, try not to bias your result by optimizing cuts tailored exactly to the measured data set. Some sort of data blindness is usually a good idea: scramble the data; use a burn sample; just look in background region all possible techniques.