Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
ISAPP Summer Institute 2009 Neural networks in data analysis Michal Kreps ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 1/38 Outline What are the neural networks 1 2 Basic principles of NN 3 For what are they useful 4 1 Training of NN 5 Available packages 6 Comparison to other methods 7 8 Some examples of NN usage input layer hidden layer output layer I’m not expert on neural networks, only advanced user who understands principles. Goal: Help you to understand principles so that: → neural networks don’t look as black box to you → you have starting point to use neural networks by yourself ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 2/38 Tasks to solve Different tasks in nature Some easy to solve algorithmically, some difficult or impossible Reading hand written text Lot of personal specifics Recognition often needs to work only with partial information Practically impossible in algorithmic way But our brains can solve this tasks extremely fast Use brain as inspiration ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 3/38 Tasks to solve Why (astro-)particle physicist would care about recognition of hand written text? Because it is classification task on not very regular data Do we see now letter A or H? → Classification tasks are very often in our analysis Is this event signal or background? → Neural networks have some advantages in this task: Can learn from known examples (data or simulation) Can evolve to complex non-linear models Easily can deal with complicated correlations amongs inputs Going to more inputs does not cost additional examples ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 4/38 Down to principles Detailed look shows huge number of neurons Neurons are interconnected, each having many inputs from other neurons To build artificial neural network: Zoom in Build model of neuron Connect many neurons together ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 5/38 Model of neuron Get inputs Give output Combine input to single quantity ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 6/38 Model of neuron Give output Get inputs Combine input to single quantity ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 7/38 Model of neuron Give output Get inputs Combine Pinput to single quantity S = i wi xi ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 7/38 Model of neuron Give output Get inputs Pass S through activation function Most common: A(x) = 1+e2−x − 1 Combine Pinput to single quantity S = i wi xi ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 7/38 Building neural network Complicated structure ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 8/38 Building neural network Even more complicated structure ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 8/38 Building neural network 1 2 3 4 1 5 6 7 8 input layer ISAPP Summer Institute 2009 Or simple structures hidden layer M. Kreps, KIT output layer Neural networks in data analysis – p. 8/38 Output of Feed-forward NN Output of k in second layer: Pnode 1→2 ok = A w xi i ik 1 2 3 4 1 5 Outputof node j in output layer: P 2→3 xk oj = A k wkj i hP P 1→2 2→3 w xi A oj = A w i ik k kj More layers usually not needed 6 If more layers involved, output follows same logic 7 8 input layer hidden layer output layer In following, stick to only three layers as here Knowledge is stored in weights ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 9/38 Tasks we can solve Classification Binary decision whether event belongs to one of the two classes Is this particle kaon or not? Is this event candidate for Higgs? Will Germany become soccer world champion in 2010? → Output layer contains single node Conditional probability density Get probability density for each event What is decay time of a decay What is energy of particle given the observed shower What is probability that customer will cause car accident → Output layer contains many nodes ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 10/38 NN work flow ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 11/38 Neural network training After assembling neural networks, weights wik1→2 and wkj2→3 unknown Initialised to default value Random value Before using neural network for planned task, need to find best set of weights Need some measure, which tells which set of weights performs best ⇒ loss function Training is process, where based on known examples of events we search for set of weights which minimise loss function Most important part in using neural networks Bad training can result in bad or even wrong solutions Many traps in the process, so need to be careful ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 12/38 Loss function Practically any function which allows you to decide, which neural network is better → In particle physics it can be one which results in best measurement In practical applications, two commonly used functions (Tji is true value with −1 for one class and 1 for other, oji is actual neural network output) Quadratic loss function P P 1P 2 2 χ = j wj χj = j 2 i (Tji − oji )2 → χ2 = 0 for correct decisions and χ2 = 2 for wrong Entropy loss function P P P j ED = j wj ED = j i − log( 21 (1 + Tji oji )) → ED = 0 for correct decisions and ED = ∞ for wrong ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 13/38 Conditional probability density Classification is easy case, we need single output node and if output is above chosen value, it is class 1 (signal) otherwise class 2 (background) Now question is how to obtain probability density? Use many nodes in output layer, each answering question, whether probability is larger than some value As example, lets have n = 10 nodes and true value of 0.63 Vector of true values for output nodes: (1, 1, 1, 1, 1, 1, −1, −1, −1, −1) Node j answers question whether probability is larger than j /10 The output vector that represents integrated probability density After differentiation we obtain desired probability density ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 14/38 Interpretation of NN output In classification tasks in general higher NN output means more signal like event → NN output rescaled to interval [0; 1] can be interpreted as probability Quadratic/entropy loss function is used NN is well trained ⇔ loss function minimised → Purity P(o) = Nselected signal (o)/Nselected (o) should be → NN is function, which maps ndimensional space to 1-dimensional space 1.0 0.9 0.8 0.7 purity linear function of NN output Looking to expression for output hP i P 2→3 1→2 A w w xi oj = A k kj i ik 0.6 0.5 0.4 0.3 0.2 0.1 0.0 -1.0 -0.5 0.0 0.5 1.0 Network output ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 15/38 How to check quality of training Define quantities: Signal efficiency: N(selected signal) ǫs = N(all signal) Efficiency: N(selected) ǫ= N(all) Purity: N(selected signal) P= N(selected) ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38 How to check quality of training Define quantities: Distribution of NN output for two classes Signal efficiency: N(selected signal) ǫs = N(all signal) ×103 160 Efficiency: N(selected) ǫ= N(all) 140 120 100 Purity: N(selected signal) P= N(selected) 80 60 40 20 0 -1.0 -0.5 0.0 0.5 1.0 Network output How good is separation? ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38 How to check quality of training Define quantities: 1.0 Signal efficiency: N(selected signal) ǫs = N(all signal) 0.8 0.7 purity Efficiency: N(selected) ǫ= N(all) 0.9 0.6 0.5 0.4 0.3 Purity: N(selected signal) P= N(selected) 0.2 0.1 0.0 -1.0 -0.5 0.0 0.5 1.0 Network output Are we at minimum of the loss function? ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38 How to check quality of training Each point has requirement on out > cut Signal efficiency: N(selected signal) ǫs = N(all signal) Efficiency: N(selected) ǫ= N(all) Signal purity Define quantities: 1.0 0.9 Best point is (1,1) 0.8 0.7 Training S /B 0.6 0.5 Purity: N(selected signal) P= N(selected) 0.4 0.3 0.2 0.1 0.00.0 0.2 0.4 0.6 0.8 1.0 Signal efficiency Each point has requirement on out < cut ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38 How to check quality of training Each point has requirement on out > cut Define quantities: Signal efficiency: N(selected signal) ǫs = N(all signal) Best achievable Signal efficiency Efficiency: N(selected) ǫ= N(all) Purity: N(selected signal) P= N(selected) 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 Random decision 0.1 0.00.0 0.2 0.4 0.6 0.8 1.0 Efficiency ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 16/38 Issues in training Main issue: Finding minimum in high dimensional space 5 input nodes with 5 nodes in hidden layer and one output node gives 5x5 + 5 = 30 weights Imagine task to find deepest valley in the Alps (only 2 dimensions) Being put to some random place, easy to find some minimum ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 17/38 Issues in training Main issue: Finding minimum in high dimensional space 5 input nodes with 5 nodes in hidden layer and one output node gives 5x5 + 5 = 30 weights Imagine task to find deepest valley in the Alps (only 2 dimensions) Being put to some random place, easy to find some minimum But what is in next valley? ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 17/38 Issues in training In training, we start with: Large number of weights which are far from optimal value Input values in unknown ranges → Large chance for numerical issues in calculation of loss function Statistical issues coming from low statistics in training In many places in physics not that much issue as training statistics can be large In industry can be quite big issue to obtain large enough training dataset But also in physics it can be sometimes very costly to obtain large enough training sample Learning by heart, when NN has enough weights to do that ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 18/38 How to treat issues All issues mentioned can be treated by neural network package in principle Regularisation for numerical issues at the beginning of training Weight decay - forgetting Helps in overtraining on statistical fluctuations Idea is that effects which are not significant are not shown often enough to keep weight to remember them Pruning of connections - remove connections, which don’t carry significant information Preprocessing numerical issues search for good minimum by good starting point and decorrelation ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 19/38 Neurobayes preprocessing input node 5 6000 4000 3000 flat events 5000 2000 1000 0 0.2 0.4 0.6 0.8 1 29.709 8.7501 7.121 6.2022 5.5871 5.1186 4.7391 4.4197 4.1511 3.9122 3.706 3.52 3.3514 3.1934 3.045 2.905 2.7733 2.649 2.5321 2.4212 2.3178 2.2173 2.1242 2.0361 1.9542 1.882 1.8114 1.7462 1.6865 1.6316 1.5801 1.5328 1.4886 1.4482 1.4096 1.3739 1.3409 1.3099 1.2805 1.2531 1.2272 1.2021 1.1779 1.1551 1.133 1.1107 1.0887 1.0667 1.0449 1.0224 1 0 Bin to bins of equal statistics 1.2 spline fit purity 1 0.8 0.6 0.4 0.2 0 10 20 30 40 50 60 70 80 90 100 bin n. 50000 final events 40000 30000 20000 10000 0 −3 ISAPP Summer Institute 2009 −2 −1 0 1 M. Kreps, KIT 2 Calculate purity for each bin and do spline fit 3 Use purity instead of original value, plus transform to µ = 0 and RMS = 1 Neural networks in data analysis – p. 20/38 Electron ID at CDF Comparing two different NN packages with exactly same inputs Preprocessing is important and can improve result significantly ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 21/38 Available packages Not exhaustive list NeuroBayes http://www.phi-t.de Best I know Commercial, but fee is small for scientific use Developed originally in high energy physics experiment ROOT/TMVA Part of the ROOT package Can be used for playing, but I would advise to use something else for serious applications SNNS: Stuttgart Neural Network Simulator http://www.ra.cs.uni-tuebingen.de/SNNS/ Written in C, but ROOT interface exist Best open source package I know about ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 22/38 NN vs. other multivariate techniques Several times I saw statements of type, my favourite method is better than other multivariate techniques Usually lot of time is spend to understand and properly train favourite method Other methods are tried just for quick comparison without understanding or tuning Often you will even not learn any details of how other methods were setup With TMVA it is now easy to compare different methods often I saw that Fischer discriminant won in those From some tests we found that NN implemented in TMVA is relativily bad No wonder that it is usually worst than other methods Usually it is useful to understand method you are using, none of them is black box ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 23/38 Practical tips No need for several hidden layers, one should be sufficient for you Number of nodes in hidden layer should not be very large → Risk of learning by heart Number of nodes in hidden layer should not be very small → Not enough capacity to learn your problem → Usually O(number of inputs) in hidden layer Understand your problem, NN gets numbers and can dig information out of them, but you know meaning of those numbers ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 24/38 Practical tips Don’t throw away knowledge you acquired about problem before starting using NN → NN can do lot, but you don’t need that it discovers what you already know If you already know that given event is background, throw it away If you know about good inputs, use them rather than raw data from which they are calculated Use neural network to learn what you don’t know yet ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 25/38 Physics examples Only mentioning work done by Karlsruhe HEP group "Stable" particle identification at Delphi, CDF and now Belle Properties (E, φ, θ, Q-value) of inclusively reconstructed B mesons Decay time in semileptonic Bs decays Heavy quark resonances studies: X (3872), B ∗∗ , Bs∗∗ , Λ∗c , Σc B flavour tagging and Bs mixing (Delphi, CDF, Belle) CPV in Bs → J /ψφ decays (CDF) B tagging for high pT physics (CDF, CMS) Single top and Higgs search (CDF) B hadron full reconstruction (Belle) Optimisation of KEKB accelerator ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 26/38 ∗∗ Bs 1.0 fb-1 1.0 fb-1 CDF Run 2 + + B** s → B K → [J/ψ K ] K 400 Candidates per 1.25 MeV/c 2 CDF Run 2 Candidates at CDF - Signal Background 350 300 250 200 150 100 B+K+ ∗ Bs2 35 Signal 30 Background 25 20 Bs1 15 10 5 50 0 -1.0 B+K- 40 -0.5 0.0 0.5 0 0.00 1.0 0.10 + Neural Network Output - + 0.15 - 0.20 2 M(B K )-M(B )-M(K ) [GeV/c ] First observation of Bs1 Most precise masses Background from data Signal from MC ISAPP Summer Institute 2009 0.05 M. Kreps, KIT Neural networks in data analysis – p. 27/38 X(3872) mass at CDF ×103 2.4 fb-1 CDF II Preliminary 100 Signal Background 2.4 fb-1 CDF II Preliminary 60 Total Rejected Selected 60000 40 Candidates per 2 MeV/c2 Candidates 80 X (3872) first of charmonium like states Lot of effort to understand its nature 50000 20 40000 0 -1.0 -0.5 0.0 0.5 1.0 Network Output 30000 20000 Powerful selection at CDF of largest sample leading to best mass measurement ISAPP Summer Institute 2009 M. Kreps, KIT 10000 0 3.4 3.6 3.8 4.0 J/ψ ππ Mass (GeV/c ) 2 Neural networks in data analysis – p. 28/38 Single top search 0.1 0 -1 Tagging whether jet is b-quark jet or not using NN -0.5 0 0.5 1 KIT Flavor Sep. Output Final data plot Part of big program leading to discovery of single top production ISAPP Summer Institute 2009 M. Kreps, KIT All Channels -1 CDF II Preliminary 3.2 fb single top tt Wbb+Wcc Wc Wqq Diboson Z+jets QCD data 300 200 80 70 60 50 40 30 20 10 0 0.6 0.7 0.8 0.9 1 100 0 -1 -0.5 0 MC normalized to SM prediction 0.2 Complicated analysis with many different multivariate techniques Candidate Events single top tt Wbb+Wcc Wc Wqq Diboson Z+jets QCD normalized to unit area Event Fraction -1 TLC 2Jets 1Tag CDF II Preliminary 3.2 fb 0.5 1 NN Output Neural networks in data analysis – p. 29/38 B flavour tagging - CDF Flavour tagging at hadron colliders is difficult task Lot of potential for improvements In plot, concentrate on continuous lines Example of not that good separation ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 30/38 Training with weights -1 CDF Run II preliminary, L = 2.4 fb Candidates per 0.4 MeV 7000 #Signal Events ≈ 23300 #Background Events ≈ 472200 6000 5000 4000 3000 2000 1000 0 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.185 0.19 2 Mass(Λ c π)-Mass(Λ c) [GeV/c ] One knows statistically whether event is signal or background Or has sample of mixture of signal and background and pure sample for one class Can use data only as example of application ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 31/38 Training with weights -1 CDF Run II preliminary, L = 2.4 fb -1 CDF Run II preliminary, L = 2.4 fb #Signal Events ≈ 23300 #Background Events ≈ 472200 6000 5000 4000 3000 2000 1000 0 0.15 0.155 0.16 0.165 0.17 0.175 0.18 0.185 0.19 2 Mass(Λ c π)-Mass(Λ c) [GeV/c ] Candidates per 1.30 MeV Candidates per 0.4 MeV 7000 3000 Data Fit Function ++ Σc(2455) → Λ+c π+ ++ Σc(2520) → Λ+c π+ 2500 2000 1500 1000 500 → Use only data in selection → Can compare data to simulation → Can reweight simulation to match data ISAPP Summer Institute 2009 (data - fit) data 0 0.15 0.20 0.25 0.15 0.20 0.25 2 0 -2 2 Mass(Λ+c π+)-Mass(Λ+c) [GeV/c ] M. Kreps, KIT Neural networks in data analysis – p. 32/38 Examples outside physics Medicine and Pharma research e.g. effects and undesirable effects of drugs early tumor recognition Banks e.g. credit-scoring (Basel II), finance time series prediction, valuation of derivates, risk minimised trading strategies, client valuation Insurances e.g. risk and cost prediction for individual clients, probability of contract cancellation, fraud recognition, justice in tariffs Trading chain stores: turnover prognosis for individual articles/stores ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 33/38 Data Mining Cup World‘s largest students competition: Data-Mining-Cup Very successful in competition with other data-mining methods 2005: Fraud detection in internet trading 2006: price prediction in ebay auctions 2007: coupon redemption prediction 2008: lottery customer behaviour prediction ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 34/38 NeuroBayes Turnover prognosis ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 35/38 Individual health costs Pilot project for a large private health insurance Prognosis of costs in following year for each person insured with confidence intervals 4 years of training, test on following year Results: Probability density for each customer/tariff combination Very good test results! Potential for an objective cost reduction in health management ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 36/38 Prognosis of sport events Results: Probabilities for ISAPP Summer Institute 2009 home - tie - guest M. Kreps, KIT Neural networks in data analysis – p. 37/38 Conclusions Neural network is powerful tool for your analysis They are not genius, but can do marvellous things for genius Lecture of this kind cannot really make you expert Hope that I was successful with: Give you enough insight so that neural network is not black box anymore Teaching you enough not to be afraid to try Feel free to talk to me, I’m at Campus Süd, building 30.23, room 9-11 ISAPP Summer Institute 2009 M. Kreps, KIT Neural networks in data analysis – p. 38/38