Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Stimulus (physiology) wikipedia , lookup
Neural modeling fields wikipedia , lookup
Metastability in the brain wikipedia , lookup
Biological neuron model wikipedia , lookup
Perceptual control theory wikipedia , lookup
Nervous system network models wikipedia , lookup
Synaptic gating wikipedia , lookup
Catastrophic interference wikipedia , lookup
Convolutional neural network wikipedia , lookup
פרקים נבחרים בפיסיקת החלקיקים אבנר סופר אביב 2007 4 Simplest variable combination: diagonal cut Combining variables • • • • Many variables that weakly separate signal from background Often correlated distributions Complicated to deal with or to use in a fit Easiest to combine into one simple variable Fisher discriminant: Neural networks BB & qq Background MC Continuum MC BB MC Signal MC Input variables for neural net Legendre Fisher Log(Dz) cosqT Log(K-D DOCA) Signal BB bgd cc+uds Lepton tagging (BtgElectronTag & BtgMuonTag) Uncorrelated, (approximately) Gaussiandistributed variables • “Gaussian-distributed” means the distribution of v is (v v ) 2 p(v) exp 2 2 • How to combine the information? Signal Background v1 • Option 1: V = v1 + v2 • Option 2: V = v1 – v2 Background • Option 3: V = a1 v1 + a2 v2 • What are the best weights ai? v2 • How about ai = (<vis> – <vib>) = difference between the signal & background means Signal Incorporating spreads in vi • <v1s> – <v1b> > <v2s> – <v2b>, but v2 has a smaller v1 spreads and more actual separation between S and B • ai = (<vis> – <vib>)/((is)2 + (ib)2) where (is)2 = <(vis – <vis>)2> = e (vies – <vis>)2 / N is the RMS spread in the vi distribution of a pure signal sample (similarly defined for ib) • You may be familiar with the form <(v – <v>)2> = <v2> + <<v>2> – 2 <v<v>> = <v2> <v>2 v2 Signal Background Background Signal Linearly correlated, Gaussian-distributed variables • Linear correlation: – <v1> = <v1>0 + c v2 – (1)2 independent of v2 • ai = (<vis> – <vib>) / ((is)2 + (ib)2) doesn’t account for the correlation • Recall (is)2 = <(vis – <vis>)2> • Replace it with the covariance matrix Cijs = <(vis – <vis>) (vjs – <vjs>) > • ai = j (<vis> – <vib>) (Cijs + Cijb)1 • Fisher discriminant: F j ai vi Inverse of the sum of the S+B covariance matrices Fisher discriminant properties • Best S-B separation for a linearly correlated set of Gaussian-distributed variables • Non-Gaussian-ness of v is usually not a problem… • There must be a mean difference <vis> – <vib> 0 • Need to calculate ai coefficients using (correctly simulated) Monte Carlo (MC) signal and background samples • Should validate using control samples (true for any discriminant) Take abs value More properties • F is more Gaussian than its inputs • (virtual calorimeter example) • Central limit theorem: – If xj (j=1, …n) are independent random variables with means <xj> and variances j2, then for large n, the sum j xj is a Gaussian-distributed variable with mean j <xj> and variance j j2 • F can usually be fit with 2 Gaussians or a bifurcated Gaussian • A cut on F corresponds to an (n-1)-diemensional plane cut through the ndimensional variable space Nonlinear correlations • Linear methods (Fisher) are not optimal for such cases • May fail altogether if there is no S-B mean difference Artificial neural networks • “Complex nonlinearity” • Each neuron – takes many inputs – outputs a response function value • The output of each neuron serves as input for the others • Neurons divided among layers for efficiency • The weight wijl between neuron i in layer l and neuron j in layer l+1 is calculated using a MC “training sample” Response functions • Neuron output = r (inputs, weights) = a(k(inputs, weights)) Common usage a = linear in output layer a = tanh in hidden layer k = sum in hidden & output layer Training (calculating weights) • Event a (a=1…N) has input variable vector x = (x1…xnvar) • For each event, calculate the deviation from the desired value (0 for background, 1 for signal) • Calculate the error function for random values w of the weights … Training • Change the weights so as to cause the most steep decline in E: • “online learning”: remove the sums – Requires a randomized training sample What architecture to use? • Weierstrass theorem: for a multilayer perceptron, 1 hidden layer is sufficient to approximate a continuous correlation function to any precision, if the number of neurons in the layer is high enough • Alternatively: several hidden layers and less neurons may converge faster and be more stable • Instability problems: – output distribution changes with different samples What variables to use? • Improvement with added variables: • Importance of variable i: More info • A cut on a NN output = non-linear slice through n-dimensional space • NN output shape can be (approximately) Gaussianized: q q’ = tanh1[(q – ½ (qmax+qmin) / ½(qmax – qmin)]