* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Information Theory and Neural Coding
Executive functions wikipedia , lookup
Perception of infrasound wikipedia , lookup
Metastability in the brain wikipedia , lookup
Time perception wikipedia , lookup
Single-unit recording wikipedia , lookup
Neuroeconomics wikipedia , lookup
Incomplete Nature wikipedia , lookup
Synaptic gating wikipedia , lookup
Neural correlates of consciousness wikipedia , lookup
Process tracing wikipedia , lookup
Lateralized readiness potential wikipedia , lookup
Biological neuron model wikipedia , lookup
Embodied cognitive science wikipedia , lookup
Response priming wikipedia , lookup
Holonomic brain theory wikipedia , lookup
Evoked potential wikipedia , lookup
Nervous system network models wikipedia , lookup
Efficient coding hypothesis wikipedia , lookup
Psychophysics wikipedia , lookup
Feature detection (nervous system) wikipedia , lookup
Information Theory in Neuroscience Noise, probability and information theory MSc Neuroscience Prof. Jan Schnupp [email protected] Neural Responses are Noisy 0 80 160 240 320 400 480 0 80 160 240 320 400 480 560 560 640 640 720 800 msec 720 800 msec Recordings in cat A1 to recordings of sheep and frog sounds. Seventeen identical repetitions of a stimulus do not produce 17 times the same spike pattern. How much information does an individual response convey about the stimulus? cats\9920\zoo50.src Joint and Marginal Probabilities Neuron Responds Neuron does not Respond (marginal p(s) ) Stimulus On Stimulus Off (marginal p(r) ) 0.35 0.05 0.4 0.15 0.45 0.6 0.5 0.5 A plausible hypothetical example Joint Probabilities and Independence Let s be stimulus present, r be neuron responds. p(s,r)=p(r,s) is the probability that stimulus is present and that neuron responds. (joint probability) p(s|r) is the probability that the neuron responds given that a stimulus was present (conditional probability) Note: p(s|r) =p(s,r)/p(r) If r and s are independent, then p(s,r)=p(s) • p(r) Therefore, if r,s independent, then p(s|r)=p(s), so knowing that the neuron responded does not change my view on how likely it is that there was a stimulus, i.e. the response does not carry information about the stimulus. What is Information? If I tell you something you already know I don’t give you any (new) information. If I tell you something that you could have easily guessed I give you only little information. The less likely a message, the more “surprising” it is: Surprise=1/p. The information content of a message is proportional to the order of magnitude of the message’s “surprise”: I=log2(1/p) = -log2 (p) Examples: “A is the first letter of the alphabet”: p=1, I=-log2(1)=0 “I flipped a coin, it came up heads”: p=0.5, I=-log2(0.5)=1 “His phone number is 928 399”: p=1/107 I=log2(107)=23.25 “Entropy” S(s) or H(s) S ( s) p( s) log 2 ( p( s)) s Measures “uncertainty” about a message s. Equal to the “average” information content of messages from a particular source. Note that, to estimate entropy, the statistical properties of the source must be known, i.e. one must know what values s can take and how likely (p(s)) they are. Entropy of flipping a fair coin: S= - (½ • log2(½) + ½ • log2(½)) = -2 • ½ • -1 = 1 Convention: 0 • log(0) = 0; Entropy of flipping a trick coin with “heads” on both sides: S= - (1 • log2(1) + 0 • log2(0)) = - (0+0) = 0 Entropy of rolling a die: S= -6 • 1/6 • log2(1/6) = -1 • log2(1/6) = log2(6) = 2.585 If two random processes are statistically independent, their entropies add Outcome of 2 coin flips HH HT TH TT Probability 1/4 1/4 1/4 1/4 S ( s) p( s) log 2 ( p( s)) s In this example: S(coin1,coin2)= -4 • 1/4 • log2(1/4) = 2 = S(coin1)+S(coin2) If two processes are not independent, their joint entropy is less than the sum of the individual entropies S ( s, r ) S ( s ) S ( r ) Outcome of 2 coin flips HH HT TH TT Probability 1/2 0 0 1/2 In this example, the two coins are linked so that their outcome is 100% correlated. S(s)=S(r)=1 => S(s)+S(r) = 2 S(s,r)= -2 • 1/2 • log2(1/2) = 1 “Mutual Information” I(r,s) I (r , s ) S (r ) S ( s) S (r , s ) p(r , s) I (r , s ) p (r , s ) log 2 ( ) p(r ) p( s) r ,s Also sometimes called the “transmitted information” T(r;s). Equal to the difference between joint-entropy and sum of individual entropies. Measures how much uncertainty about one random variable is reduced if the value of another random variable is known. Traffic Light Example Swiss Drivers Relative freq (estimated prob) Stop Go Red Green 1/2 0 0 1/2 p(r , s) I (r , s ) p (r , s ) log 2 ( ) p(r ) p( s) r ,s Here: I(Red,Stop)= ½ • log2(½ / (½ • ½)) + 0 + ½ • log2(½ / (½• ½)) + 0 = log2(2) = 1 Traffic Light Example Egyptian Drivers Relative freq Red Green Stop 0.2 0.05 Go 0.3 0.45 p(r , s) I (r , s ) p (r , s ) log 2 ( ) p(r ) p( s) r ,s Note: In this case p(Stop)=0.25, hence S(Go) = 0.8133 < 1 Here: I(Red,Stop)= 0.2 • log2(0.2 / (0.25 • 0.5)) + 0.3 • log2(0.3 / (0.75• 0.5)) + 0.05 • log2(0.05 / (0.25• 0.5)) + 0.45 • log2(0.45 / (0.75• 0.5)) = 0.3545 Hypothetical Example 4 response 3 2 1 0 0 0.5 1 1.5 stimulus intensity 2 2.5 Non-monotonic (quadratic) relationship between stimulus and response. No (linear or first order) correlation between stimulus and response. Nevertheless, the response is informative about the stimulus. E.g. large response implies mid-level stimulus. Correlation is zero, but mutual information is large. 90 90 60 elevation Elev [deg] Estimating Information in Spike Counts. Example: 30 0 45 0 -30 -45 -60 -90 -180 -135 -90 -45 0 45 90 135 180 Azim [deg] -135 -90 -45 0 45 90 135 azimuth 24 “Sectors”, p(s)=1/24, S(s) = 4.585 Data from Mrsic-Flogel et al. Nature Neurosci (2003) Spatial receptive fields of A1 neurons were mapped out using “virtual acoustic space stimuli”. Left panel: the diameter of the dots is proportional to the spike count. Space was carved up into 24 “sectors” (right panel). The question is: what is the mutual information between spike count and sector of space? Estimating Information in Spike Counts - continued. p(sector, count) p(r , s) I (r , s ) p (r , s ) log 2 ( ) p(r ) p( s) r ,s sector We use the relative frequencies (how often did we observe 0, 1, 2, … spikes if the stimulus was in quadrant 1,2,3,…) as estimates for p(r,s). p(s) is fixed by the experimenter and p(r) is estimated from the pooled responses. These values are then plugged into the formula above I(s,r)=0.7019 Difficulties with Estimating Mutual Information: Bias! p(sector, count) To calculate transmitted information, we use observed However, to estimate probabilities (particularly of rare events) accurately, one needs a lot of data. Inaccuracies in the estimates of p(s,r) tend to lead to overestimates of the information content. Example: here on the right, responses were randomly reassigned to stimulus classes. The randomisation should have led to statistical independence and hence zero information. Nevertheless, a value of 0.1281 bits was obtained. sector frequencies as estimates for true underlying probabilities. I(s,r)=0.1281 Estimating Information in Spike Patterns: The Eskandar Richmond & Optican (1992) Experiment Monkeys were trained to perform delayed non-match to target tasks with a set of Walsh patterns Neural responses in area TE of infra - temporal cortex were recorded while the monkeys performed the task IT responses Example of responses recorded by Eskander et al. Different Walsh patterns produced different response patterns as well as different spike counts. Principal Component Analysis of Response Patterns Principle Components IT Neuron Response Patterns PCA coefficients PCA makes it possible to summarize complex response shapes with relatively few numbers (The “coefficients” of the first few principle components) Eskandar et al. Results Spike count plus the first 3 PCA coefficients (T3, gray bars) transmit 30% more information about stimulus identity (“Pattern”) than Spike count alone (TS, white bars). Most of the IT response is attributable to stimulus identity (which Walsh pattern?), only little to task “context” (sample, match or non-match stimulus). Rat “Barrel” Cortex Rat S1 has a a large “barrel field” in which the vibrissae are represented. Spike Latency Coding in Rat Somatosenory Cortex Panzeri et al (2001 Neuron Vol. 29, 769–777) recorded from the D2 barrel, stimulated D2 whisker as well as surrounding whiskers. Response PSTHs shown on right While spike counts were not very informative about which whisker was stimulated, response latency carried large amounts of information. Applications of Information Theory in Neuroscience – Some Further Examples Tovee et al (J Neurophysiol. 1993) found that the first 50 ms or so of the response of “face cells” in monkey inferotemporal cortex contained most of the information contained in the entire response pattern Machens et al (J Neurosci 2001) found that grasshopper auditory neurons transmit information about sound stimuli with highest efficiency if the properties of these stimuli match the time scales and amplitude distributions of natural songs. Mrsic-Flogel et al (Nature Neurosci 2003) found that responses of A1 neurons in adult ferrets carry more information about the spatial location of a sound stimulus than do responses of infant neurons. Li et al (Nature Neurosci 2004) found that the mutual information between visual stimuli and V1 responses can depend on the task an animal is performing (attention?). Information Theory in Neuroscience: a Summary Transmitted Information measures how much the uncertainty about one random variable can be reduced by observing another. Two random variables are “mutually informative” if they are not statistically independent (p(x,y) ≠ p(x) p(y)) However, information measures are agnostic about how the information should best be decoded, or indeed about how much (if any) of the information contained in a spike train can be decoded and used by the brain. Information theory thinks about neurons merely as “transmission channels” and assumes that the receiver (i.e. “higher” brain structures) knows about possible states and their entropies. Real neurons have to be encoders and decoders as much as they are transmission channels. The information content of a spike train is hard to measure accurately, but at least rough (and potentially useful) estimates can sometimes be obtained. Further Reading Trappenberg, T. P. (2002). "Fundamentals of computational neuroscience," (Oxford University Press, Oxford). Rolls, E. T., and Treves, A. (1998). "Neural networks and brain function." (Oxford University Press, Oxford), pp. appendix 2. Rieke, F. (1997). "Spikes: exploring the neural code," (MIT Press, Cambridge, Mass.; London). Eskandar EN, Richmond BJ, and Optican LM. Role of inferior temporal neurons in visual memory. I. Temporal encoding of information about visual images, recalled images, and behavioral context. J Neurophysiol 68: 1277-1295, 1992. Furukawa, S., and Middlebrooks, J. C. (2002). "Cortical representation of auditory space: information-bearing features of spike patterns," J Neurophysiol 87, 1749-62. Panzeri S, Petersen RS, Schultz SR, Lebedev M, and Diamond ME. The role of spike timing in the coding of stimulus location in rat somatosensory cortex. Neuron 29: 769-777, 2001.