Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
The Boltzmann Machine Psych 419/719 March 1, 2001 Recall Constraint Satisfaction.. • We have a network of units and connections… • Finding an optimal state involves relaxation: letting the network settle into a configuration that maximizes a goodness function • This is done by annealing Simulated Annealing • Update unit states according to a probability distribution, which is based on: – The input to the unit. Higher input = greater odds of being on – The temperature. High temperature = more random. Low temperature = deterministic function of input • Start with high temperature, and gradually reduce it Constraint Satisfaction Networks Have Nice Properties • Can settle into stable configurations based on partial or noisy information • Can do pattern completion • Have well formed attractors corresponding to stable states • BUT: How can we make a network learn? What about Backprop? • Two problems: – Tends to split the probability distributions – If input is ambiguous (say, the word LEAD), output reflects that distribution. Not like the necker cube – Also: not very biologically plausible. – Error gradients travel backwards along connections. Neurons don’t seem to do this. We Need Hidden Units • Hidden units are needed to solve xorstyle problems • In these networks, we have a set of symmetric connections between units. • Some units are visible and others are hidden The Boltzmann Machine: Memorizing Patterns • Here, we want to train the network on a set of patterns. • We want the network to learn about the statistics and relationships between the parts of the patterns. • Not really performing an explicit mapping (like backprop is good for) How it Works • • • • Step 1. Pick an example Step 2. Run network in positive phase Step 3. Run network in negative phase Step 4. Compare the statistics of the two phases • Step 5. Update the weights based on statistics • Step 6. Go to step 1 and repeat. Step 1: Pick Example • Pretty simple. Just select an example at random. Step 2. The Positive Phase • Clamp our visible units with the pattern specified by our current example • Let network settle using the simulated annealing method • Record the outputs of the units • Start again with our example, settling again and recording units again. Step 3. The Negative Phase • Here, we don’t clamp the network units. We just let it settle to some state as before. • Do this several times, again recording the unit outputs. Step 4. Compare Statistics • For each pair of units, we compute the odds that both units are coactive (both on) for the positive phase. Do it also for the negative phase. • If we have n units, this gives us two n x n matrices of probabilities • pi,j is probability that both unit i and j are both on. Step 5: Update Weights i, j i, j wi , j k ( p p ) • Change each weight according to the difference of the probabilities for the positive and negative phase • Here, k is like a learning rate Why it Works • This reduces the difference between what the network settles to when the inputs are clamped, and what it settles to when its allowed to free-run. • So, the weights learn about what kinds of visible units go together. • Recruits hidden units to help learn higher order relationships Can Be Used For Mappings Too • Here, the positive phase involves clamping both the input and output units and letting the network settle. • The negative phase involves clamping just the input units • Network learns that given the input, it should settle to a state where the output units are what they should be Contrastive Hebbian Learning • Very similar to a normal Boltzmann machine, except we can have units whose outputs are a deterministic function of their input (like the logistic). • As before, we have two phases: positive and negative. Contrastive Hebbian Learning Rule i j i j wi , j k (a a a a ) • Weight updates based on actual unit outputs, not probabilities that they’re both on. Problems • Weight explosion. If weights get too big too early, network will get stuck in one goodness optimum. – Can be alleviated with weight decay • Settling time. Time to process an example is long, due to settling process. • Learning time. Takes a lot of presentations to learn. • Symmetric weights? Phases? Sleep? • It has been suggested that something like the minus phase might be happening during sleep: • Spontaneous correlations between hidden units (not those driven by external input) get subtracted off. Will vanish, unless driven by external input while awake. • Not a lot of evidence to support this conjecture. • We can learn while awake! For Next Time • Optional reading handed out. • Ends section on learning internal representations. Next: biologically plausible learning. • Remember: – No class next Thursday – Homework 3 due March 13 – Project proposal due March 15. See web page.