* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Brain Damage: Algorithms for Network Pruning
Survey
Document related concepts
Transcript
Brain Damage: Algorithms for Network Pruning Andrew Yip HMC Fall 2003 The Idea • Networks with excessive weights “over-train” on data. As a result, they have poor generalization. • Create a technique that can effectively reduce the size of the network without reducing validation. • Hopefully, by reducing the complexity, network pruning can increase the generalization capabilities of the net. History • Removing weights means to set them to 0 and freeze them • First attempt at network pruning removed weights of least magnitude • Minimize cost function composed of both the training error and the measure of network complexity Lecun’s Take • Derive a more theoretically sound technique for weight removal order using the derivative of the error function: E 1 2 g u h u i i i 1 2 ii 2 i i 1 2 h u u i j ij E 12 hiiui2 i i O( U ) 3 j Computing the nd 2 • Network expressed as: Derivatives xi f (ai ) ai wij x j j • Diagonals of Hessian: • Second Derivatives: 2E hkk 2 ( i , j ) wij 2E 2E 2 2 xj 2 wij aij 2 2E E E ' 2 2 '' f (ai ) wli 2 f (ai ) 2 ai al xi l 2E ' 2 '' f ( a ) 2 ( d x ) f (ai ) i i i 2 ai The Recipe • Train the network until local minimum is obtained • Compute the second derivatives for each parameter • Compute the saliencies sk hkk uk2 / 2 • Delete the low-saliency parameters • Iterate sk hkk uk2 / 2 Results Results of OBD Compared to Magnitude-Based Damage Results Continued Comparison of MSE with Retraining versus w/o Retraining Lecon’s Conclusions • Optimal Brain Damage results in a decrease in the number of parameters by up to four; general recognition accuracy increased. • OBD can be used either as an automatic pruning tool or an interactive one. Babak Hassibi: Return of Lecun • Several problems arise from Lecun’s simplifying assumptions • For smaller sized networks, OBD chooses the incorrect parameter to delete • It is possible to recursively calculate the Hessian, yielding a more accurate approximation. **Insert Math Here** (I have no idea what I’m talking about) The MONK’s Problems • Set of problems involving classifying artificial robots based on six discrete valued attributes • Binary Decision Problems: (head_shape = body_shape) • Study performed in 1991; Back-propagation with weight decay found to be most accurate solution at the time. Results: Hassibi Wins Training Training # weights MONK1 BPWD OBS MONK2 BPWD OBS MONK3 BPWD OBS 100 100 100 100 93.4 93.4 100 100 100 100 97.2 97.2 58 14 39 15 39 4 References • Le Cun, Yann. “Optimal Brain Damage”. AT&T Bell Laboratories, 1990. • Hassibi, Babak, Stork, David. “Optimal Brain Surgeon and General Network Pruning”. Ricoh California Research Center. 1993. • Thrun, S.B. “The MONK’s Problems”. CMU. 1991. Questions? (Brain Background Courtesy Brainburst.com)