Download Brain Damage: Algorithms for Network Pruning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Zero-configuration networking wikipedia , lookup

Distributed firewall wikipedia , lookup

Computer network wikipedia , lookup

Cracking of wireless networks wikipedia , lookup

Piggybacking (Internet access) wikipedia , lookup

Network tap wikipedia , lookup

Airborne Networking wikipedia , lookup

Transcript
Brain Damage: Algorithms for
Network Pruning
Andrew Yip
HMC Fall 2003
The Idea
• Networks with excessive weights “over-train” on
data. As a result, they have poor generalization.
• Create a technique that can effectively reduce the
size of the network without reducing validation.
• Hopefully, by reducing the complexity, network
pruning can increase the generalization
capabilities of the net.
History
• Removing weights means to set them to 0
and freeze them
• First attempt at network pruning removed
weights of least magnitude
• Minimize cost function composed of both
the training error and the measure of
network complexity
Lecun’s Take
• Derive a more theoretically sound technique
for weight removal order using the
derivative of the error function:
E 
1
2
 g u   h u
i
i
i
1
2
ii
2
i

i
1
2
 h u u
i j
ij
 E 12  hiiui2
i
i
 O( U )
3
j
Computing the
nd
2
• Network expressed as:
Derivatives
xi  f (ai )
ai   wij x j
j
• Diagonals of Hessian:
• Second Derivatives:
2E
hkk   2
( i , j ) wij
2E 2E 2
 2 xj
2
wij aij
2
2E

E
E
'
2
2
''
 f (ai )  wli 2  f (ai )
2
ai
al
xi
l
2E
'
2
''

f
(
a
)

2
(
d

x
)
f
(ai )
i
i
i
2
ai
The Recipe
• Train the network until local minimum is
obtained
• Compute the second derivatives for each
parameter
• Compute the saliencies sk  hkk uk2 / 2
• Delete the low-saliency parameters
• Iterate
sk  hkk uk2 / 2
Results
Results of OBD Compared to Magnitude-Based Damage
Results Continued
Comparison of MSE with Retraining versus w/o Retraining
Lecon’s Conclusions
• Optimal Brain Damage results in a decrease
in the number of parameters by up to four;
general recognition accuracy increased.
• OBD can be used either as an automatic
pruning tool or an interactive one.
Babak Hassibi: Return of Lecun
• Several problems arise from Lecun’s
simplifying assumptions
• For smaller sized networks, OBD chooses
the incorrect parameter to delete
• It is possible to recursively calculate the
Hessian, yielding a more accurate
approximation.
**Insert Math Here**
(I have no idea what I’m talking about)
The MONK’s Problems
• Set of problems involving classifying
artificial robots based on six discrete valued
attributes
• Binary Decision Problems: (head_shape =
body_shape)
• Study performed in 1991; Back-propagation
with weight decay found to be most
accurate solution at the time.
Results: Hassibi Wins
Training Training # weights
MONK1 BPWD
OBS
MONK2 BPWD
OBS
MONK3 BPWD
OBS
100
100
100
100
93.4
93.4
100
100
100
100
97.2
97.2
58
14
39
15
39
4
References
• Le Cun, Yann. “Optimal Brain Damage”. AT&T
Bell Laboratories, 1990.
• Hassibi, Babak, Stork, David. “Optimal Brain
Surgeon and General Network Pruning”. Ricoh
California Research Center. 1993.
• Thrun, S.B. “The MONK’s Problems”. CMU.
1991.
Questions?
(Brain Background Courtesy Brainburst.com)