Download Deep Neural Networks are Easily Fooled

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Catastrophic interference wikipedia , lookup

Computer vision wikipedia , lookup

Types of artificial neural networks wikipedia , lookup

Recurrent neural network wikipedia , lookup

Mental image wikipedia , lookup

Convolutional neural network wikipedia , lookup

Transcript
Deep Neural
Networks are Easily
Fooled
by
Girish Dharamveer Sukhwani
Introduction
• Given the near-human ability of the DNNs to
classify visual objects, questions arise about the
differences between computer and human visions.
• Recent studies reveal that changing an image in a
way imperceptible to human eyes, can cause a DNN
to mislabel the images.
• This paper shows another way in which the DNN
and human vision differ.
• Images that are completely unrecognizable to
humans are created, which the DNNs believe to be
recognizable objects with 99% confidence.
Introduction
contd.
• Images that are given high prediction scores given by
CNNs, are used.
• Evolutionary algorithms or gradient ascent are used on
these images to create fooling images.
• DNN models that have performed well on MNIST and
ImageNet are used.
• It seems that it is not easy to prevent MNIST DNNs from
being fooled by retraining them with fooling images
labeled as such.
• Even if the DNNs did learn to classify fooling images
while training, a new batch of fooling images can be
produced that fool these new networks, ever after many
iterations of training.
Two models used:
Deep Neural
Network
Models
a)
LeNet (Yann Lecun):
• Good hand-written digit recognizer.
• Using backpropagation in a feedforward network.
• Many hidden layers.
• Many maps of replicated units in each layer.
• Pooling of the outputs of nearby replicated units.
b) AlexNet (Alex Krizhevsky):
• ImageNet classifier (1.3 million high-res images).
• 7 hidden layers not counting some max pooling layers.
• Early layers were convolutional and the last two were
globally connected.
• Activation functions: ReLU and Normalization.
Deep Neural Net Models contd.
• LeNet Architecture
• AlexNet Architecture
Generating images with evolution
• Evolution algorithms (EAs) are optimization algorithms inspired by the Darwinian evolution.
• An evolutionary algorithm involves the following steps:
i.
Compute prediction scores for all images in the training set.
ii. Selection: Select images with high prediction scores (fitness).
iii. Crossover: Various combinations of a set of features.
iv. Mutation: Changing certain features to make them different from original features.
v. Evaluate the prediction score and replace images with low prediction scores.
• Two algorithms, since they use two types of encodings
(genomes).
Evolutionary
Algorithms
• Direct Encoding:
• One grayscale integer for each pixel (MNIST).
• Three integers (H, S, V) for each pixel (ImageNet).
• Indirect Encoding:
• Compositional Pattern-Producing Network (CPPN).
• CPPNs are similar to Artificial Neural Networks
(ANNs).
• CPPNs take (x,y) position of a pixel as input, and
outputs a grayscale value (MNIST) or tuple of HSV
(Hue, Saturation, Value) color values (ImageNet) for
that pixel.
• Has weights, activations, and neurons like ANNs.
Results
MNIST:
Irregular Images
•
•
•
•
Directly encoded images.
The DNN mislabeled unrecognizable images.
Up to 50 generations : 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99%
By 200 generations : median confidence was
99.99%.
Regular Images
• CPPN encodings
• The DNN mislabeled unrecognizable images.
• The results were the same as that for Irregular
images.
Results contd.
ImageNet:
• MNIST DNNs might have been easily fooled because they are trained
on a small dataset that could allow for overfitting.
• To make sure they used a larger dataset (ImageNet).
Irregular Images
Regular Images
• Directly encoded images.
• Less successful at producing high-confidence
images, even after 20,000 generations.
• Median confidence : 21.59%
• Evolution did manage to produce highconfidence images for 45 classes :
𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99%.
• CPPN encodings.
• Initially, 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99%.
• After 5000 generations median confidence is
88.11%.
Can DNNs generalize?
• Do DNNs learn the same features for
each class?
• They tested with two DNNs (DNNA
and DNNB) in two situations:
• Both have identical
architectures and training, but
different initializations.
• Both have different
architectures.
• Most images gave a confidence
scores greater than or equal to
99.99%.
• Some images did score high on
DNNA but not on DNNB.
Can DNNs train on
evolved images?
• First iteration on original
dataset.
• Produce evolved images after
every iteration and add to
class n+1, called “fooling
images”.
• In each iteration, we train on
the new dataset which is the
output of the previous
iteration.
Discussion
• Why are DNNs fooled by unrecognizable images?
• The difference between discriminative models
and generative models.
• The discriminative models create decision
boundaries that partition data into classification
regions.
• In high-dimensional input space, the area a
discriminative model allocates to a class may be
much larger than the area occupied by training
examples for that class.
• Synthetic images far from the decision boundary
and deep into a classification region may produce
high confidence predictions, even though they
are far from the natural images in the class.
My Theory
• The unrecognizable images produced by evolutionary algorithms are created
from the original image.
• Could it be possible that they have traces of the original image which the DNN
captures and classifies with high confidence as recognizable?
• If so, then speaking from the perspective of genetics, a DNA sequence can be
used to regenerate another possible DNA sequence which is several generations
before or after the current generation.
• This could be a crazy idea, but I assure you that I am not.
An example to support my theory
• Consider the following tree:
An example to support
my theory (contd.)
• Say we represent a relationship between two
people in the form : A R B
• Where A and B are names of people, and R is
the relationship between them.
• We train a neural net with hidden layers
containing 6 units.
• The weights associated with each hidden unit
has been represented in the image.
• After understanding what the weights in each
hidden unit represent, does my theory seem
possible?
Thank You