* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Deep Neural Networks are Easily Fooled
Survey
Document related concepts
Transcript
Deep Neural Networks are Easily Fooled by Girish Dharamveer Sukhwani Introduction • Given the near-human ability of the DNNs to classify visual objects, questions arise about the differences between computer and human visions. • Recent studies reveal that changing an image in a way imperceptible to human eyes, can cause a DNN to mislabel the images. • This paper shows another way in which the DNN and human vision differ. • Images that are completely unrecognizable to humans are created, which the DNNs believe to be recognizable objects with 99% confidence. Introduction contd. • Images that are given high prediction scores given by CNNs, are used. • Evolutionary algorithms or gradient ascent are used on these images to create fooling images. • DNN models that have performed well on MNIST and ImageNet are used. • It seems that it is not easy to prevent MNIST DNNs from being fooled by retraining them with fooling images labeled as such. • Even if the DNNs did learn to classify fooling images while training, a new batch of fooling images can be produced that fool these new networks, ever after many iterations of training. Two models used: Deep Neural Network Models a) LeNet (Yann Lecun): • Good hand-written digit recognizer. • Using backpropagation in a feedforward network. • Many hidden layers. • Many maps of replicated units in each layer. • Pooling of the outputs of nearby replicated units. b) AlexNet (Alex Krizhevsky): • ImageNet classifier (1.3 million high-res images). • 7 hidden layers not counting some max pooling layers. • Early layers were convolutional and the last two were globally connected. • Activation functions: ReLU and Normalization. Deep Neural Net Models contd. • LeNet Architecture • AlexNet Architecture Generating images with evolution • Evolution algorithms (EAs) are optimization algorithms inspired by the Darwinian evolution. • An evolutionary algorithm involves the following steps: i. Compute prediction scores for all images in the training set. ii. Selection: Select images with high prediction scores (fitness). iii. Crossover: Various combinations of a set of features. iv. Mutation: Changing certain features to make them different from original features. v. Evaluate the prediction score and replace images with low prediction scores. • Two algorithms, since they use two types of encodings (genomes). Evolutionary Algorithms • Direct Encoding: • One grayscale integer for each pixel (MNIST). • Three integers (H, S, V) for each pixel (ImageNet). • Indirect Encoding: • Compositional Pattern-Producing Network (CPPN). • CPPNs are similar to Artificial Neural Networks (ANNs). • CPPNs take (x,y) position of a pixel as input, and outputs a grayscale value (MNIST) or tuple of HSV (Hue, Saturation, Value) color values (ImageNet) for that pixel. • Has weights, activations, and neurons like ANNs. Results MNIST: Irregular Images • • • • Directly encoded images. The DNN mislabeled unrecognizable images. Up to 50 generations : 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99% By 200 generations : median confidence was 99.99%. Regular Images • CPPN encodings • The DNN mislabeled unrecognizable images. • The results were the same as that for Irregular images. Results contd. ImageNet: • MNIST DNNs might have been easily fooled because they are trained on a small dataset that could allow for overfitting. • To make sure they used a larger dataset (ImageNet). Irregular Images Regular Images • Directly encoded images. • Less successful at producing high-confidence images, even after 20,000 generations. • Median confidence : 21.59% • Evolution did manage to produce highconfidence images for 45 classes : 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99%. • CPPN encodings. • Initially, 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99%. • After 5000 generations median confidence is 88.11%. Can DNNs generalize? • Do DNNs learn the same features for each class? • They tested with two DNNs (DNNA and DNNB) in two situations: • Both have identical architectures and training, but different initializations. • Both have different architectures. • Most images gave a confidence scores greater than or equal to 99.99%. • Some images did score high on DNNA but not on DNNB. Can DNNs train on evolved images? • First iteration on original dataset. • Produce evolved images after every iteration and add to class n+1, called “fooling images”. • In each iteration, we train on the new dataset which is the output of the previous iteration. Discussion • Why are DNNs fooled by unrecognizable images? • The difference between discriminative models and generative models. • The discriminative models create decision boundaries that partition data into classification regions. • In high-dimensional input space, the area a discriminative model allocates to a class may be much larger than the area occupied by training examples for that class. • Synthetic images far from the decision boundary and deep into a classification region may produce high confidence predictions, even though they are far from the natural images in the class. My Theory • The unrecognizable images produced by evolutionary algorithms are created from the original image. • Could it be possible that they have traces of the original image which the DNN captures and classifies with high confidence as recognizable? • If so, then speaking from the perspective of genetics, a DNA sequence can be used to regenerate another possible DNA sequence which is several generations before or after the current generation. • This could be a crazy idea, but I assure you that I am not. An example to support my theory • Consider the following tree: An example to support my theory (contd.) • Say we represent a relationship between two people in the form : A R B • Where A and B are names of people, and R is the relationship between them. • We train a neural net with hidden layers containing 6 units. • The weights associated with each hidden unit has been represented in the image. • After understanding what the weights in each hidden unit represent, does my theory seem possible? Thank You