Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Deep Neural Networks are Easily Fooled by Girish Dharamveer Sukhwani Introduction • Given the near-human ability of the DNNs to classify visual objects, questions arise about the differences between computer and human visions. • Recent studies reveal that changing an image in a way imperceptible to human eyes, can cause a DNN to mislabel the images. • This paper shows another way in which the DNN and human vision differ. • Images that are completely unrecognizable to humans are created, which the DNNs believe to be recognizable objects with 99% confidence. Introduction contd. • Images that are given high prediction scores given by CNNs, are used. • Evolutionary algorithms or gradient ascent are used on these images to create fooling images. • DNN models that have performed well on MNIST and ImageNet are used. • It seems that it is not easy to prevent MNIST DNNs from being fooled by retraining them with fooling images labeled as such. • Even if the DNNs did learn to classify fooling images while training, a new batch of fooling images can be produced that fool these new networks, ever after many iterations of training. Two models used: Deep Neural Network Models a) LeNet (Yann Lecun): • Good hand-written digit recognizer. • Using backpropagation in a feedforward network. • Many hidden layers. • Many maps of replicated units in each layer. • Pooling of the outputs of nearby replicated units. b) AlexNet (Alex Krizhevsky): • ImageNet classifier (1.3 million high-res images). • 7 hidden layers not counting some max pooling layers. • Early layers were convolutional and the last two were globally connected. • Activation functions: ReLU and Normalization. Deep Neural Net Models contd. • LeNet Architecture • AlexNet Architecture Generating images with evolution • Evolution algorithms (EAs) are optimization algorithms inspired by the Darwinian evolution. • An evolutionary algorithm involves the following steps: i. Compute prediction scores for all images in the training set. ii. Selection: Select images with high prediction scores (fitness). iii. Crossover: Various combinations of a set of features. iv. Mutation: Changing certain features to make them different from original features. v. Evaluate the prediction score and replace images with low prediction scores. • Two algorithms, since they use two types of encodings (genomes). Evolutionary Algorithms • Direct Encoding: • One grayscale integer for each pixel (MNIST). • Three integers (H, S, V) for each pixel (ImageNet). • Indirect Encoding: • Compositional Pattern-Producing Network (CPPN). • CPPNs are similar to Artificial Neural Networks (ANNs). • CPPNs take (x,y) position of a pixel as input, and outputs a grayscale value (MNIST) or tuple of HSV (Hue, Saturation, Value) color values (ImageNet) for that pixel. • Has weights, activations, and neurons like ANNs. Results MNIST: Irregular Images • • • • Directly encoded images. The DNN mislabeled unrecognizable images. Up to 50 generations : 𝑐𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99% By 200 generations : median confidence was 99.99%. Regular Images • CPPN encodings • The DNN mislabeled unrecognizable images. • The results were the same as that for Irregular images. Results contd. ImageNet: • MNIST DNNs might have been easily fooled because they are trained on a small dataset that could allow for overfitting. • To make sure they used a larger dataset (ImageNet). Irregular Images Regular Images • Directly encoded images. • Less successful at producing high-confidence images, even after 20,000 generations. • Median confidence : 21.59% • Evolution did manage to produce highconfidence images for 45 classes : 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99%. • CPPN encodings. • Initially, 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒 ≥ 99.99%. • After 5000 generations median confidence is 88.11%. Can DNNs generalize? • Do DNNs learn the same features for each class? • They tested with two DNNs (DNNA and DNNB) in two situations: • Both have identical architectures and training, but different initializations. • Both have different architectures. • Most images gave a confidence scores greater than or equal to 99.99%. • Some images did score high on DNNA but not on DNNB. Can DNNs train on evolved images? • First iteration on original dataset. • Produce evolved images after every iteration and add to class n+1, called “fooling images”. • In each iteration, we train on the new dataset which is the output of the previous iteration. Discussion • Why are DNNs fooled by unrecognizable images? • The difference between discriminative models and generative models. • The discriminative models create decision boundaries that partition data into classification regions. • In high-dimensional input space, the area a discriminative model allocates to a class may be much larger than the area occupied by training examples for that class. • Synthetic images far from the decision boundary and deep into a classification region may produce high confidence predictions, even though they are far from the natural images in the class. My Theory • The unrecognizable images produced by evolutionary algorithms are created from the original image. • Could it be possible that they have traces of the original image which the DNN captures and classifies with high confidence as recognizable? • If so, then speaking from the perspective of genetics, a DNA sequence can be used to regenerate another possible DNA sequence which is several generations before or after the current generation. • This could be a crazy idea, but I assure you that I am not. An example to support my theory • Consider the following tree: An example to support my theory (contd.) • Say we represent a relationship between two people in the form : A R B • Where A and B are names of people, and R is the relationship between them. • We train a neural net with hidden layers containing 6 units. • The weights associated with each hidden unit has been represented in the image. • After understanding what the weights in each hidden unit represent, does my theory seem possible? Thank You