Visual Chaos: 2016

Recently I started exploring Tensor Flow, a wonderful library for machine intelligence, released by Google. As tutorial put it, the “Hello World” of machine learning is recognition of handwritten digits. There is a standard database containing 60,000 images and corresponding labels, it is called MNIST.

Exercise 1 of Tensor Flow tutorial is building one-layered neural network with softmax normalization of the outputs. It achieves 92% accuracy in recognition. Exercise 2 is training small convolutional neural network. First convolutional layer contains 32 5x5 features, is max-pooled and connected to second convolutional layer with 64 features, which is then processed by a densely connected layer. Final accuracy raises from 92% to approximately 99%.

When thinking about how neural network operates, I realized that the handwritten images are only a small subset of images that network can classify as “digits”. Network have 10 outputs, each output corresponds to a specific digit. Pixels intensities are fed to 28x28 = 784 input neurons, are processed by a network so that and each output holds a number from 0 to 1, which roughly translates as how sure network is that input was a corresponding digit.

Network is uncertain about random images, and maximal values in output layer rarely exceed 0.6. On the other hand, network is usually pretty sure about the answer when presented with MNIST examples: maximal values in output layer are typically greater than 0.9, and often are very close to 1.

So, I wondered, what other images would be recognized by a convoluted network as digits with certainty close to 1. This is like making identikits of handwritten numbers by asking network a question. First, I took a blank base image, generated 500 random masks where pixels were randomly perturbed, and added masks to a blank base to obtain perturbed image candidates. Then I run neural network, and ask it what candidate has maximal likeness to a specific digit by checking its output channel. The best image is kept and used as a base for the next iteration – until network is absolutely sure that what it perceives is an image of a digit.

I repeated this process several times for all digits. Here is what I’ve got:

Some basic observations:

The process of image generation is random, but most of the time resulting images have similarity within each class, and some easily recognizeable features that distinguish a class from other classes.
One can easily see 0 and 6 in computer-generated images of zero and six. However, it take some imagination to see 3 in F-like generated shapes or 4 in u-like shapes. 1 and 9 look like a total mess.
Sometimes network got stuck in local minima and none of the generated noisy images could improve recognition of the base image above a certain level. But in most cases confidence raised to 0.99 and above easily. 7 and 9 were the most difficult images to articulate – network converged to 0.99 in 30% of “9” cases and in 50% of “7” cases.

So, when all traces of human civilization will be gone except for the last handwritten digits recognition neural network, the aliens archeologists could make a reconstruction of how we wrote digits:

One of the things that inevitably pop-up in any simulation is limit on available computational power. In particular case of simulating 2D lattice model there are 2 general ways to cope with these limitations. The first and the most straightforward thing to do is to make border of the lattice “special” in some way. For example, cells in the bulk might have 8 neighbors, cells on the border have 5 neighbors, and cells in the corners have only 3 neighbors. Usually, this means that behaviour of the system changes on the border, but when done right this does not lead to any catastrophic failure during the simulation. Here is an example of what I've got after simulating a small grid with borders:

Second way to cope with limited computational resources is to make use of periodic boundary conditions. The simplest case of periodic boundary conditions are those of asteroid game, where adjacent screens wrap on each other, and shells fired at the right edge of the screen appear at the left edge:

The overall shape of the simulation field is a torus, or more precisely, a flat torus:

But there are more twisted ways to stich the simulation fields together than just tile screens next to each other. Imagine I would take the top, twist it and glue to the bottom.

There may be even more twists to the way how ends of the screen are glued together. Here is the general blueprint of how it would look:

The overall pattern of Ising model would be different. In chaotic model these changes would be imperceptible, but when the temperature is low, the pattern would acquire certain features to it. Here how these features manifest themselves when field is annealed a number of times:

Regular patterns have a certain degree of rectangularity to them, while twisted patterns are more diagonal-like. Situations where top and bottom have opposite colors can only happen in a doubly twisted simulations.

Visual Chaos

May 15, 2016

Reverse image recognition

April 3, 2016

A border and a twist