Blog


Perceptual Photomosaic

November 28, 2018

During the summer of 2018, I worked as a research assistant for the Ryerson Vision Lab, funded by the Undergraduate Research Opportunities (URO) program. In this time I did work on a paper which was accepted to the Workshop on Computer Vision for Fashion, Art and Design at the European Conference on Computer Vision (ECCV2018). Using a fully convolutional neural network (FCN), we proposed an end-to-end method of learning to generate photomosaic art. This post is a little bit about that. The paper itself can be found here, and the unofficial code is here.

The Idea

Given an input image and a set of template images, our network learned to find some method of tiling the given templates to generate an approximation to the original image. First, a network pre-trained on classification (VGG16) was used to encode the input image. A two-layer CNN was used to decode the image, and a softmax layer produced coefficients for the linear combination of templates. These coefficients were multiplied elementwise with the input templates and the products summed to produce a new image of the same pixel-wise resolution as the original (but with the effective resolution W/Tx * H/Ty where W, H were the resolution of the input, and Tx, Ty were the resolution of a template. We then fed this output image back into VGG16 and using the ability of the pre-trained network to extract meaningful feature representations, a self-supervisory signal was provided via a feature reconstruction loss. We found empirically that the network produced better outputs by doing this in a multi-scale fashion.

Problems

Lessons Learned