Siamese networks were first introduced by Bromley and LeCun [1] in early 1990s to solve signature verification as an image matching problem. A similar Siamese architecture was independently proposed for fingerprint identification by Baldi and Chauvin [2] in 1992.  Later in 2015, Gregory Koch et al. [3] proposed to use Siamese neural networks for one-shot image recognition.

Siamese neural networks are designed as two twin networks that are connected by their final layer by means of a distance layer that is trained to predict whether two images belong to the same category or not. The networks that compose the siamese architecture are called twins because all the weights and biases are tied, which means that both networks are symmetric. Symmetry is important as the network should be invariant to switching the input images. Moreover, this characteristic makes the networks much faster to train since the number of parameter is reduced significantly.

The last layer compares the output of the twin networks and forces the distance of same category images to be 0 and 1 for the different ones. A contrastive loss function is used for this purpose, which was introduced by Yann LeCun et al. [4]:

where Dw is the distance output by the network, Y the real value (0/1) and m a parameter which value is determined by the maximum distance chosen (1).

Aside from the first applications mentioned, these networks can be used in many applications than involve a differentiation task. Besides, the evolution of neural networks based on convolutional architecture was crucial for applying these methods to more complex images. On this post we tackle one of the most interesting applications of these networks: one-shot learning. This method consists on predicting an image’s category just by comparing it to a single example of each class.

Under this approach, a siamese network is used to learn image representations via a supervised metric- and then those network’s features are reused for one-shot learning without any retraining. In other words, we first learn a neural network that can discriminate between the class-identity of image pairs and generalize it to one-shot task by evaluating new images in a pairwise manner against different image pairs. This maps each image representation at the last layer as a vector space where images belonging to the same class have a distance zero and one if they are different. This can be easily seen as a space transform where each class stands on a hypersphere centre.

In conclusion, the ability to distinguish of the Siamese networks is an interesting quality exploited in a wide range of applications. Apart from the ones mentioned, authorship verification and face recognition [5] are also among them. One-shot learning is an interesting problem for which Siamese networks are a solution but not the only one. Other succesful approaches have been developed, such as prototypical networks, conditional networks, matching networks and many more. These networks can be used for the implementation of quality control systems for surface inspection in industrial environments, Computer Aided Diagnosis (CAD) tools, image search engines based on visual similarity (CBIR), etc.

These kind of techniques are been applied at the H2020 European project PICCOLO (GA No. 732111) lead by Tecnalia Computer Vision team.

[1] J. Bromley, I. Guyon, Y. LeCun, E. Säckinger, and R. Shah. Signature verification using a “Siamese” time delay neural network. In Proceedings of the 6th International Conference on Neural Information Processing Systems (NIPS’93), San Francisco, 737-744, 1993.

[2] P. Baldi, Y. Chauvin. Neural Networks for Fingerprint Recognition. Neural Computation, 1993.

[3] Koch, Gregory R. Siamese Neural Networks for One-Shot Image Recognition, 2015.

[4] R. Hadsell, S. Chopra and Y. LeCun. Dimensionality Reduction by Learning an Invariant Mapping. New York, NY, USA,pp. 1735-1742, 2006.

[5] W. Mei, W. Deng. Deep Face Recognition: A Survey. 2018.

Disclaimer: Due to privacy and data protection law, images on this post have been obtained from free external sources.