Skip to main content
6 of 7
replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/

Siamese neural network

I have been studying the architecture of the Siamese neural network introduced by Yann LeCun and his colleagues in 1994 for the recognition of signatures ("Signature verification using a Siamese time delay neural network".pdf, NIPS 1994).

Image from  “Probabilistic Siamese Network for Learning Representations” by Chen Liu (University of Toronto 2013).

I had some problems in understanding the general architecture of this Siamese neural network model, and discussed with a friend on Cross Validated about it. I think I finally understood it, so now I have move to the next step: to implement it.

We ended up stating that the global algorithm should be something like:

  • Create the convolutional neural network convNetA, for the 1st signature.
  • Create the convolutional neural network convNetB, for the 2nd signature.
  • Tying the convNetA weights to the the convNetB weights.
  • Setting the cosine similarity function to compute the loss .
  • Run the training (forwards and backwards).

I'm new to Torch so I do not really know how to implement this algorithm. Here's my first version:

-- training
function gradientUpdate(perceptron, dataset, target, learningRate, max_iterations)

for i = 1, max_iterations do

      predictionValue = perceptron:forward(dataset)
       -- is this the cosine similarity?
  -- [output] forward(input): 
  -- Takes an input object, and computes the corresponding output of the module. In general input and output are Tensors. 

      
      io.write(" pre-predictionValue= "..predictionValue .."\n");

      -- the minus is because we're goin' backwards
      gradientWrtOutput = torch.Tensor({-target})

      perceptron:zeroGradParameters() -- zeroGradParameters(): If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters, accumulated through accGradParameters(input, gradOutput,scale) calls. Otherwise, it does nothing.
      -- initialization

      perceptron:backward(dataset, gradientWrtOutput) -- Performs a backpropagation step through the module, with respect to the given input. 

      perceptron:updateParameters(learningRate)
  end

end

require "os"
require "nn"

input_number=5
output_number=2

-- imagine we have one network we are interested in, it is called "perceptronAAA"
perceptronAAA= nn.Sequential(); 
perceptronAAA:add(nn.Linear(input_number, output_number))

-- But we want to push examples towards or away from each other
-- so we make another copy of it called perceptronBBB
-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage
-- that's why we create it again (so that the gradients of the pair don't wipe each other)
perceptronBBB= perceptronAAA:clone('weight', 'bias')

-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) perceptron
-- ParallelTable is a container module that, in its forward() method, applies the i-th member module to the i-th input, and outputs a table of the set of outputs.
parallel_table = nn.ParallelTable()
parallel_table:add(perceptronAAA)
parallel_table:add(perceptronBBB)

-- now we define our top level network that takes this parallel table and computes the cosine distance betweem
-- the pair of outputs
perceptron= nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())


-- lets make two example vectors
x_vector = torch.rand(input_number)
y_vector = torch.rand(input_number)
dataset = {x_vector, y_vector}

function dataset:size() return #dataset end


-- matrix having 5 rows * 2 columns
max_iterations = 100
learnRate = 0.1
target = 1 -- the target for cosine similarity is +1 on forwards, that becomes -1 on backwards

-- TRAINING:

-- push the pair x_vector and y_vector together, the distance should get larger..


 gradientUpdate(perceptron, dataset, target, learnRate, max_iterations)

Do you think this is a correct implementation of a Siamese neural network with cosine similarity function to minimize? Or can you see any errors/wrong stuff in it?