Return to Revisions

6 of 7

replaced http://stats.stackexchange.com/ with https://stats.stackexchange.com/

edited Apr 13, 2017 at 12:44

Siamese neural network

I have been studying the architecture of the Siamese neural network introduced by Yann LeCun and his colleagues in 1994 for the recognition of signatures ("Signature verification using a Siamese time delay neural network".pdf, NIPS 1994).

Image from “Probabilistic Siamese Network for Learning Representations” by Chen Liu (University of Toronto 2013).

I had some problems in understanding the general architecture of this Siamese neural network model, and discussed with a friend on Cross Validated about it. I think I finally understood it, so now I have move to the next step: to implement it.

We ended up stating that the global algorithm should be something like:

Create the convolutional neural network convNetA, for the 1st signature.
Create the convolutional neural network convNetB, for the 2nd signature.
Tying the convNetA weights to the the convNetB weights.
Setting the cosine similarity function to compute the loss .
Run the training (forwards and backwards).

I'm new to Torch so I do not really know how to implement this algorithm. Here's my first version:

-- training
function gradientUpdate(perceptron, dataset, target, learningRate, max_iterations)

for i = 1, max_iterations do

      predictionValue = perceptron:forward(dataset)
       -- is this the cosine similarity?
  -- [output] forward(input): 
  -- Takes an input object, and computes the corresponding output of the module. In general input and output are Tensors. 

      
      io.write(" pre-predictionValue= "..predictionValue .."\n");

      -- the minus is because we're goin' backwards
      gradientWrtOutput = torch.Tensor({-target})

      perceptron:zeroGradParameters() -- zeroGradParameters(): If the module has parameters, this will zero the accumulation of the gradients with respect to these parameters, accumulated through accGradParameters(input, gradOutput,scale) calls. Otherwise, it does nothing.
      -- initialization

      perceptron:backward(dataset, gradientWrtOutput) -- Performs a backpropagation step through the module, with respect to the given input. 

      perceptron:updateParameters(learningRate)
  end

end

require "os"
require "nn"

input_number=5
output_number=2

-- imagine we have one network we are interested in, it is called "perceptronAAA"
perceptronAAA= nn.Sequential(); 
perceptronAAA:add(nn.Linear(input_number, output_number))

-- But we want to push examples towards or away from each other
-- so we make another copy of it called perceptronBBB
-- this *shares* the same weights via the set command, but has its own set of temporary gradient storage
-- that's why we create it again (so that the gradients of the pair don't wipe each other)
perceptronBBB= perceptronAAA:clone('weight', 'bias')

-- we make a parallel table that takes a pair of examples as input. they both go through the same (cloned) perceptron
-- ParallelTable is a container module that, in its forward() method, applies the i-th member module to the i-th input, and outputs a table of the set of outputs.
parallel_table = nn.ParallelTable()
parallel_table:add(perceptronAAA)
parallel_table:add(perceptronBBB)

-- now we define our top level network that takes this parallel table and computes the cosine distance betweem
-- the pair of outputs
perceptron= nn.Sequential()
perceptron:add(parallel_table)
perceptron:add(nn.CosineDistance())


-- lets make two example vectors
x_vector = torch.rand(input_number)
y_vector = torch.rand(input_number)
dataset = {x_vector, y_vector}

function dataset:size() return #dataset end


-- matrix having 5 rows * 2 columns
max_iterations = 100
learnRate = 0.1
target = 1 -- the target for cosine similarity is +1 on forwards, that becomes -1 on backwards

-- TRAINING:

-- push the pair x_vector and y_vector together, the distance should get larger..


 gradientUpdate(perceptron, dataset, target, learnRate, max_iterations)

Do you think this is a correct implementation of a Siamese neural network with cosine similarity function to minimize? Or can you see any errors/wrong stuff in it?

asked Jun 1, 2015 at 19:02

DavideChicco.it