5

I'm currently using this repo to perform NLP and learn more about CNN's using my own dataset, and I keep running into an error regarding a shape mismatch:

ValueError: Target size (torch.Size([64])) must be the same as input size (torch.Size([15]))

     10 }
     11 for epoch in tqdm(range(params['epochs'])):
---> 12     train_loss, train_acc = train(model, train_iterator, optimizer, criterion)
     13     valid_loss, valid_acc = evaluate(model, valid_iterator, criterion)
     14     epoch_mins, epoch_secs = epoch_time(start_time, end_time)

     57         print("PredictionShapeAfter:")
     58         print(predictions.shape)
---> 59         loss = criterion(predictions, batch.l)
     60 
     61         acc = binary_accuracy(predictions, batch.l)

Doing some digging, I found that my CNN's prediction is a different size compared to the training data truth it's being compared to:

  Input Shape:
    torch.Size([15, 64])
    Truth Shape:
    torch.Size([64])
    embedded unsqueezed: torch.Size([15, 1, 64, 100])
    cat shape: torch.Size([15, 300])
    Prediction Shape Before Squeeze:
    torch.Size([15, 1])
    PredictionShapeAfter:
    torch.Size([15])

The model is making the prediction shape (the last value in this list) as the first dimension of the inputs. Is this a common problem and is there a way to rectify this issue?

My Model:

class CNN(nn.Module):
    def __init__(self, vocab_size, embedding_dim, n_filters, filter_sizes, output_dim, 
                 dropout, pad_idx):
        super().__init__()
                
        self.embedding = nn.Embedding(vocab_size, embedding_dim, padding_idx = pad_idx)
        
        self.convs = nn.ModuleList([
                                    nn.Conv2d(in_channels = 1, 
                                              out_channels = n_filters, 
                                              kernel_size = (fs, embedding_dim)) 
                                    for fs in filter_sizes
                                    ])
        
        self.fc = nn.Linear(len(filter_sizes) * n_filters, output_dim)
        
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, text): 
        embedded = self.embedding(text)
        embedded = embedded.unsqueeze(1)
        print(f"embedded unsqueezed: {embedded.shape}")
        conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs]          
        pooled = [F.max_pool1d(conv, conv.shape[2]).squeeze(2) for conv in conved]
        cat = self.dropout(torch.cat(pooled, dim = 1))   
        print(f"cat shape: {cat.shape}")       
        return self.fc(cat)

My Training function:

def train(model, iterator, optimizer, criterion):
    
    epoch_loss = 0
    epoch_acc = 0
    
    model.train()
    
    for batch in iterator:
        
        optimizer.zero_grad()

        print("InputShape:")
        print(batch.t.shape)
        print("Truth Shape:")
        print(batch.l.shape)

        predictions = model(batch.t)
        print("Prediction Shape Before Squeeze:")
        print(predictions.shape)

        predictions = predictions.squeeze(1)
        print("PredictionShapeAfter:")
        print(predictions.shape)
        loss = criterion(predictions, batch.l)
        
        acc = binary_accuracy(predictions, batch.l)
        
        loss.backward()
        
        optimizer.step()
        
        epoch_loss += loss.item()
        epoch_acc += acc.item()
        
    return epoch_loss / len(iterator), epoch_acc / len(iterator)

My full code can be found at this link.

2
  • 1
    I'm assuming your labels are of shape 64? That means you are passing the wrong output_dim to your model. What does batch.l represent in your data? What do the numbers themselves represent? Commented Apr 7, 2021 at 4:45
  • 2
    I didn't take a detailed look at the code, but shouldn't the size of your ground truth labels be the same as the number of batches? If your input data is torch.size([15, 64]) then shouldn't your labels be torch.size([15]) considering you have one label for each sample? Commented Apr 7, 2021 at 5:56

1 Answer 1

2

Your issue is here:

self.convs = nn.ModuleList([
                            nn.Conv2d(in_channels = 1, 
                                      out_channels = n_filters, 
                                      kernel_size = (fs, embedding_dim)) 
                            for fs in filter_sizes
                            ])

You are inputting data of shape [15, 1, 64, 100], which the convolutions are interpreting as batches of size 15, of 1-channel images of HxW 64x100.

What it appears you want is a batch of size 64, so swap those dimensions first:

...
embedded = embedded.swapdims(0,2)
conved = [F.relu(conv(embedded)).squeeze(3) for conv in self.convs] 
...
Sign up to request clarification or add additional context in comments.

2 Comments

When I implement these changes, I get a different error: RuntimeError: Calculated padded input size per channel: (4 x 100). Kernel size: (5 x 100). Kernel size can't be greater than actual input size Do you have any idea on how to solve this? It seems like its a problem with the iterator being funky.
@NakulUpadhya your padded image has been reduced in size so that it is now too small for the next convolutional filter. This happens because your images are only 15 pixels high apparently.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.