Skip to main content
0 votes
0 answers
28 views

TypeError: Transformer.transform() missing 1 required positional argument: 'yy'

I am working on a school finder and I just trying to convert easting and northings to lats and longitude. The code above is too check the nearest school to the user. The problem is that apparently I ...
Litcoder's user avatar
0 votes
1 answer
608 views

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. ...
Quinten's user avatar
  • 42.7k
0 votes
0 answers
150 views

Why my Transformer model did not work well when dealing with single cell multi-omic data

The complete codes and data are available at:Google Disk I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...
氢氰酸's user avatar
0 votes
0 answers
169 views

Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead

Description: I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...
Promit Dey Sarker Arjan's user avatar
1 vote
1 answer
91 views

Can I use a custom attention layer while still leveraging a pre-trained BERT model?

In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...
Blockchain Kid's user avatar
1 vote
1 answer
162 views

(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'

My code: from transformers import AutoTokenizer, AutoModel model_name = "NVIDIA/nv-embed-v2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(...
6zL's user avatar
  • 21
0 votes
1 answer
33 views

Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?

I read that a function f is equivariant if f(P(x)) = P(f(x)) where P is a permutation So to check what means equivariant and permutation invariant I wrote the following code import torch import torch....
fenaux's user avatar
  • 47
0 votes
0 answers
36 views

Does Temporal Fusion Transformer Learn Global Trends Across the Entire Time Series?

I'm using the Temporal Fusion Transformer (TFT) to train on time series data, aiming to make real-time forecasts for a specific process unit at any point in time during operation. However, for ...
YoungJoo Park's user avatar
0 votes
0 answers
61 views

Why does adding token and positional embeddings in transformers work?

In transformer models, I've noticed that token embeddings and positional embeddings are added together before being passed into the attention layers: import torch import torch.nn as nn class ...
Yilmaz's user avatar
  • 50.9k
0 votes
0 answers
63 views

Why is attention scaled by sqrt(d_k) in Transformer architectures?

I have this code in transformer model: keys = x @ W_key queries = x @ W_query values = x @ W_value attention_scores = queries @ keys.T # keys.shape[-1]**0.5: used to scale the attention scores before ...
Yilmaz's user avatar
  • 50.9k
0 votes
0 answers
87 views

Training and validation losses do not reduce when fine-tuning ViTPose from huggingface

I am trying to fine-tune a transformer/encoder based pose estimation model available here at: https://huggingface.co/docs/transformers/en/model_doc/vitpose When passing "labels" attribute to ...
Soham Bhaumik's user avatar
1 vote
0 answers
45 views

Why is day_size set to 32 in temporal embedding code?

I am trying to understand the code for temporal embedding inside autoformer implementation using pytorch. https://github.com/thuml/Autoformer/blob/main/layers/Embed.py class TemporalEmbedding(nn....
prem's user avatar
  • 439
2 votes
1 answer
79 views

Logits Don't Change in a Custom Reimplementation of a CLIP model [PyTorch]

The problem The similarity scores are almost the same for texts that describe both a photo of a cat and a dog (the photo is of a cat). Cat similarity: tensor([[-3.5724]], grad_fn=<MulBackward0>) ...
Yousef's user avatar
  • 51
2 votes
1 answer
168 views

I keep getting this error, cuda available 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

I'm training a transformer model using RLlib's PPO algorithm, but I encounter a device mismatch error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, ...
Thanasis Mpoulionis's user avatar
0 votes
0 answers
68 views

SageMaker Real-Time Endpoint Timeout Issues with Lambda for Parallel Data Processing

I’m new to AWS and struggling with an architecture involving AWS Lambda and a SageMaker real-time endpoint. I’m trying to process large batches of data rows efficiently, but I’m running into timeout ...
Kabir Juneja's user avatar

15 30 50 per page
1
2 3 4 5
74