1,110 questions
0
votes
0
answers
28
views
TypeError: Transformer.transform() missing 1 required positional argument: 'yy'
I am working on a school finder and I just trying to convert easting and northings to lats and longitude. The code above is too check the nearest school to the user. The problem is that apparently I ...
0
votes
1
answer
608
views
AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'
I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. ...
0
votes
0
answers
150
views
Why my Transformer model did not work well when dealing with single cell multi-omic data
The complete codes and data are available at:Google Disk
I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...
0
votes
0
answers
169
views
Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead
Description:
I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...
1
vote
1
answer
91
views
Can I use a custom attention layer while still leveraging a pre-trained BERT model?
In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...
1
vote
1
answer
162
views
(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'
My code:
from transformers import AutoTokenizer, AutoModel
model_name = "NVIDIA/nv-embed-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(...
0
votes
1
answer
33
views
Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?
I read that a function f is equivariant if f(P(x)) = P(f(x)) where P is a permutation
So to check what means equivariant and permutation invariant I wrote the following code
import torch
import torch....
0
votes
0
answers
36
views
Does Temporal Fusion Transformer Learn Global Trends Across the Entire Time Series?
I'm using the Temporal Fusion Transformer (TFT) to train on time series data, aiming to make real-time forecasts for a specific process unit at any point in time during operation.
However, for ...
0
votes
0
answers
61
views
Why does adding token and positional embeddings in transformers work?
In transformer models, I've noticed that token embeddings and positional embeddings are added together before being passed into the attention layers:
import torch
import torch.nn as nn
class ...
0
votes
0
answers
63
views
Why is attention scaled by sqrt(d_k) in Transformer architectures?
I have this code in transformer model:
keys = x @ W_key
queries = x @ W_query
values = x @ W_value
attention_scores = queries @ keys.T
# keys.shape[-1]**0.5: used to scale the attention scores before ...
0
votes
0
answers
87
views
Training and validation losses do not reduce when fine-tuning ViTPose from huggingface
I am trying to fine-tune a transformer/encoder based pose estimation model available here at:
https://huggingface.co/docs/transformers/en/model_doc/vitpose
When passing "labels" attribute to ...
1
vote
0
answers
45
views
Why is day_size set to 32 in temporal embedding code?
I am trying to understand the code for temporal embedding inside autoformer implementation using pytorch.
https://github.com/thuml/Autoformer/blob/main/layers/Embed.py
class TemporalEmbedding(nn....
2
votes
1
answer
79
views
Logits Don't Change in a Custom Reimplementation of a CLIP model [PyTorch]
The problem
The similarity scores are almost the same for texts that describe both a photo of a cat and a dog (the photo is of a cat).
Cat similarity: tensor([[-3.5724]], grad_fn=<MulBackward0>)
...
2
votes
1
answer
168
views
I keep getting this error, cuda available 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu
I'm training a transformer model using RLlib's PPO algorithm, but I encounter a device mismatch error:
RuntimeError: Expected all tensors to be on the same device, but found
at least two devices, ...
0
votes
0
answers
68
views
SageMaker Real-Time Endpoint Timeout Issues with Lambda for Parallel Data Processing
I’m new to AWS and struggling with an architecture involving AWS Lambda and a SageMaker real-time endpoint. I’m trying to process large batches of data rows efficiently, but I’m running into timeout ...