Newest 'transformer-model' Questions

0 votes

0 answers

28 views

TypeError: Transformer.transform() missing 1 required positional argument: 'yy'

I am working on a school finder and I just trying to convert easting and northings to lats and longitude. The code above is too check the nearest school to the user. The problem is that apparently I ...

Litcoder

1

asked Oct 7 at 18:17

0 votes

1 answer

608 views

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

I'm following the Hands-On Large Language Models book to learn more about LLMs. I'm trying to generate text using the "microsoft/Phi-3-mini-4k-instruct" model which is used in the book. ...

Quinten

42.7k

asked Sep 19 at 8:39

0 votes

0 answers

150 views

Why my Transformer model did not work well when dealing with single cell multi-omic data

The complete codes and data are available at:Google Disk I'm working on a high-dimensional regression problem and have built a Transformer-based model in PyTorch. While the model trains, I'm observing ...

氢氰酸

9

asked Sep 3 at 14:31

0 votes

0 answers

169 views

Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead

Description: I am trying to install the Hugging Face Transformers version that supports the Qwen2.5-Omni model. According to the official docs, the correct tag to install is v4.51.3-Qwen2.5-Omni-...

Promit Dey Sarker Arjan

1

asked Sep 3 at 10:17

1 vote

1 answer

91 views

Can I use a custom attention layer while still leveraging a pre-trained BERT model?

In the paper “Using Prior Knowledge to Guide BERT’s Attention in Semantic Textual Matching Tasks”, they multiply a similarity matrix with the attention scores inside the attention layer. I want to ...

Blockchain Kid

335

asked Jul 6 at 11:47

1 vote

1 answer

162 views

(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'

My code: from transformers import AutoTokenizer, AutoModel model_name = "NVIDIA/nv-embed-v2" tokenizer = AutoTokenizer.from_pretrained(model_name) model = AutoModel.from_pretrained(...

6zL

21

asked Jul 5 at 13:27

0 votes

1 answer

33 views

Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?

I read that a function f is equivariant if f(P(x)) = P(f(x)) where P is a permutation So to check what means equivariant and permutation invariant I wrote the following code import torch import torch....

fenaux

47

asked Jul 2 at 19:34

0 votes

0 answers

36 views

Does Temporal Fusion Transformer Learn Global Trends Across the Entire Time Series?

I'm using the Temporal Fusion Transformer (TFT) to train on time series data, aiming to make real-time forecasts for a specific process unit at any point in time during operation. However, for ...

YoungJoo Park

51

asked Jun 27 at 0:31

0 votes

0 answers

61 views

Why does adding token and positional embeddings in transformers work?

In transformer models, I've noticed that token embeddings and positional embeddings are added together before being passed into the attention layers: import torch import torch.nn as nn class ...

Yilmaz

50.9k

asked May 26 at 21:21

0 votes

0 answers

63 views

Why is attention scaled by sqrt(d_k) in Transformer architectures?

I have this code in transformer model: keys = x @ W_key queries = x @ W_query values = x @ W_value attention_scores = queries @ keys.T # keys.shape[-1]**0.5: used to scale the attention scores before ...

Yilmaz

50.9k

asked May 25 at 21:48

0 votes

0 answers

87 views

Training and validation losses do not reduce when fine-tuning ViTPose from huggingface

I am trying to fine-tune a transformer/encoder based pose estimation model available here at: https://huggingface.co/docs/transformers/en/model_doc/vitpose When passing "labels" attribute to ...

Soham Bhaumik

341

asked May 8 at 15:28

1 vote

0 answers

45 views

Why is day_size set to 32 in temporal embedding code?

I am trying to understand the code for temporal embedding inside autoformer implementation using pytorch. https://github.com/thuml/Autoformer/blob/main/layers/Embed.py class TemporalEmbedding(nn....

prem

439

asked Apr 28 at 12:45

2 votes

1 answer

79 views

Logits Don't Change in a Custom Reimplementation of a CLIP model [PyTorch]

The problem The similarity scores are almost the same for texts that describe both a photo of a cat and a dog (the photo is of a cat). Cat similarity: tensor([[-3.5724]], grad_fn=<MulBackward0>) ...

Yousef

51

asked Apr 20 at 18:46

2 votes

1 answer

168 views

I keep getting this error, cuda available 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

I'm training a transformer model using RLlib's PPO algorithm, but I encounter a device mismatch error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, ...

Thanasis Mpoulionis

23

asked Apr 11 at 11:41

0 votes

0 answers

68 views

SageMaker Real-Time Endpoint Timeout Issues with Lambda for Parallel Data Processing

I’m new to AWS and struggling with an architecture involving AWS Lambda and a SageMaker real-time endpoint. I’m trying to process large batches of data rows efficiently, but I’m running into timeout ...

Kabir Juneja

1

asked Mar 31 at 6:07

Collectives™ on Stack Overflow

TypeError: Transformer.transform() missing 1 required positional argument: 'yy'

AttributeError: 'DynamicCache' object has no attribute 'seen_tokens'

Why my Transformer model did not work well when dealing with single cell multi-omic data

Cannot import `QwenForCausalLM` after installing `v4.51.3-Qwen2.5-Omni-preview` tag; pip installs 4.52.0.dev0 instead

Can I use a custom attention layer while still leveraging a pre-trained BERT model?

(NVIDIA/nv-embed-v2) ImportError: cannot import name 'MISTRAL_INPUTS_DOCSTRING' from 'transformers.models.mistral.modeling_mistral'

Multi-Head Self Attention in Transformer is permutation-invariant or equivariant how to see it in practice?

Does Temporal Fusion Transformer Learn Global Trends Across the Entire Time Series?

Why does adding token and positional embeddings in transformers work?

Why is attention scaled by sqrt(d_k) in Transformer architectures?

Training and validation losses do not reduce when fine-tuning ViTPose from huggingface

Why is day_size set to 32 in temporal embedding code?

Logits Don't Change in a Custom Reimplementation of a CLIP model [PyTorch]

I keep getting this error, cuda available 'RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu

SageMaker Real-Time Endpoint Timeout Issues with Lambda for Parallel Data Processing

Hot Network Questions