Skip to main content

NLP Collective

A collective focused on NLP (natural language processing), the transformation or extraction of useful information from natural language data.
38.4k Questions
+6
11.3k Members
+41
Contact

Pinned content

View all 2 collections

NLP admins have deemed these posts noteworthy.

Pinned
9 votes
2k views
Collection

Natural Language Processing FAQ

Frequently asked questions relating to NLP. Many of these may be questions that are often asked over and over, duplicates would likely be closed in favor of these. Add the best answer (using the ...
Berthold's user avatar
  • 101

Can you answer these questions?

View all unanswered questions

These questions still don't have an answer

0 votes
0 answers
15 views

Transformer model outputs degrade after ONNX export — what could be causing this?

I’ve exported a fine-tuned BERT-based QA model to ONNX for faster inference, but I’m noticing that the predictions from the ONNX model are consistently less accurate than those from the original ...
0 votes
0 answers
16 views

LangChain HuggingFace ChatHuggingFace raises StopIteration with any model

I’m trying to use LangChain’s Hugging Face integration to chat with the model TinyLlama/TinyLlama-1.1B-Chat-v1.0 for the very first time, but I’m getting a StopIteration error when calling .invoke(). ...
0 votes
0 answers
20 views

ONNX Runtime Helsinki-NLP in Java

has anyone managed to translate something using Helsinki-NLP and ONNX Runtime in Java? Using a Python script, I generated these files: ├── encoder_model.onnx ├── decoder_model.onnx ├── ...
0 votes
0 answers
43 views

Training with spaCy from command line, don't know why gpu-id not recognized

I am having the hardest of times getting my training session to use my gpu 0 which by every measure is present and correctly setup with cuda 12.2. When I try to do python -m spacy train base_config....
1 vote
0 answers
60 views

How to make Microsoft Presidio detect and mask Indian names and unusual text patterns in banking data?

I’m working on anonymizing PII in banking text using Microsoft Presidio . The built-in PERSON recognizer (which uses spaCy under the hood) works for some Western names and when the sentence is clear ...

Looking for an extra challenge?

View all bountied questions

These questions have a bounty on them

2 votes
3 answers
127 views
+150

Huggingface Model initialization for inference

I am working with OmniEmbed model (https://huggingface.co/Tevatron/OmniEmbed-v0.1), which is build on Qwen2.5 7B. My goal is to get a multimodal embedding for images and videos. I have the following ...