language-model

Feature request

We currently have ViLT in the library, which, among other tasks, is capable of performing visual question answering (VQA).

It would be great to have a pipeline for this task, with the following API:

from transformers import pipeline

pipe = pipeline("vqa")
pipe("cats.png", "how many cats are there?")
`

From paper, it mentioned

Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.

It means that 15% of token will be choose for sure.

From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.

When users run our tutorial notebooks, there are quite many convoluted log messages.

For example, there is the log message regarding apex and others:

"INFO - haystack.document_stores.base -  Numba not found, replacing njit() with no-op implementation. Enable it with 'pip install numba'.\n",
"INFO - haystack.modeling.model.optimization -  apex not found, won't use it. See https://nvidia.g

目前的多音字使用 pypinyin 或者 g2pM，精度有限，想做一个基于 BERT (或者 ERNIE) 多音字预测模型，简单来说就是假设某语言有 100 个多音字，每个多音字最多有 3 个发音，那么可以在 BERT 后面接 100 个 3 分类器（简单的 fc 层即可），在预测时，找到对应的分类器进行分类即可。
参考论文：
tencent_polyphone.pdf

数据可以用 https://github.com/kakaobrain/g2pM 提供的数据

进阶：多任务的 BERT
![image](https://user-images.githubusercontent.com/24568452

Describe the bug
Setting "text-gen-type": "interactive" results in an IndexError: : shape mismatch: indexing tensors could not be broadcast together with shapes [4], [3]. Other generation types work.

To Reproduce
Steps to reproduce the behavior:

Install, adapt 20B to local environment, add "text-gen-type": "interactive" config
Run inference
Enter arbitrary prompt when

Issue to track tutorial requests:

Deep Learning with PyTorch: A 60 Minute Blitz - #69
Sentence Classification - #79

I've been chatting with some others interested in training CLIP for different domain tasks. They expressed interest in a simple way to use a pre-trained text transformer.

Some basic support for Hugging Face or generic classes of transformers shouldn't be too crazy of an extension to what is already fleshed out.

Apr	MAY	Jun
	14
2021	2022	2023

language-model

Here are 904 public repositories matching this topic...

huggingface / transformers

Feature request

brightmart / nlp_chinese_corpus

EleutherAI / gpt-neo

huggingface / tokenizers

codertimo / BERT-pytorch

deepset-ai / haystack

NVIDIA / NeMo

speechbrain / speechbrain

PaddlePaddle / PaddleSpeech

CLUEbenchmark / CLUE

tensorflow / lingvo

CyberZHG / keras-bert

zzw922cn / awesome-speech-recognition-speech-synthesis-papers

EleutherAI / gpt-neox

chiphuyen / lazynlp

Separius / awesome-sentence-embedding

salesforce / awd-lstm-lm

NVIDIA / OpenSeq2Seq

huggingface / pytorch-openai-transformer-lm

prabhuomkar / pytorch-cpp

nlpodyssey / spago

mlfoundations / open_clip

explosion / spacy-transformers

ymcui / Chinese-ELECTRA

mihail911 / nlp-library

brightmart / bert_language_understanding

microsoft / DeBERTa

SKTBrain / KoBERT

pykaldi / pykaldi

LiyuanLucasLiu / LM-LSTM-CRF

Improve this page

Add this topic to your repo