language-model
Here are 904 public repositories matching this topic...
-
Updated
Oct 22, 2020
-
Updated
Feb 25, 2022 - Python
-
Updated
May 10, 2022 - Rust
chooses 15% of token
From paper, it mentioned
Instead, the training data generator chooses 15% of tokens at random, e.g., in the sentence my
dog is hairy it chooses hairy.
It means that 15% of token will be choose for sure.
From https://github.com/codertimo/BERT-pytorch/blob/master/bert_pytorch/dataset/dataset.py#L68,
for every single token, it has 15% of chance that go though the followup procedure.
PositionalEmbedding
When users run our tutorial notebooks, there are quite many convoluted log messages.
For example, there is the log message regarding apex and others:
"INFO - haystack.document_stores.base - Numba not found, replacing njit() with no-op implementation. Enable it with 'pip install numba'.\n",
"INFO - haystack.modeling.model.optimization - apex not found, won't use it. See https://nvidia.g
-
Updated
May 14, 2022 - Jupyter Notebook
-
Updated
May 13, 2022 - Python
目前的多音字使用 pypinyin 或者 g2pM,精度有限,想做一个基于 BERT (或者 ERNIE) 多音字预测模型,简单来说就是假设某语言有 100 个多音字,每个多音字最多有 3 个发音,那么可以在 BERT 后面接 100 个 3 分类器(简单的 fc 层即可),在预测时,找到对应的分类器进行分类即可。
参考论文:
tencent_polyphone.pdf
数据可以用 https://github.com/kakaobrain/g2pM 提供的数据
进阶:多任务的 BERT


Feature request
We currently have ViLT in the library, which, among other tasks, is capable of performing visual question answering (VQA).
It would be great to have a pipeline for this task, with the following API: