The Wayback Machine - https://web.archive.org/web/20220111165113/https://github.com/topics/language-model
Skip to content
#

language-model

Here are 820 public repositories matching this topic...

transformers
ikergarcia1996
ikergarcia1996 commented Dec 10, 2021

🚀 Feature request

Fast Tokenizer for DeBERTA-V3 and mDeBERTa-V3

Motivation

DeBERTa V3 is an improved version of DeBERTa. With the V3 version, the authors also released a multilingual model "mDeBERTa-base" that outperforms XLM-R-base. However, DeBERTa V3 currently lacks a FastTokenizer implementation which makes it impossible to use with some of the example scripts (They require a Fa

haystack
maxupp
maxupp commented Nov 12, 2021

_handle_duplicate_documents and _drop_duplicate_documents in the elastic search document store will always report self.index as the index with the conflict, which is obviously incorrect.

Edit: Upon further investigation, this is actually a lot worse. Using multiple indices with the ElasticSearch DocumentStore is completely broken due to the fact, that this is used in `_handle_duplicate_do

yt605155624
yt605155624 commented Jan 6, 2022

目前的多音字使用 pypinyin 或者 g2pM,精度有限,想做一个基于 BERT (或者 ERNIE) 多音字预测模型,简单来说就是假设某语言有 100 个多音字,每个多音字最多有 3 个发音,那么可以在 BERT 后面接 100 个 3 分类器(简单的 fc 层即可),在预测时,找到对应的分类器进行分类即可。
参考论文:
tencent_polyphone.pdf

数据可以用 https://github.com/kakaobrain/g2pM 提供的数据

进阶:多任务的 BERT
![image](https://user-images.githubusercontent.com/24568452

Automatic Speech Recognition (ASR), Speaker Verification, Speech Synthesis, Text-to-Speech (TTS), Language Modelling, Singing Voice Synthesis (SVS), Voice Conversion (VC)

  • Updated Jan 11, 2022

Improve this page

Add a description, image, and links to the language-model topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the language-model topic, visit your repo's landing page and select "manage topics."

Learn more