Pull requests: huggingface/tokenizers
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
Add spaces_between_special_tokens and cleanup tokenization spaces
#1095
opened Nov 3, 2022 by
ArthurZucker
Loading…
[WIP] Unigram trainer seems odd, ignoring some suffixes entirely
#1081
opened Oct 6, 2022 by
Narsil
Loading…
Add wasm32 emscripten target support for python binding
#1021
opened Jul 1, 2022 by
messense
Loading…
Use sentencepiece's protobuf module instead of the local protobuf file
#992
opened Apr 29, 2022 by
tma15
Loading…
Add normalization option to Chinese characters (using OpenCC) and separate symbols from merging
#473
opened Oct 19, 2020 by
ecchochan
Loading…
remove use of parallel iterators except in batch methods
#308
opened Jun 17, 2020 by
epwalsh
Loading…
ProTip!
Adding no:label will show everything without a label.

