Focused crawls are collections of frequently-updated webcrawl data from narrow (as opposed to broad or wide) web crawls, often focused on a single domain or subdomain.
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants
🤖 Dev tools to reliably understand text and automate conversations. Built-in NLU. Connect & deploy on any messaging channel (Slack, MS Teams, website, Telegram, etc).
While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file
Error Message
ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.
To Reproduce
from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p
A practical and feature-rich paraphrasing framework to augment human intents in text form to build robust NLU models for conversational engines. Created by Prithiviraj Damodaran. Open to pull requests and other forms of collaboration.
Description
While using tokenizers.create with the model and vocab file for a custom corpus, the code throws an error and is not able to generate the BERT vocab file
Error Message
ValueError: Mismatch vocabulary! All special tokens specified must be control tokens in the sentencepiece vocabulary.
To Reproduce
from gluonnlp.data import tokenizers
tokenizers.create('spm', model_p