Join GitHub today
GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.
Sign upBugfix: custom UNK token conversion error #707
Conversation
| @@ -34,7 +34,8 @@ class Vocab(object): | |||
| UNK = '<unk>' | |||
|
|
|||
| def __init__(self, counter, max_size=None, min_freq=1, specials=['<unk>', '<pad>'], | |||
| vectors=None, unk_init=None, vectors_cache=None, specials_first=True): | |||
| vectors=None, unk_init=None, vectors_cache=None, specials_first=True, | |||
| unk_token=None): | |||
zhangguanheng66
Mar 13, 2020
Collaborator
Not sure if we want to add extra argument to the API. Ideally we should put all the special tokens in specials. stoi should return '<unk>' id if the token is not found.
Not sure if we want to add extra argument to the API. Ideally we should put all the special tokens in specials. stoi should return '<unk>' id if the token is not found.
bentrevett
Mar 13, 2020
Contributor
How will we know which of the special tokens is the unk token without explicitly passing it as an argument?
How will we know which of the special tokens is the unk token without explicitly passing it as an argument?
zhangguanheng66
Mar 13, 2020
Collaborator
I'm not against the hard-code unk token and assign it to the vocab class. I just don't like the idea to make the API longer and longer.
I'm not against the hard-code unk token and assign it to the vocab class. I just don't like the idea to make the API longer and longer.
ohke
Mar 14, 2020
Author
Thanks for your comments.
I don't want to increase the arguments to maintain if possible, but Field class accespts custom unk_token.
Field class adds unk_token to specials in build_vocab method, so there is no problem when accessing via Field object.
We can include arguments check if unk_token exists in specials , in Vocab initialize method.
Thanks for your comments.
I don't want to increase the arguments to maintain if possible, but Field class accespts custom unk_token.
Field class adds unk_token to specials in build_vocab method, so there is no problem when accessing via Field object.
We can include arguments check if unk_token exists in specials , in Vocab initialize method.
ohke
Mar 16, 2020
•
Author
I added one argument validation and its test
I added one argument validation and its test

Formed in 2009, the Archive Team (not to be confused with the archive.org Archive-It Team) is a rogue archivist collective dedicated to saving copies of rapidly dying or deleted websites for the sake of history and digital heritage. The group is 100% composed of volunteers and interested parties, and has expanded into a large amount of related projects for saving online and digital history.

Bugfix: #618, #706
Newly, this changes adds
unk_tokenargument to build_vocab method for set by Field.Also, for backward compatibility, this PR leaves
Vocab.UNKas default token.