update utils and examples #586

saidbleik · 2020-05-05T02:51:40Z

Description

Updated NER example to improve usability based on feedback.
Switched to automodels in transformers to support a wider range of pre-trained models

Checklist:

My code follows the code style of this project, as detailed in our contribution guidelines.
I have added tests.
I have updated the documentation accordingly.

review-notebook-app · 2020-05-05T02:51:45Z

Check out this pull request on

Review Jupyter notebook visual diffs & provide feedback on notebooks.

Powered by ReviewNB

examples/named_entity_recognition/ner_wikigold_transformer.ipynb

utils_nlp/models/transformers/common.py

daden-ms · 2020-05-06T02:45:43Z

utils_nlp/models/transformers/named_entity_recognition.py

@@ -77,7 +87,7 @@ def get_inputs(batch, device, model_name, train_mode=True):
                Labels are only returned when train_mode is True.
        """
        batch = tuple(t.to(device) for t in batch)
-        if model_name.split("-")[0] in ["bert", "distilbert"]:
+        if model_name in list(TC_MODEL_CLASS):


model_name is not optional in this function. The documentation needs to be changed.

can do_lower_case be handled within the function, instead of being passed as an argument to avoid mismatch?

utils_nlp/models/transformers/named_entity_recognition.py

daden-ms · 2020-05-06T02:57:31Z

utils_nlp/models/transformers/named_entity_recognition.py

-                    "Text after tokenization with length {} has been truncated".format(
-                        len(new_tokens)
-                    )
-                )
                new_tokens = new_tokens[:max_len]
                new_labels = new_labels[:max_len]


function fit_to_block_size and build_mask from abstractive_summarization_bertsum.py (originally from huggingface's transformers) can be reused here and probably much easier to comprehend.

I feel processing the dataset per example is much cleaner way to see what preprocessing is needed. Also if you choose to process the dataset one example at a time, there is no need to return a TensorDataset, a list of dictionary suffices, and in your get_inputs function, you can specify the field, instead of using the index 0, 1 to track what the inputs requires. It's up to you to decide whether you want to make this change or not.

daden-ms · 2020-05-06T03:08:45Z

utils_nlp/models/transformers/named_entity_recognition.py

@@ -398,7 +405,9 @@ def fit(

        # init scheduler
        scheduler = Transformer.get_default_scheduler(
-            optimizer=self.optimizer, warmup_steps=warmup_steps, num_training_steps=max_steps
+            optimizer=self.optimizer,


should we use dataset instead of dataloader to make it consistent with other transformer models?

same for predict function

can the predict function take the label_map to provide the entity labels directly without having the users call the get_predicted_token_labels?

daden-ms

Nice work. I put in some comments and might require significant changes. It's up to you to decide whether you want to make those changes or not, The PR looks great by itself.

daden-ms

looks good to me. Just one last comment, do we want to check-in the output of the cells in the Jupiter notebooks?

saidbleik · 2020-05-11T22:39:13Z

looks good to me. Just one last comment, do we want to check-in the output of the cells in the Jupiter notebooks?

In general, yes. The NER notebook now includes cells' output.

…ecipes into bleik/add-models

update NER example

Loading status checks…

eab92fa

saidbleik requested a review from sharatsc May 5, 2020

minor edit

Loading status checks…

d331f32

saidbleik requested a review from daden-ms May 5, 2020

daden-ms reviewed May 6, 2020

View changes

examples/named_entity_recognition/ner_wikigold_transformer.ipynb Outdated Show resolved Hide resolved

examples/named_entity_recognition/ner_wikigold_transformer.ipynb Outdated Show resolved Hide resolved

daden-ms reviewed May 6, 2020

View changes

utils_nlp/models/transformers/common.py Outdated Show resolved Hide resolved

daden-ms reviewed May 6, 2020

View changes

utils_nlp/models/transformers/named_entity_recognition.py Show resolved Hide resolved

daden-ms reviewed May 6, 2020

View changes

saidbleik changed the title ~~update NER example~~ update utils and examples May 8, 2020

saidbleik added 8 commits May 8, 2020

update NER example

71921da

update ner utils

2c3b38b

update seq classification utils

e418fc9

common updates

c1aaf25

qa updates

8f965d4

update summarization models

Loading status checks…

78726ff

resolve conflicts

Loading status checks…

9d9ef20

Merge branch 'staging' into bleik/add-models

Verified

This commit was created on GitHub.com and signed with a verified signature using GitHub’s key.

GPG key ID: 4AEE18F83AFDEB23 Learn about signing commits

Loading status checks…

477db37

daden-ms approved these changes May 11, 2020

View changes

saidbleik added 2 commits May 12, 2020

fix ner test

ad32375

Merge branch 'bleik/add-models' of https://github.com/microsoft/nlp-r…

Loading status checks…

2e92e31

…ecipes into bleik/add-models

Dec	JAN	Feb
	19
2020	2021	2022

microsoft / nlp-recipes

update utils and examples #586

update utils and examples #586

saidbleik commented May 5, 2020 •

edited

review-notebook-app bot commented May 5, 2020

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

daden-ms left a comment

daden-ms left a comment

saidbleik commented May 11, 2020

microsoft / nlp-recipes

update utils and examples #586

update utils and examples #586

Conversation

saidbleik commented May 5, 2020 • edited

Description

Checklist:

review-notebook-app bot commented May 5, 2020

This comment has been minimized.

daden-ms May 6, 2020 Collaborator

This comment has been minimized.

daden-ms May 6, 2020 Collaborator

This comment has been minimized.

daden-ms May 6, 2020 Collaborator

This comment has been minimized.

daden-ms May 6, 2020 Collaborator

This comment has been minimized.

daden-ms May 6, 2020 Collaborator

This comment has been minimized.

daden-ms May 6, 2020 • edited Collaborator

This comment has been minimized.

daden-ms May 6, 2020 Collaborator

daden-ms left a comment

daden-ms left a comment

saidbleik commented May 11, 2020

saidbleik commented May 5, 2020 •

edited

daden-ms May 6, 2020
Collaborator

daden-ms May 6, 2020
Collaborator

daden-ms May 6, 2020
Collaborator

daden-ms May 6, 2020
Collaborator

daden-ms May 6, 2020
Collaborator

daden-ms May 6, 2020 •

edited

Collaborator

daden-ms May 6, 2020
Collaborator