Pinned
2,212 contributions in the last year
Less
More
Activity overview
Contributed to
huggingface/datasets,
huggingface/transformers,
huggingface/huggingface_hub
and 5 other
repositories
Contribution activity
July 2021
Created 47 commits in 5 repositories
Created a pull request in huggingface/datasets that received 10 comments
Add skip and take
As discussed in #2375 (comment) I added the IterableDataset.skip and IterableDataset.take methods that allows to do basic splitting of iterable dat…
+187
−8
•
10
comments
Opened 23 other pull requests in 4 repositories
huggingface/datasets
18
merged
1
open
- Fix pick default config name message
- Fix OSCAR Esperanto
- Fix bad config ids that name cache directories
- Fix c4 expected files
- Increase json reader block_size automatically
- Parallelize ETag requests
- Load Dataset from the Hub (NO DATASET SCRIPT)
- Allow dataset config kwargs to be None
- Streaming for the Json loader
- Streaming for the Pandas loader
- Streaming for the CSV loader
- Fix missing EOL issue in to_json for old versions of pandas
- Convert numpy scalar to python float in Pearsonr output
- Make any ClientError trigger retry in streaming mode (e.g. ClientOSError)
- Support pandas 1.3.0 read_csv
- Add c4.noclean infos
- Add mC4
- Add C4
- Add streaming in load a dataset docs
huggingface/huggingface_hub
2
merged
huggingface/datasets-viewer
1
open
huggingface/transformers
1
open
Reviewed 52 pull requests in 5 repositories
huggingface/datasets 46 pull requests
- Add Russian SuperGLUE
- Fix shuffle on IterableDataset that disables batching in case any functions were mapped
- Docs details
- Update PAN-X data URL in XTREME dataset
- Ignore empty batch when writing
- Add support for disable_progress_bar on Windows
- Update WikiANN data URL
- Enumerate all ner_tags values in WNUT 17 dataset
- Print absolute local paths in load_dataset error messages
- fix: 🐛 change string format to allow copy/paste to work in bash
- Minor documentation fix
- Fix Blog Authorship Corpus dataset
- feat: 🎸 add paperswithcode id for qasper dataset
- Use tqdm from tqdm_utils
- Delete extracted files when loading dataset
- Fix logging docstring
- Refactor patching to specific submodule
- More consistent naming
- add image-classification task template
- [Metrics] added wiki_split metrics
- Add speech processing tasks
- Faster search_batch for ElasticsearchIndex due to threading
- Use ETag of remote data files
- Minor fix tests with Windows paths
- Support remote data files
- Some pull request reviews not shown.
huggingface/datasets-viewer 2 pull requests
huggingface/transformers 2 pull requests
huggingface/blog 1 pull request
huggingface/datasets-tagging 1 pull request
Created an issue in pandas-dev/pandas that received 6 comments
BUG: read_csv raises an error when both prefix and names are set to None
I have checked that this issue has not already been reported. I have confirmed this bug exists on the latest version of pandas. (optional)…
•
6
comments
Opened 4 other issues in 1 repository
huggingface/datasets
3
closed
1
open
13
contributions
in private repositories
Jul 7 – Jul 21

