Issues: huggingface/datasets
[2.6.1][2.7.0] Upgrade
datasets to fix `TypeError: can only...
#5406
opened Jan 4, 2023 by
lhoestq
Open
11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
Loading column subset from parquet file produces error since version 2.13
#6039
opened Jul 16, 2023 by
kklemon
File "/home/zhizhou/anaconda3/envs/pytorch/lib/python3.10/site-packages/datasets/builder.py", line 992, in _download_and_prepare if str(split_generator.split_info.name).lower() == "all": AttributeError: 'str' object has no attribute 'split_info'. Did you mean: 'splitlines'?
#6038
opened Jul 15, 2023 by
BaiMeiyingxue
DownloadConfig.proxies not work when load_dataset_builder calling HfApi.dataset_info
#6032
opened Jul 14, 2023 by
codingl2k1
Switch to huggingface_hub's HfFileSystem
enhancement
New feature or request
#6017
opened Jul 11, 2023 by
lhoestq
[FR] New feature or request
good second issue
Issues a bit more difficult than "Good First" issues
map should reuse unchanged columns from the previous dataset to avoid disk usage
enhancement
#6013
opened Jul 10, 2023 by
NightMachinery
[FR] Transform Chaining, Lazy Mapping
enhancement
New feature or request
#6012
opened Jul 9, 2023 by
NightMachinery
Improve New feature or request
Dataset's string representation
enhancement
#6010
opened Jul 7, 2023 by
mariosasko
Get an error "OverflowError: Python int too large to convert to C long" when loading a large dataset
arrow
Related to Apache Arrow
#6007
opened Jul 5, 2023 by
silverriver
interleave_datasets & DataCollatorForLanguageModeling having a conflict ?
#6003
opened Jul 3, 2023 by
PonteIneptique
extend the map function so it can wrap around long text that does not fit in the context window
enhancement
New feature or request
#5997
opened Jun 29, 2023 by
siddhsql
Cannot reuse tokenizer object for dataset map
duplicate
This issue or pull request already exists
#5985
opened Jun 23, 2023 by
vikigenius
AutoSharding IterableDataset's when num_workers > 1
enhancement
New feature or request
#5984
opened Jun 23, 2023 by
mathephysicist
Only two cores are getting used in sagemaker with pytorch 3.10 kernel
#5981
opened Jun 22, 2023 by
mmr-crexi
Docs: make "repository structure" easier to find
documentation
Improvements or additions to documentation
#5971
opened Jun 21, 2023 by
severo
description disappearing from Info when Uploading a Dataset Created with
from_dict
#5970
opened Jun 20, 2023 by
balisujohn
Issue with train_test_split maintaining the same underlying PyArrow Table
#5962
opened Jun 17, 2023 by
Oziel14
Previous Next
ProTip!
Mix and match filters to narrow down what you’re looking for.

