Releases: deepspeedai/DeepSpeed
Releases · deepspeedai/DeepSpeed
v0.17.1 Patch Release
What's Changed
- Update version.txt after v0.17.0 release by @loadams in #7326
- Ulysses Plus Docs by @stas00 in #7331
- UlyssesPlus Docs take 2 by @stas00 in #7332
- Improve Ulysses Plus Docs by @cynricfu in #7335
- Update config_utils.py by @qgallouedec in #7333
- Fix pytest version to 8.3.5 in hpu-gaudi actions by @raza-sikander in #7337
- Fix issue with symint input by @tohtana in #7243
- fp16 optimizer timers fix - TypeError: 'NoneType' object is not callable by @rraminen in #7330
- DeepNVMe update by @tjruwase in #7215
- fixed: Modified the topkgating function and modified the test_moe file for testing by @xiongjyu in #7163
- Fix LoRA arxiv reference by @emmanuel-ferdman in #7340
- Update folder name by @sfc-gh-truwase in #7343
- Improve overflow handling in ZeRO by @tjruwase in #6976
- Fix docs that are rendering Incorrectly by @felixgondwe in #7344
- Move pytest pinning from individual tests to requirements-dev.txt until fixed. by @loadams in #7327
New Contributors
- @cynricfu made their first contribution in #7335
- @xiongjyu made their first contribution in #7163
- @sfc-gh-truwase made their first contribution in #7343
- @felixgondwe made their first contribution in #7344
Full Changelog: v0.17.0...v0.17.1
DeepSpeed v0.17.0
What's Changed
- Update next version in version.txt after 0.16.9 release. by @loadams in #7306
- Update COMMITTERS.md by @PKUWZP in #7305
- Fix AutoTP gathering replaced layer params when bias is not None by @HollowMan6 in #7257
- Fix the GPU memory usage of ZeRO-Offload (only update stage_1_and_2.py) by @arminzhu in #7309
- Fix: Update grad norm calculation for CPU offload by @therealnaveenkamal in #7302
- CI: prefer bf16 over fp16 by @stas00 in #7304
tests/conftest.py
: automatically add local deepspeed repo when running tests by @stas00 in #7317- Update gaudi2 nightly,ci to latest 1.21.0 build by @raza-sikander in #7313
- anchor transformers version by @stas00 in #7316
- fix asymmetric in dequantize by @pencil-hub in #7283
- Ulysses SP for HF Integration by @stas00 in #7268
- Fix ci hang in torch2.7& improve ut by @inkcherry in #7321
- Bump to v0.17.0 by @sfc-gh-mwyatt in #7324
New Contributors
- @PKUWZP made their first contribution in #7305
- @arminzhu made their first contribution in #7309
- @therealnaveenkamal made their first contribution in #7302
- @pencil-hub made their first contribution in #7283
- @sfc-gh-mwyatt made their first contribution in #7324
Full Changelog: v0.16.9...v0.17.0
v0.16.9 Patch Release
What's Changed
- Update patch version after 0.16.8 release by @loadams in #7296
- Avoid graph break by removing another redundant requires grad false by @deepcharm in #7263
- Add qwen3 meta loading for AutoTP by @delock in #7293
- Modernize system executable detection across components by @emmanuel-ferdman in #7290
- Enable ZeRO set/get APIs for NVMe offload by @tjruwase in #7046
- Add qwen3moe meta loading for AutoTP by @ranzhejiang in #7297
- disable license check until the new license situation has been sorted… by @stas00 in #7301
- Fix extra_repr_str when weight is None / in zero-3 by @HollowMan6 in #7254
- [XPU] Support XCCL on deepspeed side by @ys950902 in #7299
New Contributors
- @emmanuel-ferdman made their first contribution in #7290
Full Changelog: v0.16.8...v0.16.9
v0.16.8 Patch Release
What's Changed
- Update version.txt after 0.16.7 release by @loadams in #7232
- Recommend using latest by @tohtana in #7233
- [NFC] Fix comment related to SP group by @c8ef in #7234
- Add cpu accelerator fp16 dtype support by @Yejing-Lai in #7207
- Update CPU torch version to 2.7 by @loadams in #7241
- Update README.md by @jizhang02 in #7246
- Fix compile error for nv_bloat162 by @loscrossos in #7248
- add
Makefile
to ease maintenance by @stas00 in #7267 - Fix fp8 gemm by @RezaYazdaniAminabadi in #7265
- [XPU] update xpu-max1100 CI workflow to torch 2.7 by @Liangliang-Ma in #7284
- Fix issues XPU tests hit with extra-index-url by @loadams in #7291
- Temporarily skip AIO tests due to an issue with runners by @loadams in #7288
- rollback #6726 by @delock in #7258
New Contributors
- @jizhang02 made their first contribution in #7246
- @loscrossos made their first contribution in #7248
Full Changelog: v0.16.7...v0.16.8
v0.16.7 Patch Release
What's Changed
- Update version.txt after 0.16.6 release by @loadams in #7218
- Fix release links by @tjruwase in #7219
- Fix pass for z3 and profiler by @tohtana in #7222
- Fix build on AMD GPUs (related to DeepCompile) by @HollowMan6 in #7224
- Add defence for DeepCompile w/o optimizer by @HollowMan6 in #7225
- Pass
with_cuda
arg for jit_load in OpBuilder by @HollowMan6 in #7226 - Make sure it's not None before offloading contiguous_grad_buffer by @HollowMan6 in #7227
Full Changelog: v0.16.6...v0.16.7
v0.16.6 Patch Release
What's Changed
- Update version.txt after 0.16.5 release by @loadams in #7180
- Cross layer overlapping for domino by @hwchen2017 in #7178
- async tp allreduce by @inkcherry in #7115
- Fix issue #5242 grad_norm and loss is nan by @Glaceon-Hyy in #7171
- Add qwen3 autotp support by @Yejing-Lai in #7187
- Update to new torch grad hook API: BF16Optimizer and Stage2 by @deepcharm in #7189
- Reland perf fix for nan inf check by @nelyahu in #7184
- Update to fix pydantic warning by @loadams in #7193
- update dependencies version info by @inkcherry in #7206
- HPU accelerator memory mapping is broken because of torch fill uninit memory by @oelayan7 in #7209
- Support complicated use cases with TiedLayerSpec by @limjcst in #7208
- Add defence for offload_states and reload_states w/o optimizer by @HollowMan6 in #7211
- DeepCompile for enhanced compiler integration by @tohtana in #7154
New Contributors
- @Glaceon-Hyy made their first contribution in #7171
- @limjcst made their first contribution in #7208
Full Changelog: v0.16.5...v0.16.6
v0.16.5 Patch Release
What's Changed
- Update version.txt after 0.16.4 release by @loadams in #7063
- fix an outdated doc wrt CUDA_VISIBLE_DEVICES by @stas00 in #7058
- Tecorigin sdaa accelerator by @siqi654321 in #6903
- Handle special case of libuv for Windows by @loadams in #7064
- Bug Fix for offload_states API by @U-rara in #7050
- Update README with info on newest accelerator by @loadams in #7065
- Fix TOCTOU issues, switch to fstat by @loadams in #7067
- config torch to avoid graph breaks caused by logger by @ShellyNR in #6999
- Fix meta load tensor imcompatible issue by @Yejing-Lai in #7073
- Replace calls to
python setup.py sdist
withpython -m build --sdist
by @loadams in #7069 - Revert "Handle special case of libuv for Windows (#7064)" by @loadams in #7076
- Add DeepseekV3 AutoTP. by @Yejing-Lai in #7045
- Improve inference tutorial docs by @loadams in #7083
- Pin transformers version on tests that use latest. by @loadams in #7085
- Update README.md with ICS '23 MoE paper link by @siddharth9820 in #7087
- Update parallelism for nv-torch-latest/nightly tests due to more GPUs/runner by @loadams in #7086
- Remove workflows for very old torch versions by @loadams in #7090
- Use new dlpack api; Formatting fixes by @tjruwase in #7101
- Avoid graph breaks by disabling sourceless calls in instrument_w_nvtx by @deepcharm in #7081
- Avoid graph breaks in torch.compile caused by inner classes in the backward hooks by @deepcharm in #7062
- Only run pre-commit on the changes by @hwchen2017 in #7106
- Avoid graph break due to unsupported frozenset by @deepcharm in #7105
- Fix fused_qkv print model ValueError by @Yejing-Lai in #7109
- Update references to new X/Twitter handle by @loadams in #7110
- Update gaudi2 nightly,ci to latest 1.20.0 build by @raza-sikander in #7093
- fix keep_module_on_host by @inkcherry in #7112
- Add sequential pytest mark to TestNVMeCheckpointing to resolve pytest forked hangs by @loadams in #7131
- Training multiple models by @tjruwase in #7018
- Update CONTRIBUTING.md to reflect changes from CLA to DCO by @loadams in #7135
- Avoid missing attr error by @tjruwase in #7133
- Add conditional expression by @A-transformer in #7119
- Unpin transformers version for most workflows by @loadams in #7139
- Conditionally quote env vars by @saurabhkoshatwar in #7071
- Correct the BACKWARD_PREFETCH_SUBMIT mismatch by @A-transformer in #7120
- Enhance Gaudi2 CI/Nightly Coverage with Model Parallelism and Linear Tests by @raza-sikander in #7146
- Update container version that runs on A6000 tests. by @loadams in #7153
- hf tp+zero training doc. by @inkcherry in #7151
- Avoid graph break by removing redundant requires_grad attr change by @deepcharm in #7158
- Add destroy to tests to free memory by @tohtana in #7160
- [NFC] Typo fix in SP layer. by @c8ef in #7152
- Link AutoTP blog in the front page by @hwchen2017 in #7167
- fix
seq_parallel_communication_data_type
constant. by @stas00 in #7175 - Fix typos in GDS blog by @loadams in #7177
- Variable batch size and LR scheduler by @bm-synth in #7104
New Contributors
- @siqi654321 made their first contribution in #6903
- @A-transformer made their first contribution in #7119
- @saurabhkoshatwar made their first contribution in #7071
- @c8ef made their first contribution in #7152
Full Changelog: v0.16.4...v0.16.5
v0.16.4 Patch Release
What's Changed
- Update version.txt after 0.16.3 release by @loadams in #6965
- Precisely track nvme optimizer offload by @tjruwase in #6963
- Update build_win.bat script to exclue GDS op as it lacks Windows support. by @loadams in #6971
- Add CUDA 12.8 support and comment on CUDA 12.7 by @loadams in #6975
- Update cpu torch latest to use torch 2.6 by @loadams in #6977
- generalize deepspeed linear and implement it for non cuda systems by @oelayan7 in #6932
- Update recommended Windows whl building versions by @loadams in #6983
- Title: Fix setup_env_ranks to Properly Set Environment Variables Instead of Raising Error by @fabiosanger in #6979
- Specify torchvision in nv-ds-chat workflow (prevents errors with torch 2.6) by @loadams in #6982
- Remove assumption that padding only occurs on last rank by @xylian86 in #6974
- Use ds-specific module id to avoid conflicts by @tjruwase in #6847
- Update A6000 workflows to use newer docker container - 24.09 vs 24.03 by @loadams in #6967
- Allow NVIDIA Blackwell by @fabiendupont in #6991
- Update GH org references by @tjruwase in #6998
- [XPU] max1100 workflow update for docker and softwares by @Liangliang-Ma in #7003
- autotp training(fix dco) by @inkcherry in #7004
- import triton files when triton is supported and installed by @oelayan7 in #6989
- Update A6000 tests transformers version by @loadams in #7016
- Fix ds-chat CI regression by @tjruwase in #7015
- [Ulysses tutorial] typos by @stas00 in #7024
- fix hostname -I for macOS #6497 by @fitzjalen in #6990
- Update workflows to cuda 12.4 by @loadams in #7000
- [ROCm] Enable fp_quantizer on ROCm by @rraminen in #7027
- add gds chinese blog by @GuanhuaWang in #7034
- Add chinese blog for deepspeed windows, and fix format by @hwchen2017 in #7035
- AIO on ROCM by @jomayeri in #7023
- Control trace cache warnings by @tjruwase in #7039
- Update CUDA compute capability to support Blackwell by @hwchen2017 in #7047
- Update setup.py handling of ROCm cupy by @loadams in #7051
- nv-ds-chat breaks with latest transformers by @loadams in #7052
- Rename aio_thread_count to intra_op_parallelism by @tjruwase in #7056
- add autoTP training zero2 tests by @inkcherry in #7049
- Fix, bf16 optimizer remove dup loop by @wukong1992 in #7054
New Contributors
- @fabiosanger made their first contribution in #6979
- @fabiendupont made their first contribution in #6991
- @fitzjalen made their first contribution in #6990
- @wukong1992 made their first contribution in #7054
Full Changelog: v0.16.3...v0.16.4
v0.16.3 Patch Release
What's Changed
- Update version.txt after 0.16.2 release by @loadams in #6893
- Allow to compile collective for PT>2.3 by @NirSonnenschein in #6899
- Zero2: avoid graph breaks in torch.compile by using param_idx by @nelyahu in #6803
- hpu_accelerator: use torch.use_deterministic_algorithms by @nelyahu in #6897
- Fix error caused by all_reduce call in domino by @hwchen2017 in #6880
- Update Gaudi2 jobs to latest 1.19 build by @raza-sikander in #6905
- Change compile for pipeline module torch.compile by @NirSonnenschein in #6478
- Stage3: Use new torch grad accumulation hooks API by @deepcharm in #6773
- Cleanup ops/transformer/inference tests by @loadams in #6830
- Fix
checkpointable_layers
Logic by @Quentin-Anthony in #6881 - [BUG FIX]:fix get torch.version.cuda error when cuda is None in rocm by @hj-wei in #6909
- Add fp8_gemm fallback for non-triton systems by @oelayan7 in #6916
- Reduce the device bubble introduced by heavy loop synchronization in coalesced fetch/release(z3_leaf_module) by @inkcherry in #6694
- Cleanup ops/transformer/inference tests by @loadams in #6925
- Check transformers version in BLOOM for inference v1 by @lekurile in #6766
- inference: remove unused _validate_args function by @nelyahu in #5505
- Use
torch.log1p
by @kit1980 in #6930 - Update python version classifiers by @loadams in #6933
- Fix building on Windows with presence of Triton by @woct0rdho in #6749
- Fix windows blog examples by @loadams in #6934
- Add deepseek autotp by @Yejing-Lai in #6937
- Add position_ids arg to OPTEmbedding forward function by @lekurile in #6939
- Add information on security expectations with this software by @loadams in #6941
- Support pure meta model lm_head tp by @Yejing-Lai in #6812
- Remove op compilation flags due to perf issue by @NirSonnenschein in #6944
- Pin nv-a6000 workflow by @loadams in #6938
- [inf] Add config var to enable keeping module on host by @oelayan7 in #6846
warn
towarning
by @qgallouedec in #6952- Add extra_repr to Linear classes for debugging purpose by @Xia-Weiwen in #6954
- Update import for torchvision.transformers by @loadams in #6958
- Remove Duplicate Declaration of pandas in
Dockerfile
by @Zerohertz in #6959 - Add the missing view operations from sequence parallel(async). by @inkcherry in #6750
- Update
torch.norm
totorch.linalg.norm
andtorch.linalg.vector_norm
by @loadams in #6931 - Using explicit GPU upcast for ZeRO-Offload by @xylian86 in #6962
New Contributors
- @hj-wei made their first contribution in #6909
- @kit1980 made their first contribution in #6930
- @woct0rdho made their first contribution in #6749
- @Xia-Weiwen made their first contribution in #6954
- @Zerohertz made their first contribution in #6959
Full Changelog: v0.16.2...v0.16.3
v0.16.2 Patch Release
What's Changed
- Update pre-commit version by @loadams in #6821
- Update version.txt after 0.16.1 release by @loadams in #6826
- Pin HPU tests by @loadams in #6831
- Flops profiler support einops.einsum by @lvhoaa in #6755
- Pin pytest-subtests version for accelerate tests by @loadams in #6842
- Inference UTs check for trition support from accelerator by @raza-sikander in #6782
- Unpin pytest-subtests now that 0.14.1 is released by @loadams in #6844
- Merge LoCo with Zero++ by @XingyuXie in #6730
- Fix type error in
ZeROOrderedDict
by @oraluben in #6794 - Fix uneven head sequence parallelism bug (#6774) by @Eugene29 in #6797
- Fix nv-torch-nightly test by pinning transformers by @loadams in #6849
- Remove broken links to non-active site by @kaiksi-bb in #6854
- Avoid poisoning process with CUDA calls as soon as importing by @HollowMan6 in #6810
- Fix xpu tests workflow failure by changing pip index url by @Liangliang-Ma in #6864
- Domino updates by @GuanhuaWang in #6861
- add domino navigation by @GuanhuaWang in #6866
- Update TSC by @tjruwase in #6867
- Remove warnings from autodoc and sphinx by @loadams in #6788
- Update real_accelerator.py by @keiwoo in #6845
- Fix assertion for offloading states by @tohtana in #6855
- Remove pin from transformers version and fix Processing/Threading issues in tests by @loadams in #6822
- Add MLP/lm_head tp grain size setting. by @Yejing-Lai in #6828
- Fix --enable_each_rank_log when used with PDSH multi-node runner by @akeshet in #6863
- Update transformers ops unit tests to use
requried_torch_version
by @loadams in #6884 - Don't error out when cpu accelerator doesn't have torch (as default for whl building) by @loadams in #6886
- Add arctic model support by adding w2 to all_reduce by @pi314ever in #6856
- Update code owners by @tjruwase in #6890
New Contributors
- @lvhoaa made their first contribution in #6755
- @XingyuXie made their first contribution in #6730
- @Eugene29 made their first contribution in #6797
- @kaiksi-bb made their first contribution in #6854
- @HollowMan6 made their first contribution in #6810
- @keiwoo made their first contribution in #6845
- @akeshet made their first contribution in #6863
- @pi314ever made their first contribution in #6856
Full Changelog: v0.16.1...v0.16.2