Pulse · modelscope/ms-swift · GitHub

June 20, 2025 – June 27, 2025

Overview

28 Active pull requests

175 Active issues

2 Releases published by 1 person

v3.5.2 Patch release v3.5.2
published Jun 20, 2025
v3.5.3 Patch release v3.5.3
published Jun 27, 2025

24 Pull requests merged by 7 people

[grpo]Tool rl: add reward func for ToolRL
#4694 merged Jun 27, 2025
compat transformers==4.52 (vlm)
#4738 merged Jun 26, 2025
[grpo] check liger & sp
#4734 merged Jun 26, 2025
[grpo] fix max_step for dataloader when applying sequence parallel
#4731 merged Jun 26, 2025
[quant] Support fp8
#4729 merged Jun 26, 2025
support Kimi-VL-A3B-Thinking-2506 & Kimi-Dev-72B
#4719 merged Jun 25, 2025
[doc] simplify environment variables & update best practices documentation
#4715 merged Jun 25, 2025
[grpo] fix colocate seed
#4712 merged Jun 25, 2025
[megatron] support rednote-hilab/dots.llm1.inst
#4707 merged Jun 25, 2025
[megatron] support DeepseekV2ForCausalLM and DeepseekV3ForCausalLM
#4659 merged Jun 25, 2025
fix links
#4690 merged Jun 24, 2025
[feat] support fine-tuning of reranker models
#4671 merged Jun 24, 2025
[grpo] fix grpo pt
#4683 merged Jun 24, 2025
[rollout] fix dp args
#4678 merged Jun 23, 2025
[doc] fix doc
#4675 merged Jun 23, 2025
[doc] fix image link
#4674 merged Jun 23, 2025
docs: correct typo "resonse" to "response"
#4672 merged Jun 23, 2025
[channel loss]support packing & padding free
#4666 merged Jun 23, 2025
[docs] update docs
#4665 merged Jun 23, 2025
[dataset] fix grounding_dataset
#4664 merged Jun 23, 2025
[grpo] refactor multi turn & support async engine & refactor grpo docs
#4380 merged Jun 23, 2025
[template] optimize remove_unused_columns
#4661 merged Jun 22, 2025
[gkd] support use_logits_to_keep/padding_free/packing & update gkd shell
#4658 merged Jun 21, 2025
[docs] update gkd
#4657 merged Jun 20, 2025

4 Pull requests opened by 3 people

solve the default 'template_backend' bug in llm.tempalte.base.Templte._encode
#4669 opened Jun 23, 2025
Refactor Web-UI
#4687 opened Jun 24, 2025
[megatron] support fp8
#4730 opened Jun 26, 2025
support Tencent-Hunyuan/Hunyuan-A13B-Instruct
#4745 opened Jun 27, 2025

123 Issues closed by 12 people

Megatron不支持GRPO训练
#4744 closed Jun 27, 2025
DPO的full微调后Qwen3-4B模型不再输出think
#4701 closed Jun 27, 2025
GRPO怎么自定义format reward
#4667 closed Jun 26, 2025
[grpo] loading BERT model in reward
#4580 closed Jun 26, 2025
GRPO训练中Loss和grad_norm一直为0
#4570 closed Jun 26, 2025
GRPO什么时候支持多机megatron训练
#4558 closed Jun 26, 2025
GRPO训练reward的std始终为0
#4512 closed Jun 26, 2025
多机训练使用--vllm_mode server 会卡死无法运行
#4532 closed Jun 26, 2025
GRPO Qwen3 32B training torch issue
#4491 closed Jun 26, 2025
qwen3强化训练，grpo训练结束后，爆通信错误
#4170 closed Jun 26, 2025
The expanded size of the tensor (8) must match the existing size (5) at non-singleton dimension 0.
#4056 closed Jun 26, 2025
训练结束报错/data/chatglm/retrieval_agent_new/ms_swift_train/ms-swift/swift/cli/rlhf.py FAILED
#4302 closed Jun 26, 2025
dapo时在UserWarning: None of the inputs have requires_grad=True. Gradients will be None一直卡住，直至timeout
#4050 closed Jun 26, 2025
用grpo训练qwen2.5-7b-instruct出现!!!!
#4060 closed Jun 26, 2025
训练正常 eval时报assert error
#4081 closed Jun 26, 2025
Batch size in GRPO.
#4341 closed Jun 26, 2025
grpo训练奖励函数注册失败
#4351 closed Jun 26, 2025
GRPO数据传递失败
#4362 closed Jun 26, 2025
Qwen-Omni 全量微调grpo报错ValueError: `max_new_tokens` must be greater than 0, but is -16384
#4392 closed Jun 26, 2025
GRPO微调多模态训练报错
#4470 closed Jun 26, 2025
双卡A6000使用GRPO微调Qwen2.5-VL-3B会OOM吗？
#4477 closed Jun 26, 2025
RTX3090上运行sft-rlhf-grpo微调，报错：torch.distributed.DistBackendError: [3] is setting up NCCL communicator and retrieving ncclUniqueId from [0] via c10d key-value store by key '0', but store->get('0') got error: wait timeout after 1800000ms,
#3612 closed Jun 26, 2025
Any plans to support megatron for GRPO training?
#3760 closed Jun 26, 2025
LLava 跑GRPO 无法跑通
#3928 closed Jun 26, 2025
QWQ：GRPO训练无法跑通，报错”RuntimeError: ACL stream synchronize failed, error code:107020“
#3932 closed Jun 26, 2025
While training GRPO, I noticed that my model crashes. Its loss is 0, its grad_norm and kl are both Nan, and it completes as “!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!””
#3930 closed Jun 26, 2025
GRPO训练中间一部分后报错
#3771 closed Jun 26, 2025
grpo训练卡住，一直显示一下问题。
#3794 closed Jun 26, 2025
GRPO训练报错
#3769 closed Jun 26, 2025
Various traceback error during GRPO training
#3836 closed Jun 26, 2025
贡献一个dockerfile吧，这个测试了多模态的grpo训练可以基本可以复现示例里面的结果
#3812 closed Jun 26, 2025
GRPO 算法如果设置 reward_model 而不是--reward_funcs ，reward模型和 model都加载到一张卡里去了
#3843 closed Jun 26, 2025
Meet GPU OutOfMemory in GRPO training
#3848 closed Jun 26, 2025
grpo训练32b模型OOM
#3871 closed Jun 26, 2025
GRPO 训练100 steps后性能骤降，请问是什么原因
#3876 closed Jun 26, 2025
Bug! Checkpoint resume failure - deepspeed different DP size. Is there a quick checkpoint converter anywere?
#3989 closed Jun 26, 2025
Bug! Help! MS-SWIFT GRPO + LoRA training hung/stuck after training 1 step from full merged model merged from lora adapter
#3990 closed Jun 26, 2025
if sleep_level > 0, gradient_accumulation_steps will be forced to 1
#3943 closed Jun 26, 2025
The GRPO training process hangs for multi-node training.
#3934 closed Jun 26, 2025
NPU环境训练速度问题
#3331 closed Jun 26, 2025
求一个能8卡A100使用GRPO跑通Qwen2.5 72B模型的脚本
#3416 closed Jun 26, 2025
GRPO 训练时使用2个节点并且设置--num_infer_workers 2 时会报错
#3393 closed Jun 26, 2025
基于qwenvl-7b-instruct训练grpo，eval过程会oom
#3541 closed Jun 26, 2025
4*v100环境执行lora_vllm脚本报错：Assertion `!(srcMmaLayout && dstMmaLayout && !srcMmaLayout.isAmpere()) && "mma -> mma layout conversion is only supported on Ampere"' failed.
#3549 closed Jun 26, 2025
单机多卡跑grpo，多个step后会报错
#3576 closed Jun 26, 2025
Loss goes to 0, Gibberish Outputs
#3582 closed Jun 26, 2025
日志怎么添加训练数据中的字段
#3591 closed Jun 26, 2025
多机多卡GRPO assert self.cpu_group is not None
#3583 closed Jun 26, 2025
设置NPROC_PER_NODE后会直接报错 failed (exitcode: -11) local_rank: 1
#3611 closed Jun 26, 2025
GRPO算法训练，后期训练时，显存暴增
#3600 closed Jun 26, 2025
grpo 固定seed，结果依旧不可复现
#3607 closed Jun 26, 2025
gemma3使用grpo用vllm的bug
#3660 closed Jun 26, 2025
【bug】Failed to open local file in cache
#3667 closed Jun 26, 2025
[Bug]: RuntimeError: setup failed!
#3662 closed Jun 26, 2025
使用GRPO训练llava-1.5以及qwen2-vl时，使用vllm推理，在eval时报错
#3666 closed Jun 26, 2025
有没有4*V100能跑起来GRPO的训练脚本和环境配置呀？
#3671 closed Jun 26, 2025
ValueError: RLHF do not support sequence parallel
#3673 closed Jun 26, 2025
Hanging after tqdm starts [COLOCATE MODE]
#3702 closed Jun 26, 2025
GRPO max_grad_norm seems don't work
#3713 closed Jun 26, 2025
It is recommended to use a dedicated device for vLLM
#3719 closed Jun 26, 2025
npu环境GRPO训练，使用vllm时，官方脚本无法正常启动，其他脚本则可以
#3726 closed Jun 26, 2025
GRPO 训练，数据格式解析有bug
#3728 closed Jun 26, 2025
TypeError: embedding(): argument 'indices' (position 2) must be Tensor, not NoneType
#3730 closed Jun 26, 2025
Support Ulysses in Swift
#3731 closed Jun 26, 2025
GRPO tutorial bug: world_size (8) is not equal to tensor_model_parallel_size (4) x pipeline_model_parallel_size (1)
#3739 closed Jun 26, 2025
多模态qwen2.5-vl-3B,grpo实验报错
#3398 closed Jun 26, 2025
grpo微调deepseek v2，训练过程中到eval阶段，就会卡住，然后就会停止训练
#3528 closed Jun 26, 2025
请问如何在grpo中配置自定义的数据集路径，并进行数据格式转换？
#3525 closed Jun 26, 2025
2workers_async_iterations2_vllm help
#3522 closed Jun 26, 2025
Bug in GRPO best practices document!
#3501 closed Jun 26, 2025
unhashable type: 'list'
#3490 closed Jun 26, 2025
请求支持GRPO训练中，vllm推理后端支持多张卡🙏 Request for support for using multiple cards in the vLLM inference backend during GRPO training
#3477 closed Jun 26, 2025
使用GRPO进行Qwen2.5-vl-7B-Instruct训练，报错：无法多卡训练，只能加载1张卡并oom
#3404 closed Jun 26, 2025
GRPO训练功能建议
#3415 closed Jun 26, 2025
GRPO 训练loss和reward异常
#3372 closed Jun 26, 2025
grpo 多机多卡训练timeout
#3343 closed Jun 26, 2025
GRPO训练LLAVA CUDA Error
#3264 closed Jun 26, 2025
GRPO LLava 训练报错，无法多卡训练，1卡可以
#3228 closed Jun 26, 2025
GRPO 4卡A100训练BUG
#3223 closed Jun 26, 2025
如何对deepseek r1做sft和grpo微调
#3211 closed Jun 26, 2025
使用GRPO 使用我已经训练的LLava模型加载问题
#3195 closed Jun 26, 2025
GRPO deepspeed lmdeploy训练InternVL2d5 报错
#3151 closed Jun 26, 2025
Using Unsloth in conjunction with GRPO to train a model for OOM
#3183 closed Jun 26, 2025
grpo训练如何设置vllm_device使用多张卡
#3098 closed Jun 26, 2025
Does ms-swift support tensor(model)-parallel GRPO training?
#3068 closed Jun 26, 2025
ValueError: Image features and image tokens do not match: tokens: 5589, features 5805
#2460 closed Jun 26, 2025
grad_norm nan
#2280 closed Jun 26, 2025
期望RLHF能支持序列并行（sequence_parallel）
#1958 closed Jun 26, 2025
GRPO训练的old_per_token_logps计算是不是有bug
#4727 closed Jun 26, 2025
rerank 数据加载错误
#4728 closed Jun 26, 2025
Issue with Multi-GPU Training
#4718 closed Jun 26, 2025
Qwen3 Full Sft设置predict_with_generate=true报错keyerror"messages"，为false时可以正常训练结束
#4695 closed Jun 26, 2025
支持 moonshotai/Kimi-VL-A3B-Thinking-2506
#4708 closed Jun 25, 2025
grpo训练qwen2.5-vl报错
#4364 closed Jun 25, 2025
全量微调grpo 相同数量的样本ms-swift效果比unsloth效果差很多
#4393 closed Jun 25, 2025
GRPO OOM USE resume_from_checkpoint
#4406 closed Jun 25, 2025
GRPO训练报错：AssertionError: Forward context is not set. Please use `set_forward_context` to set the forward context.
#4418 closed Jun 25, 2025
支持的DeepSeek-R1训练是指671B的模型吗还是蒸馏的模型？
#3132 closed Jun 25, 2025
seq_cls训练时候开启flash_attn指标大幅度低于不开flash_attn
#4384 closed Jun 25, 2025
多回归任务，推理问题
#4705 closed Jun 25, 2025
请问使用zero2/zero3导致max_steps相差八倍的原因是什么？
#4616 closed Jun 23, 2025
请求增加对Qwen3-8B的自我认知训练的NoteBook文件
#4034 closed Jun 23, 2025
raise KeyError(f"Column {key} not in the dataset. Current columns in the dataset: {columns}") [rank1]: KeyError: 'Column length not in the dataset. Current columns in the dataset: []'
#4058 closed Jun 23, 2025
InternVL3-9B LoRA微调数据集预处理速度缓慢问题（大约7h）
#4076 closed Jun 23, 2025
单坐标点定位物体位置
#4292 closed Jun 23, 2025
data_load
#4288 closed Jun 23, 2025
Seq CLS Infer 问题咨询
#4325 closed Jun 23, 2025
UI-TARS冻结参数推理无法均匀分配显存导致超出显存
#4359 closed Jun 23, 2025
微调Qwen3在默认脚本上加上zero2/3会OOM
#4371 closed Jun 23, 2025
VLLM Engine Batch 推理咨询
#4386 closed Jun 23, 2025
swift infer这个些命令如何转为python命令运行的，内部原理
#4555 closed Jun 23, 2025
Failing to preprocess hf dataset
#4564 closed Jun 23, 2025
How-to use on Apple Mac?
#4572 closed Jun 23, 2025
Multimodal finetune llava1.6-mistral bug: RuntimeError: Tensors must have same number of dimensions
#4578 closed Jun 23, 2025
关于ms-swift 3.x的template和2.x的不同
#4602 closed Jun 23, 2025
ovis2 微调失败，loss计算时报ValueError: Expected input batch_size (1384) to match target batch_size (16384)
#4611 closed Jun 23, 2025
关于pip install -e '.[all]' 的安装、evalscope的安装的咨询
#4605 closed Jun 23, 2025
loss_scale hermes not work
#4607 closed Jun 23, 2025
华为910B lora qwen2.5vl报错：AssertionError: Torch not compiled with CUDA enabled
#4619 closed Jun 23, 2025
满血版R1/Qwen3-235B-30A HF参数转megatron OOM
#4648 closed Jun 23, 2025
10分钟改变大模型自我认知教程报错'Qwen2_5VLTemplate' object has no attribute 'model'
#4662 closed Jun 23, 2025
DPO训练到 100 步时，遇到 StopIteration ERROR during training 问题
#4644 closed Jun 23, 2025
多卡多进程使用orpo卡死，触发watchdog caught collective operation timeout.
#3564 closed Jun 20, 2025

52 Issues opened by 43 people

基于本地加载数据集进行多卡并行训练，停在Init COMPLETE... 无法进入train阶段
#4743 opened Jun 27, 2025
输入多图的编号问题
#4742 opened Jun 27, 2025
ms swift如何加入early stop
#4741 opened Jun 27, 2025
[WARNING:swift] Please install the package: pip install "decord" -U
#4740 opened Jun 27, 2025
Qwen2.5-omni GRPO训练出现内存OOM
#4739 opened Jun 27, 2025
微调DeepSeek模型报错：AssertionError: noaux_tc not supported for training
#4737 opened Jun 26, 2025
Does the packing feature block attention score between different samples?
#4736 opened Jun 26, 2025
a question for rl
#4735 opened Jun 26, 2025
Please open Security Advisories for vulnerability reporting
#4733 opened Jun 26, 2025
在学习全部轮次的SFT训练中，中间轮次结束符号不能被学习，导致训练后的模型无法停止
#4732 opened Jun 26, 2025
swift推理精度差异
#4726 opened Jun 26, 2025
使用lora 训练qwen2.5vl3b之后，lora未合并，使用deploy部署，使用pt, 跟vllm 结果不一致
#4725 opened Jun 26, 2025
GKD代码加载模型卡死
#4724 opened Jun 26, 2025
Swift代码库进行lora checkpoint的continue sft，加载模型和checkpoint后可训练参数为0%
#4723 opened Jun 26, 2025
qwen2.5vl lora sft关于freeze_vit
#4722 opened Jun 26, 2025
[rank4]: AssertionError: Expected multimodal embeddings to be a list/tuple of 2D tensors, or a single 3D tensor, but got <class 'NoneType'> instead.
#4721 opened Jun 25, 2025
qwen3 embedding 微调在评估阶段报错：'NoneType' object has no attribute 'get'
#4720 opened Jun 25, 2025
添加python示例代码
#4717 opened Jun 25, 2025
如何传入自定义的causal_attention_mask
#4716 opened Jun 25, 2025
local_repo_path参数，在python脚本里如何添加
#4714 opened Jun 25, 2025
hf格式模型文件转megatron报错: CUDA error: operation not supported
#4713 opened Jun 25, 2025
lora 微调 Ovis2-34B loss=0.0
#4711 opened Jun 25, 2025
Deepspeed zero3 多 GPU 训练没法设置 batch_size 为1
#4710 opened Jun 25, 2025
[Bug]: [WARNING:swift] Please install the package: pip install "decord" -U
#4709 opened Jun 25, 2025
多回归任务输出问题
#4706 opened Jun 25, 2025
序列分类任务，能否多卡训练？
#4704 opened Jun 25, 2025
Swift rollout卡住
#4703 opened Jun 25, 2025
NuminaMath-TIR数据集上训GRPO不work
#4702 opened Jun 25, 2025
使用msswift框架，基于QwQ-32B模型，微调自制的function-call数据集，效果很差，不知道原因
#4700 opened Jun 25, 2025
自定义数据集包含了'messages'、'rejected_response'、'label'、'images'、'videos'、'audios'、'tools'和'objects'之外的key，该如何写template？
#4699 opened Jun 24, 2025
lora微调qwen3 embedding模型弹出警告find_unused_parameters
#4698 opened Jun 24, 2025
Qwen2-VL merge lora报错
#4697 opened Jun 24, 2025
求问 Qwen 235B A22 训练成本和 Qwen 32B dense 对比
#4696 opened Jun 24, 2025
用CLI推理时，有办法能在推理结果中保存输入的dataset中的额外参数嘛？
#4693 opened Jun 24, 2025
VLLM Engine 咨询
#4692 opened Jun 24, 2025
多机加载大数据集时，会多台机子先后串行加载
#4691 opened Jun 24, 2025
UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
#4689 opened Jun 24, 2025
我想要给PPO设置两个reward model和两个value model，通过两者的value和reward加权计算loss损失，应该怎么做？
#4688 opened Jun 24, 2025
请问是否支持QWenVL等多模态模型的增量预训练？
#4686 opened Jun 24, 2025
改变 IMAGE_FACTOR 是不是意味着视觉部分需要重新训练？
#4685 opened Jun 24, 2025
如何关闭自动模型并行呢？
#4684 opened Jun 24, 2025
Help: Multi Turn SFT
#4681 opened Jun 24, 2025
mllm模型训练，一个epoch训练完任务卡住，gpu利用率100%，无法save checkpoint
#4680 opened Jun 23, 2025
GRPO训练失败，模型似乎学习困难
#4679 opened Jun 23, 2025
奖励函数一直震荡不上升，似乎学不到东西
#4677 opened Jun 23, 2025
SFT训练一个回归任务后，推理使用vllm加速，模型load会报错，有办法解决吗
#4676 opened Jun 23, 2025
持续输出Trainer.tokenizer is now deprecated. You should use Trainer.processing_class instead.，但是不报错，请问是什么原因？
#4673 opened Jun 23, 2025
关于 rlhf 数据的preprocess
#4670 opened Jun 23, 2025
微调 MiniCPM-o-2_6 报错 assert media_type in {'image', 'video'}
#4668 opened Jun 23, 2025
Applying sequence parallelism causes the training to finish early, even though it hasn't reached the specified max_steps
#4663 opened Jun 23, 2025
Any way to run evaluation before training starts?
#4660 opened Jun 22, 2025
'weight' must be 2-D
#4656 opened Jun 20, 2025

23 Unresolved conversations

Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.

pretrain报错进度异常问题
#2692 commented on Jun 20, 2025 • 0 new comments
更新以后我应该如何获得history呢
#4645 commented on Jun 23, 2025 • 0 new comments
Qwen2.5-vl预训练过程中loss突然激增
#4634 commented on Jun 23, 2025 • 0 new comments
Fatal Python error: none_dealloc: deallocating None
#4353 commented on Jun 23, 2025 • 0 new comments
lora微调后merge完模型进行lmdeploy推理用时比Qwen2.5-VL-7B-Instruct多一倍，原因为何？
#4609 commented on Jun 23, 2025 • 0 new comments
能否支持MiniCPM-o 2.6 audio模态训练
#2961 commented on Jun 23, 2025 • 0 new comments
采用swift infer 测试qwen2.5-omni模型结果，与官方测试方法结果不一致
#4595 commented on Jun 23, 2025 • 0 new comments
qwen3-32B全参数ppo训练一步报错
#4599 commented on Jun 23, 2025 • 0 new comments
训练Omni的时候会卡住不动
#4651 commented on Jun 23, 2025 • 0 new comments
qwen2.5-7B GRPO训练时卡住，未显示任何报错
#4603 commented on Jun 23, 2025 • 0 new comments
为什么没有loss
#4652 commented on Jun 24, 2025 • 0 new comments
可以在moe的模型训练中增加专家并行的参数吗
#1631 commented on Jun 24, 2025 • 0 new comments
Any example on training llama on function calling dataset?
#4604 commented on Jun 24, 2025 • 0 new comments
lora 微调 ovis2-34B loss=0.0 grad_norm=nan
#3494 commented on Jun 25, 2025 • 0 new comments
Error occurred when saving checkpoints during Qwen3 multi-GPU SFT
#4411 commented on Jun 25, 2025 • 0 new comments
GRPO的时候怎么保存最后一步的checkpoints
#4574 commented on Jun 25, 2025 • 0 new comments
🍭[Roadmap] ms-swift3.6
#4561 commented on Jun 25, 2025 • 0 new comments
训练保存checkpoint的时候报错，但本地又有相应的文件。
#3420 commented on Jun 25, 2025 • 0 new comments
🚀 Best Practices for Training Qwen3/Qwen3-MoE
#4030 commented on Jun 25, 2025 • 0 new comments
支持GME微调么
#3019 commented on Jun 25, 2025 • 0 new comments
想问下embedding的训练如何加入system or instructions？
#4638 commented on Jun 26, 2025 • 0 new comments
swift infer 设置了temperature，top_p 但是每次生成都是同样的结果
#4627 commented on Jun 26, 2025 • 0 new comments
训练后的RM模型，支持推理引擎sglang/vllm部署
#3610 commented on Jun 26, 2025 • 0 new comments