The Wayback Machine - https://web.archive.org/web/20220412125930/https://github.com/topics/model-parallelism
Skip to content
#

model-parallelism

Here are 19 public repositories matching this topic...

rsn870
rsn870 commented Aug 21, 2020

Hi ,

I have tried out both loss.backward() and model_engine.backward(loss) for my code. There are several subtle differences that I have observed , for one retain_graph = True does not work for model_engine.backward(loss) . This is creating a problem since buffers are not being retained every time I run the code for some reason.

Please look into this if you could.

enhancement good first issue
ColossalAI
SMesForoush
SMesForoush commented Mar 12, 2022

Dear Colossal-AI team,
There are a few features in my mind that I thought would be helpful to the project, and I wanted to ask if there is any of them which might be more useful so I could start implementing them.
Loki-Promtail is a tool for monitoring distributed logs with Grafana. Connecting the Distributed Logger to it and extracting labels from the log structure would be a user-friendly sys

good first issue

Paddle Distributed Training Examples. 飞桨分布式训练示例 Resnet Bert GPT MOE DataParallel ModelParallel PipelineParallel HybridParallel AutoParallel Zero Sharding Recompute GradientMerge Offload AMP DGC LocalSGD Wide&Deep

  • Updated Mar 14, 2022
  • Shell

Improve this page

Add a description, image, and links to the model-parallelism topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the model-parallelism topic, visit your repo's landing page and select "manage topics."

Learn more