
dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
Preview meta tags from the dev-discuss.pytorch.org website.
Linked Hostnames
11- 21 links todev-discuss.pytorch.org
- 13 links togithub.com
- 5 links tocanada1.discourse-cdn.com
- 3 links todocs.nvidia.com
- 2 links toarxiv.org
- 2 links toen.wikipedia.org
- 2 links topytorch.org
- 1 link todeveloper.nvidia.com
Thumbnail

Search Engine Appearance
https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…
Bing
Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…
DuckDuckGo

Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…
General Meta Tags
8- titleRethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles - distributed - PyTorch Developer Mailing List
- charsetutf-8
- descriptionGiven some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…
- generatorDiscourse 3.5.0.beta9-dev - https://github.com/discourse/discourse version 7121cfd4ab44fd7971ce4c27a3a5841f1e81b7be
- theme-color#111111
Open Graph Meta Tags
9- og:site_namePyTorch Developer Mailing List
- og:typewebsite
- og:imagehttps://canada1.discourse-cdn.com/flex036/uploads/pytorch1/original/1X/7155d58f5b0c98d4a13490dc47d0aea098d991b7.jpeg
- og:urlhttps://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
- og:titleRethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
Twitter Meta Tags
9- twitter:cardsummary
- twitter:imagehttps://canada1.discourse-cdn.com/flex036/uploads/pytorch1/original/1X/7155d58f5b0c98d4a13490dc47d0aea098d991b7.jpeg
- twitter:urlhttps://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
- twitter:titleRethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
- twitter:descriptionGiven some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator interaction). I can share more details if there is further interest. TL;DR We rethought the PyTorch FSDP design from first principles to uncover a new one that takes a first step toward improving composability and flexibility. This includes an experimental fully_shard API that is p...
Item Prop Meta Tags
73- position1
- headlineRethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
- datePublished2023-01-31T22:38:19Z
- articleSectiondistributed
- keywords
Link Tags
27- alternate nofollowhttps://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019.rss
- apple-touch-iconhttps://canada1.discourse-cdn.com/flex036/uploads/pytorch1/optimized/1X/6cd7da56682d360e2c6006ff3e31eb250c5a8675_2_180x180.png
- canonicalhttps://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
- iconhttps://canada1.discourse-cdn.com/flex036/uploads/pytorch1/optimized/1X/6cd7da56682d360e2c6006ff3e31eb250c5a8675_2_32x32.png
- searchhttps://dev-discuss.pytorch.org/opensearch.xml
Links
52- https://arxiv.org/abs/2203.11014
- https://arxiv.org/pdf/2002.09018.pdf
- https://canada1.discourse-cdn.com/flex036/uploads/pytorch1/original/1X/444664d0029efb8dc638a3cb64c3c47dac5b7eb5.jpeg
- https://canada1.discourse-cdn.com/flex036/uploads/pytorch1/original/1X/48f61e699ce0abff22f72bde3fe700ebfbbb637a.jpeg
- https://canada1.discourse-cdn.com/flex036/uploads/pytorch1/original/1X/7155d58f5b0c98d4a13490dc47d0aea098d991b7.jpeg