dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019

Preview meta tags from the dev-discuss.pytorch.org website.

Linked Hostnames

11

Thumbnail

Search Engine Appearance

Google

https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019

Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles

Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…



Bing

Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles

https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019

Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…



DuckDuckGo

https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019

Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles

Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…

  • General Meta Tags

    8
    • title
      Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles - distributed - PyTorch Developer Mailing List
    • charset
      utf-8
    • description
      Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator inter…
    • generator
      Discourse 3.5.0.beta9-dev - https://github.com/discourse/discourse version 7121cfd4ab44fd7971ce4c27a3a5841f1e81b7be
    • theme-color
      #111111
  • Open Graph Meta Tags

    9
    • og:site_name
      PyTorch Developer Mailing List
    • og:type
      website
    • og:image
      https://canada1.discourse-cdn.com/flex036/uploads/pytorch1/original/1X/7155d58f5b0c98d4a13490dc47d0aea098d991b7.jpeg
    • og:url
      https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
    • og:title
      Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
  • Twitter Meta Tags

    9
    • twitter:card
      summary
    • twitter:image
      https://canada1.discourse-cdn.com/flex036/uploads/pytorch1/original/1X/7155d58f5b0c98d4a13490dc47d0aea098d991b7.jpeg
    • twitter:url
      https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
    • twitter:title
      Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
    • twitter:description
      Given some interest, I am sharing a note (first written internally) on the PyTorch Fully Sharded Data Parallel (FSDP) design. This covers much but not all of it (e.g. it excludes autograd and CUDA caching allocator interaction). I can share more details if there is further interest. TL;DR We rethought the PyTorch FSDP design from first principles to uncover a new one that takes a first step toward improving composability and flexibility. This includes an experimental fully_shard API that is p...
  • Item Prop Meta Tags

    73
    • position
      1
    • headline
      Rethinking PyTorch Fully Sharded Data Parallel (FSDP) from First Principles
    • datePublished
      2023-01-31T22:38:19Z
    • articleSection
      distributed
    • keywords
  • Link Tags

    27
    • alternate nofollow
      https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019.rss
    • apple-touch-icon
      https://canada1.discourse-cdn.com/flex036/uploads/pytorch1/optimized/1X/6cd7da56682d360e2c6006ff3e31eb250c5a8675_2_180x180.png
    • canonical
      https://dev-discuss.pytorch.org/t/rethinking-pytorch-fully-sharded-data-parallel-fsdp-from-first-principles/1019
    • icon
      https://canada1.discourse-cdn.com/flex036/uploads/pytorch1/optimized/1X/6cd7da56682d360e2c6006ff3e31eb250c5a8675_2_32x32.png
    • search
      https://dev-discuss.pytorch.org/opensearch.xml

Links

52