arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

Preview meta tags from the arvados.org website.

Linked Hostnames

7

Thumbnail

Search Engine Appearance

Google

https://arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

A Look At Deduplication In Keep

What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob



Bing

A Look At Deduplication In Keep

https://arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob



DuckDuckGo

https://arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

A Look At Deduplication In Keep

What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob

  • General Meta Tags

    12
    • title
      A Look At Deduplication In Keep | Arvados
    • charset
      utf-8
    • X-UA-Compatible
      IE=edge
    • viewport
      width=device-width, initial-scale=1
    • generator
      Jekyll v4.2.2
  • Open Graph Meta Tags

    7
    • og:title
      A Look At Deduplication In Keep
    • US country flagog:locale
      en_US
    • og:description
      What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob
    • og:site_name
      Arvados
    • og:type
      article
  • Twitter Meta Tags

    1
    • twitter:card
      summary
  • Link Tags

    14
    • alternate
      /feed.xml
    • apple-touch-icon
      /favicon/apple-touch-icon.png
    • icon
      /images/favicon-32x32.svg
    • icon
      /favicon/favicon-32x32.png
    • icon
      /favicon/favicon-16x16.png

Emails

1

Links

35