arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

Preview meta tags from the arvados.org website.

Linked Hostnames

Thumbnail

Search Engine Appearance

Google

https://arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

A Look At Deduplication In Keep

What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob

Bing

A Look At Deduplication In Keep

https://arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

DuckDuckGo

https://arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

A Look At Deduplication In Keep

General Meta Tags
12
- title
  A Look At Deduplication In Keep | Arvados
- charset
  utf-8
- X-UA-Compatible
  IE=edge
- viewport
  width=device-width, initial-scale=1
- generator
  Jekyll v4.2.2
Open Graph Meta Tags
7
- og:title
  A Look At Deduplication In Keep
- og:locale
  en_US
- og:description
  What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob
- og:site_name
  Arvados
- og:type
  article
Twitter Meta Tags
1
- twitter:card
  summary
Link Tags
14
- alternate
  /feed.xml
- apple-touch-icon
  /favicon/apple-touch-icon.png
- icon
  /images/favicon-32x32.svg
- icon
  /favicon/favicon-32x32.png
- icon
  /favicon/favicon-16x16.png

Emails

[email protected]

arvados.org/2021/05/11/a_look_at_deduplication_in_Keep

Linked Hostnames

Thumbnail

Search Engine Appearance

Google

A Look At Deduplication In Keep

Bing

A Look At Deduplication In Keep

DuckDuckGo

A Look At Deduplication In Keep

General Meta Tags

Open Graph Meta Tags

Twitter Meta Tags

Link Tags

Emails

Links