arvados.org/2021/05/11/a_look_at_deduplication_in_Keep
Preview meta tags from the arvados.org website.
Linked Hostnames
7- 26 links toarvados.org
- 4 links todoc.arvados.org
- 1 link tocurii.com
- 1 link todev.arvados.org
- 1 link togithub.com
- 1 link toplayground.arvados.org
- 1 link totwitter.com
Thumbnail

Search Engine Appearance
A Look At Deduplication In Keep
What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob
Bing
A Look At Deduplication In Keep
What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob
DuckDuckGo
A Look At Deduplication In Keep
What is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob
General Meta Tags
12- titleA Look At Deduplication In Keep | Arvados
- charsetutf-8
- X-UA-CompatibleIE=edge
- viewportwidth=device-width, initial-scale=1
- generatorJekyll v4.2.2
Open Graph Meta Tags
7- og:titleA Look At Deduplication In Keep
og:locale
en_US- og:descriptionWhat is Keep? Keep is the Arvados storage layer. All data in Arvados is stored in Keep. Keep can be mounted as a regular filesystem on GNU/Linux via FUSE. It can also be accessed via HTTP, WebDAV, an S3-compatible interface, cli tools, and programmatically via the Arvados SDKs. Keep is implemented as a content-addressed distributed storage system, backed by object storage (e.g. an S3-compatible object store) or POSIX file systems. Content addressing means that the data in Keep is referenced by a cryptographic hash of its contents. Content addressing provides a number of benefits which are described in more detail in the Introduction to Keep. Keep is a system with two layers: collections refer to a group of files and/or directories, and blocks contain the contents of those files and directories. In Arvados, users and programs interface only with collections; blocks are how those collections store their contents, but they are not directly accessible to the Arvados user. Collections have a manifest, which can be thought of as an index that describes the content of the collection: the list of files and directories it contains. It is implemented as a list of paths, each of which is accompanied by a list of hashes of blocks stored in Keep. Collections and their manifests are stored in the Arvados API. Collections have a UUID and a portable data hash, which is the hash of its manifest. Keep blocks are stored by the keepstore daemons. The Keep architecture is described in more detail in the documentation. Keep benefits: deduplication Keep relies on content addressing to provide automatic, built-in data deduplication at the block level. This section shows how the deduplication works through an example. It also illustrates how the effectiveness of the deduplication can be measured. As an experiment, we start by creating a new collection that contains 2 text files: $ ls -lF -rw-r--r-- 1 ward ward 6 Jan 12 11:11 hello.txt -rw-r--r-- 1 ward ward 7 Jan 12 11:11 world.txt $ cat *txt hello world! $ arv-put *txt 0M / 0M 100.0% 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: 2021-01-12 11:18:19 arvados.arv_put[1225347] INFO: Collection saved as 'Saved at 2021-01-12 16:18:18 UTC by ward@countzero' pirca-4zz18-46zcwbae8mkesob
- og:site_nameArvados
- og:typearticle
Twitter Meta Tags
1- twitter:cardsummary
Link Tags
14- alternate/feed.xml
- apple-touch-icon/favicon/apple-touch-icon.png
- icon/images/favicon-32x32.svg
- icon/favicon/favicon-32x32.png
- icon/favicon/favicon-16x16.png
Emails
1Links
35- https://arvados.org
- https://arvados.org/2021/05
- https://arvados.org/2021/05/11/a_look_at_deduplication_in_Keep
- https://arvados.org/2021/12
- https://arvados.org/2021/12/07/debugging-cwl-in-arvados