web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

Preview meta tags from the web.archive.org website.

Linked Hostnames

49 links to
web.archive.org

Search Engine Appearance

Google

https://web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

Fully General Online Imitation Learning

In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.

Bing

Fully General Online Imitation Learning

https://web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

DuckDuckGo

https://web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

Fully General Online Imitation Learning

General Meta Tags
16
- title
  [2102.08686] Fully General Online Imitation Learning
- title
  open search
- title
  open navigation menu
- title
  contact arXiv
- title
  subscribe to arXiv mailings
Open Graph Meta Tags
4
- og:site_name
  arXiv.org
- og:title
  Fully General Online Imitation Learning
- og:url
  https://web.archive.org/web/20210222082645/https://arxiv.org/abs/2102.08686v1
- og:description
  In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.
Twitter Meta Tags
1
- twitter:site
  @arxiv
Link Tags
8
- shortcut icon
  https://web.archive.org/web/20210222082645im_/https://static.arxiv.org/static/browse/0.3.2.6/images/icons/favicon.ico
- stylesheet
  https://web-static.archive.org/_static/css/banner-styles.css?v=p7PEIJWi
- stylesheet
  https://web-static.archive.org/_static/css/iconochive.css?v=3PDvdIFv
- stylesheet
  https://web.archive.org/web/20210222082645cs_/https://static.arxiv.org/static/browse/0.3.2.6/css/arXiv.css?v=20200727
- stylesheet
  https://web.archive.org/web/20210222082645cs_/https://static.arxiv.org/static/browse/0.3.2.6/css/arXiv-print.css?v=20200611

web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

Linked Hostnames

Search Engine Appearance

Google

Fully General Online Imitation Learning

Bing

Fully General Online Imitation Learning

DuckDuckGo

Fully General Online Imitation Learning

General Meta Tags

Open Graph Meta Tags

Twitter Meta Tags

Link Tags

Links