web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

Preview meta tags from the web.archive.org website.

Linked Hostnames

1

Search Engine Appearance

Google

https://web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

Fully General Online Imitation Learning

In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.



Bing

Fully General Online Imitation Learning

https://web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.



DuckDuckGo

https://web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686

Fully General Online Imitation Learning

In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.

  • General Meta Tags

    16
    • title
      [2102.08686] Fully General Online Imitation Learning
    • title
      open search
    • title
      open navigation menu
    • title
      contact arXiv
    • title
      subscribe to arXiv mailings
  • Open Graph Meta Tags

    4
    • og:site_name
      arXiv.org
    • og:title
      Fully General Online Imitation Learning
    • og:url
      https://web.archive.org/web/20210222082645/https://arxiv.org/abs/2102.08686v1
    • og:description
      In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.
  • Twitter Meta Tags

    1
    • twitter:site
      @arxiv
  • Link Tags

    8
    • shortcut icon
      https://web.archive.org/web/20210222082645im_/https://static.arxiv.org/static/browse/0.3.2.6/images/icons/favicon.ico
    • stylesheet
      https://web-static.archive.org/_static/css/banner-styles.css?v=p7PEIJWi
    • stylesheet
      https://web-static.archive.org/_static/css/iconochive.css?v=3PDvdIFv
    • stylesheet
      https://web.archive.org/web/20210222082645cs_/https://static.arxiv.org/static/browse/0.3.2.6/css/arXiv.css?v=20200727
    • stylesheet
      https://web.archive.org/web/20210222082645cs_/https://static.arxiv.org/static/browse/0.3.2.6/css/arXiv-print.css?v=20200611

Links

49