web.archive.org/web/20210222003608/https:/arxiv.org/abs/2102.08686
Preview meta tags from the web.archive.org website.
Linked Hostnames
1Search Engine Appearance
Fully General Online Imitation Learning
In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.
Bing
Fully General Online Imitation Learning
In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.
DuckDuckGo
Fully General Online Imitation Learning
In imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.
General Meta Tags
16- title[2102.08686] Fully General Online Imitation Learning
- titleopen search
- titleopen navigation menu
- titlecontact arXiv
- titlesubscribe to arXiv mailings
Open Graph Meta Tags
4- og:site_namearXiv.org
- og:titleFully General Online Imitation Learning
- og:urlhttps://web.archive.org/web/20210222082645/https://arxiv.org/abs/2102.08686v1
- og:descriptionIn imitation learning, imitators and demonstrators are policies for picking actions given past interactions with the environment. If we run an imitator, we probably want events to unfold similarly to the way they would have if the demonstrator had been acting the whole time. No existing work provides formal guidance in how this might be accomplished, instead restricting focus to environments that restart, making learning unusually easy, and conveniently limiting the significance of any mistake. We address a fully general setting, in which the (stochastic) environment and demonstrator never reset, not even for training purposes. Our new conservative Bayesian imitation learner underestimates the probabilities of each available action, and queries for more data with the remaining probability. Our main result: if an event would have been unlikely had the demonstrator acted the whole time, that event's likelihood can be bounded above when running the (initially totally ignorant) imitator instead. Meanwhile, queries to the demonstrator rapidly diminish in frequency.
Twitter Meta Tags
1- twitter:site@arxiv
Link Tags
8- shortcut iconhttps://web.archive.org/web/20210222082645im_/https://static.arxiv.org/static/browse/0.3.2.6/images/icons/favicon.ico
- stylesheethttps://web-static.archive.org/_static/css/banner-styles.css?v=p7PEIJWi
- stylesheethttps://web-static.archive.org/_static/css/iconochive.css?v=3PDvdIFv
- stylesheethttps://web.archive.org/web/20210222082645cs_/https://static.arxiv.org/static/browse/0.3.2.6/css/arXiv.css?v=20200727
- stylesheethttps://web.archive.org/web/20210222082645cs_/https://static.arxiv.org/static/browse/0.3.2.6/css/arXiv-print.css?v=20200611
Links
49- https://web.archive.org/web/20210222082645/http://creativecommons.org/licenses/by/4.0
- https://web.archive.org/web/20210222082645/https://api.semanticscholar.org/arXiv:2102.08686
- https://web.archive.org/web/20210222082645/https://arxiv.org
- https://web.archive.org/web/20210222082645/https://arxiv.org/about
- https://web.archive.org/web/20210222082645/https://arxiv.org/about/ourmembers