web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Preview meta tags from the web.archive.org website.

Linked Hostnames

11 links to
web.archive.org

Thumbnail

Search Engine Appearance

Google

https://web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to

Bing

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

https://web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

DuckDuckGo

https://web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

General Meta Tags
5
- title
  Updating the Lottery Ticket Hypothesis - AI Alignment Forum
- Accept-CH
  DPR, Viewport-Width, Width
- charset
  utf-8
- description
  Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
- viewport
  width=device-width, initial-scale=1
Open Graph Meta Tags
5
- og:image
  https://web.archive.org/web/20210419010001im_/https://res.cloudinary.com/lesswrong-2-0/image/upload/v1503704344/sequencesgrid/h6vrwdypijqgsop7xwa0.jpg
- og:title
  Updating the Lottery Ticket Hypothesis - AI Alignment Forum
- og:type
  article
- og:url
  https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis
- og:description
  Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
Twitter Meta Tags
3
- twitter:image:src
  https://res.cloudinary.com/lesswrong-2-0/image/upload/v1503704344/sequencesgrid/h6vrwdypijqgsop7xwa0.jpg
- twitter:card
  summary
- twitter:description
  Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
Link Tags
10
- alternate
  https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/feed.xml
- canonical
  https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis
- shortcut icon
  https://web.archive.org/web/20210419010001im_/https://res.cloudinary.com/dq3pms5lt/image/upload/v1531267596/alignmentForum_favicon_o9bjnl.png
- stylesheet
  https://web-static.archive.org/_static/css/banner-styles.css?v=p7PEIJWi
- stylesheet
  https://web-static.archive.org/_static/css/iconochive.css?v=3PDvdIFv

web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Linked Hostnames

Thumbnail

Search Engine Appearance

Google

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

Bing

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

DuckDuckGo

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

General Meta Tags

Open Graph Meta Tags

Twitter Meta Tags

Link Tags

Links