web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Preview meta tags from the web.archive.org website.

Linked Hostnames

1

Thumbnail

Search Engine Appearance

Google

https://web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to



Bing

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

https://web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to



DuckDuckGo

https://web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis

Updating the Lottery Ticket Hypothesis - AI Alignment Forum

Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to

  • General Meta Tags

    5
    • title
      Updating the Lottery Ticket Hypothesis - AI Alignment Forum
    • Accept-CH
      DPR, Viewport-Width, Width
    • charset
      utf-8
    • description
      Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
    • viewport
      width=device-width, initial-scale=1
  • Open Graph Meta Tags

    5
    • og:image
      https://web.archive.org/web/20210419010001im_/https://res.cloudinary.com/lesswrong-2-0/image/upload/v1503704344/sequencesgrid/h6vrwdypijqgsop7xwa0.jpg
    • og:title
      Updating the Lottery Ticket Hypothesis - AI Alignment Forum
    • og:type
      article
    • og:url
      https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis
    • og:description
      Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
  • Twitter Meta Tags

    3
    • twitter:image:src
      https://res.cloudinary.com/lesswrong-2-0/image/upload/v1503704344/sequencesgrid/h6vrwdypijqgsop7xwa0.jpg
    • twitter:card
      summary
    • twitter:description
      Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
  • Link Tags

    10
    • alternate
      https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/feed.xml
    • canonical
      https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis
    • shortcut icon
      https://web.archive.org/web/20210419010001im_/https://res.cloudinary.com/dq3pms5lt/image/upload/v1531267596/alignmentForum_favicon_o9bjnl.png
    • stylesheet
      https://web-static.archive.org/_static/css/banner-styles.css?v=p7PEIJWi
    • stylesheet
      https://web-static.archive.org/_static/css/iconochive.css?v=3PDvdIFv

Links

11