
web.archive.org/web/20210419010001/https:/www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis
Preview meta tags from the web.archive.org website.
Linked Hostnames
1Thumbnail

Search Engine Appearance
Updating the Lottery Ticket Hypothesis - AI Alignment Forum
Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
Bing
Updating the Lottery Ticket Hypothesis - AI Alignment Forum
Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
DuckDuckGo

Updating the Lottery Ticket Hypothesis - AI Alignment Forum
Epistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
General Meta Tags
5- titleUpdating the Lottery Ticket Hypothesis - AI Alignment Forum
- Accept-CHDPR, Viewport-Width, Width
- charsetutf-8
- descriptionEpistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
- viewportwidth=device-width, initial-scale=1
Open Graph Meta Tags
5- og:imagehttps://web.archive.org/web/20210419010001im_/https://res.cloudinary.com/lesswrong-2-0/image/upload/v1503704344/sequencesgrid/h6vrwdypijqgsop7xwa0.jpg
- og:titleUpdating the Lottery Ticket Hypothesis - AI Alignment Forum
- og:typearticle
- og:urlhttps://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis
- og:descriptionEpistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
Twitter Meta Tags
3- twitter:image:srchttps://res.cloudinary.com/lesswrong-2-0/image/upload/v1503704344/sequencesgrid/h6vrwdypijqgsop7xwa0.jpg
- twitter:cardsummary
- twitter:descriptionEpistemic status: not confident enough to bet against someone who’s likely to understand this stuff. The lottery ticket hypothesis of neural network learning (as aptly described by Daniel Kokotajlo) roughly says: When the network is randomly initialized, there is a sub-network that is already decent at the task. Then, when training happens, that sub-network is reinforced and all other sub-networks are dampened so as to not interfere. This is a very simple, intuitive, and useful picture to have in mind, and the original paper presents interesting evidence for at least some form of the hypothesis. Unfortunately, the strongest forms of the hypothesis do not seem plausible - e.g. I doubt that today’s neural networks already contain dog-recognizing subcircuits at initialization. Modern neural networks are big, but not that big. Meanwhile, a cluster of research has shown that large neural networks approximate certain Bayesian models, involving phrases like “neural tangent kernel (NTK)” or “Gaussian process (GP)”. Mingard et al. show that these models explain the large majority of the good performance we see from large neural networks in practice. This view also implies a version of the lottery ticket hypothesis, but it has different implications for what the “lottery tickets” are. They’re not subcircuits of the initial net, but rather subcircuits of the parameter tangent space of the initial net. This post will sketch out what that means. Let’s start with the jargon: what’s the “parameter tangent space” of a neural net? Think of the network as a functionfwith two kinds of inputs: parametersθ, and data inputsx. During training, we try to adjust the parameters so that the function sends each data inputx(n)to the corresponding data outputy(n)- i.e. findθfor whichy(n)=f(x(n),θ), for alln. Each data point gives an equation whichθ must satisfy, in order for that data input to be exactly mapped to its target output. If our initial parametersθ0happen to be close enough to
Link Tags
10- alternatehttps://web.archive.org/web/20210419010001/https://www.alignmentforum.org/feed.xml
- canonicalhttps://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis
- shortcut iconhttps://web.archive.org/web/20210419010001im_/https://res.cloudinary.com/dq3pms5lt/image/upload/v1531267596/alignmentForum_favicon_o9bjnl.png
- stylesheethttps://web-static.archive.org/_static/css/banner-styles.css?v=p7PEIJWi
- stylesheethttps://web-static.archive.org/_static/css/iconochive.css?v=3PDvdIFv
Links
11- https://web.archive.org/web/20210419010001/https://arxiv.org/abs/1803.03635
- https://web.archive.org/web/20210419010001/https://towardsdatascience.com/neural-networks-are-fundamentally-bayesian-bee9a172fad8
- https://web.archive.org/web/20210419010001/https://www.alignmentforum.org
- https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis?commentId=3Z7z2C7bypfHTyEgQ
- https://web.archive.org/web/20210419010001/https://www.alignmentforum.org/posts/i9p5KWNWcthccsxqm/updating-the-lottery-ticket-hypothesis?commentId=icGMBokEpCKiLaLdx