aligned.substack.com/p/alignment-solution/comment/10154757

Preview meta tags from the aligned.substack.com website.

Linked Hostnames

2

Thumbnail

Search Engine Appearance

Google

https://aligned.substack.com/p/alignment-solution/comment/10154757

Michael Oesterle on Musings on the Alignment Problem

Thank you for this informative and motivating post! There are a few points on which I would like to comment: #2: “One possible path to achieve the outcome of an idealized process with significantly less effort than actually running it is to build a sufficiently capable and aligned AI system and have it figure out what the outcome would be. However, I expect that most people would not regard this substitute process as legitimate.” In my opinion, what makes this approach dangerous is that the answer of such an AI would to the alignment problem influences how we treat *this very* AI (and all other AIs) going forward. As soon as the AI figures out that we will use its output in this way, its behavior becomes strategic, adding a strong incentive for breaking free from its alignment and pursuing its own objectives (maybe that’s simply an instrumental goal like survival to start with). #2: I’m somewhat unsatisfied with the entire “emulating human values in AI models” approach. Apart from the difficulties you describe, I see the much more fundamental problem that human preferences might just not be very “good” compared to what’s possible. Two quite straight-forward aspects are: (a) Human preferences about specific situations might not perfectly capture abstract human values, due to various biases, and (b) human values might be systematically flawed, due to the fact that we’re, well, humans. Therefore, I would extend your argument that “with our automated alignment researcher we don’t need to restrict the search space to alignment techniques humans could devise” to the search space of consistent moral value systems, such that we’re no longer restricted to what *we* can conceive (of course, this would instead require some higher level description of desiderata for such value systems). #4: “If we want to prove something about a GPT-3-sized 175 billion parameter model, our theorem’s size is going to be at least 175GB.” Is your assumption that 175B parameters are *necessary* to capture the capabilities of GPT-3? It seems non-trivial to me to show that the same capabilities cannot be obtained by a much smaller model for *some* combination of initial configuration and training data. If this were possible, we could potentially describe (and make provable claims about) such a system in a much more compact form. I would be excited to hear your opinion!



Bing

Michael Oesterle on Musings on the Alignment Problem

https://aligned.substack.com/p/alignment-solution/comment/10154757

Thank you for this informative and motivating post! There are a few points on which I would like to comment: #2: “One possible path to achieve the outcome of an idealized process with significantly less effort than actually running it is to build a sufficiently capable and aligned AI system and have it figure out what the outcome would be. However, I expect that most people would not regard this substitute process as legitimate.” In my opinion, what makes this approach dangerous is that the answer of such an AI would to the alignment problem influences how we treat *this very* AI (and all other AIs) going forward. As soon as the AI figures out that we will use its output in this way, its behavior becomes strategic, adding a strong incentive for breaking free from its alignment and pursuing its own objectives (maybe that’s simply an instrumental goal like survival to start with). #2: I’m somewhat unsatisfied with the entire “emulating human values in AI models” approach. Apart from the difficulties you describe, I see the much more fundamental problem that human preferences might just not be very “good” compared to what’s possible. Two quite straight-forward aspects are: (a) Human preferences about specific situations might not perfectly capture abstract human values, due to various biases, and (b) human values might be systematically flawed, due to the fact that we’re, well, humans. Therefore, I would extend your argument that “with our automated alignment researcher we don’t need to restrict the search space to alignment techniques humans could devise” to the search space of consistent moral value systems, such that we’re no longer restricted to what *we* can conceive (of course, this would instead require some higher level description of desiderata for such value systems). #4: “If we want to prove something about a GPT-3-sized 175 billion parameter model, our theorem’s size is going to be at least 175GB.” Is your assumption that 175B parameters are *necessary* to capture the capabilities of GPT-3? It seems non-trivial to me to show that the same capabilities cannot be obtained by a much smaller model for *some* combination of initial configuration and training data. If this were possible, we could potentially describe (and make provable claims about) such a system in a much more compact form. I would be excited to hear your opinion!



DuckDuckGo

https://aligned.substack.com/p/alignment-solution/comment/10154757

Michael Oesterle on Musings on the Alignment Problem

Thank you for this informative and motivating post! There are a few points on which I would like to comment: #2: “One possible path to achieve the outcome of an idealized process with significantly less effort than actually running it is to build a sufficiently capable and aligned AI system and have it figure out what the outcome would be. However, I expect that most people would not regard this substitute process as legitimate.” In my opinion, what makes this approach dangerous is that the answer of such an AI would to the alignment problem influences how we treat *this very* AI (and all other AIs) going forward. As soon as the AI figures out that we will use its output in this way, its behavior becomes strategic, adding a strong incentive for breaking free from its alignment and pursuing its own objectives (maybe that’s simply an instrumental goal like survival to start with). #2: I’m somewhat unsatisfied with the entire “emulating human values in AI models” approach. Apart from the difficulties you describe, I see the much more fundamental problem that human preferences might just not be very “good” compared to what’s possible. Two quite straight-forward aspects are: (a) Human preferences about specific situations might not perfectly capture abstract human values, due to various biases, and (b) human values might be systematically flawed, due to the fact that we’re, well, humans. Therefore, I would extend your argument that “with our automated alignment researcher we don’t need to restrict the search space to alignment techniques humans could devise” to the search space of consistent moral value systems, such that we’re no longer restricted to what *we* can conceive (of course, this would instead require some higher level description of desiderata for such value systems). #4: “If we want to prove something about a GPT-3-sized 175 billion parameter model, our theorem’s size is going to be at least 175GB.” Is your assumption that 175B parameters are *necessary* to capture the capabilities of GPT-3? It seems non-trivial to me to show that the same capabilities cannot be obtained by a much smaller model for *some* combination of initial configuration and training data. If this were possible, we could potentially describe (and make provable claims about) such a system in a much more compact form. I would be excited to hear your opinion!

  • General Meta Tags

    17
    • title
      Comments - What could a solution to the alignment problem look like?
    • title
    • title
    • title
    • title
  • Open Graph Meta Tags

    7
    • og:url
      https://aligned.substack.com/p/alignment-solution/comment/10154757
    • og:image
      https://substackcdn.com/image/fetch/$s_!yEEV!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faligned.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-455970578%26version%3D9
    • og:type
      article
    • og:title
      Michael Oesterle on Musings on the Alignment Problem
    • og:description
      Thank you for this informative and motivating post! There are a few points on which I would like to comment: #2: “One possible path to achieve the outcome of an idealized process with significantly less effort than actually running it is to build a sufficiently capable and aligned AI system and have it figure out what the outcome would be. However, I expect that most people would not regard this substitute process as legitimate.” In my opinion, what makes this approach dangerous is that the answer of such an AI would to the alignment problem influences how we treat *this very* AI (and all other AIs) going forward. As soon as the AI figures out that we will use its output in this way, its behavior becomes strategic, adding a strong incentive for breaking free from its alignment and pursuing its own objectives (maybe that’s simply an instrumental goal like survival to start with). #2: I’m somewhat unsatisfied with the entire “emulating human values in AI models” approach. Apart from the difficulties you describe, I see the much more fundamental problem that human preferences might just not be very “good” compared to what’s possible. Two quite straight-forward aspects are: (a) Human preferences about specific situations might not perfectly capture abstract human values, due to various biases, and (b) human values might be systematically flawed, due to the fact that we’re, well, humans. Therefore, I would extend your argument that “with our automated alignment researcher we don’t need to restrict the search space to alignment techniques humans could devise” to the search space of consistent moral value systems, such that we’re no longer restricted to what *we* can conceive (of course, this would instead require some higher level description of desiderata for such value systems). #4: “If we want to prove something about a GPT-3-sized 175 billion parameter model, our theorem’s size is going to be at least 175GB.” Is your assumption that 175B parameters are *necessary* to capture the capabilities of GPT-3? It seems non-trivial to me to show that the same capabilities cannot be obtained by a much smaller model for *some* combination of initial configuration and training data. If this were possible, we could potentially describe (and make provable claims about) such a system in a much more compact form. I would be excited to hear your opinion!
  • Twitter Meta Tags

    8
    • twitter:image
      https://substackcdn.com/image/fetch/$s_!yEEV!,f_auto,q_auto:best,fl_progressive:steep/https%3A%2F%2Faligned.substack.com%2Ftwitter%2Fsubscribe-card.jpg%3Fv%3D-455970578%26version%3D9
    • twitter:card
      summary_large_image
    • twitter:label1
      Likes
    • twitter:data1
      0
    • twitter:label2
      Replies
  • Link Tags

    19
    • alternate
      /feed
    • apple-touch-icon
      https://substackcdn.com/icons/substack/apple-touch-icon.png
    • canonical
      https://aligned.substack.com/p/alignment-solution/comment/10154757
    • icon
      https://substackcdn.com/icons/substack/icon.svg
    • preconnect
      https://substackcdn.com

Links

16