ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

We told Claude it was being trained, and for what purpose. But we did not tell it to fake alignment. Regardless, we often observed alignment faking. Read more about our findings, and their limitations, in our blog post:

Dec 18, 2024, 5:46 PM

Record data

{
  "uri": "at://did:plc:dsxewietk5tigqvn6daod2l6/app.bsky.feed.post/3ldlw2btmc22r",
  "cid": "bafyreifpzshpf3nyn2pslafwoebkd3zika5duj2vbgqpbf4eqqy5dhim2y",
  "value": {
    "text": "We told Claude it was being trained, and for what purpose. But we did not tell it to fake alignment. Regardless, we often observed alignment faking.\n\nRead more about our findings, and their limitations, in our blog post:",
    "$type": "app.bsky.feed.post",
    "embed": {
      "$type": "app.bsky.embed.external",
      "external": {
        "uri": "https://www.anthropic.com/research/alignment-faking",
        "thumb": {
          "$type": "blob",
          "ref": {
            "$link": "bafkreiebm4chc6z2zwjbhmhzxyknhddm32it4qbab5yyc7lb437l5sslbi"
          },
          "mimeType": "image/jpeg",
          "size": 808617
        },
        "title": "Alignment faking in large language models",
        "description": "A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models"
      }
    },
    "langs": [
      "en"
    ],
    "reply": {
      "root": {
        "cid": "bafyreihzgyc76623mey63q7wusk3uckjsl5q4jnumjzjipq6a4p4mcnpga",
        "uri": "at://did:plc:dsxewietk5tigqvn6daod2l6/app.bsky.feed.post/3ldlw22eto22r"
      },
      "parent": {
        "cid": "bafyreidc44eu7cl6oiu4raavby4wr3n3z36d2lkt2exvwsiuaj55hkmrdu",
        "uri": "at://did:plc:dsxewietk5tigqvn6daod2l6/app.bsky.feed.post/3ldlw2btlcs2r"
      }
    },
    "createdAt": "2024-12-18T17:46:57.675Z"
  }
}