Experimental browser for the Atmosphere
We told Claude it was being trained, and for what purpose. But we did not tell it to fake alignment. Regardless, we often observed alignment faking. Read more about our findings, and their limitations, in our blog post:
Dec 18, 2024, 5:46 PM
{ "uri": "at://did:plc:dsxewietk5tigqvn6daod2l6/app.bsky.feed.post/3ldlw2btmc22r", "cid": "bafyreifpzshpf3nyn2pslafwoebkd3zika5duj2vbgqpbf4eqqy5dhim2y", "value": { "text": "We told Claude it was being trained, and for what purpose. But we did not tell it to fake alignment. Regardless, we often observed alignment faking.\n\nRead more about our findings, and their limitations, in our blog post:", "$type": "app.bsky.feed.post", "embed": { "$type": "app.bsky.embed.external", "external": { "uri": "https://www.anthropic.com/research/alignment-faking", "thumb": { "$type": "blob", "ref": { "$link": "bafkreiebm4chc6z2zwjbhmhzxyknhddm32it4qbab5yyc7lb437l5sslbi" }, "mimeType": "image/jpeg", "size": 808617 }, "title": "Alignment faking in large language models", "description": "A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models" } }, "langs": [ "en" ], "reply": { "root": { "cid": "bafyreihzgyc76623mey63q7wusk3uckjsl5q4jnumjzjipq6a4p4mcnpga", "uri": "at://did:plc:dsxewietk5tigqvn6daod2l6/app.bsky.feed.post/3ldlw22eto22r" }, "parent": { "cid": "bafyreidc44eu7cl6oiu4raavby4wr3n3z36d2lkt2exvwsiuaj55hkmrdu", "uri": "at://did:plc:dsxewietk5tigqvn6daod2l6/app.bsky.feed.post/3ldlw2btlcs2r" } }, "createdAt": "2024-12-18T17:46:57.675Z" } }