Experimental browser for the Atmosphere
Loading post...
{ "uri": "at://did:plc:evvussoazdkvsld475dfbuci/app.bsky.feed.like/3ljojdohawm2p", "cid": "bafyreide4ar5p6jtejnjxs5yaqyaknxkmhnqnu5pmpx7yfcyrsj7dtza44", "value": { "$type": "app.bsky.feed.like", "subject": { "cid": "bafyreiayktrh3ay3u4nqjzgfn4ke4d6ap5uf52e6npfdlhbazf4z4fq4zu", "uri": "at://did:plc:aevciz6kmhv2gpzy6lgemnpy/app.bsky.feed.post/3ljogaslqes2n" }, "createdAt": "2025-03-06T03:08:50.867Z" } }
Every AI industry lab should have an internal “Inspector General” that challenges internal evals/results. “Are you sure we beat this benchmark or is the training set contaminated? Is this benchmark even useful? Etc.” Might help find mismatches between benchmarks & customer experiences/vibe checks.
Mar 6, 2025, 2:13 AM