ATProto Browser

Experimental browser for the Atmosphere

Post

Every AI industry lab should have an internal “Inspector General” that challenges internal evals/results. “Are you sure we beat this benchmark or is the training set contaminated? Is this benchmark even useful? Etc.” Might help find mismatches between benchmarks & customer experiences/vibe checks.

Mar 6, 2025, 2:13 AM

Loading post...

Record data

{
  "uri": "at://did:plc:evvussoazdkvsld475dfbuci/app.bsky.feed.like/3ljojdohawm2p",
  "cid": "bafyreide4ar5p6jtejnjxs5yaqyaknxkmhnqnu5pmpx7yfcyrsj7dtza44",
  "value": {
    "$type": "app.bsky.feed.like",
    "subject": {
      "cid": "bafyreiayktrh3ay3u4nqjzgfn4ke4d6ap5uf52e6npfdlhbazf4z4fq4zu",
      "uri": "at://did:plc:aevciz6kmhv2gpzy6lgemnpy/app.bsky.feed.post/3ljogaslqes2n"
    },
    "createdAt": "2025-03-06T03:08:50.867Z"
  }
}