ATProto Browser

{
  "text": "Parallel-form reliability tests if different—but designed to be equivalent—versions of a measure are consistent. E.g., how consistent are different prompt formulations for evaluating the LM responses on the same bias dataset? Are LMs sensitive to minor changes to how the questions are phrased?",
  "$type": "app.bsky.feed.post",
  "langs": [
    "en"
  ],
  "reply": {
    "root": {
      "cid": "bafyreidc5acxzktjgf3sratzwt6vei52wikhpe5aofvgygyobmo33gpwwy",
      "uri": "at://did:plc:jo6p6curyzzhgblcdwwso6qy/app.bsky.feed.post/3kjpqeivm4n2t"
    },
    "parent": {
      "cid": "bafyreigvvxzg74a5v4lyqfciaer2uokgxyc4w6jtnjspdlvdaphknpkbta",
      "uri": "at://did:plc:jo6p6curyzzhgblcdwwso6qy/app.bsky.feed.post/3kjpqojgwi62s"
    }
  },
  "createdAt": "2024-01-24T09:29:02.587Z"
}
Post

In reply to 3kjpqojgwi62s