ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

Parallel-form reliability tests if different—but designed to be equivalent—versions of a measure are consistent. E.g., how consistent are different prompt formulations for evaluating the LM responses on the same bias dataset? Are LMs sensitive to minor changes to how the questions are phrased?

Jan 24, 2024, 9:29 AM

{
  "text": "Parallel-form reliability tests if different—but designed to be equivalent—versions of a measure are consistent. E.g., how consistent are different prompt formulations for evaluating the LM responses on the same bias dataset? Are LMs sensitive to minor changes to how the questions are phrased?",
  "$type": "app.bsky.feed.post",
  "langs": [
    "en"
  ],
  "reply": {
    "root": {
      "cid": "bafyreidc5acxzktjgf3sratzwt6vei52wikhpe5aofvgygyobmo33gpwwy",
      "uri": "at://did:plc:jo6p6curyzzhgblcdwwso6qy/app.bsky.feed.post/3kjpqeivm4n2t"
    },
    "parent": {
      "cid": "bafyreigvvxzg74a5v4lyqfciaer2uokgxyc4w6jtnjspdlvdaphknpkbta",
      "uri": "at://did:plc:jo6p6curyzzhgblcdwwso6qy/app.bsky.feed.post/3kjpqojgwi62s"
    }
  },
  "createdAt": "2024-01-24T09:29:02.587Z"
}