ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

Result #1: For an "ideal" N, BoN actually achieves optimal performance if the base model obeys certain (stringent) notions of coverage. However, we show that BoN provably suffers from reward hacking when N is large, and fails to achieve optimal performance under realistic coverage conditions. 4/11

May 3, 2025, 5:40 PM

Record data

{
  "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3lobv4fewek2d",
  "cid": "bafyreierovwjkxd2nyk53bqgvtxb6jgvnlbhz2uyvmv5r3mh5zehbbwkpu",
  "value": {
    "text": "Result #1: For an \"ideal\" N, BoN actually achieves optimal performance if the base model obeys certain (stringent) notions of coverage.\n\nHowever, we show that BoN provably suffers from reward hacking when N is large, and fails to achieve optimal performance under realistic coverage conditions.\n\n4/11",
    "$type": "app.bsky.feed.post",
    "langs": [
      "en"
    ],
    "reply": {
      "root": {
        "cid": "bafyreih6mvnxmoz4bgad7vcbv2fl63qhwwggtht5lpckbzj7yexbnz2qie",
        "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3lobv4byuec2d"
      },
      "parent": {
        "cid": "bafyreiacwrqw7tnp5gu6s2udd65hohyyn6hp2lshzhrfy22ivxktbtohdy",
        "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3lobv4fevfc2d"
      }
    },
    "createdAt": "2025-05-03T17:40:49.562Z"
  }
}