Experimental browser for the Atmosphere
Result #1: For an "ideal" N, BoN actually achieves optimal performance if the base model obeys certain (stringent) notions of coverage. However, we show that BoN provably suffers from reward hacking when N is large, and fails to achieve optimal performance under realistic coverage conditions. 4/11
May 3, 2025, 5:40 PM
{ "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3lobv4fewek2d", "cid": "bafyreierovwjkxd2nyk53bqgvtxb6jgvnlbhz2uyvmv5r3mh5zehbbwkpu", "value": { "text": "Result #1: For an \"ideal\" N, BoN actually achieves optimal performance if the base model obeys certain (stringent) notions of coverage.\n\nHowever, we show that BoN provably suffers from reward hacking when N is large, and fails to achieve optimal performance under realistic coverage conditions.\n\n4/11", "$type": "app.bsky.feed.post", "langs": [ "en" ], "reply": { "root": { "cid": "bafyreih6mvnxmoz4bgad7vcbv2fl63qhwwggtht5lpckbzj7yexbnz2qie", "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3lobv4byuec2d" }, "parent": { "cid": "bafyreiacwrqw7tnp5gu6s2udd65hohyyn6hp2lshzhrfy22ivxktbtohdy", "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3lobv4fevfc2d" } }, "createdAt": "2025-05-03T17:40:49.562Z" } }