Experimental browser for the Atmosphere
To study this question formally, we introduce a new computational framework for RL with language models, where the learner interacts with the base model through black-box sampling queries. This lets us separate data-efficiency (number of reward evaluations and prompts) from comp. efficiency. 10/
Mar 27, 2025, 5:28 PM
{ "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5st4ge2c", "cid": "bafyreia3ionxopl5d4j3pg7wn2hqsd6bysbcf2zr2w5muaqgj3qh464xci", "value": { "text": "To study this question formally, we introduce a new computational framework for RL with language models, where the learner interacts with the base model through black-box sampling queries. This lets us separate data-efficiency (number of reward evaluations and prompts) from comp. efficiency.\n\n10/", "$type": "app.bsky.feed.post", "langs": [ "en" ], "reply": { "root": { "cid": "bafyreidzv7o2gllewwfqqi4ixremjopxmuyntc5gstydqektrxkcjni4pi", "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5p66ac2c" }, "parent": { "cid": "bafyreieslthrt7bwkwsxjuukop4aphnmtvq4rprpaoj2gnskkiljpipmya", "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5st4gd2c" } }, "createdAt": "2025-03-27T17:28:13.779Z" } }