Experimental browser for the Atmosphere
Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model? A new paper with Zak Mhammedi and Dhruv Rohatgi: The Computational Role of the Base Model in Exploration arxiv.org/abs/2503.07453
Mar 27, 2025, 5:28 PM
{ "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5p66ac2c", "cid": "bafyreidzv7o2gllewwfqqi4ixremjopxmuyntc5gstydqektrxkcjni4pi", "value": { "text": "Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model?\n\nA new paper with Zak Mhammedi and Dhruv Rohatgi: \nThe Computational Role of the Base Model in Exploration\n\narxiv.org/abs/2503.07453", "$type": "app.bsky.feed.post", "embed": { "$type": "app.bsky.embed.images", "images": [ { "alt": "", "image": { "$type": "blob", "ref": { "$link": "bafkreibciiq77vllwn24bodo4hkwfpqqfatxyigfzhwghvudkw4albestm" }, "mimeType": "image/jpeg", "size": 613733 }, "aspectRatio": { "width": 2000, "height": 1655 } } ] }, "langs": [ "en" ], "facets": [ { "index": { "byteEnd": 292, "byteStart": 268 }, "features": [ { "uri": "https://arxiv.org/abs/2503.07453", "$type": "app.bsky.richtext.facet#link" } ] } ], "createdAt": "2025-03-27T17:28:13.770Z" } }