ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model? A new paper with Zak Mhammedi and Dhruv Rohatgi: The Computational Role of the Base Model in Exploration arxiv.org/abs/2503.07453

Mar 27, 2025, 5:28 PM

Record data

{
  "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5p66ac2c",
  "cid": "bafyreidzv7o2gllewwfqqi4ixremjopxmuyntc5gstydqektrxkcjni4pi",
  "value": {
    "text": "Reinforcement learning has led to amazing breakthroughs in reasoning (e.g., R1), but can it discover truly new behaviors not already present in the base model?\n\nA new paper with Zak Mhammedi and Dhruv Rohatgi: \nThe Computational Role of the Base Model in Exploration\n\narxiv.org/abs/2503.07453",
    "$type": "app.bsky.feed.post",
    "embed": {
      "$type": "app.bsky.embed.images",
      "images": [
        {
          "alt": "",
          "image": {
            "$type": "blob",
            "ref": {
              "$link": "bafkreibciiq77vllwn24bodo4hkwfpqqfatxyigfzhwghvudkw4albestm"
            },
            "mimeType": "image/jpeg",
            "size": 613733
          },
          "aspectRatio": {
            "width": 2000,
            "height": 1655
          }
        }
      ]
    },
    "langs": [
      "en"
    ],
    "facets": [
      {
        "index": {
          "byteEnd": 292,
          "byteStart": 268
        },
        "features": [
          {
            "uri": "https://arxiv.org/abs/2503.07453",
            "$type": "app.bsky.richtext.facet#link"
          }
        ]
      }
    ],
    "createdAt": "2025-03-27T17:28:13.770Z"
  }
}