Experimental browser for the Atmosphere
The basic idea premise behind language model reinforcement learning techniques is that by iteratively generating new proposals and refining them with verifiable feedback/rewards (e.g., from a formal proof checker), the language model can eventually discover useful behaviors and capabilities. 2/
Mar 27, 2025, 5:28 PM
{ "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5qyih22c", "cid": "bafyreiajq5psz6xmuw2xbnjhjheii5nepctkj6b6c6a2bvh67keqswejiu", "value": { "text": "The basic idea premise behind language model reinforcement learning techniques is that by iteratively generating new proposals and refining them with verifiable feedback/rewards (e.g., from a formal proof checker), the language model can eventually discover useful behaviors and capabilities.\n\n2/", "$type": "app.bsky.feed.post", "embed": { "$type": "app.bsky.embed.images", "images": [ { "alt": "", "image": { "$type": "blob", "ref": { "$link": "bafkreidcrilmbj2pkkf64wah2a4xtipq7sq7bu5dsscaecuri3rbeb6wty" }, "mimeType": "image/jpeg", "size": 299410 }, "aspectRatio": { "width": 1338, "height": 740 } }, { "alt": "", "image": { "$type": "blob", "ref": { "$link": "bafkreieyy6ptt6vihx2kne64rynnoget7k5rfoy62ncsidltfe33gftvym" }, "mimeType": "image/jpeg", "size": 65226 }, "aspectRatio": { "width": 520, "height": 272 } } ] }, "langs": [ "en" ], "reply": { "root": { "cid": "bafyreidzv7o2gllewwfqqi4ixremjopxmuyntc5gstydqektrxkcjni4pi", "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5p66ac2c" }, "parent": { "cid": "bafyreidzv7o2gllewwfqqi4ixremjopxmuyntc5gstydqektrxkcjni4pi", "uri": "at://did:plc:x2a3inabvfsn4wntrlbbndrv/app.bsky.feed.post/3llet5p66ac2c" } }, "createdAt": "2025-03-27T17:28:13.771Z" } }