Experimental browser for the Atmosphere
Hah well the previous benchmark I created was evaluated on a curated vuln identification dataset which made it trivial and allowed me to iterate on the input side which led toward agentic prompt selection. I have other ideas and would incorporate more variables for this task ofc ;)
Mar 13, 2025, 9:30 AM
{ "uri": "at://did:plc:tolbj73l7dmkeiqoruzzpj5h/app.bsky.feed.post/3lkarwateos2n", "cid": "bafyreidk7bmvfel7kmjrpr3z3yxw73zv2lujf25fwghl2zce25flzkp74y", "value": { "text": "Hah well the previous benchmark I created was evaluated on a curated vuln identification dataset which made it trivial and allowed me to iterate on the input side which led toward agentic prompt selection. I have other ideas and would incorporate more variables for this task ofc ;)", "$type": "app.bsky.feed.post", "langs": [ "en" ], "reply": { "root": { "cid": "bafyreihyaqzr7osxukhwvhckgvzd4523lwliny6gkjjy3kbram6ob7qwma", "uri": "at://did:plc:tolbj73l7dmkeiqoruzzpj5h/app.bsky.feed.post/3lkaal72sv22h" }, "parent": { "cid": "bafyreidyqn6yjryympo3dk2okkdydgjdj2sqidgeifh2kksmggmmdhy6aq", "uri": "at://did:plc:qrgahxlxtqxqxaga3xbexytp/app.bsky.feed.post/3lkaprkahbc2d" } }, "createdAt": "2025-03-13T09:30:19.536Z" } }