Experimental browser for the Atmosphere
Indeed the article is truly terrible, using "hallucinat* 20x vs "inaccurate" 1x, and making all kinds of quantitative comparisons of benchmark outcomes. (An LLM benchmark score is %age of correct answers on today's collection of tricky questions, providing no info on system accuracy in the wild.)
Apr 27, 2025, 9:31 AM
{ "uri": "at://did:plc:iw4ngu7e6vevjog34kermab3/app.bsky.feed.post/3lnrwxwg52c2n", "cid": "bafyreih3fgxmbo3pb67nmpif4l56rc7f2ejoo34dfktxsui6s2udecpvg4", "value": { "text": "Indeed the article is truly terrible, using \"hallucinat* 20x vs \"inaccurate\" 1x, and making all kinds of quantitative comparisons of benchmark outcomes. (An LLM benchmark score is %age of correct answers on today's collection of tricky questions, providing no info on system accuracy in the wild.)", "$type": "app.bsky.feed.post", "langs": [ "en" ], "reply": { "root": { "cid": "bafyreibf7y27l7cdjuzobo5pqzmil7b2zczycatr2x74di7cq7ls25al5a", "uri": "at://did:plc:iw4ngu7e6vevjog34kermab3/app.bsky.feed.post/3lnrwgrckz22n" }, "parent": { "cid": "bafyreibf7y27l7cdjuzobo5pqzmil7b2zczycatr2x74di7cq7ls25al5a", "uri": "at://did:plc:iw4ngu7e6vevjog34kermab3/app.bsky.feed.post/3lnrwgrckz22n" } }, "createdAt": "2025-04-27T09:31:34.868Z" } }