ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

What can we do about the benchmark fatigue for #LLM? More people I speak to don’t take them seriously anymore, and I can’t blame them. I still hesitate, but I’m about to drop them as well. I think we need something new for #AI eval. Or is there something I’m not aware of?

Nov 27, 2024, 11:17 PM

{
  "text": "What can we do about the benchmark fatigue for #LLM? More people I speak to don’t take them seriously anymore, and I can’t blame them. I still hesitate, but I’m about to drop them as well. I think we need something new for #AI eval. Or is there something I’m not aware of?",
  "$type": "app.bsky.feed.post",
  "embed": {
    "$type": "app.bsky.embed.video",
    "video": {
      "$type": "blob",
      "ref": {
        "$link": "bafkreicsfwp6544kxqje7ntb7yjybkeznlluildbp4s5vtaevqr5b3vg2i"
      },
      "mimeType": "video/mp4",
      "size": 323919
    },
    "aspectRatio": {
      "width": 1200,
      "height": 1200
    }
  },
  "langs": [
    "en"
  ],
  "facets": [
    {
      "index": {
        "byteEnd": 51,
        "byteStart": 47
      },
      "features": [
        {
          "tag": "LLM",
          "$type": "app.bsky.richtext.facet#tag"
        }
      ]
    },
    {
      "index": {
        "byteEnd": 232,
        "byteStart": 229
      },
      "features": [
        {
          "tag": "AI",
          "$type": "app.bsky.richtext.facet#tag"
        }
      ]
    }
  ],
  "createdAt": "2024-11-27T23:17:44.187Z"
}