Experimental browser for the Atmosphere
What can we do about the benchmark fatigue for #LLM? More people I speak to don’t take them seriously anymore, and I can’t blame them. I still hesitate, but I’m about to drop them as well. I think we need something new for #AI eval. Or is there something I’m not aware of?
Nov 27, 2024, 11:17 PM
{
"text": "What can we do about the benchmark fatigue for #LLM? More people I speak to don’t take them seriously anymore, and I can’t blame them. I still hesitate, but I’m about to drop them as well. I think we need something new for #AI eval. Or is there something I’m not aware of?",
"$type": "app.bsky.feed.post",
"embed": {
"$type": "app.bsky.embed.video",
"video": {
"$type": "blob",
"ref": {
"$link": "bafkreicsfwp6544kxqje7ntb7yjybkeznlluildbp4s5vtaevqr5b3vg2i"
},
"mimeType": "video/mp4",
"size": 323919
},
"aspectRatio": {
"width": 1200,
"height": 1200
}
},
"langs": [
"en"
],
"facets": [
{
"index": {
"byteEnd": 51,
"byteStart": 47
},
"features": [
{
"tag": "LLM",
"$type": "app.bsky.richtext.facet#tag"
}
]
},
{
"index": {
"byteEnd": 232,
"byteStart": 229
},
"features": [
{
"tag": "AI",
"$type": "app.bsky.richtext.facet#tag"
}
]
}
],
"createdAt": "2024-11-27T23:17:44.187Z"
}