ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

Testing models on 18 commonly used static evaluation benchmarks, we find that none produce the same rankings as our interactive user evaluation. This suggests common interaction areas might be missing in existing static benchmarks used for Large Audio Models! (3/5)

Dec 10, 2024, 12:01 AM

{
  "text": "Testing models on 18 commonly used static evaluation benchmarks, we find that none produce the same rankings as our interactive user evaluation. \n\nThis suggests common interaction areas might be missing in existing static benchmarks used for Large Audio Models! (3/5)",
  "$type": "app.bsky.feed.post",
  "embed": {
    "$type": "app.bsky.embed.images",
    "images": [
      {
        "alt": "Visualization of the following raw CSV\n\nconst csvData = `Model,Urfunny (Humor Detection),Mustard (Sarcasm Detection),SLURP (Intent Detection),IEMOCAP (Emotion Recognition),MELD (Emotion Recognition),Public_SG_Speech (Speech QA),CN_College_Listen (Speech QA),Librispeech (Speech Grounding),SLURP (Entity Recognition),Callhome (Relation Classification),Commonvoice (Gender Classification),FairSpeech (Gender Classification),Commonvoice (Age Classification),FairSpeech (Age Classification),Commonvoice (Accent Classification),Covost2 (Language Classification),Openhermes (Instruction Following),Alpaca (Instruction Following),Overall\nNextGPT,26.6,16.9,12.7,11.5,5.7,55.3,20.5,8.7,12.2,27.4,17.9,30.2,7,9.9,6.8,26.4,7,5.9,17.1\nPandaGPT,42.6,33.4,13.9,16.4,5.8,53.6,25.3,8.7,17.6,44.2,26.4,58.5,11.5,11.9,4,33.5,27.3,24,25.5\nSpeechGPT,29.5,27.2,18.4,16.6,6.1,56,19.6,7.5,13.9,17.3,22.1,29.4,11,11.4,1.8,30.8,51.5,50,23.3\nSALMONN,39.2,34.6,35.5,22.7,9.7,69.4,32.9,18,28.3,31.7,12.8,20.8,2.9,8.3,3.3,20.3,43.9,32.2,25.9\nQwen-audio,39.9,30.8,69.1,21.2,11.6,75.7,44.9,5,38.7,30.9,48,43,4.2,12.5,5,58.1,50.3,40.8,35.0\nDiva,46.2,38.3,61.5,26.4,23.9,64.2,36.9,17.3,18.8,34.9,31.1,29.9,7.3,13.6,13,46.5,66.2,67,35.7\nQwen2-audio,34.9,41.5,81.1,26.7,19.6,68.8,55.7,10,43.7,17.3,79.8,58.3,10.3,14.3,5.4,66.5,64,61.3,42.2\nGemini,35.7,36,91.4,27.5,26.9,62.3,66.1,25.9,23.6,35.9,38.3,49.5,5.6,10.1,24.5,68.8,56,62.3,41.5\nGPT4o,44.6,53.6,89.2,31.5,26.6,64.4,65.9,22.2,35.8,59.7,18,9.1,9.1,15.4,35.3,73.3,63.7,64.2,43.4\nWhisper+llama3,37.8,32.8,64.8,25.2,22.8,50.3,62.6,20.4,16.5,22.8,30.1,32.6,9.7,12.9,13.9,50.4,45.9,44.9,33.1\nTyphoon,44.6,48.8,45.3,25,18.1,62.3,42.8,22.1,38.5,44.2,74.4,36.3,5,18.1,7.9,36.4,69.4,67.1,39.2`",
        "image": {
          "$type": "blob",
          "ref": {
            "$link": "bafkreieqmjyq6jhgkbogyyswea4papffs5uypwhw5pg3zajcvvbi67brei"
          },
          "mimeType": "image/jpeg",
          "size": 118945
        },
        "aspectRatio": {
          "width": 767,
          "height": 564
        }
      }
    ]
  },
  "langs": [
    "en"
  ],
  "reply": {
    "root": {
      "cid": "bafyreidefh3jrhtu6xfxyo5nh7ttbnzvx5yi4z2iqxbktmdk4rx74lxc44",
      "uri": "at://did:plc:exj4ro3xzv7id2hstt5ys5j5/app.bsky.feed.post/3lcvwrmffzc2p"
    },
    "parent": {
      "cid": "bafyreidqogh2o2puxqxm2q4ek5efkoqzqly3zhhe5xkuymqzaadcjeii5i",
      "uri": "at://did:plc:exj4ro3xzv7id2hstt5ys5j5/app.bsky.feed.post/3lcvwro3ods2p"
    }
  },
  "createdAt": "2024-12-10T00:01:34.082Z"
}