ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

I wanted to post of a few of my favorite #EMNLP2024 papers, starting with a couple in tokenization. Fishing For Magicarp explores the problem of undertrained "glitch" tokens, and how they can be identified from their embedding vectors. aclanthology.org/2024.emnlp-m...

Dec 20, 2024, 6:45 PM

{
  "text": "I wanted to post of a few of my favorite #EMNLP2024 papers, starting with a couple in tokenization. Fishing For Magicarp explores the problem of undertrained \"glitch\" tokens, and how they can be identified from their embedding vectors.  aclanthology.org/2024.emnlp-m...",
  "$type": "app.bsky.feed.post",
  "embed": {
    "$type": "app.bsky.embed.external",
    "external": {
      "uri": "https://aclanthology.org/2024.emnlp-main.649/",
      "thumb": {
        "$type": "blob",
        "ref": {
          "$link": "bafkreieokywoejc62j3nlmeyoymje6s5lzug4qq7xnlc47nplt5nx4mdfq"
        },
        "mimeType": "image/jpeg",
        "size": 843740
      },
      "title": "Fishing for Magikarp: Automatically Detecting Under-trained Tokens in Large Language Models",
      "description": "Sander Land, Max Bartolo. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing. 2024."
    }
  },
  "langs": [
    "en"
  ],
  "facets": [
    {
      "index": {
        "byteEnd": 51,
        "byteStart": 41
      },
      "features": [
        {
          "tag": "EMNLP2024",
          "$type": "app.bsky.richtext.facet#tag"
        }
      ]
    },
    {
      "index": {
        "byteEnd": 269,
        "byteStart": 237
      },
      "features": [
        {
          "uri": "https://aclanthology.org/2024.emnlp-main.649/",
          "$type": "app.bsky.richtext.facet#link"
        }
      ]
    }
  ],
  "createdAt": "2024-12-20T18:45:49.349Z"
}