ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

By June 2023, the data team had compiled Dolma, a dataset of 3 trillion tokens, ready to train a language model. Dolma was formed from a diverse mix of web content, academic publications, code, books, and encyclopedic materials, all acquired through a transparent process.

May 6, 2025, 8:55 PM

Record data

{
  "uri": "at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3lojrfdvhc32c",
  "cid": "bafyreidydaqrymrm2x7scevrdattsmysbe27ogbzn45fcgbnlfdx7uhuua",
  "value": {
    "text": "By June 2023, the data team had compiled Dolma, a dataset of 3 trillion tokens, ready to train a language model. Dolma was formed from a diverse mix of web content, academic publications, code, books, and encyclopedic materials, all acquired through a transparent process.",
    "$type": "app.bsky.feed.post",
    "langs": [
      "en"
    ],
    "reply": {
      "root": {
        "cid": "bafyreifmvesjgu7dbxpy7x6q72n6wmwkgvvj6ae6awysunnjhi6vbdpk3y",
        "uri": "at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3lojrfdvahc2c"
      },
      "parent": {
        "cid": "bafyreicxozwbtkc4av56ew5y2nstzxx4h62wnwadeaxtj5xjmw4kgolrzq",
        "uri": "at://did:plc:i4kytxgsu3yfsrt2ml3o7tgq/app.bsky.feed.post/3lojrfdvhc22c"
      }
    },
    "createdAt": "2025-05-06T20:55:36.476Z"
  }
}