ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Post

I wonder what the limit difference between CSV and Parquet would be under real conditions, where most queries only need a tiny subset of large datasets. You could probably handle >petabyte datasets on that EC2 machine with good partitioning of Parquet or using Iceberg.

Apr 22, 2025, 10:37 PM

Record data

{
  "uri": "at://did:plc:7vr6lyprarcq5mryv2flot6h/app.bsky.feed.post/3lngqkdbg7s27",
  "cid": "bafyreiav4ymstkxxjnight6arulhi7wiqijxa7ltpzpnaqu3cuuf2wdf5m",
  "value": {
    "text": "I wonder what the limit difference between CSV and Parquet would be under real conditions, where most queries only need a tiny subset of large datasets. You could probably handle >petabyte datasets on that EC2 machine with good partitioning of Parquet or using Iceberg.",
    "$type": "app.bsky.feed.post",
    "langs": [
      "en"
    ],
    "reply": {
      "root": {
        "cid": "bafyreifotgidsvg4xngw7clarqbtkgovwyvchxct3tavanfkndlkb7jx64",
        "uri": "at://did:plc:id67xmpji7oysb7vitsodr4v/app.bsky.feed.post/3lngfssuik227"
      },
      "parent": {
        "cid": "bafyreifotgidsvg4xngw7clarqbtkgovwyvchxct3tavanfkndlkb7jx64",
        "uri": "at://did:plc:id67xmpji7oysb7vitsodr4v/app.bsky.feed.post/3lngfssuik227"
      }
    },
    "createdAt": "2025-04-22T22:37:19.011Z"
  }
}