Experimental browser for the Atmosphere
I wonder what the limit difference between CSV and Parquet would be under real conditions, where most queries only need a tiny subset of large datasets. You could probably handle >petabyte datasets on that EC2 machine with good partitioning of Parquet or using Iceberg.
Apr 22, 2025, 10:37 PM
{ "uri": "at://did:plc:7vr6lyprarcq5mryv2flot6h/app.bsky.feed.post/3lngqkdbg7s27", "cid": "bafyreiav4ymstkxxjnight6arulhi7wiqijxa7ltpzpnaqu3cuuf2wdf5m", "value": { "text": "I wonder what the limit difference between CSV and Parquet would be under real conditions, where most queries only need a tiny subset of large datasets. You could probably handle >petabyte datasets on that EC2 machine with good partitioning of Parquet or using Iceberg.", "$type": "app.bsky.feed.post", "langs": [ "en" ], "reply": { "root": { "cid": "bafyreifotgidsvg4xngw7clarqbtkgovwyvchxct3tavanfkndlkb7jx64", "uri": "at://did:plc:id67xmpji7oysb7vitsodr4v/app.bsky.feed.post/3lngfssuik227" }, "parent": { "cid": "bafyreifotgidsvg4xngw7clarqbtkgovwyvchxct3tavanfkndlkb7jx64", "uri": "at://did:plc:id67xmpji7oysb7vitsodr4v/app.bsky.feed.post/3lngfssuik227" } }, "createdAt": "2025-04-22T22:37:19.011Z" } }