ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Record data

{
  "uri": "at://did:plc:7mnpet2pvof2llhpcwattscf/beauty.piss.blog.entry/3lned7a6njk23",
  "cid": "bafyreibjcqpulpkpjopwvamp7j4we3tu4wjbxp3j2lsltg5fi6eb4zcipy",
  "value": {
    "tags": [
      "rust",
      "parsing",
      "dev"
    ],
    "$type": "beauty.piss.blog.entry",
    "title": "Set Member Parsing",
    "content": "# Set Member Parsing\n\nA fairly common scenario in writing web servers is the need to check that some parameter value in a user request is a member of a set - a list of datasets, a machine learning model name,  etc.\nWe do this because some operation further inside the server needs to operate with the assumption the values it's using won't cause bad behavior. \n\nlet's look at some ways of doing this, and try to arrive at a better one, following the principle \"[parse, don't validate](https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/)\". We'll use a real example from my work: there are unsigned 64-bit integer keys known as [chain IDs](https://chainlist.org/), which uniquely identify a given blockchain. \nWe use these as parameters in some HTTP routes, and we only support a subset of them, so we must return an error to the request if one we don't support is requested.\n\ncompiling code for this post is available [on my GitHub](https://github.com/stella3d/set-parsing-example/tree/main/src).\n\n\n## easy mode: compile-time sets\n\nIf we know the full set of possible valid values at compile time, implementing this behavior efficiently is mostly obvious: use an enum. We'll also use the `TryFromPrimitive` derive from `num_trait` to provide us with an easy way to convert from a raw `u64` value.\n\n```rust\n#[derive(Debug, TryFromPrimitive)]\n#[repr(u64)]\nenum SupportedChainIdEnum {\n    Ethereum = 1,\n    Base = 8543\n}\n``` \n\nthis is simplest with integers, since we can use `#[repr(u64)]` to make sure our enum is represented as a `u64`, but it's possible with strings as well, using a crate like [strum](https://crates.io/crates/strum).\n\nhow should we use this enum in our route ? a common strategy might be: \n\n_we have an integer input in the query string, which we will validate as being in the enum at the beginning of the route_\n\nand implementing that looks like this. note: examples use [`poem_openapi`](https://docs.rs/poem-openapi/latest/poem_openapi/index.html)'s way of declaring route handlers. \nhere, the `Query(chain_id): Query<u64>` means that we expect a `u64` argument in place of the `:chain_id` placeholder in the route path.\n\n```rust\n#[oai(path = \"/:chain_id/supported\", method = \"get\")]\nasync fn chain_supported(\n    &self,\n    Query(chain_id): Query<u64>,\n) -> poem::Result<()> {\n    let id = SupportedChainIdEnum::try_from(chain_id)\n        .map_err(|_| poem::Error::from_string(\n            format!(\"unsupported chain ID: {}\", chain_id),\n            StatusCode::BAD_REQUEST,\n        ))?;\n\n    inner_handler(id) // do your actual route handler\n}\n```\n\nthis is analagous to how you would commonly do it if you are writing in a language like Typescript:\n\n```typescript\nimport type { Request, Response } from 'express';\n\nenum SupportedChainId {\n  Ethereum = 1,\n  Base = 8543,\n}\n\nconst validChainIds = new Set<number>(\n  Object.values(SupportedChainId).filter((v) => typeof v === 'number') as number[]\n);\n\nasync function chainSupported(\n  req: Request<{}, any, any, { chain_id: string }>,\n  res: Response\n): Promise<Response> {\n  const chainId = Number(req.query.chain_id);\n\n  if (!validChainIds.has(chainId)) {\n    return res.status(400).json({ error: `unsupported chain ID: ${chainId}` });\n  }\n\n  return await innerHandler(res, chainId);\n}\n```\n\n\nthis is pretty decent - our validation of the input parameter is just one line of boilerplate at the start of the route, we get a solid type out of it, hard to complain.\nhowever, it's annoying to write that line over and over if we have a lot of routes using it. can we improve this?\n\nwhat if our enum could be understood directly in the route? it can be!\n\n```rust\nasync fn chain_supported(\n    &self,\n    // if a request passes an unsupported u64, it's now an error parsing this enum,\n    // and the HTTP framework will return an appriate error to the requester\n    Query(chain_id): Query<SupportedChainId>,\n) -> poem::Result<()> {\n    inner_handler(chain_id) // do your actual route handler\n}\n```\n\nNow _this_ is nice, right? It's impossible for any of our route handling code to get an invalid value for `chain_id`, bad requests get appropriate errors returned, and there's no validation boilerplate in the routes. \n\nUnfortunately, there are a lot of sets we'd like to check membership of that are not possible to know at compile time, so this enum approach can't be extended to those scenarios. How can we make a type that will provide us the same safety and ergonomics, but work with runtime-only sets? We'll come back to this in a moment.\n\nUsing a stronger type directly in the route declaration is possible thanks to our http framework implementing support for parsing common data types by default. How does the framework know how to parse a given type from an http request anyway?\n\n## parameter parsing\n\nAll values parsed from an HTTP request must be parsed from a string representing some part of the request.\n\nSo, [`poem_openapi`](https://docs.rs/poem-openapi/latest/poem_openapi/index.html) provides a trait [`ParseFromParameter`](https://docs.rs/poem-openapi/5.1.13/poem_openapi/types/trait.ParseFromParameter.html) with a single required method that looks exactly like you might expect: take in `&str`, and fallibly return `Self`.\n\n```rust\npub trait ParseFromParameter: Sized + Type {\n    fn parse_from_parameter(value: &str) -> ParseResult<Self>;\n}\n```\n\n`ParseResult` is a result type from poem that will return a 400 error to the requester when thrown.\n\nwe can implement this trait for our own types, the data types supported by default (like our `u64` enum) aren't special.\n\ncould a custom type get us the goals i mentioned above for runtime sets? probably!\n\n## runtime sets\n\nlet's say that instead of our `SupportedChainId` enum, our server loads a file `supported_chains.json` at startup from a config file. this file provides similar information to our enum, with the following contents:\n\n```json\n[1, 10, 100, 8543, 42161]\n```\n\nwhat does that setup step look like in Rust? \n\n### setup\n\nwe'll use a [`LazyLock`](https://doc.rust-lang.org/std/sync/struct.LazyLock.html), which was recently stabilized in Rust 1.80, to initialize a global value. we rely on `serde_json` to parse the json array into either a `HashSet` or a `BTreeSet`, depending on your use case. You can easily extend this approach to further use cases by using a `Map` instead of a `Set`.\n\n```rust\nuse std::{collections::BTreeSet, path::Path};\n// BTreeSet is faster than HashSet with small number of int keys\npub static SUPPORTED_CHAIN_IDS: LazyLock<BTreeSet<u64>> =\n    LazyLock::new(|| {\n        let path = Path::new(\"./chain_ids.json\");\n        serde_json::from_reader(\n            File::open(path).expect(\"can't open ./chain_ids.json\"),\n        )\n        .expect(\"failed to parse chain IDs file\")\n    });\n```\n\nthere's also the slightly more complex case, of a server that needs to re-load the configuration every so often without restarting itself. we're going to ignore this case, as multithreading synchronization practices aren't the point of this post.\n\nnow that we're done with setup, we can return to our original problem.\n\n### in-route comparison\n\nbefore get to writing the nice version of this, let's look at what the most obvious version might be, one that parallels the first implementation we did with an enum.\n\n```rust\nasync fn chain_supported(\n        &self,\n        Query(chain_id): Query<u64>,\n    ) -> poem::Result<()> {\n        match SUPPORTED_CHAIN_IDS.contains(&number) {\n            true => {\n                // call actual route handler here\n                inner_handler(SupportedChainId(number))\n            }\n            false => Err(poem::Error::from_string(\n                format!(\"unsupported chain ID: {}\", chain_id),\n                StatusCode::BAD_REQUEST,\n            )),\n        }\n    }\n```\n\nthe route declares the primitive type, then we _validate_ the value. let's see how we could _parse_ the value instead.\n\n### encoding support in a type\n\nlet's use everything we just discussed to solve our problem. first, our struct definition.\n\n```rust\n#[derive(\n    Serialize, Deserialize, PartialEq, Eq, poem_openapi::NewType,\n)]\n#[oai(from_parameter = false)]\n#[repr(transparent)]\n/// a currently supported chain ID\npub struct SupportedChainId(pub u64);\n```\n\n_**note**_: this type definition isn't strict about preventing the type from being constructed with arbitrary numbers by consuming code, so it's not %100 airtight. this sort of strictness isn't the point of the post, but it's something to think about if you use this sort of code in production.\n\nthe `NewType` derive tells poem to get the type/parsing implementation for `SupportedChainId` by looking at `u64`, which it is a newtype of.\n\nthe `#[oai(from_parameter = false)]` prevents the `NewType` derive from providing `u64`'s implementation of `ParseFromParameter`, because customizing that impl is the key to this whole thing. let's do that now.\n\n```rust\nimpl ParseFromParameter for SupportedChainId {\n    fn parse_from_parameter(value: &str) -> ParseResult<Self> {\n        // first, parse to plain u64, propagating any error\n        let number = value.parse::<u64>()\n            .map_err(|_| ParseError::custom(\"must be a uint64\"))?;\n\n        // then, use the static set we setup earlier to check support\n        match SUPPORTED_CHAIN_IDS.contains(&number) {\n            // wrapping in our type encodes that we've checked the number\n            true => Ok(SupportedChainId(number)),\n            // represent lack of support as a parsing error\n            false => Err(ParseError::custom(\n                format!(\"unsupported chain ID: {}\", number)\n            )),\n        }\n    }\n}\n```\n\nyou may notice that the parsing implementation also includes the errors that should be returned to the requester in case parsing goes wrong.\n\nWhat does our route definition look like if we use this new type? _Exactly the same_ as when using an enum - only the definition of `SupportedChainId` has changed.\n\n```rust\nasync fn chain_supported(\n    &self,\n    Query(chain_id): Query<SupportedChainId>,\n) -> poem::Result<()> {\n    inner_handler(chain_id) // do your actual route handler\n}\n```\n\nnow, we've arrived at our goals:\n\n* our route code doesn't validate the ID: we parse it as part of a type, directly in the route definition\n* it works with sets defined at runtime in the same way as at compile time\n* it's efficient enough to use frequently\n\n## _caio!_\n\ni intend to write more on the topic of strong types at serialization boundaries, since i think it's a very practical way to embrace \"parse, don't validate\". thanks for reading.",
    "createdAt": "2025-04-21T23:32:56.366Z"
  }
}