ATProto Browser

ATProto Browser

Experimental browser for the Atmosphere

Record data

{
  "uri": "at://did:plc:7mnpet2pvof2llhpcwattscf/beauty.piss.blog.entry/3lmkzod2nl52l",
  "cid": "bafyreihfefnzcpdvy4zg342a3sdl26jugsewnyum5w2a3azw6y6ud2dkiq",
  "value": {
    "tags": [
      "parsing",
      "rust",
      "dev"
    ],
    "$type": "beauty.piss.blog.entry",
    "title": "Deserializing Stringified Data",
    "content": "# deserializing stringified data\n\nsometimes, upstream data you rely on might have fields with structured data that's been stringified, which is less than ideal. \n\nin my experience, this usually occurs as a result of trying to glue disparate systems together: at some point in that process, someone decides to squeeze structured data through an interface that only supports strings, or that couldn't handle variations in the JSON otherwise, and then you, the downstream consumer, have to live with it. \n\nFor example, for some reason it's not uncommon to encounter [Ethereum contract ABI json](https://docs.soliditylang.org/en/latest/abi-spec.html#json) as a string.\n\n## doing it the low-friction way\n\nsay we have this incoming JSON:\n\n```json\n{\"data\": \"{\\\"key\\\": 1, \\\"enabled\\\": false}\"}\n```\n\nas you can see, the `data` field's value is a stringified JSON object, which in its un-stringified form would be described by this Rust struct:\n\n```rust\n#[derive(Deserialize)]\nstruct Metadata {\n    key: i32,\n    enabled: bool,\n}\n```\n\nthe easiest way to work around this if it comes up in day to day development is just to accept a secondary parsing step, where `String` is the initial type of the stringified field, as shown in `deser_two_step` here:\n\n```rust\n#[derive(Deserialize)]\n#[serde(bound(deserialize = \"T: serde::de::Deserialize<'de>\"))]\npub(crate) struct Outer<T> {\n    data: T,\n}\n\nfn deser_two_step<T: DeserializeOwned>(json: &str) -> Result<Outer<T>, serde_json::Error> {\n    let outer: Outer<String> = serde_json::from_str(json)?;\n    let data = serde_json::from_str(&outer.data)?;\n    Ok(Outer { data })\n}\n\nlet json_str = r#\"{\"data\": \"{\\\"key\\\": 1, \\\"enabled\\\": false}\"}\"#;\n\nlet parsed: Outer<Metadata> = deser_two_step::<Metadata>(json_str).unwrap();\nassert_eq!(parsed.data.key, 1);\nassert_eq!(parsed.data.enabled, false);\n```\n\nthis is a lot like the approach discussed in the previous [filter parsing JSON](filter-parsing-json) post - parse first to something infallible, then do your secondary parse. it works OK, but you allocate & parse twice for that field - once to `String`, then to your `T`. It also is less than ideal in terms of developer experience.\n\nwe could do better!\n\n## custom deserializers\n\nby implementing a custom [serde deserializer](https://serde.rs/impl-deserializer.html), we can customize the deserialization of a given field. we can use this to move the parsing logic out of our consuming code!\n\n\n### first draft\nwhat would be the simplest custom `deserialize_with` function we could implement that works for stringified JSON?\n\n```rust\npub fn deser_stringified_json<'de, D, T>(deserializer: D) -> Result<T, D::Error>\nwhere\n    D: Deserializer<'de>,\n    T: DeserializeOwned,\n{\n    let s: String = Deserialize::deserialize(deserializer)?;\n    serde_json::from_str(&s).map_err(serde::de::Error::custom)\n}\n```\n\nwhich can then be used like this, by annotating the `data` field:\n```rust\nlet json = r#\"{\"data\": \"{\\\"key\\\": 1, \\\"enabled\\\": false}\"}\"#;\n\n#[derive(Deserialize)]\nstruct Example<T: DeserializeOwned> {\n    #[serde(deserialize_with = \"deser_stringified_json\")]\n    data: T,\n}\n\nlet parsed: Example<Metadata> = serde_json::from_str(json).unwrap();\nassert_eq!(parsed.data.key, 1);\nassert_eq!(parsed.data.enabled, false);\n```\n\nOK, that's pretty simple, and it works! check out [this playground](https://play.rust-lang.org/?version=stable&mode=release&edition=2024&gist=13c80d91911a25ddc08730cafdfbc770) to run the code.\n\nthis has improved our developer experience, as now we don't have to directly deal with the stringified form in our consuming code, and the parsing is one step. great, right ?\n\nbut we still have one issue remaining, the same as we had with the naive approach of 2-step parsing: *double allocation*.\n\n### customizing the `Visitor`\n\nI asked if there was a better way of doing this, and got a great answer: customize the entire deserializer!\n\nhow does that change our `deser_stringified_json` function?\n\n```rust\npub fn deser_stringified_json<'de, D, T>(deserializer: D) -> Result<T, D::Error>\nwhere\n    D: Deserializer<'de>,\n    T: DeserializeOwned,\n{\n    struct StringifiedJsonVisitor<T>(PhantomData<T>);\n\n    impl<'de, T> Visitor<'de> for StringifiedJsonVisitor<T>\n    where\n        T: DeserializeOwned,\n    {\n        type Value = T;\n\n        fn expecting(&self, formatter: &mut fmt::Formatter) -> fmt::Result {\n            formatter.write_str(\"a string containing JSON data\")\n        }\n\n        fn visit_str<E>(self, value: &str) -> Result<Self::Value, E>\n        where\n            E: serde::de::Error,\n        {\n            serde_json::from_str(value).map_err(E::custom)\n        }\n\n        fn visit_borrowed_str<E>(self, value: &'de str) -> Result<Self::Value, E>\n        where\n            E: serde::de::Error,\n        {\n            serde_json::from_str(value).map_err(E::custom)\n        }\n    }\n\n    deserializer.deserialize_str(StringifiedJsonVisitor(PhantomData))\n}\n```\n\nthis implementation of [serde's `Visitor<'de>`](https://serde.rs/impl-deserialize.html#the-visitor-trait) makes it so that JSON is parsed from the stringified form _without any intermediate copying_: it parses directly from a `&str` that represents some JSON field contents. \n\nit's used exactly the same as the previous implementation, but now doesn't cause extra allocations. check out the [playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2024&gist=115e73f3e4bc501982bbbe285b121ea3) to run the code.\n\nso, **mission accomplished** ? almost!\n\nonce the pattern is established for one format, it's pretty trivial to make it work generically.\n\n## making it generic: `deser-stringified`\n\n[`deser-stringified`](https://crates.io/crates/deser-stringified) is a new crate that implements this utility in a way that's generic across data formats. initially, it supports JSON, YAML, and TOML. hopefully, it makes implementing the techniques discussed above not needed.\n\nthis is a personal itch i wanted to scratch, as i've run into this stringified parsing problem multiple times. please see the repo / crate for more information on it if you're interested.\n\n#### caio!",
    "createdAt": "2025-04-11T22:24:03.558Z"
  }
}