>_
EngineeringNotes
← Back to All Backend Concepts
Concept 04

Serialization & Deserialization

The process of converting complex data structures into a common format for network transmission and storage, and reconstructing them back.

01

Definition & Importance

Serialization is the process of converting complex data structures or internal object states (like a JavaScript Object or a Python Dictionary) into a structured, byte-stream format that can be easily stored or transmitted across a network.

Deserialization is the exact inverse: taking that transmitted data format and reconstructing it back into usable native objects in the destination environment.

🌐 The Importance of Language-Agnostic Communication

Modern applications are highly distributed. A React (JavaScript) frontend might need to communicate with a Rust or Go backend. Since these languages cannot inherently understand each other's native memory structures, serialization acts as the universal translator. By converting data into a shared format before transmission, any language can correctly reconstruct the data upon receiving it.

02

OSI Model Context

While we write software at the top of the networking stack, understanding the full lifecycle of data transmission is a critical mental model for backend engineers.

  • Serialization primarily happens at the Application / Presentation Layer. We convert an object into JSON or Protobuf.
  • As the data travels down the OSI stack, it gets wrapped in HTTP headers, TCP segments, and IP packets.
  • At the very bottom (the Physical Layer), the serialized data is fundamentally translated into raw electrical signals, light pulses, or radio waves (1s and 0s) for actual physical transmission across the globe.
1. JS Object (App Layer)
↓ Serialize ↓
2. JSON String
↓ HTTP / TCP / IP ↓
3. Packets / Frames
↓ Encode ↓
4. Physical Bits (010010...)
03

Serialization Standards

There are dozens of serialization formats, but they generally fall into two major categories: Text-Based and Binary-Based.

📄 Text-Based Formats

  • JSON: The undisputed king of the web. Lightweight, highly human-readable, and native to JavaScript. The default choice for HTTP communication.
  • XML: Older, heavily tag-based format (looks like HTML). Verbose and strict, primarily used in legacy enterprise systems (SOAP APIs).
  • YAML: Extremely clean, indentation-based format. Often preferred for configuration files rather than dynamic web APIs due to parsing complexity.

Binary Formats

Binary formats compile data into highly compressed byte arrays. They are strictly machine-readable (not human-readable), but they are vastly faster and smaller over the network.

  • Protocol Buffers (protobuf): Created by Google. Requires a predefined "schema" (a .proto file). Widely used in highly performant microservice architectures like gRPC.
  • MessagePack / BSON: Binary alternatives to JSON that serialize much faster while maintaining dynamic typing without strict schemas.
04

Deep Dive into JSON

user.json
{ "id": 1042, "username": "dev_architect", "isActive": true, "roles": [ "admin", "editor" ], "preferences": { "theme": "dark", "notifications": false }, "lastLogin": null }

JSON (JavaScript Object Notation) is the backbone of modern web APIs. Despite the name, it is entirely language-independent and extremely lightweight.

📜 Standard Rules & Support
  • Strict Quotes: All keys must be wrapped in double quotes ("key", never 'key' or unquoted).
  • Data Types: Natively supports Strings, Numbers, Booleans (true/false), and null.
  • No Functions: JSON cannot store executable code or functions, only raw declarative data.
  • Nesting: Fully encapsulates complexity by supporting deeply nested Objects ({...}) and Arrays ([... ]).
// In Native JavaScript:
JSON.stringify(obj); → Serialize (Object to String)
JSON.parse(str); → Deserialize (String to Object)