pydantic-cereal¤
Advanced serialization for Pydantic models¤
Pydantic is the most widely used data validation library for Python. It uses type hints/type annotations to define data models and has quite a nice "feel" to it. Pydantic V2 was released in June 2023 and brings many changes and improvements, including a new Rust-based engine for serializing and validating data.
This package, pydantic-cereal, is a small extension package that enables users to serialize Pydantic
models with "arbitrary" (non-JSON-fiendly) types to "arbitrary" file-system-like locations.
It uses fsspec to support generic file systems.
Writing a custom writer (serializer) and reader (loader) with fsspec URIs is quite straightforward.
You can also use universal-pathlib's
UPath with pydantic-cereal.
Usage Examples¤
See the minimal pure-Python example to learn how to wrap your own type. Below is a preview of this example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | |
For wrapping 3rd-party libraries, see the Pandas dataframe example.
Key Concepts¤
The Cereal class is the main interface, handling type registration and I/O.
You should define one global object of this type, i.e. cereal = Cereal() within your modules.
You can save or load Pydantic models via cereal to any fsspec URI, universal-pathlib UPath or -
for more complicated cases - to a path within a pre-made fsspec file system.
The serialization format is a directory under the given path, containing the model.json (main fields
and metadata), model.schema.json (JSON schema for your Pydantic model, as a nice-to-have) and
the serialized objects that are not representable in JSON.
In order for cereal and Pydantic to know how to serialize your non-JSON-compatible type, you must:
- wrap your type with
cereal.wrap_type(type, reader, writer); - specify
arbitrary_types_allowedin your Pydantic model configuration; - add the wrapped type as fields in your Pydantic model;
- use
cereal.write_model(mdl, path)to write your model object to the given fsspec path.
Note
You can easily wrap types from 3rd-party classes. Since you don't need to re-define the type, it will "just work" outside of the Pydantic models.
You can even specify different serialization mechanisms for different fields that have the same type; just wrap your types multiple times with different readers/writers.
How It Works¤
Under the hood, pydantic-cereal uses the new
functional serializers
that are available in Pydantic V2 and use
typing.Annotated
(or typing_extensions.Annotated).
Note
When you use wrap_type, the only "wrapping" done is adding metadata via
typing.Annotated.
The class itself isn't changed, and nothing will change in terms of your code, IDE or type checkers!
The Cereal class uses a context to convert the objects-to-write into
JSON-compatible metadata, then call the respective writer
(CerealWriter-compatible) functions.
When reading, the Cereal object imports your Pydantic model class and any reader
(CerealReader-compatible) functions for your wrapped types,
then reads the objects from the fsspec URIs, and plugs them into your model.
Limitations¤
- Your
cerealobject doesn't necessarily have to be a global, but the same instance must be used to both register your model type and write your object (and ideally read too). - You can't define your types or Pydantic model inside a function, because
pydantic-cerealrelies on importing your type by dotted name (package.module.MyType), and we can't import local variables. Instead, define it in a top-level module, then import it. - When running from the REPL or in a Jupyter notebook, the module name for your class definitions
will be
__main__. This means your saved objects will only be loadable in the same kind of session. We strongly recommend you move your code to a Python package structure ASAP to avoid issues.