pydantic-cereal
¤
Advanced serialization for Pydantic models¤
Pydantic is the most widely used data validation library for Python. It uses type hints/type annotations to define data models and has quite a nice "feel" to it. Pydantic V2 was released in June 2023 and brings many changes and improvements, including a new Rust-based engine for serializing and validating data.
This package, pydantic-cereal
, is a small extension package that enables users to serialize Pydantic
models with "arbitrary" (non-JSON-fiendly) types to "arbitrary" file-system-like locations.
It uses fsspec
to support generic file systems.
Writing a custom writer (serializer) and reader (loader) with fsspec
URIs is quite straightforward.
You can also use universal-pathlib
's
UPath
with pydantic-cereal
.
Usage Examples¤
See the minimal pure-Python example to learn how to wrap your own type. Below is a preview of this example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 |
|
For wrapping 3rd-party libraries, see the Pandas dataframe example.
Key Concepts¤
The Cereal
class is the main interface, handling type registration and I/O.
You should define one global object of this type, i.e. cereal = Cereal()
within your modules.
You can save or load Pydantic models via cereal
to any fsspec
URI, universal-pathlib
UPath
or -
for more complicated cases - to a path within a pre-made fsspec
file system.
The serialization format is a directory under the given path, containing the model.json
(main fields
and metadata), model.schema.json
(JSON schema for your Pydantic model, as a nice-to-have) and
the serialized objects that are not representable in JSON.
In order for cereal
and Pydantic
to know how to serialize your non-JSON-compatible type, you must:
- wrap your type with
cereal.wrap_type(type, reader, writer)
; - specify
arbitrary_types_allowed
in your Pydantic model configuration; - add the wrapped type as fields in your Pydantic model;
- use
cereal.write_model(mdl, path)
to write your model object to the given fsspec path.
Note
You can easily wrap types from 3rd-party classes. Since you don't need to re-define the type, it will "just work" outside of the Pydantic models.
You can even specify different serialization mechanisms for different fields that have the same type; just wrap your types multiple times with different readers/writers.
How It Works¤
Under the hood, pydantic-cereal
uses the new
functional serializers
that are available in Pydantic V2 and use
typing.Annotated
(or typing_extensions.Annotated
).
Note
When you use wrap_type
, the only "wrapping" done is adding metadata via
typing.Annotated
.
The class itself isn't changed, and nothing will change in terms of your code, IDE or type checkers!
The Cereal
class uses a context to convert the objects-to-write into
JSON-compatible metadata, then call the respective writer
(CerealWriter
-compatible) functions.
When reading, the Cereal
object imports your Pydantic model class and any reader
(CerealReader
-compatible) functions for your wrapped types,
then reads the objects from the fsspec
URIs, and plugs them into your model.
Limitations¤
- Your
cereal
object doesn't necessarily have to be a global, but the same instance must be used to both register your model type and write your object (and ideally read too). - You can't define your types or Pydantic model inside a function, because
pydantic-cereal
relies on importing your type by dotted name (package.module.MyType
), and we can't import local variables. Instead, define it in a top-level module, then import it. - When running from the REPL or in a Jupyter notebook, the module name for your class definitions
will be
__main__
. This means your saved objects will only be loadable in the same kind of session. We strongly recommend you move your code to a Python package structure ASAP to avoid issues.