Autodocumenting Pydantic Models#
First, a shout-out to pydantic. pydantic is a wonderful tool that I find myself using quite often! Honestly, I probably use it too much… but who cares! I love pydantic!
What do I mean by autodocumenting#
What I mean by autodocumenting pydantic models is:
automatically setting the __doc__
string of pydantic models.
The idea originated from this GitHub issue.
Esentially, the idea is that, since the pydantic model already
knows a lot of information about itself (such as the field types, descriptions, etc.),
then why can’t a model’s __doc__
string be created automatically by pydantic
?
Sounds like a reasonable thing, right?
However, there are many reasons why it wouldn’t be wise to
integrate this functionality into pydantic.
First of all, generating a docstring could be really expensive,
and could decrease performance. Secondly, there is little consensus
on how __doc__
strings should be formatted.
For example, some of the formatting styles are numpy-style, google-style, rst-style, etc.
Some existing solutions#
This tutorial wouldn’t be complete without first acknoweldging that there are existing tools that can generate documentation for pydantic models.
Most notable of all is autodoc_pydantic.
This tool won’t modify a pydantic model’s __doc__
string,
but it wil arguably do more… it wiL generate really beautiful Sphinx
documentation for your pydantic classes!
Warning
So, if you are already using Sphinx for documentation, then please consider using autodoc_pydantic instead of following the advice in this tutorial.
How autodocumentation will work#
Since we want this behaviour of auto-generating __doc__
strings
to effect all of our models, we will follow the advice given by pydantic on
how to change behaviour of models globally.
We define our own BaseModel that provides us with the __doc__
string functionality,
and all our models will be subclasses of this BaseModel.
We will also follow the advice from
this GitHub issue comment
(which was also mentioned above) for how to implement this __doc__
string functionality.
The idea is, since all of our models will be subclasses of a custom BaseModel,
then we can take advantage of __init_subclass__
to perform the set the __doc__
string
of all subclasses.
Pydantic autodoc template#
This is the template of what our solution will look like. There are many ways to solve this problem, but this template will serve as our starting point.
from typing import Type
import pydantic as pd
from rich.markdown import Markdown
def generate_docs(cls: Type[pd.BaseModel]) -> str:
doc = "Auto-generated docs!\n"
doc += f"Model name : {cls.__name__}\n"
doc += f"Fields: {', '.join(cls.__fields__)}\n"
return doc
class BaseModel(pd.BaseModel):
def __init_subclass__(cls: Type[pd.BaseModel]) -> None:
cls.__doc__ = generate_docs(cls)
class Person(BaseModel):
name: str
age: int
print(Person.__doc__)
Auto-generated docs!
Model name : Person
Fields:
/tmp/ipykernel_254/3186100914.py:10: PydanticDeprecatedSince20: The `__fields__` attribute is deprecated, use `model_fields` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.4/migration/
doc += f"Fields: {', '.join(cls.__fields__)}\n"
You’ll see that there is a free function which takes a
BaseModel Class, and generates the doc-string for that class.
We define a custom BaseModel for our project.
In the __init_subclass__
function the __doc__
string gets set.
All of our models inherit from this BaseModel
and will thereby have their docstrings set automatically!
One More Example#
Let’s go a little crazy and do some more interesting stuff!
class AutoDocBase(pd.BaseModel):
class Doc:
short_description: str
long_description: str
def generate_docs_markdown(cls: Type[AutoDocBase]) -> str:
doc = ""
doc += f"# {cls.__name__}\n\n"
doc += f"{cls.Doc.short_description}\n\n"
doc += f"{cls.Doc.long_description}\n\n"
doc += "## Fields\n\n"
for name, field in cls.__fields__.items():
field_info: pd.fields.FieldInfo = field.field_info
doc += f"### {name}\n\n"
doc += f"{field_info.description}\n\n"
for constraint in field_info.get_constraints():
doc += (
f"* constraint : `{constraint} = {getattr(field_info, constraint)}`\n\n"
)
return doc
class BaseModel(AutoDocBase):
def __init_subclass__(cls: Type[AutoDocBase]) -> None:
cls.__doc__ = generate_docs_markdown(cls)
class Person(BaseModel):
class Doc:
short_description = "Short description of a Person."
long_description = "Long description of a person."
name: str = pd.Field(
..., description="Name of a person.", pattern=r"[A-Z][a-zA-Z\s]+"
)
age: int = pd.Field(..., description="Age of a person.", gt=0, lt=150)
print(Person.__doc__)
# Person
Short description of a Person.
Long description of a person.
## Fields
/tmp/ipykernel_254/30684553.py:13: PydanticDeprecatedSince20: The `__fields__` attribute is deprecated, use `model_fields` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.4/migration/
for name, field in cls.__fields__.items():
Markdown(Person.__doc__)
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓ ┃ Person ┃ ┗━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┛ Short description of a Person. Long description of a person. Fields
Conclusion#
So, I hope you’ve seen that it’s pretty easy to customize the look and feel
of the __doc__
string for your pydantic models.
But, please, use autodoc_pydantic instead.
since it integrates really nicely into Sphinx documentation.
Besides, do people really look at the __doc__
string anymore?
They are much more likely to browse the Sphinx documentation.