How to Flatten a Python dict
#
We’ve all had that moment.
We are trying to analyze a nested dict
.
We start by peeling away the layers one at a time,
hoping to find the values.
Sometimes this approach works…
and sometimes you need to flatten it!.
What you’ll learn#
How to flatten a JSON-compatible
dict
in PythonHow to customize the flattening function to work with other objects
What do I mean by flattening?#
There are many ways to flatten a dict
, but what I
want is this:
I want to start with an JSON object
a nested
dict
whose keys are all stringsor a
list
I want all of the values
I also want all of the paths to those values
I want the paths to be valid Python code
Example#
Imagine a classroom with students. If I wanted to store the roster
in a dict
it could look like this:
from pathlib import Path
from typing import Any, Dict, List, Tuple, Union
import pandas as pd
roster = {
"students": [
{
"age": 25,
"name": "John",
},
{
"age": 30,
"name": "Jane",
},
],
"class": {
"title": "Philosophy 101",
"id": 12345,
},
}
Here are the paths and values I want, stored in a pandas.DataFrame
because pandas
rocks!
roster_flattened_expected = pd.DataFrame(
{
"path": [
'roster["students"][0]["age"]',
'roster["students"][0]["name"]',
'roster["students"][1]["age"]',
'roster["students"][1]["name"]',
'roster["class"]["title"]',
'roster["class"]["id"]',
],
"value": [
25,
"John",
30,
"Jane",
"Philosophy 101",
12345,
],
}
)
roster_flattened_expected
path | value | |
---|---|---|
0 | roster["students"][0]["age"] | 25 |
1 | roster["students"][0]["name"] | John |
2 | roster["students"][1]["age"] | 30 |
3 | roster["students"][1]["name"] | Jane |
4 | roster["class"]["title"] | Philosophy 101 |
5 | roster["class"]["id"] | 12345 |
Let’s Flatten Something!#
Let’s create our function that does the actual flattening. I’ll define it here, and explain parts of it later.
JsonObject = Union[Dict, List]
Paths = List[str]
Value = Any
Values = List[Value]
def flatten_json(*, obj: JsonObject, name: str) -> Tuple[Paths, Values]:
def do_flattening(
*,
obj: JsonObject,
path: Path,
paths: Paths,
values: Values,
):
obj_type = type(obj)
if dict == obj_type:
for key, value in obj.items():
new_path = f'{path}["{key}"]'
new_obj = value
do_flattening(
obj=new_obj,
path=new_path,
paths=paths,
values=values,
)
elif list == obj_type:
for i, item in enumerate(obj):
new_path = f"{path}[{i}]"
new_obj = item
do_flattening(
obj=new_obj,
path=new_path,
paths=paths,
values=values,
)
else:
paths.append(path)
values.append(obj)
paths = []
values = []
do_flattening(
obj=obj,
path=name,
paths=paths,
values=values,
)
return paths, values
def flatten_json_to_df(**kwargs) -> pd.DataFrame:
paths, values = flatten_json(**kwargs)
return pd.DataFrame({"path": paths, "value": values})
About our Function#
We name it flatten_json
to be more explicit in that it only is expected
to flatten JSON-compliant dictionaries whose keys are strings.
You’ll see that all the logic is contained in a nested function do_flattening
.
Values can be one of:
another JSON object (
dict
)a list
a value
We check the type of the value. If it is a dict
or a list
then we must dive deeper into the object. If it is anything else, then we
have reached a terminal value. As we traverse the object we keep updating
path until when we reach a terminal value, at which point we append our path and value
to the list of paths and values, respectively.
Testing our Function#
Let’s see if our function can correctly flatten our roster.
roster_flattened = flatten_json_to_df(obj=roster, name="roster")
roster_flattened
path | value | |
---|---|---|
0 | roster["students"][0]["age"] | 25 |
1 | roster["students"][0]["name"] | John |
2 | roster["students"][1]["age"] | 30 |
3 | roster["students"][1]["name"] | Jane |
4 | roster["class"]["title"] | Philosophy 101 |
5 | roster["class"]["id"] | 12345 |
assert roster_flattened.equals(roster_flattened_expected)
Yay! Looks like it works.
Customizing the Flattening Function#
You may be thinking, “Great, but I need a slightly different function”. The good news is that you can use the flattening function as a template and customize it for your custom needs. The logic of the function should remain relatively unchanged regardless of the type of object.
Flattening Paths#
For example, imagine if we just wanted to get all of the files residing in a specific folder:
def flatten_path(*, path: Path) -> List[Path]:
def do_flattening(
*,
path: Path,
paths: List[Path],
):
if path.is_dir():
for new_path in path.iterdir():
do_flattening(
path=new_path,
paths=paths,
)
else:
paths.append(path)
paths = []
do_flattening(
path=path,
paths=paths,
)
return paths
flatten_path(path=Path("./"))
[PosixPath('pydantic_autodoc.py'),
PosixPath('sudoku.py'),
PosixPath('maze.py'),
PosixPath('wordle.py'),
PosixPath('function_dispatch.py'),
PosixPath('images.py'),
PosixPath('image_boundaries.py'),
PosixPath('good_documentation.py'),
PosixPath('res/astronaut.png'),
PosixPath('res/maze.png'),
PosixPath('music.py'),
PosixPath('xarray_computations.py'),
PosixPath('how_to_flatten_a_python_dict.py')]
You’ll see that we can express our logic as:
if our path is a directory
get all the entries in the directory and flatten those
if our path is a file (i.e. a terminal value)
append our file to the list of paths
The nice thing about Path
objects is that we don’t need
any complicated objects for updating the path as we traverse
the directories since this information is inherently part of a Path
object.
Hooray for making things easy!
Conclusion#
I hope you’ve learnt a great deal about flattening nesting
dict
s in Python, and I also hope you can take that knowledge
with you in your quest to flatten other objects.
Thanks for reading!