How can I parse a YAML file in Python?
10 Answers
The easiest method without relying on C headers is PyYaml (documentation), which can be installed via pip install pyyaml
:
import yaml
with open("example.yaml") as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
yaml.load()
also exists, but yaml.safe_load()
should always be preferred to avoid introducing the possibility for arbitrary code execution. So unless you explicitly need the arbitrary object serialization/deserialization use safe_load
.
The PyYaml project supports versions up through the YAML 1.1 specification. If YAML 1.2 specification support is needed, see ruamel.yaml as noted in this answer.
Also, you could also use a drop in replacement for pyyaml, that keeps your yaml file ordered the same way you had it, called oyaml. View snyk of oyaml here
14 Comments
yaml.safe_load
as it cannot execute arbitrary code from the YAML file.pip install pyyaml
, see this post for more options stackoverflow.com/questions/14261614/…Read & Write YAML files with Python 2+3 (and unicode)
# -*- coding: utf-8 -*-
import yaml
import io
# Define data
data = {
'a list': [
1,
42,
3.141,
1337,
'help',
u'€'
],
'a string': 'bla',
'another dict': {
'foo': 'bar',
'key': 'value',
'the answer': 42
}
}
# Write YAML file
with io.open('data.yaml', 'w', encoding='utf8') as outfile:
yaml.dump(data, outfile, default_flow_style=False, allow_unicode=True)
# Read YAML file
with open("data.yaml", 'r') as stream:
data_loaded = yaml.safe_load(stream)
print(data == data_loaded)
Created YAML file
a list:
- 1
- 42
- 3.141
- 1337
- help
- €
a string: bla
another dict:
foo: bar
key: value
the answer: 42
Common file endings
.yml
and .yaml
Alternatives
- CSV: Super simple format (read & write)
- JSON: Nice for writing human-readable data; VERY commonly used (read & write)
- YAML: YAML is a superset of JSON, but easier to read (read & write, comparison of JSON and YAML)
- pickle: A Python serialization format (read & write) ⚠️ Using pickle with files from 3rd parties poses an uncontrollable arbitrary code execution risk.
- MessagePack (Python package): More compact representation (read & write)
- HDF5 (Python package): Nice for matrices (read & write)
- XML: exists too *sigh* (read & write)
For your application, the following might be important:
- Support by other programming languages
- Reading / writing performance
- Compactness (file size)
See also: Comparison of data serialization formats
In case you are rather looking for a way to make configuration files, you might want to read my short article Configuration files in Python
10 Comments
io.open(doc_name, 'r', encoding='utf8')
to read the special character. YAML version 0.1.7open(doc_name, ..., encodung='utf8')
for read and write, without importing io
.import yaml
, but that isn't a built-in module, and you don't specify which package it is. Running import yaml
on a fresh Python3 install results in ModuleNotFoundError: No module named 'yaml'
pip install pyyaml
for import yaml
to workIf you have YAML that conforms to the YAML 1.2 specification (released 2009) then you should use ruamel.yaml (disclaimer: I am the author of that package). It is essentially a superset of PyYAML, which supports most of YAML 1.1 (from 2005).
If you want to be able to preserve your comments when round-tripping, you certainly should use ruamel.yaml.
Upgrading @Jon's example is easy:
import ruamel.yaml as yaml
with open("example.yaml") as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
Use safe_load()
unless you really have full control over the input, need it (seldom the case) and know what you are doing.
If you are using pathlib Path
for manipulating files, you are better of using the new API ruamel.yaml provides:
from ruamel.yaml import YAML
from pathlib import Path
path = Path('example.yaml')
yaml = YAML(typ='safe')
data = yaml.load(path)
1 Comment
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe7 in position 926: ordinal not in range(128)
). I've tried to set yaml.encoding to utf-8 but didn't work as the load method in YAML still uses the ascii_decode. Is this a bug?First install pyyaml using pip3.
Then import yaml module and load the file into a dictionary called 'my_dict':
import yaml
with open('filename.yaml') as f:
my_dict = yaml.safe_load(f)
That's all you need. Now the entire yaml file is in 'my_dict' dictionary.
3 Comments
!!python
) it can also be unsafe (as in complete harddisc wiped clean) to use yaml.load()
. As that is clearly documented you should have repeated that warning here (in almost all cases yaml.safe_load()
can be used).import yaml
, but that isn't a built-in module, and you don't specify which package it is. Running import yaml
on a fresh Python3 install results in ModuleNotFoundError: No module named 'yaml'
import yaml; from munch import munchify; f = munchify(yaml.load(…)); print(fo.d.try)
To access any element of a list in a YAML file like this:
global:
registry:
url: dtr-:5000/
repoPath:
dbConnectionString: jdbc:oracle:thin:@x.x.x.x:1521:abcd
You can use following python script:
import yaml
with open("/some/path/to/yaml.file", 'r') as f:
valuesYaml = yaml.load(f, Loader=yaml.FullLoader)
print(valuesYaml['global']['dbConnectionString'])
Comments
Example:
defaults.yaml
url: https://www.google.com
environment.py
from ruamel import yaml
data = yaml.safe_load(open('defaults.yaml'))
data['url']
3 Comments
I use ruamel.yaml. Details & debate here.
from ruamel import yaml
with open(filename, 'r') as fp:
read_data = yaml.load(fp)
Usage of ruamel.yaml is compatible (with some simple solvable problems) with old usages of PyYAML and as it is stated in link I provided, use
from ruamel import yaml
instead of
import yaml
and it will fix most of your problems.
EDIT: PyYAML is not dead as it turns out, it's just maintained in a different place.
4 Comments
I would suggest to use the pyyaml
library, together with the in-built pathlib
library.
You will need to build a pathlib.Path
object first:
from pathlib import Path
import yaml
path: Path = Path("/tmp/file.yaml")
# Make sure the path exists
assert path.exists()
# Read file and parse with pyyaml
dictionnaire = yaml.safe_load(path.read_text())
This works fine with modern versions of Python 3.
You do not need with use the open
file method this way. The code is a slightly more concise and you can easily check if the file exists, before you actually try to parse the YAML file.
Comments
I made my own script for this. Feel free to use it, as long as you keep the attribution. The script can parse yaml from a file (function load
), parse yaml from a string (function loads
) and convert a dictionary into yaml (function dumps
). It respects all variable types.
# © didlly AGPL-3.0 License - github.com/didlly
def is_float(string: str) -> bool:
try:
float(string)
return True
except ValueError:
return False
def is_integer(string: str) -> bool:
try:
int(string)
return True
except ValueError:
return False
def load(path: str) -> dict:
with open(path, "r") as yaml:
levels = []
data = {}
indentation_str = ""
for line in yaml.readlines():
if line.replace(line.lstrip(), "") != "" and indentation_str == "":
indentation_str = line.replace(line.lstrip(), "").rstrip("\n")
if line.strip() == "":
continue
elif line.rstrip()[-1] == ":":
key = line.strip()[:-1]
quoteless = (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
)
if len(line.replace(line.strip(), "")) // 2 < len(levels):
if quoteless:
levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"
else:
levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"
else:
if quoteless:
levels.append(f"[{line.strip()[:-1]}]")
else:
levels.append(f"['{line.strip()[:-1]}']")
if quoteless:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"
+ " = {}"
)
else:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"
+ " = {}"
)
continue
key = line.split(":")[0].strip()
value = ":".join(line.split(":")[1:]).strip()
if (
is_float(value)
or is_integer(value)
or value == "True"
or value == "False"
or ("[" in value and "]" in value)
):
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}"
)
else:
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'"
)
return data
def loads(yaml: str) -> dict:
levels = []
data = {}
indentation_str = ""
for line in yaml.split("\n"):
if line.replace(line.lstrip(), "") != "" and indentation_str == "":
indentation_str = line.replace(line.lstrip(), "")
if line.strip() == "":
continue
elif line.rstrip()[-1] == ":":
key = line.strip()[:-1]
quoteless = (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
)
if len(line.replace(line.strip(), "")) // 2 < len(levels):
if quoteless:
levels[len(line.replace(line.strip(), "")) // 2] = f"[{key}]"
else:
levels[len(line.replace(line.strip(), "")) // 2] = f"['{key}']"
else:
if quoteless:
levels.append(f"[{line.strip()[:-1]}]")
else:
levels.append(f"['{line.strip()[:-1]}']")
if quoteless:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}]"
+ " = {}"
)
else:
exec(
f"data{''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}']"
+ " = {}"
)
continue
key = line.split(":")[0].strip()
value = ":".join(line.split(":")[1:]).strip()
if (
is_float(value)
or is_integer(value)
or value == "True"
or value == "False"
or ("[" in value and "]" in value)
):
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = {value}"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = {value}"
)
else:
if (
is_float(key)
or is_integer(key)
or key == "True"
or key == "False"
or ("[" in key and "]" in key)
):
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}[{key}] = '{value}'"
)
else:
exec(
f"data{'' if line == line.strip() else ''.join(str(i) for i in levels[:line.replace(line.lstrip(), '').count(indentation_str) if indentation_str != '' else 0])}['{key}'] = '{value}'"
)
return data
def dumps(yaml: dict, indent="") -> str:
"""A procedure which converts the dictionary passed to the procedure into it's yaml equivalent.
Args:
yaml (dict): The dictionary to be converted.
Returns:
data (str): The dictionary in yaml form.
"""
data = ""
for key in yaml.keys():
if type(yaml[key]) == dict:
data += f"\n{indent}{key}:\n"
data += dumps(yaml[key], f"{indent} ")
else:
data += f"{indent}{key}: {yaml[key]}\n"
return data
print(load("config.yml"))
Example
config.yml
level 0 value: 0
level 1:
level 1 value: 1
level 2:
level 2 value: 2
level 1 2:
level 1 2 value: 1 2
level 2 2:
level 2 2 value: 2 2
Output
{'level 0 value': 0, 'level 1': {'level 1 value': 1, 'level 2': {'level 2 value': 2}}, 'level 1 2': {'level 1 2 value': '1 2', 'level 2 2': {'level 2 2 value': 2 2}}}
3 Comments
one:\n - two\n - three
IO is simpler with dummio, a package I created. Just pip install dummio
. Then
import dummio
# read
data = dummio.yaml.load(filepath)
# write
dummio.yaml.save(data, filepath=filepath)
Note that this works even if filepath is a cloud path (s3, gcs, azure). The package supports many other data types and file formats, not only dict/yaml.