π§© The problem
Continuing the previous post we'll now transform the shell script into a Python one. We want an integration with Python to pull feeds from YouTube directly, so we must, first of all, transform the original instructions into Python API calls. This will enable yt-dlp to directly download the necessary videos avoiding the whole channel every time, which might result in IP throttling or temporary bans.
yt-dlp is also capable of working with text files in two ways:
- to know which files have already been downloaded by saving their ids
- to store the options instead of manually passing them via the shell or API
We're not going to use these two features today.
β οΈ Warning
β οΈβ οΈ Before continuing, please only mirror content you have permission to... β οΈβ οΈ
β The solution
π€ Vibe coding
I'm lazy so I asked GPT o4-mini (via Duck.ai) how to do it. Here I redacted some output so it's more straight forward to read.
Me
original shell command
yt-dlp "${URL}" \ --verbose \ --fixup detect_or_warn \ --prefer-ffmpeg \ --sub-langs "en,it" \ --write-subs \ --embed-subs \ --write-auto-sub \ --prefer-free-formats \ --no-call-home \ --no-overwrites \ --recode webm \ --add-metadata \ --write-thumbnail \ --convert-thumbnails png \ --exec "echo \"%(title)s\" > ${DST_DIR}/%(id)s/title.txt" \ --exec "echo \"%(description)s\" > ${DST_DIR}/%(id)s/description.txt" \ --output "${DST_DIR}/%(id)s/%(id)s.%(ext)s"
Starting from this command, and setting URL='my youtube channel url', please transform the shell command to the Python calls
~
o4-mini
Hereβs how you can do the same download in Python using yt-dlpβs API. You define essentially the same options in a dict, register a small progress hook to emit your βexecβ behavior, and then call
.download()
.
import os
from yt_dlp import YoutubeDL
# ========== configuration ==========
URL = 'https://www.youtube.com/c/YourChannelNameHere'
DST_DIR = '/path/to/your/download/folder'
# Make sure the DST_DIR exists
os.makedirs(DST_DIR, exist_ok=True)
# ========== progress hook to write title & description ==========
def write_meta_hook(d):
"""
This hook is called multiple times; we care about when download
of a single video is finished so we can write title/description.
"""
if d['status'] == 'finished':
info = d.get('info_dict', {})
video_id = info.get('id')
title = info.get('title', '')
description = info.get('description', '')
# Create a per-video subfolder, mirror --output structure
vid_dir = os.path.join(DST_DIR, video_id)
os.makedirs(vid_dir, exist_ok=True)
# write out title.txt and description.txt
with open(os.path.join(vid_dir, 'title.txt'), 'w', encoding='utf-8') as f:
f.write(title)
with open(os.path.join(vid_dir, 'description.txt'), 'w', encoding='utf-8') as f:
f.write(description)
# ========== yt-dlp options ==========
ydl_opts = {
# general
'verbose': True,
'no_overwrites': True,
'nocallhome': True,
'add_metadata': True,
'fixup': 'detect_or_warn',
'prefer_ffmpeg': True,
'recode_video': 'webm',
# subtitles
'sub_langs': ['en', 'it'],
'writesubtitles': True, # --write-subs
'writeautomaticsub': True, # --write-auto-sub
'embedsubtitles': True,
# formats
'prefer_free_formats': True,
# thumbnails
'writethumbnail': True,
'convert_thumbnails': 'png',
# output template
'outtmpl': os.path.join(DST_DIR, '%(id)s', '%(id)s.%(ext)s'),
# post hooks
'progress_hooks': [write_meta_hook],
}
# ========== run the download ==========
with YoutubeDL(ydl_opts) as ydl:
ydl.download([URL])
π Clues
I then go and double check if all the options in the generated code actually exist. Surprise! They do not.
By opening the yt-dlp source code, specifically the YoutubeDL
class, you'll see all the supported API options.
π Trying to help the bot help me
Me
i'll tell you the missing options from the yt-dlp code. you need to fix them:
'recode_video', 'embedsubtitles', 'convert_thumbnails',
~
o4-mini
The three fixed option names are:
recode_video
: your--recode webm
remux into WebMembed_subtitles
: your--embed-subs
convert_thumbnails
: your--convert-thumbnails png
Turns out these fixes are wrong.
π Script that saved the day
There's one interesting utility in the yt-dlp repository called cli_to_api.py
which is able to translate shell options into Python ones. One of the maintainers suggests to use that script for this use case. Here's how to run it:
- clone the repository
git clone https://github.com/yt-dlp/yt-dlp
- install locally
cd yt-dlp
python3 -m venv .venv
. .venv/bin/activate
pip install .
- run the script
cd devscripts
python -m cli_to_api "--verbose"
This is the output for the --verbose
option
The arguments passed translate to:
[debug] Override config: ['--verbose']
{'verbose': True}
Combining these with the CLI defaults gives:
[debug] Override config: ['--verbose']
{'extract_flat': 'discard_in_playlist',
'fragment_retries': 10,
'ignoreerrors': 'only_download',
'postprocessors': [{'key': 'FFmpegConcat',
'only_multi_video': True,
'when': 'playlist'}],
'retries': 10,
'verbose': True}
You can of course pass multiple options, each one between quotes:
python -m cli_to_api "--output" "DST_DIR/%(id)s/%(id)s.%(ext)s"
Which results in:
The arguments passed translate to:
{'outtmpl': {'default': 'DST_DIR/%(id)s/%(id)s.%(ext)s'}}
Combining these with the CLI defaults gives:
{'extract_flat': 'discard_in_playlist',
'fragment_retries': 10,
'ignoreerrors': 'only_download',
'outtmpl': {'default': 'DST_DIR/%(id)s/%(id)s.%(ext)s'},
'postprocessors': [{'key': 'FFmpegConcat',
'only_multi_video': True,
'when': 'playlist'}],
'retries': 10}
π― Result
The write_meta_hook
provided by GPT seems correct enough. However, to keep things simple I decided to translate the original shell exec options verbatim. I also like pathlib more than os to manage paths.
Here's the complete result:
import pathlib
import sys
from yt_dlp import YoutubeDL
DST_DIR = '/srv/http/videos'
ydl_opts: dict = {
'verbose': True,
'no_overwrites': True,
'call_home': False,
'add_metadata': True,
'fixup': 'detect_or_warn',
'prefer_ffmpeg': True,
'subtitleslangs': ['en', 'it'],
'writesubtitles': True,
'writeautomaticsub': True,
'prefer_free_formats': True,
'writethumbnail': True,
'final_ext': 'webm',
'outtmpl': {
'default': str(pathlib.Path(DST_DIR,'%(id)s','%(id)s.%(ext)s'))
},
'postprocessors': [
{
# --convert-thumbnails png
'format': 'png',
'key': 'FFmpegThumbnailsConvertor',
'when': 'before_dl'
},
{
# --recode webm
'key': 'FFmpegVideoConvertor',
'preferedformat': 'webm'
},
{
'exec_cmd': ["cat > " + str(pathlib.Path(DST_DIR, '%(id)s', 'title.txt')) + " << 'EOF'\n"
'%(title)s\n'
'EOF'],
'key': 'Exec',
'when': 'after_move'
},
{
'exec_cmd': ["cat > " + str(pathlib.Path(DST_DIR, '%(id)s', 'description.txt')) + " << 'EOF'\n"
'%(description)s\n'
'EOF'],
'key': 'Exec',
'when': 'after_move'
},
],
}
if __name__ == '__main__':
# RSS feed code here: TODO
# This will replace the single URL passed to the `download` method.
url: str = sys.argv[1]
with YoutubeDL(ydl_opts) as ydl:
ydl.download([url])
The script works exactly the same as the one using the shell:
python -m mirror_yt "video or channel URL"
π Conclusion
As you read, this is yet another evidence that vibe coding does not always work 100%.
Next time we'll integrate the YouTube RSS feeds into the script like I did in the first post of this series.
You can comment here and check my YouTube channel.
Top comments (0)