DEV Community

Cover image for YouTube channel mirror on Jekyll - part 2
Solve Computer Science
Solve Computer Science

Posted on

YouTube channel mirror on Jekyll - part 2

🧩 The problem

Continuing the previous post we'll now transform the shell script into a Python one. We want an integration with Python to pull feeds from YouTube directly, so we must, first of all, transform the original instructions into Python API calls. This will enable yt-dlp to directly download the necessary videos avoiding the whole channel every time, which might result in IP throttling or temporary bans.

yt-dlp is also capable of working with text files in two ways:

  • to know which files have already been downloaded by saving their ids
  • to store the options instead of manually passing them via the shell or API

We're not going to use these two features today.

Previous post

⚠️ Warning

⚠️⚠️ Before continuing, please only mirror content you have permission to... ⚠️⚠️

βœ… The solution

πŸ€– Vibe coding

I'm lazy so I asked GPT o4-mini (via Duck.ai) how to do it. Here I redacted some output so it's more straight forward to read.

Me

original shell command
yt-dlp "${URL}" \
    --verbose \
    --fixup detect_or_warn \
    --prefer-ffmpeg \
    --sub-langs "en,it" \
    --write-subs \
    --embed-subs \
    --write-auto-sub \
    --prefer-free-formats \
    --no-call-home \
    --no-overwrites \
    --recode webm \
    --add-metadata \
    --write-thumbnail \
    --convert-thumbnails png \
    --exec "echo \"%(title)s\" > ${DST_DIR}/%(id)s/title.txt" \
    --exec "echo \"%(description)s\" > ${DST_DIR}/%(id)s/description.txt" \
    --output "${DST_DIR}/%(id)s/%(id)s.%(ext)s"

Starting from this command, and setting URL='my youtube channel url', please transform the shell command to the Python calls

~

o4-mini

Here’s how you can do the same download in Python using yt-dlp’s API. You define essentially the same options in a dict, register a small progress hook to emit your β€œexec” behavior, and then call .download().

import os
from yt_dlp import YoutubeDL

# ========== configuration ==========

URL     = 'https://www.youtube.com/c/YourChannelNameHere'
DST_DIR = '/path/to/your/download/folder'

# Make sure the DST_DIR exists
os.makedirs(DST_DIR, exist_ok=True)

# ========== progress hook to write title & description ==========
def write_meta_hook(d):
    """
    This hook is called multiple times; we care about when download
    of a single video is finished so we can write title/description.
    """
    if d['status'] == 'finished':
        info = d.get('info_dict', {})
        video_id    = info.get('id')
        title       = info.get('title', '')
        description = info.get('description', '')

        # Create a per-video subfolder, mirror --output structure
        vid_dir = os.path.join(DST_DIR, video_id)
        os.makedirs(vid_dir, exist_ok=True)

        # write out title.txt and description.txt
        with open(os.path.join(vid_dir, 'title.txt'), 'w', encoding='utf-8') as f:
            f.write(title)
        with open(os.path.join(vid_dir, 'description.txt'), 'w', encoding='utf-8') as f:
            f.write(description)

# ========== yt-dlp options ==========
ydl_opts = {
    # general
    'verbose':             True,
    'no_overwrites':       True,
    'nocallhome':          True,
    'add_metadata':        True,
    'fixup':               'detect_or_warn',
    'prefer_ffmpeg':       True,
    'recode_video':        'webm',

    # subtitles
    'sub_langs':           ['en', 'it'],
    'writesubtitles':      True,   # --write-subs
    'writeautomaticsub':   True,   # --write-auto-sub
    'embedsubtitles':      True,

    # formats
    'prefer_free_formats': True,

    # thumbnails
    'writethumbnail':      True,
    'convert_thumbnails':  'png',

    # output template
    'outtmpl': os.path.join(DST_DIR, '%(id)s', '%(id)s.%(ext)s'),

    # post hooks
    'progress_hooks': [write_meta_hook],
}

# ========== run the download ==========
with YoutubeDL(ydl_opts) as ydl:
    ydl.download([URL])
Enter fullscreen mode Exit fullscreen mode

πŸ”Ž Clues

I then go and double check if all the options in the generated code actually exist. Surprise! They do not.

By opening the yt-dlp source code, specifically the YoutubeDL class, you'll see all the supported API options.

πŸ†˜ Trying to help the bot help me

Me

i'll tell you the missing options from the yt-dlp code. you need to fix them:

'recode_video', 'embedsubtitles', 'convert_thumbnails',

~

o4-mini

The three fixed option names are:

  • recode_video: your --recode webm remux into WebM
  • embed_subtitles: your --embed-subs
  • convert_thumbnails: your --convert-thumbnails png

Turns out these fixes are wrong.

😌 Script that saved the day

There's one interesting utility in the yt-dlp repository called cli_to_api.py which is able to translate shell options into Python ones. One of the maintainers suggests to use that script for this use case. Here's how to run it:

  • clone the repository
  git clone https://github.com/yt-dlp/yt-dlp
Enter fullscreen mode Exit fullscreen mode
  • install locally
  cd yt-dlp
  python3 -m venv .venv
  . .venv/bin/activate
  pip install .
Enter fullscreen mode Exit fullscreen mode
  • run the script
  cd devscripts
  python -m cli_to_api "--verbose"
Enter fullscreen mode Exit fullscreen mode

This is the output for the --verbose option

The arguments passed translate to:

[debug] Override config: ['--verbose']
{'verbose': True}

Combining these with the CLI defaults gives:

[debug] Override config: ['--verbose']
{'extract_flat': 'discard_in_playlist',
 'fragment_retries': 10,
 'ignoreerrors': 'only_download',
 'postprocessors': [{'key': 'FFmpegConcat',
                     'only_multi_video': True,
                     'when': 'playlist'}],
 'retries': 10,
 'verbose': True}
Enter fullscreen mode Exit fullscreen mode

You can of course pass multiple options, each one between quotes:

python -m cli_to_api "--output" "DST_DIR/%(id)s/%(id)s.%(ext)s"
Enter fullscreen mode Exit fullscreen mode

Which results in:

The arguments passed translate to:

{'outtmpl': {'default': 'DST_DIR/%(id)s/%(id)s.%(ext)s'}}

Combining these with the CLI defaults gives:

{'extract_flat': 'discard_in_playlist',
 'fragment_retries': 10,
 'ignoreerrors': 'only_download',
 'outtmpl': {'default': 'DST_DIR/%(id)s/%(id)s.%(ext)s'},
 'postprocessors': [{'key': 'FFmpegConcat',
                     'only_multi_video': True,
                     'when': 'playlist'}],
 'retries': 10}
Enter fullscreen mode Exit fullscreen mode

🎯 Result

The write_meta_hook provided by GPT seems correct enough. However, to keep things simple I decided to translate the original shell exec options verbatim. I also like pathlib more than os to manage paths.

Here's the complete result:

import pathlib
import sys
from yt_dlp import YoutubeDL

DST_DIR = '/srv/http/videos'

ydl_opts: dict = {
    'verbose':             True,
    'no_overwrites':       True,
    'call_home':           False,
    'add_metadata':        True,
    'fixup':               'detect_or_warn',
    'prefer_ffmpeg':       True,
    'subtitleslangs':      ['en', 'it'],
    'writesubtitles':      True,
    'writeautomaticsub':   True,
    'prefer_free_formats': True,
    'writethumbnail':      True,
    'final_ext':           'webm',
    'outtmpl': {
        'default': str(pathlib.Path(DST_DIR,'%(id)s','%(id)s.%(ext)s'))
    },
    'postprocessors': [
        {
            # --convert-thumbnails png
            'format': 'png',
            'key': 'FFmpegThumbnailsConvertor',
            'when': 'before_dl'
        },
        {
            # --recode webm
        'key': 'FFmpegVideoConvertor',
            'preferedformat': 'webm'
        },
        {
            'exec_cmd': ["cat > " + str(pathlib.Path(DST_DIR, '%(id)s', 'title.txt')) + " << 'EOF'\n"
                         '%(title)s\n'
                         'EOF'],
            'key': 'Exec',
            'when': 'after_move'
        },
        {
            'exec_cmd': ["cat > " + str(pathlib.Path(DST_DIR, '%(id)s', 'description.txt')) + " << 'EOF'\n"
                         '%(description)s\n'
                         'EOF'],
            'key': 'Exec',
            'when': 'after_move'
        },
    ],
}

if __name__ == '__main__':
    # RSS feed code here: TODO
    # This will replace the single URL passed to the `download` method.

    url: str = sys.argv[1]

    with YoutubeDL(ydl_opts) as ydl:
        ydl.download([url])
Enter fullscreen mode Exit fullscreen mode

The script works exactly the same as the one using the shell:

python -m mirror_yt "video or channel URL"
Enter fullscreen mode Exit fullscreen mode

πŸŽ‰ Conclusion

As you read, this is yet another evidence that vibe coding does not always work 100%.

Next time we'll integrate the YouTube RSS feeds into the script like I did in the first post of this series.

You can comment here and check my YouTube channel.

Top comments (0)