11
\$\begingroup\$

I have a utility that updates, transcodes and renames audio files. Previously I used a function that simply yielded every file that was a good match (based on file extension).

As some tasks can take a while, I wanted a progress bar to show where it was currently working in the library. I got an acceptable display with rich:

progress bar image

but my method for doing so really complicated what was previously a very single-purpose function (not much more than a yield from Path.walk() and some filtering). Now it's more of a progress-bar setup/maintenance function.

Is there a way to better separate the work for creating/updating the progress bars and the actual path retrieval (or any improvement in the layout/maintainability)?

from collections.abc import Generator
from pathlib import Path
from time import sleep

from rich.console import Console
from rich.progress import BarColumn, MofNCompleteColumn, Progress, TextColumn

#  Walks a given directory for Artist/Album/Track files
#    Yields any file with an audio format extension
#    Updates progress bar to show progress


def set_progress_columns() -> Progress:
    """Returns progress context"""
    console = Console(force_terminal=True)
    return Progress(
        TextColumn("[progress.description]{task.description:>40.40s}"),
        BarColumn(complete_style="bar.finished"),
        MofNCompleteColumn(),
        console=console,
    )


def get_audio_files(source: Path) -> Generator[Path]:
    """
    Return audio files found in given directory.

    Given a path to a folder with an Artist/Album/Track audiofile
    layout, each path to audio files inside is yielded.
    """
    VALID_TRACK_EXTENSIONS = (".flac", ".mp3", ".m4a", ".aac")
    artists = sorted(
        (x for x in source.iterdir() if not x.name.startswith(".")),
        key=lambda x: x.stem.casefold(),
    )
    # with Progress(console=console) as progress:
    with set_progress_columns() as progress:
        artist_bar = progress.add_task("", total=len(artists))
        # Per Artist (top-level)
        for artist_path in artists:
            progress.update(
                artist_bar,
                advance=1,
                description=artist_path.name,
            )
            if not artist_path.is_dir():
                continue

            # Per Album
            for album_path in artist_path.iterdir():
                if not album_path.is_dir():
                    continue
                tracks = sorted(
                    x
                    for x in album_path.iterdir()
                    if x.suffix.casefold() in VALID_TRACK_EXTENSIONS
                )
                album_bar = progress.add_task(album_path.name, total=len(tracks))

                # Per Track
                for track in tracks:
                    progress.update(
                        album_bar,
                        advance=1,
                        description=f"{album_path.name}/{track.stem}",
                    )
                    yield track
                progress.remove_task(album_bar)


def main() -> None:
    source = Path("source")

    for audio_file in get_audio_files(source):
        # analyze/transcode/update files
        sleep(0.05)


if __name__ == "__main__":
    main()
\$\endgroup\$
2
  • \$\begingroup\$ Unless I am missing something (a real possibility), you will only be yielding audio files that exists two subdirectories below "source", e.g. "source/a/b/xyz.flac". Is this what you want? \$\endgroup\$ Commented Oct 24 at 11:07
  • 2
    \$\begingroup\$ That is correct. For the most part, nothing else in this utility cares about the specific paths. But for the progress to make sense, it should have the A and B to display. The eventual music players that consume the data would probably not be happy if there were files in other locations. \$\endgroup\$ Commented Oct 24 at 17:05

5 Answers 5

8
\$\begingroup\$

I will address your immediate concern of separating creating and updating progress bars from the main task of searching and yielding audio files.

My approach would be to use a variation of the Observer pattern. Typically you have a class that does some processing and one or more "observers" that wish to be notified of state changes or events that occurs during that processing. The processing class would typically implement methods add_observer and remove_observer maintaining a list of observers to be notified. There would be a well-defined interface that observers must implement and as processing proceeds and the state of processing changes or some event occurs, each observer that is registered would be notified by invoking a well-defined method defined by the observer interface.

Since your main processing is done by a function, get_audio_files, rather than a class method, it will be more convenient to have this function take an optional argument that is an instance of interface DirEventObserverInterface:

from abc import ABC, abstractmethod

class DirEventObserverInterface(ABC):
    """Interface for an "observer" that can be passed to get_audio_files
    that will be called at various stages in processing."""

    @abstractmethod
    def start_iteration(self, level: int, num_entries: int) -> None:
        """Called when a new level of subsdirectory is to be iterated."""

    @abstractmethod
    def update(self, level: int, description: str) -> None:
        """Called when a description for a subdirectory level is to be updated."""

    @abstractmethod
    def end_iteration(self, level: int) -> None:
        """Called when an iteration of a subdirectory has completed."""

And the signature for get_audio_files becomes:

def get_audio_files(source: Path, listener: DirEventObserverInterface | None=None) -> Generator[Path]:
    ...

In the following code I have made a couple of additional changes:

  1. You computed the artists list to be all the directory entries for the passed path to get_audio_files and then in processing these entries you skip those entries that are not directories. I have changed this so that artists only contains directory entries. This provides a more accurate count of entries that need processing.
  2. When checking whether a directory entry is one of the types of audio files you are interested in I have added the additional check to make sure the entry in question is an actual file just to handle the case where you might have a directory whose name ends with, for ecample, ".flac".
from collections.abc import Generator
from pathlib import Path
from time import sleep
from abc import ABC, abstractmethod

from rich.console import Console
from rich.progress import BarColumn, MofNCompleteColumn, Progress, TextColumn

"""
Walks a given directory for Artist/Album/Track files
yielding any file with an audio format extension.
"""

class DirEventObserverInterface(ABC):
    """Interface for an "observer" that can be passed to get_audio_files
    that will be called at various stages in processing."""

    @abstractmethod
    def start_iteration(self, level: int, num_entries: int) -> None:
        """Called when a new level of subsdirectory is to be iterated."""

    @abstractmethod
    def update(self, level: int, description: str) -> None:
        """Called when a description for a subdirectory level is to be updated."""

    @abstractmethod
    def end_iteration(self, level: int) -> None:
        """Called when an iteration of a subdirectory has completed."""

class DirEventObserver(DirEventObserverInterface):
    """An implementation of DirEventObserverInterface that uses the rich
    package for displaying progress bars."""

    def __init__(self):
        console = Console(force_terminal=True)
        self._progress = Progress(
            TextColumn("[progress.description]{task.description:>40.40s}"),
            BarColumn(complete_style="bar.finished"),
            MofNCompleteColumn(),
            console=console,
        )
        self._bars = {}

    def start_iteration(self, level: int, num_entries: int, description: str) -> None:
        """Called when a new progress bar is to be created for a subdirectory
        at a certain level."""
        if not self._bars:
            self._progress.start()
        self._bars[level] = self._progress.add_task(description, total=num_entries)

    def update(self, level: int, description: str) -> None:
        """Called when a progress bar's description is to be updated."""
        self._progress.update(
            self._bars[level],
            advance=1,
            description=description
        )

    def end_iteration(self, level: int) -> None:
        """Called when a progress bar is to be removed."""
        self._progress.remove_task(self._bars[level])
        del self._bars[level]
        if not self._bars:
            self._progress.stop()

def get_audio_files(source: Path, listener: DirEventObserverInterface | None=None) -> Generator[Path]:
    """
    Return audio files found in given directory.

    Given a path to a folder with an Artist/Album/Track audiofile
    layout, each path to audio files inside is yielded.
    """
    VALID_TRACK_EXTENSIONS = (".flac", ".mp3", ".m4a", ".aac")
    artists = sorted(
        (x for x in source.iterdir() if x.is_dir() and not x.name.startswith(".")),
        key=lambda x: x.stem.casefold(),
    )
    if listener:
        listener.start_iteration(level=1, num_entries=len(artists), description="")
    # Per Artist (top-level)
    for artist_path in artists:
        if listener:
            listener.update(level=1, description=artist_path.name)

        # Per Album
        for album_path in artist_path.iterdir():
            if not album_path.is_dir():
                continue
            tracks = sorted(
                x
                for x in album_path.iterdir()
                if x.is_file() and x.suffix.casefold() in VALID_TRACK_EXTENSIONS
            )
            if listener:
                listener.start_iteration(level=2, num_entries=len(tracks), description=album_path.name)

            # Per Track
            for track in tracks:
                if listener:
                    listener.update(level=2, description=f"{album_path.name}/{track.stem}")
                yield track

            if listener:
                listener.end_iteration(level=2)

    if listener:
        listener.end_iteration(level=1)

def main() -> None:
    source = Path("source")

    for audio_file in get_audio_files(source, DirEventObserver()):
        # analyze/transcode/update files
        sleep(.05)

if __name__ == "__main__":
    main()

If you wanted now to use the tqdm package for displaying progress, you only need to change the DirEventObserver class. But to simplify the code further, get_audio_files always requires a DirEventObserverInterface instance but by default one that does nothing is used:

from collections.abc import Generator
from pathlib import Path
from time import sleep
from abc import ABC, abstractmethod

from tqdm import tqdm

"""
Walks a given directory for Artist/Album/Track files
yielding any file with an audio format extension.
"""

class DirEventObserverInterface(ABC):
    """Interface for an "observer" that can be passed to get_audio_files
    that will be called at various stages in processing."""

    @abstractmethod
    def start_iteration(self, level: int, num_entries: int) -> None:
        """Called when a new level of subsdirectory is to be iterated."""

    @abstractmethod
    def update(self, level: int, description: str) -> None:
        """Called when a description for a subdirectory level is to be updated."""

    @abstractmethod
    def end_iteration(self, level: int) -> None:
        """Called when an iteration of a subdirectory has completed."""

class DoNothingDirEventObserver(DirEventObserverInterface):
    """Interface for an "observer" that ignores all events."""

    def start_iteration(self, level: int, num_entries: int) -> None:
        pass

    def update(self, level: int, description: str) -> None:
        pass

    def end_iteration(self, level: int) -> None:
        pass


class DirEventObserver(DirEventObserverInterface):
    """An implementation of DirEventObserverInterface that uses the tqdm
    package for displaying progress bars."""

    def __init__(self):
        self._bars = {}

    def start_iteration(self, level: int, num_entries: int, description: str) -> None:
        """Called when a new progress bar is to be created for a subdirectory
        at a certain level."""
        bar = tqdm(total=num_entries, position=level, desc=description, leave=False)
        self._bars[level] = bar

    def update(self, level: int, description: str) -> None:
        """Called when a progress bar's description is to be updated."""
        bar = self._bars[level]
        bar.update()
        bar.set_description(description)
        bar.refresh()

    def end_iteration(self, level: int) -> None:
        """Called when a progress bar is to be removed."""
        pass

def get_audio_files(source: Path, listener: DirEventObserverInterface=DoNothingDirEventObserver()) -> Generator[Path]:
    """
    Return audio files found in given directory.

    Given a path to a folder with an Artist/Album/Track audiofile
    layout, each path to audio files inside is yielded.
    """
    VALID_TRACK_EXTENSIONS = (".flac", ".mp3", ".m4a", ".aac")
    artists = sorted(
        (x for x in source.iterdir() if x.is_dir() and not x.name.startswith(".")),
        key=lambda x: x.stem.casefold(),
    )
    listener.start_iteration(level=1, num_entries=len(artists), description="")
    # Per Artist (top-level)
    for artist_path in artists:
        listener.update(level=1, description=artist_path.name)

        # Per Album
        for album_path in artist_path.iterdir():
            if not album_path.is_dir():
                continue
            tracks = sorted(
                x
                for x in album_path.iterdir()
                if x.is_file() and x.suffix.casefold() in VALID_TRACK_EXTENSIONS
            )
            listener.start_iteration(level=2, num_entries=len(tracks), description=album_path.name)

            # Per Track
            for track in tracks:
                listener.update(level=2, description=f"{album_path.name}/{track.stem}")
                yield track

            listener.end_iteration(level=2)

    listener.end_iteration(level=1)

def main() -> None:
    source = Path("source")

    for audio_file in get_audio_files(source, DirEventObserver()):
        # analyze/transcode/update files
        sleep(.05)

if __name__ == "__main__":
    main()
\$\endgroup\$
6
\$\begingroup\$

I notice you are using uv to invoke the script:

$ uv run audio_convert.py

Instead of doing this, make your script a proper package and install it as a tool. Just four commands should suffice (working directory is the script's parent directory):

$ uv init --package --app . --name audio-convert
$ mv audio_convert.py src/audio_convert/__init__.py
$ uv add rich
$ uv tool install .

Note that you will also need to modify audio_convert.py itself so that it accepts CLI arguments. Currently it just looks for a sibling source directory, which can and should be parameterized.

Eventually, you should be able to run the executable using either:

$ uvx audio-convert
$ audio-convert

Your audio_convert.py is what many would called a standalone script. As such, it would benefit greatly from PEP 723. This is what the commands will look like once #7242 is resolved:

$ uv init --script audio_convert.py
$ uv add --script audio_convert.py rich
$ uv tool install --script audio_convert.py
\$\endgroup\$
2
  • \$\begingroup\$ Very interesting. I've tended to think of uv tools as items that might benefit from being available to other environments. As this is mostly a standalone beast (which yes, should take CLI arguments), what are the benefits to having it be a tool, rather than just a normal uv project? \$\endgroup\$ Commented Oct 24 at 23:00
  • \$\begingroup\$ @it_serf I'm not sure what you mean by "normal uv project" (one managed and built by uv?), but the primary benefit is that tools can be invoked anywhere and not limited to the project directory. \$\endgroup\$ Commented Oct 25 at 0:49
4
\$\begingroup\$

Overview

The code layout is really good, and you used meaningful names for functions and variables. I especially like the docstrings and type hints.

I agree with your desire to separate the progress-bar logic from the file retrieval. However, I don't have any experience with rich to offer any specific advice there. Here are some minor, general coding style suggestions.

Documentation

You used docstrings for your functions, as recommended by the PEP 8 style guide. You might as well do so for the header comments:

"""
Walks a given directory for Artist/Album/Track files
  Yields any file with an audio format extension
  Updates progress bar to show progress
"""

Comments

Remove commented-out code to reduce clutter:

# with Progress(console=console) as progress:

The other comments are good.

Perhaps you could add more documentation (docstrings or comments) regarding the structure of both the directory and audio files. Sometimes the act of writing documentation can trigger ideas for improving the code.

\$\endgroup\$
4
\$\begingroup\$

To decouple the UI (here: a console) from the background processes - this could be easier than you think with event-oriented programming.

For example, I use the blinker lib in Flask but also outside of Flask to emit signals (events) that other routines can subscribe to. So the idea is to send signals every time you are done doing something, but let an other routine refresh the interface. Blinker is very easy to use, just check out the link, but you are free to look for alternatives.

get_audio_files is meant to be a generator that returns audio files, so it should not be doing anything more than that.

There should be a dedicated function that performs file transcoding and broadcasts events upon completion eg 'file_transcoded', with relevant arguments (file name, artist name etc). Then a small routine acting as the "subscriber" shall listen to those events and react accordingly to refresh the console output.

\$\endgroup\$
1
  • \$\begingroup\$ While I agree in principle, I'm unsure on how to feed the progress bar info from outside the generator. It was the only location that knows the total size of the iterators for the progress bar to get set up. Do you have any other specifics on how that communication would be better done? \$\endgroup\$ Commented Oct 24 at 23:09
4
\$\begingroup\$

progress library

    for audio_file in get_audio_files(source): ...

got an acceptable display with rich

Ok, sure, that's one possibility.

I commend the tqdm package to you. The usual idiom is:

from tqdm import tqdm

    for audio_file in tqdm(get_audio_files(source)): ...

and that will show elapsed time for each file (each "iteration").

If you can afford to first gather all filenames into a container, then the idiom would be:

    for audio_file in tqdm(list(get_audio_files(source))): ...

That results in a more informative progress bar, because tqdm() knows that we're currently on e.g. file 3 of 20. It also lets tqdm() display an estimate of when all files will be finished.

The docs explain that there's a bunch of optional parameters, plus you can call .update() to display details of each audio track as we're crunching through it. It can also accommodate fancy needs like two bars, one for artist and another for the artist's tracks.

The benefit of tqdm() is that it is trivially easy to add to an app, without "really complicating what was previously a very single-purpose function".


pathlib

            if not artist_path.is_dir():

I like it! Thank you for preferring Path over the ancient os.path module.

Also, kudos on using {mypy, pyright} to check your type annotation. Your code offers informative signatures which are easy to read.

tuple

                    yield track

Consider using yield artist, track, as that plays nicely with sorting, and would expose more info for tqdm() to display.

accurate progress

Let's assume that most tracks are about three minutes, with no giant outliers. And we expect that some artists are "prolific", while another might be a "one hit wonder".

If work per artist has high variance, much higher than work per track, then you might prefer to operate on a flat list of (artist, track) tuples. Otherwise we might wind up with the final artist representing much more or much less work than average, which makes it challenging for the bar's "countdown" to arrive at zero just as we're finishing up all the work.

If you're a stickler for accuracy then you might stat() each track file to learn its length, and generate a 3-tuple of (artist, track, num_bytes). You would have to sum() all the lengths before starting the top-level for loop, and then .update() the progress bar with each completed track length.

\$\endgroup\$
1
  • \$\begingroup\$ I've seen many references to tqdm, but had not yet attempted to use it. I'll see if I can apply it here in a way that makes more sense. \$\endgroup\$ Commented Oct 24 at 23:05

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.