0

I need to use md5sum in Python by using pipe to calculate checksum for a bunch of .mp3 files... is there a command which ignore whitespaces in filenames on the command line of md5sum program?

For example:

import os
def index(directory):
    stack = [directory]
    files = []
    while stack:
        directory = stack.pop()
        for file in os.listdir(directory):
            fullname = os.path.join(directory, file)
            if fullname.endswith('mp3'):
                files.append(fullname)
            if os.path.isdir(fullname) and not os.path.islink(fullname):
                stack.append(fullname)
    return files

def check(directory):
    files = index(directory)
    hvalues = []
    for x in files:
        cmd = 'md5sum' + ' ' + x
        fp = os.popen(cmd)
        res = fp.readline()
        hvalues.append(res)
        stat = fp.close() # What to do with stat?
    return hvalues

Command cmd = 'md5sum' + ' ' + x won't work as it should on files that include whitespaces or special characters, because 'md5sum' tool lacks the ability of properly handling (hashing) files with whitespaces in filenames.

2
  • Could you clarify your question with an example? How are you passing the filenames? Commented Oct 21, 2013 at 8:43
  • I've added code example. Commented Oct 21, 2013 at 10:04

2 Answers 2

3

As @binfalse points out, the problem is not in the md5sum program, but the way you invoked it. Your code is actually bad on several levels:

  1. You assembled a shell command without escaping. In the worst case, that could lead to the execution of a completely unintended command if one of the filenames happens to be cleverly crafted. That would be horrible, unless you are writing a single-use throwaway script.

  2. The os.popen() function has been deprecated since Python 2.6. The recommended replacement is subprocess.Popen(). Be sure to pass a list as the args parameter, not a concatenated string, to avoid the shell-escaping problem previously mentioned.

    def check(directory):
        files = index(directory)
        hvalues = []
        for f in files:
            cmd = ['md5sum', f]
            proc = subprocess.Popen(cmd, stdout=subprocess.PIPE)
            hvalues.append(proc.stdout.readline())
            proc.stdout.close()
            stat = os.waitpid(proc.pid, 0)
        return hvalues
    
  3. Better yet, use Python's hashlib to calculate the hashes.

5
  • Thank you for your advices! I'm actually currently learning Python, so this is a throwaway pipe-learning script, but in any case your advices are invaluable for me! Commented Oct 21, 2013 at 15:11
  • By 'escaping' do you reffer on adding the try/except clause? Commented Oct 21, 2013 at 15:51
  • @Reloader No: “escaping” means using quotes in the shell fragment, so that the shell knows which spaces separate arguments and which spaces are part of an argument. The spaces and other special characters that are part of a file name need to be escaped, i.e. quoted in some fashion. Commented Oct 21, 2013 at 15:55
  • Can you help me in replacing os.popen() with subprocess.Popen()? Commented Oct 22, 2013 at 18:45
  • I'm getting proc = subprocess.Popen(cmd, stdout=subprocess.PIPE) NameError: global name 'subprocess' is not defined although I declared from subprocess import Popen, PIPE in the first line... Commented Oct 23, 2013 at 9:22
1

That's not a lack of abilities in the md5tool but a general command line restriction. Arguments are separated by spaces. So if you pass a file name containing spaces to md5sum it will interpret each token as a single file. You can get around that by surrounding the file name with quotation marks. That said, try replacing the line

cmd = 'md5sum' + ' ' + x

with

cmd = 'md5sum' + ' "' + x + '"'

and your command line call will look like

md5sum "file name with spaces.mp3"

Thus, md5sum will calculate the hash without complaining.

0

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.