1

Below I have two chunks of code where -except for one line- all the rest is the same.

for id in ids_list:
    id_dir = os.path.join(dir, id)
    os.chdir(id_dir)
    for path in glob('*' + file_extention):
        with open(path) as file:
        # count number of lines in file
            names[path] = sum(1 for line in file if line.strip())

for id in ids_list:
    id_dir = os.path.join(dir, id)
    os.chdir(id_dir)
    for path in glob('*' + file_extention):
        with open(filepath) as file:
            # get file content
            content = file.read()

I was wondering if there is a way to create a method (which maybe will have the ids_list, the file_extention and of course the statement (either to count the number of lines or get the content) as arguments. I am struggling on how to do that with the statement. Any help, especially illustrated with some example code since I am Python newbie will be great.

5
  • You could use a lambda: my_function(ids_list, file_extension, lambda: len(ids_list)). The lambda could then be called like a function inside your function definition. Commented Apr 6, 2016 at 16:29
  • Without using exec, no. Commented Apr 6, 2016 at 16:29
  • 1
    @chepner Is not Amit Gold's answer a correct way without using exec? Is it not appropriate? Commented Apr 6, 2016 at 16:45
  • 1
    Well, your question was how to pass a statement, not a function that encapsulates the statement. The definition of that function also needs to be designed to communicate its result back to the calling environment, which Amit alludes to but doesn't provide an example. Commented Apr 6, 2016 at 16:48
  • @chepner I see what you mean, I did not stated my question clearly because I see now that I was missing some important concepts. But, I do not understand what you mean with ''communicating its result back to the calling environment" in this case. If you could provide an answer (on top of Amit's one or separately) for me to see exactly what you mean I would greatly appreciate it. Commented Apr 6, 2016 at 16:51

4 Answers 4

1

Time to use callback function

Your situation is the case where use of callback function can help.

Typically, callback is a function with agreed parameters and sometime also return values. The callback function is passed as an argument to another function, which is calling it passing agreed arguments to it and leaving processing to the callback function.

To make your code working, I had to modify it a bit. All the code comes to one file e.g. with name "et.py"

To explain it, I will show it piece by piece.

Imports

import os
from glob import glob

Callback for processing content read from the file

Your example was reading values into content variable, each loop rewriting it with new value, so finally you would have only last value there.

I modified the code by adding global variable GLOB_CONTENT, to which I was appending the content of each file one by one.

GLOB_CONTENT = []


def read_file_content(path):
    global GLOB_CONTENT
    with open(path) as f:
        # get file content
        content = f.read()
        # do some content processing here
        GLOB_CONTENT.append(content)

Usage of global variables is sometime suspicious, but it is one way of keeping global state of something.

Callback for counting lines - with "memory"

Any function shall be usable as callback (if it follows expected signature). And one case is a method of class instance. It will be derived from dict to be able remembering some values under key name, and it will add a method count_file_lines, taking as argument name of a file:

class FilesLineCounter(dict):
    def count_file_lines(self, path):
        with open(path) as file:
            self[path] = sum(1 for line in file if line.strip())

It counts non-empty lines in the file and remember it in itself.

Function processing the files

The loop can be generalized into function:

def process_ids(dir_path, ids_list, file_extension, callback):
    for itm_id in ids_list:
        id_dir = os.path.join(dir_path, itm_id)
        for path in glob(id_dir + '/*' + file_extension):
            callback(path)

As you see, it gets all the arguments necessary to find proper files, plus callback function used to process the found file.

Finally: put it all together

Here is final part of the code:

if __name__ == "__main__":
    dir_path = "subdir"
    ids_list = ["1", "2"]
    file_extension = ".txt"

    cntr = FilesLineCounter()
    # goint to use the callback magic
    process_ids(dir_path, ids_list, file_extension, cntr.count_file_lines)
    process_ids(dir_path, ids_list, file_extension, read_file_content)

    # time to show our results
    for path, numoflines in cntr.items():
        print("File {} has {} lines".format(path, numoflines))

    for i, content in enumerate(GLOB_CONTENT):
        print("File # {} last 3 bytes are {}".format(i, content[-3:]))

The cntr = FilesLineCounter() creates our special sort of extended dictionary. The cntr is empty dictionary with added method count_file_lines. As the method is usable as a function, we use cntr.count_file_lines as value for callback.

When it is processed by process_ids, we shall find in cntr one key per processed file and each having value with number of non-empty lines in that file.

Similarly we read the content.

Running the $ python et.py I get following output:

File subdir/1/one-plus.txt has 1 lines                                                                                                                                                                      
File subdir/2/empty.txt has 0 lines                                                                                                                                                                               
File subdir/1/one.txt has 8 lines                                                                                                                                                                                 
File subdir/2/long.txt has 42 lines                                                                                                                                                                               
File # 0 last 3 bytes are fa                                                                                                                                                                                      

File # 1 last 3 bytes are hi                                                                                                                                                                                      

File # 2 last 3 bytes are fa                                                                                                                                                                                      

File # 3 last 3 bytes are 
Sign up to request clarification or add additional context in comments.

2 Comments

That is a great answer; learnt so much by looking at it. Thanks. The other approach Amit Gold suggested above, how does it compare with yours? I mean, besides being less object-oriented, is there any other shortcoming or difference?
@nik-fford You are welcome. Amit Gold is using the callback version where last_line is the callback function and do_last_line is my process_ids.
1

Inverted solution - generator

Instead of using a callback, as described in my other answer, the solution can be inverted.

Instead of looping over path values and calling some function with it inside the loop, we may create a generator, yielding path values out and let the code do whatever is to be done.

import os
from glob import glob


def files_to_process(dir_path, ids_list, file_extension):
    for itm_id in ids_list:
        id_dir = os.path.join(dir_path, itm_id)
        for path in glob(id_dir + '/*' + file_extension):
            yield path

if __name__ == "__main__":

    dir_path = "subdir"
    ids_list = ["1", "2"]
    file_extension = ".txt"

    names = {}
    # using the generator first time
    for path in files_to_process(dir_path, ids_list, file_extension):
        with open(path) as f:
            names[path] = sum(1 for line in f if line.strip())

    glob_content = []
    # using the generator the second time
    for path in files_to_process(dir_path, ids_list, file_extension):
        with open(path) as f:
            glob_content.append(f.read())
            names[path] = sum(1 for line in f if line.strip())


    for path, numoflines in names.items():
        print("File {} has {} lines".format(path, numoflines))

    for i, content in enumerate(glob_content):
        print("File # {} last 3 bytes are {}".format(i, content[-3:]))

The function files_to_process is the generator. Calling files_to_process(dir_path, ids_list, file_extension) you get the generator value. If you iterate over it, it will return (yield) all the values it finds to the loop.

Warning: generators can be exhausted. It means, that once it yields one value, next time it will yield another one until there is nothing more to yield and you get no more values.

To get the values again, you have to create the generator again.

To me the code with generator seems more readable.

Comments

0

This can be done by passing one function as a parameter to the other function.

def last_line():
    content = file.read()

def do_last_line(func):
    for id in ids_list:
        id_dir = os.path.join(dir, id)
        os.chdir(id_dir)
            for path in glob('*' + file_extention):
                with open(filepath) as file:
                    func()

do_last_line(last_line)

That should do it, although the content variable won't be available outside of the function. You can return it though.

Another way would be using exec() or eval(), but that is generally considered bad practice.

Comments

0

Surely you could have a method 'read file(IDs_list, ext, type):' .... 'if type == get:' 'get code here' 'else:' 'other code here'

Comments