0

I am writing a script in python in order to navigate to a folder on my desktop, read files (using glob patterns as I will be adding files everyday or so) and copying their content in one separate .txt file.

I wrote the below script:

#!/usr/bin/env python3
with open('../python_diary.txt', 'w') as outfile:
    for filename in glob.glob('../Desktop/diary/*-2020.txt'):
        with open(filename) as infile:
            for line in infile:
                outfile.write(line)

The script generally works fine, but my files are in a dd-mm-yyyy format and when launching the script they appear in my destination file in the following order(up to today): 19-06-2020 17-06-2020 16-06-2020 18-06-2020

Any idea how I can make these concatenated files appear from oldest to newest?

Thanks,

3
  • What do you mean by appear? In the output of ls? If you want alpha-numeric sorting to work on dates, you should use ISO 8601 date format (YYYY-MM-DD). Commented Jun 19, 2020 at 18:19
  • You mean for line in reversed(infile.readlines()): outfile.write(line)? Commented Jun 19, 2020 at 18:22
  • Yes sorry for my lack of vocabulary. The output is in order as described in my post. So I should "simply" change the date format on my system (new to linux here) and it should work? Or should I add this in my script? Commented Jun 19, 2020 at 18:26

1 Answer 1

2

You can perform a sort on the glob with a few tricks to get to the datetime. Assuming your timestamps are all zero-padded months and days with a 4-digit year, this will work for you:

import os
from glob import glob

# Grab the filenames matching this glob
filenames = glob.glob('../Desktop/diary/*-2020.txt')
# Sort the filenames by ascending date
def filename_to_isodate(filename):
    date = os.path.basename(filename).rsplit('.', 1)[0][-10:]
    return date[-4:] + date[3:5] + date[:2]

filenames = sorted(filenames, key=filename_to_isodate)
for filename in filenames:
    ...  # Your stuff here...

Explanation os.path.basename gives us the name of the file, e.g., '../Desktop/diary/01-01-2020.txt' becomes '01-01-2020.txt'

rsplit('.', 1)[0][-:10] splits the basename by the period, effectively stripping the extension, and only grabbing what is before the extension. The [-10:] only grabs the 10 characters that make up a date, in this case, 4 for the year + 2 for the month + 2 for the day + 2 dashes = 10 characters.

Last, in the sorting, we use sorted with the key to tell the function to sort by ISO date (year, month, day).


edit: following input from @Daniel F, the strptime from the datetime module is replaced by simply using the date in ISO string format in sorting for speed purposes. Below was the original method used in this answer.

The built-in datetime module can be used to parse the datetime by a given format, in this case: %d-%m-%Y. strptime gives a datetime object that can be treated numerically, meaning that it can be compared and thus sorted. os.path.basename(s).rsplit('.', 1)[0][-10:], '%d-%m-%Y'

Sign up to request clarification or add additional context in comments.

7 Comments

While this is good, I think that just reordering the filename so that it forms something akin to an ISO format instead of using datetime.strptime would be much faster. sortname = name[-4:] + name[3:5] + name[:2] (or something like that)
You can further optimize this if you store the length of the directory path of the first globbed item into a variable outside of the filename_to_isodate-function, and then use it to remove the directory from the filename by using it as an index for the substring. And assuming that .txt is always the same ending extension, the last four characters can just be stripped away as well. Something like date = filename[dirlen:-extlen][-10]. Actually, since he is globbing for *-2020.txt, the .txt extension in guaranteed.
Thanks I'll try this today. So for my understanding the whole piece of code you wrote is to make the file name in a ISO date format so that it is readable as such for my machine right? Then I'll put my code with finding, reading and writing the files as you said.
So the code does not modify any of your files nor the filenames, rather, it uses the date in ISO format to sort the original filenames. You can think of it like assigning a number for sorting to each filename, then the filenames are sorted based on that number. In short, you can just add your code in the loop and you are good to go.
@BaptisteVanlitsenburgh that sounds like you have filenames in that directory that are not valid dates - I would assume given that error you have a file called 1.txt? If that's the case, you can modify the solution to just ignore files that aren't length 10 (plus ".txt"), or an alternative
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.