Bugs:
Here, instead of promt it should be prompt:
password: str = getpass(promt="Enter password: ")
If PDF file is not in the current folder, this will throw an error:
if decrypt(os.path.join(folder_name, filename), password):
os.remove(filename)
it should be os.remove(os.path.join(folder_name, filename)) instead.
Docstring:
"""
Decrypts all files in a folder and sub folder with provided password
"""
I think it would be better to specify that only PDF files will be decrypted. Also, it should mention that original copy will be removed. How about:
"""
Decrypts all pdf files in a specified folder
and all its sub-folders with provided password,
saves decrypted copy and removes original encrypted files.
"""
Imports should be grouped in the following order:
- Standard library imports.
- Related third party imports.
- Local application/library specific imports.
You should put a blank line between each group of imports.
Also, it is advised to sort imports alphabetically.
So, instead of:
import os
import argparse
from typing import Tuple
from getpass import getpass
from pathlib import Path
import PyPDF2
You will have:
import argparse
import os
from getpass import getpass
from pathlib import Path
from typing import Tuple
import PyPDF2
Also, before the first function there should be two blank lines. Apart from that, well done! You follow the style guide pretty well!
Instead of using argparse, I'd suggest to take a look at third-party library Click. Why Click?
Example of usage:
For example, here is how you could wrap main function pdf_decrypter:
import click
...
@click.command()
@click.option('--path', '-p',
default='.',
show_default=True,
type=click.Path(),
help='Folder path to look for PDFs to decrypt '
'(absolute or relative).')
@click.password_option(confirmation_prompt=False)
def pdf_decrypter(path: str,
password: str):
...
Running python3 pdf_decrypter.py --help from command line will print:
Usage: pdf_decrypter.py [OPTIONS]
Main routine: Giving a password and path all PDFs which are encrypted get
decrypted in the supplied path. If the password is wrong a message is
displayed.
Options:
-p, --path PATH Folder path to look for PDFs to decrypt (absolute or
relative). [default: .]
--password TEXT
--help Show this message and exit.
Confirmation messages:
Further in the body of the function you can add confirmation message as well, like:
...
if not click.confirm(f'All PDF files will be encrypted in {path}\n'
'Do you want to continue?'):
return
...
This will look like:
python3 pdf_decrypter.py
Password:
All PDF files will be encrypted in /path/to/pdf_files
Do you want to continue? [y/N]:
Putting encryption and decryption together:
And, probably, you would want to put both encryption from your previous post and decryption in the same module later, as they share a lot in common. In this case you would want to use Nesting Commands, for example, like this:
@click.group()
def cli():
pass
@cli.command()
@click.option('--path', '-p',
default='.',
show_default=True,
type=click.Path(),
help='Folder path to look for PDFs to decrypt '
'(absolute or relative).')
@click.password_option(confirmation_prompt=False)
def decrypt(path: str,
password: str):
...
@cli.command()
@click.option('--path', '-p',
default='.',
show_default=True,
type=click.Path(),
help='Folder path to look for PDFs to decrypt '
'(absolute or relative).')
@click.password_option(confirmation_prompt=False)
def encrypt(path: str,
password: str) -> None:
...
And you would call your functions like python pdf_utils.py encrypt or python pdf_utils.py decrypt, etc.
click.Path:
You can specify additional checks, like checking if path exists, path is not a file, and also make Click to resolve path automatically by setting resolve_path as True. So you wouldn't have to use os.path.join later in the code:
@click.option('--path', '-p',
default='.',
show_default=True,
type=click.Path(exists=True,
file_okay=False,
resolve_path=True),
help='Folder path to look for PDFs to decrypt '
'(absolute or relative).')
Iterating over files:
Right now you have two nested loops and two checks of extension and if file is encrypted:
for folder_name, _, filenames in os.walk(path):
for filename in filenames:
if not filename.endswith('.pdf'):
continue
if not is_encrypted(os.path.join(folder_name, filename)):
continue
# decryption logic
This looks pretty cumbersome. And I personally don't like it when the code is nested on two or more levels. We can use pathlib module to simplify this logic:
pdf_paths = Path(path).rglob('*.pdf')
encrypted_pdfs = filter(is_encrypted, pdf_paths)
for pdf in encrypted_pdfs:
# decryption logic
Errors catching:
Now you catch PyPDF2.utils.PdfReadError on the deepest level, in decrypt function. I suggest you to take it out to main function, and not return True or False to indicate success/failure. See, for example, When and how should I use exceptions?
So, in the pdf_decrypter, you will have:
for pdf in encrypted_pdfs:
try:
decrypt(pdf, password)
pdf.unlink()
except PyPDF2.utils.PdfReadError:
print(f'{pdf} could not be decrypted (wrong password)')
pathlib over os:
You almost don't use advantages of the pathlib. I already included some things in previous code examples. But here I will sum up everything.
Opening file:
This:
with open(filename, 'rb') as pdf_file:
will become:
with filepath.open('rb') as pdf_file:
Note that filepath would be a better name here.
Changing filename:
filename_decrypted = filename.strip('.pdf') + "_decrypted.pdf"
could be rewritten as:
decrypted_filename = f'{filepath.stem}_decrypted{filepath.suffix}' # new name
decrypted_filepath = filepath.parent / decrypted_filename # complete path
Removing file:
os.remove(path) can be replaced by path.unlink().
By this moment we removed all os module usages. So, we can remove that import.
Putting it all together:
"""
Decrypts or encrypts all pdf files in a specified folder
and all its sub-folders with provided password,
saves processed copies and removes original files.
"""
from itertools import filterfalse
from pathlib import Path
import click
import PyPDF2
def is_encrypted(filepath: Path) -> bool:
"""Checks if file is encrypted"""
with filepath.open('rb') as pdf_file:
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
return pdf_reader.isEncrypted
def write_copy(filepath: Path,
password: str,
*,
mode: str) -> None:
"""
Writes encrypted or decrypted copy of the file based on the mode
:param filepath: path of the PDF file to be processed
:param password: password to decrypt/encrypt PDF with
:param mode: one of 'encrypt' or 'decrypt'
"""
with filepath.open('rb') as pdf_file:
pdf_reader = PyPDF2.PdfFileReader(pdf_file)
pdf_writer = PyPDF2.PdfFileWriter()
if mode == 'decrypt':
pdf_reader.decrypt(password)
for page_number in range(pdf_reader.numPages):
pdf_writer.addPage(pdf_reader.getPage(page_number))
if mode == 'encrypt':
pdf_writer.encrypt(password)
suffix = {'decrypt': '_decrypted',
'encrypt': '_encrypted'}
processed_filename = f'{filepath.stem}{suffix[mode]}{filepath.suffix}'
processed_filepath = filepath.parent / processed_filename
with processed_filepath.open('wb') as pdf_file_processed:
pdf_writer.write(pdf_file_processed)
@click.group()
def cli():
pass
@cli.command()
@click.option('--path', '-p',
default='.',
show_default=True,
type=click.Path(exists=True,
file_okay=False,
resolve_path=True),
help='Folder path to look for PDFs to decrypt '
'(absolute or relative).')
@click.password_option(confirmation_prompt=False)
def decrypt(path: str,
password: str) -> None:
"""
Decrypts all encrypted PDFs in the supplied path.
If the password is wrong a message is displayed.
Saves processed copies and removes original files.
:param path: folder path to look for PDFs to decrypt
:param password: password to decrypt PDFs with
"""
pdf_paths = Path(path).rglob('*.pdf')
encrypted_pdfs = filter(is_encrypted, pdf_paths)
for pdf in encrypted_pdfs:
try:
write_copy(pdf, password, mode='decrypt')
pdf.unlink()
click.echo(f'Decrypted successfully: {pdf}')
except PyPDF2.utils.PdfReadError:
click.echo(f'{pdf} could not be decrypted (wrong password)')
@cli.command()
@click.option('--path', '-p',
default='.',
show_default=True,
type=click.Path(exists=True,
file_okay=False,
resolve_path=True),
help='Folder path to look for PDFs to decrypt '
'(absolute or relative).')
@click.password_option(confirmation_prompt=False)
def encrypt(path: str,
password: str) -> None:
"""
Encrypts all non-encrypted PDFs in the supplied path.
Saves processed copies and removes original files.
:param path: folder path to look for PDFs to encrypt
:param password: password to encrypt PDFs with
"""
if not click.confirm(f'All PDF files will be encrypted in {path}\n'
'Do you want to continue?'):
return
pdf_paths = Path(path).rglob('*.pdf')
not_encrypted_pdfs = filterfalse(is_encrypted, pdf_paths)
for pdf in not_encrypted_pdfs:
write_copy(pdf, password, mode='encrypt')
pdf.unlink()
click.echo(f'Encrypted successfully: {pdf}')
if __name__ == '__main__':
cli()
Probably, encrypt and decrypt could be put in one function with a mode parameter, like what I did in a write_copy function; but see for yourself if it will make the code more clear.