How to print the file name by calling a function name in the file in python

Question

Here is a program based on CV screening. In this program I have given scoring for each CV to rank them. Now the output just gives me the scoring number I want in the output that it gives me the file name. The program is:

import re
import fitz
import os

# Create an array "zs[]" that store the score values
zs = []

# call the Resume files by calling the folder name
for filename in os.listdir('resume/'):

    # Select only PDF files
    if filename.endswith('.pdf'):
        print(filename)
        os.chdir('C:/Users/M. Abrar Hussain/Desktop/cv/resume')
        pdfFileObj = open(filename, 'rb')

        # Extract the text Data from resume files
        with fitz.open(pdfFileObj) as doc:
            text = ""
            for page in doc:
                text += page.getText()
            print(text)

            # Splitting the Resume Data into many indexes of Array
            p = doc.loadPage(0)
            p_text = p.getText()
            p_lines = p_text.splitlines()

            # Split the information and the data
            split_lst = [i.split(': ', 1) for i in p_lines]
            d = {a: b.rstrip() for a, b in split_lst}

            # Naming the information part
            f = d["Name"]
            g = d["GPA"]
            h = d["Skills"]
            i = d["Experience"]
            p = re.findall(r"[-+]?\d*\.\d+|\d+", i)

            # search the keywords with the data that extract from resume
            search_keywords = ['Laravel', 'Java', 'Python']
            search_exp = ['1', '1.5', '2', '2.5', '3']
            search_gpa = ['2.5', '2.6', '2.7', '2.8', '2.9', '3.0', '3.1', '3.2', '3.3', '3.4', '3.5', '3.6', '3.7',
                          '3.8', '3.9', '4.0']

            # Comparing GPA data with the keywords
            lst = []
            for gpa in search_gpa:
                if gpa in g:
                    lst.append(gpa)

            # Comparing Skills data with keywords
            lst1 = []
            for word in search_keywords:
                if word in h:
                    lst1.append(word)

            # Comparing Experience data with keywords
            lst2 = []
            for exp in search_exp:
                if exp in p:
                    lst2.append(exp)

            # Scoring the Extracted data to see the best resume
            score = 0
            w1 = []

            # Scoring the GPA
            for w1 in lst:
                if '3.0' <= w1 < '3.5':
                    score += 1
                if '3.5' <= w1 <= '4':
                    score += 2

            # Scoring the SKills
            for w1 in lst1:
                if w1 == 'Laravel':
                    score += 2
                if w1 == 'Python':
                    score += 2
                if w1 == 'Java':
                    score += 1

            # Scoring the Experience
            for w1 in lst2:
                if '2.5' <= w1 < '3':
                    score += 0.5
                if '3' <= w1 < '3.5':
                    score += 1
                if '3.5' <= w1:
                    score += 2

            # store score values in an array
            tt = zs.append(score)

            print("%s has Scored %s" % (f, score))
            print('\n')

            pdfFileObj.close()

# Rank the CV's on the basis of Scoring
zs.sort(reverse=True)
print(zs)

The Output of the program is:

cv2.pdf
Name: Danish Ejaz  
GPA: 3.7  
Skills: Python, Java  
Experience: 2.5 years 

Danish Ejaz has Scored 5.5


cv3.pdf
Name: Abdullah  
GPA: 3.2  
Skills: Laravel, Java  
Experience: 2 years 

Abdullah has Scored 4


cv5.pdf
Name: M. Abrar Hussain  
GPA: 3.5  
Skills: Python, Laravel  
Experience: 3 years 

M. Abrar Hussain has Scored 7


[7, 5.5, 4]

Process finished with exit code 0

The second last line is the result after scoring. In this result it just gives us the scoring number, can I call the file name in the result as well? if yes then kindly help me to complete this project.

@RomanPavelka Yes the scoring is based on experience, Skills, and GPA. — Abrar Hussain
– Abrar Hussain, Commented Aug 20, 2021 at 9:56
@ignoring_gravity [7, 5.5, 4] This is the output of the program a I want the filename with this just like that [7 cv5.pdf, 5.5 cv2.pdf, 4 cv3.pdf] — Abrar Hussain
– Abrar Hussain, Commented Aug 20, 2021 at 9:58

Roman Pavelka · Accepted Answer · 2021-08-20 11:11:40Z

The proper way IMHO is to define a class, minimal variant would be

class Candidate:
    def __init__(self, name, score, filename):
        self.name = name
        self.score = score
        self.filename = filename

    def __gt__(self, other):
        return self.score > other.score

    def __str__(self):
       return f'Candidate{self.name, self.filename, self.score}' 

    def __repr__(self):
       return self.__str__()

put this before your main for loop. Then instead of

            tt = zs.append(score)

put

            tt = zs.append(Candidate(f, score, filename))

It should work the same otherwise. Here is little explanatory usage:

class Candidate:
    def __init__(self, name, score, filename):
        self.name = name
        self.score = score
        self.filename = filename

    def __gt__(self, other):
        return self.score > other.score

    def __str__(self):
       return f'Candidate{self.name, self.filename, self.score}'

    def __repr__(self):
       return self.__str__()

# __init__ allows this
a = Candidate("Arnold", 10, "arnold.pdf")
b = Candidate("Betty", 11, "betty.pdf")

# __gt__ allows this
print(a < b)
print(a > b)

# __str__ allows this
print(a)

# __repr__ allows human-readable this
print([a, b])

# __repr__ and __gt__ allows human-readable this
print(sorted([b, a]))

This would print

True
False
Candidate('Arnold', 'arnold.pdf', 10)
[Candidate('Arnold', 'arnold.pdf', 10), Candidate('Betty', 'betty.pdf', 11)]
[Candidate('Arnold', 'arnold.pdf', 10), Candidate('Betty', 'betty.pdf', 11)]

The Output is: " [<__main__.Candidate object at 0x000001990DA62D30>, <__main__.Candidate object at 0x000001990F5AACA0>, <__main__.Candidate object at 0x000001990DA62DF0>] "
@AbrarHussain yay, right, we need also __repr__(self) function, I have fixed my post now
yeh that right, but one thing is that the score is not an integer value it's a float value. In this output it just shows an integer.

Rafael-WO · Accepted Answer · 2021-08-20 12:23:38Z

1

You just have to store your filenames and print them at the end together with the scores:

# Create a dictionary with filenames as key and add the score as value
# Note that this might be an issue if you have irrelevant files in your directory
file_scores = dict.fromkeys(listdir('resume/'))

# call the Resume files by calling the folder name
for filename in file_scores:
    # Your scoring logic
    (...)

    # store score values in the dictionary
    file_scores[filename] = score

    (...)


# Remove items without value
file_scores = {k: v for k, v in file_scores.items() if v}

# Sort the dictionary based on score descending
file_scores = {k: v for k, v in sorted(file_scores.items(), key=lambda x: x[1], reverse=True)}

# Print the file and the score together
for filename, score in file_scores.items():
    if score:  # Ignore other files
        print(f"File {filename}: Score = {score}")

edited Aug 20, 2021 at 12:23

answered Aug 20, 2021 at 9:55

Rafael-WO

1,70018 silver badges30 bronze badges

5 Comments

C Hecht Over a year ago

This won't work as it breaks the link between file-name and score. zs is in a different order as files.

Rafael-WO Over a year ago

@CHecht Ah you are totally right! I will fix it.

Rafael-WO Over a year ago

@CHecht I guess I fixed it :)

Abrar Hussain Over a year ago

@Rafael-WO it gives me error: " Traceback (most recent call last): File "C:\Users\M. Abrar Hussain\Desktop\cv\difflib.py", line 108, in <module> file_scores = {k: v for k, v in sorted(file_scores.items(), key=lambda x: x[1], reverse=True)} TypeError: '<' not supported between instances of 'int' and 'NoneType' "

Rafael-WO Over a year ago

@AbrarHussain I added an additional line which fixes this error. You basically have to remove all items with value None first.

C Hecht · Accepted Answer · 2021-08-20 09:55:55Z

0

The easiest way to achieve what you want is by using table-structures. Given that you have similar fields in each result, you could for instance create a pd.DataFrame that you fill with all your values and then afterwards use sort_values and select the score column.

There are of course other alternatives where you use np.argsort or similar, but a DataFrame is probably the easiest way to achieve what you are after.

answered Aug 20, 2021 at 9:55

C Hecht

1,0168 silver badges16 bronze badges

Collectives™ on Stack Overflow

How to print the file name by calling a function name in the file in python

3 Answers 3

7 Comments

5 Comments

Comments

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

7 Comments

5 Comments

Comments

Related