1

Here is a program based on CV screening. In this program I have given scoring for each CV to rank them. Now the output just gives me the scoring number I want in the output that it gives me the file name. The program is:

import re
import fitz
import os

# Create an array "zs[]" that store the score values
zs = []

# call the Resume files by calling the folder name
for filename in os.listdir('resume/'):

    # Select only PDF files
    if filename.endswith('.pdf'):
        print(filename)
        os.chdir('C:/Users/M. Abrar Hussain/Desktop/cv/resume')
        pdfFileObj = open(filename, 'rb')

        # Extract the text Data from resume files
        with fitz.open(pdfFileObj) as doc:
            text = ""
            for page in doc:
                text += page.getText()
            print(text)

            # Splitting the Resume Data into many indexes of Array
            p = doc.loadPage(0)
            p_text = p.getText()
            p_lines = p_text.splitlines()

            # Split the information and the data
            split_lst = [i.split(': ', 1) for i in p_lines]
            d = {a: b.rstrip() for a, b in split_lst}

            # Naming the information part
            f = d["Name"]
            g = d["GPA"]
            h = d["Skills"]
            i = d["Experience"]
            p = re.findall(r"[-+]?\d*\.\d+|\d+", i)

            # search the keywords with the data that extract from resume
            search_keywords = ['Laravel', 'Java', 'Python']
            search_exp = ['1', '1.5', '2', '2.5', '3']
            search_gpa = ['2.5', '2.6', '2.7', '2.8', '2.9', '3.0', '3.1', '3.2', '3.3', '3.4', '3.5', '3.6', '3.7',
                          '3.8', '3.9', '4.0']

            # Comparing GPA data with the keywords
            lst = []
            for gpa in search_gpa:
                if gpa in g:
                    lst.append(gpa)

            # Comparing Skills data with keywords
            lst1 = []
            for word in search_keywords:
                if word in h:
                    lst1.append(word)

            # Comparing Experience data with keywords
            lst2 = []
            for exp in search_exp:
                if exp in p:
                    lst2.append(exp)

            # Scoring the Extracted data to see the best resume
            score = 0
            w1 = []

            # Scoring the GPA
            for w1 in lst:
                if '3.0' <= w1 < '3.5':
                    score += 1
                if '3.5' <= w1 <= '4':
                    score += 2

            # Scoring the SKills
            for w1 in lst1:
                if w1 == 'Laravel':
                    score += 2
                if w1 == 'Python':
                    score += 2
                if w1 == 'Java':
                    score += 1

            # Scoring the Experience
            for w1 in lst2:
                if '2.5' <= w1 < '3':
                    score += 0.5
                if '3' <= w1 < '3.5':
                    score += 1
                if '3.5' <= w1:
                    score += 2

            # store score values in an array
            tt = zs.append(score)

            print("%s has Scored %s" % (f, score))
            print('\n')

            pdfFileObj.close()

# Rank the CV's on the basis of Scoring
zs.sort(reverse=True)
print(zs)

The Output of the program is:

cv2.pdf
Name: Danish Ejaz  
GPA: 3.7  
Skills: Python, Java  
Experience: 2.5 years 

Danish Ejaz has Scored 5.5


cv3.pdf
Name: Abdullah  
GPA: 3.2  
Skills: Laravel, Java  
Experience: 2 years 

Abdullah has Scored 4


cv5.pdf
Name: M. Abrar Hussain  
GPA: 3.5  
Skills: Python, Laravel  
Experience: 3 years 

M. Abrar Hussain has Scored 7


[7, 5.5, 4]

Process finished with exit code 0

The second last line is the result after scoring. In this result it just gives us the scoring number, can I call the file name in the result as well? if yes then kindly help me to complete this project.

4
  • You would like to open the best pdf file in some viewer? Commented Aug 20, 2021 at 9:51
  • can you post what exactly you expect as expected output? Commented Aug 20, 2021 at 9:51
  • @RomanPavelka Yes the scoring is based on experience, Skills, and GPA. Commented Aug 20, 2021 at 9:56
  • @ignoring_gravity [7, 5.5, 4] This is the output of the program a I want the filename with this just like that [7 cv5.pdf, 5.5 cv2.pdf, 4 cv3.pdf] Commented Aug 20, 2021 at 9:58

3 Answers 3

1

The proper way IMHO is to define a class, minimal variant would be

class Candidate:
    def __init__(self, name, score, filename):
        self.name = name
        self.score = score
        self.filename = filename

    def __gt__(self, other):
        return self.score > other.score

    def __str__(self):
       return f'Candidate{self.name, self.filename, self.score}' 

    def __repr__(self):
       return self.__str__()

put this before your main for loop. Then instead of

            tt = zs.append(score)

put

            tt = zs.append(Candidate(f, score, filename))

It should work the same otherwise. Here is little explanatory usage:

class Candidate:
    def __init__(self, name, score, filename):
        self.name = name
        self.score = score
        self.filename = filename

    def __gt__(self, other):
        return self.score > other.score

    def __str__(self):
       return f'Candidate{self.name, self.filename, self.score}'

    def __repr__(self):
       return self.__str__()

# __init__ allows this
a = Candidate("Arnold", 10, "arnold.pdf")
b = Candidate("Betty", 11, "betty.pdf")

# __gt__ allows this
print(a < b)
print(a > b)

# __str__ allows this
print(a)

# __repr__ allows human-readable this
print([a, b])

# __repr__ and __gt__ allows human-readable this
print(sorted([b, a]))

This would print

True
False
Candidate('Arnold', 'arnold.pdf', 10)
[Candidate('Arnold', 'arnold.pdf', 10), Candidate('Betty', 'betty.pdf', 11)]
[Candidate('Arnold', 'arnold.pdf', 10), Candidate('Betty', 'betty.pdf', 11)]
Sign up to request clarification or add additional context in comments.

7 Comments

The Output is: " [<__main__.Candidate object at 0x000001990DA62D30>, <__main__.Candidate object at 0x000001990F5AACA0>, <__main__.Candidate object at 0x000001990DA62DF0>] "
@AbrarHussain yay, right, we need also __repr__(self) function, I have fixed my post now
yeh that right, but one thing is that the score is not an integer value it's a float value. In this output it just shows an integer.
Allright, now it should work with any comparable score...
You are welcome! Well, I have learned something as well :)
|
1

You just have to store your filenames and print them at the end together with the scores:

# Create a dictionary with filenames as key and add the score as value
# Note that this might be an issue if you have irrelevant files in your directory
file_scores = dict.fromkeys(listdir('resume/'))

# call the Resume files by calling the folder name
for filename in file_scores:
    # Your scoring logic
    (...)

    # store score values in the dictionary
    file_scores[filename] = score

    (...)


# Remove items without value
file_scores = {k: v for k, v in file_scores.items() if v}

# Sort the dictionary based on score descending
file_scores = {k: v for k, v in sorted(file_scores.items(), key=lambda x: x[1], reverse=True)}

# Print the file and the score together
for filename, score in file_scores.items():
    if score:  # Ignore other files
        print(f"File {filename}: Score = {score}")

5 Comments

This won't work as it breaks the link between file-name and score. zs is in a different order as files.
@CHecht Ah you are totally right! I will fix it.
@CHecht I guess I fixed it :)
@Rafael-WO it gives me error: " Traceback (most recent call last): File "C:\Users\M. Abrar Hussain\Desktop\cv\difflib.py", line 108, in <module> file_scores = {k: v for k, v in sorted(file_scores.items(), key=lambda x: x[1], reverse=True)} TypeError: '<' not supported between instances of 'int' and 'NoneType' "
@AbrarHussain I added an additional line which fixes this error. You basically have to remove all items with value None first.
0

The easiest way to achieve what you want is by using table-structures. Given that you have similar fields in each result, you could for instance create a pd.DataFrame that you fill with all your values and then afterwards use sort_values and select the score column.

There are of course other alternatives where you use np.argsort or similar, but a DataFrame is probably the easiest way to achieve what you are after.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.