I am new with Python and I have a Tokenization assignment The Input is a .txt file with sentences and output is .txt file with Tokens, and When I say Token i mean: simple word, ',' , '!' , '?' , '.' ' " '
I have this function: Input: Elemnt is a word with or without Punctuation, could be word like: Hi or said: or said" StrForCheck : is an array of Punctuation that i want to separate from the words TokenFile: is my output file
def CheckIfSEmanExist(Elemnt,StrForCheck, TokenFile):
FirstOrLastIsSeman = 0
for seman in StrForCheck:
    WordSplitOnSeman = Elemnt.split(seman)
    if len(WordSplitOnSeman) > 1:
        if Elemnt[len(Elemnt)-1] == seman:
            FirstOrLastIsSeman = len(Elemnt)-1
        elif Elemnt[0] == seman:
            FirstOrLastIsSeman = 1
if FirstOrLastIsSeman == 1:
    TokenFile.write(Elemnt[0])
    TokenFile.write('\n')
    TokenFile.write(Elemnt[1:-1])
    TokenFile.write('\n')
elif FirstOrLastIsSeman == len(Elemnt)-1:
    TokenFile.write(Elemnt[0:-1])
    TokenFile.write('\n')
    TokenFile.write(Elemnt[len(Elemnt)-1])
    TokenFile.write('\n')
elif FirstOrLastIsSeman == 0:
    TokenFile.write(Elemnt)
    TokenFile.write('\n')
The Code loops over the Punctuation Array, and if he finds one, i check if the Punctuation was the first letter or the last letter in the word, and write in my output file the word and the Punctuation each in a different line
But My Problem is that it works wonderful on the whole text except those words: Jobs" , created" , public" , police"
