I am trying to insert only certain values from a string into table (ie excluding common words) after tokenization in a Python script.
The incoming string might look like "this is a string I want to parse because it mentions IOT". Out of those individual tokens/words, I want to exclude things like "this" "is" "a" "I" "want", etc - but less common tokens like "string" "parse" etc should be kept.
Currently, I plan to have a table of common words I can reference.
While I could do something like INSERT $term$ WHERE NOT IN(SELECT * FROM excludedterm)
, it seems like there should be a simpler method than building a query per term (and, therefore, a separate check to the db on every term).
Is there a Pythonic way to do an equivalent to NOT IN()...
that SQL supports? Maybe reading the excludes
table into a list, then comparing tokens against it in some kind of NOT IN($list$)
format?