This uses codeop.compile_command to attempt to compile the code. This is the same logic that the code module does to determine whether to ask for another line or immediately fail with a syntax error.
import codeop
def is_valid_code(line):
try:
codeop.compile_command(line)
except SyntaxError:
return False
else:
return True
It can be used as follows:
>>> is_valid_code('for i in range(10):')
True
>>> is_valid_code('')
True
>>> is_valid_code('x = 1')
True
>>> is_valid_code('for j in range(10 in range(10(')
True
>>> is_valid_code('x = ++-+ 1+-')
False
I'm sure at this point, you're saying "what gives? for j in range(10 in range(10( was supposed to be invalid!" The problem with this line is that 10() is technically syntactically valid, at least according to the Python interpreter. In the REPL, you get this:
>>> 10()
Traceback (most recent call last):
File "<pyshell#22>", line 1, in <module>
10()
TypeError: 'int' object is not callable
Notice how this is a TypeError, not a SyntaxError. ast.parse says it is valid as well, and just treats it as a call with the function being an ast.Num.
These kinds of things can't easily be caught until they actually run.
If some kind of monster managed to modify the value of the cached 10 value (which would technically be possible), you might be able to do 10(). It's still allowed by the syntax.
What about the unbalanced parentheses? This fits the same bill as for i in range(10):. This line is invalid on its own, but may be the first line in a multi-line expression. For example, see the following:
>>> is_valid_code('if x ==')
False
>>> is_valid_code('if (x ==')
True
The second line is True because the expression could continue like this:
if (x ==
3):
print('x is 3!')
and the expression would be complete. In fact, codeop.compile_command distinguishes between these different situations by returning a code object if it's a valid self-contained line, None if the line is expected to continue for a full expression, and throwing a SyntaxError on an invalid line.
However, you can also get into a much more complicated problem than initially stated. For example, consider the line ). If it's the start of the module, or the previous line is {, then it's invalid. However, if the previous line is (1,2,, it's completely valid.
The solution given here will work if you only work forward, and append previous lines as context, which is what the code module does for an interactive session. Creating something that can always accurately identify whether a single line could possibly exist in a Python file without considering surrounding lines is going to be extremely difficult, as the Python grammar interacts with newlines in non-trivial ways. This answer responds with whether a given line could be at the beginning of a module and continue on to the next line without failing.
It would be better to identify what the purpose of recognizing single lines is and solve that problem in a different way than trying to solve this for every case.
x =+ 1is syntactically valid. It assigns+1tox.for j in range(10also possibly syntatically valid)for j in range(10is also valid if the next line continues with something like):, andif x < 3could be part of a multi-line expression as well. Almost anything could be part of a multi-line string, too.foris still syntactically valid. The assignment isn't quite valid any more unless, say, it's part of a triple-quoted string or a line-continued comment. I don't think you quite understand what you're trying to do.