17

It is very similar to this:

How to tell if a string contains valid Python code

The only difference being instead of the entire program being given altogether, I am interested in a single line of code at a time.

Formally, we say a line of python is "syntactically valid" if there exists any syntactically valid python program that uses that particular line.

For instance, I would like to identify these as syntactically valid lines:

for i in range(10):

x = 1

Because one can use these lines in some syntactically valid python programs.

I would like to identify these lines as syntactically invalid lines:

for j in range(10 in range(10(

x =++-+ 1+-

Because no syntactically correct python programs could ever use these lines

The check does not need to be too strict, it just need to be good enough to filter out obviously bogus statements (like the ones shown above). The line is given as a string, of course.

16
  • 2
    FYI, x =+ 1 is syntactically valid. It assigns +1 to x. Commented May 3, 2016 at 19:39
  • 2
    What about implicit line concatenation (which would make for j in range(10 also possibly syntatically valid) Commented May 3, 2016 at 19:39
  • 3
    for j in range(10 is also valid if the next line continues with something like ):, and if x < 3 could be part of a multi-line expression as well. Almost anything could be part of a multi-line string, too. Commented May 3, 2016 at 19:40
  • 3
    I think the question that you need to answer is why you need/want to do this Commented May 3, 2016 at 19:41
  • 2
    The for is still syntactically valid. The assignment isn't quite valid any more unless, say, it's part of a triple-quoted string or a line-continued comment. I don't think you quite understand what you're trying to do. Commented May 3, 2016 at 19:47

2 Answers 2

17

This uses codeop.compile_command to attempt to compile the code. This is the same logic that the code module does to determine whether to ask for another line or immediately fail with a syntax error.

import codeop
def is_valid_code(line):
    try:
        codeop.compile_command(line)
    except SyntaxError:
        return False
    else:
        return True

It can be used as follows:

>>> is_valid_code('for i in range(10):')
True
>>> is_valid_code('')
True
>>> is_valid_code('x = 1')
True
>>> is_valid_code('for j in range(10 in range(10(')
True
>>> is_valid_code('x = ++-+ 1+-')
False

I'm sure at this point, you're saying "what gives? for j in range(10 in range(10( was supposed to be invalid!" The problem with this line is that 10() is technically syntactically valid, at least according to the Python interpreter. In the REPL, you get this:

>>> 10()
Traceback (most recent call last):
  File "<pyshell#22>", line 1, in <module>
    10()
TypeError: 'int' object is not callable

Notice how this is a TypeError, not a SyntaxError. ast.parse says it is valid as well, and just treats it as a call with the function being an ast.Num.

These kinds of things can't easily be caught until they actually run. If some kind of monster managed to modify the value of the cached 10 value (which would technically be possible), you might be able to do 10(). It's still allowed by the syntax.

What about the unbalanced parentheses? This fits the same bill as for i in range(10):. This line is invalid on its own, but may be the first line in a multi-line expression. For example, see the following:

>>> is_valid_code('if x ==')
False
>>> is_valid_code('if (x ==')
True

The second line is True because the expression could continue like this:

if (x ==
    3):
    print('x is 3!')

and the expression would be complete. In fact, codeop.compile_command distinguishes between these different situations by returning a code object if it's a valid self-contained line, None if the line is expected to continue for a full expression, and throwing a SyntaxError on an invalid line.

However, you can also get into a much more complicated problem than initially stated. For example, consider the line ). If it's the start of the module, or the previous line is {, then it's invalid. However, if the previous line is (1,2,, it's completely valid.

The solution given here will work if you only work forward, and append previous lines as context, which is what the code module does for an interactive session. Creating something that can always accurately identify whether a single line could possibly exist in a Python file without considering surrounding lines is going to be extremely difficult, as the Python grammar interacts with newlines in non-trivial ways. This answer responds with whether a given line could be at the beginning of a module and continue on to the next line without failing.

It would be better to identify what the purpose of recognizing single lines is and solve that problem in a different way than trying to solve this for every case.

Sign up to request clarification or add additional context in comments.

8 Comments

I agree with your logic. But shouldn't the "for j in range(10 in range(10(" be invalid because it has unmatched parenthesis (a syntax only error)?
No, since that line may continue onto the next line. I'll explain in the answer.
sounds reasonable enough! Seems more safer than adding the "pass" and trying to compile to an AST (the other answer). I will use this
@EvanPu More important would be to know the reasoning for why this needs to be done, and try solving that problem instead.
if you google "sk_p" I believe you'll find my paper on this !
|
-1

I am just suggesting, not sure if going to work... But maybe something with exec and try-except?

code_line += "\n" + ("\t" if code_line[-1] == ":" else "") + "pass"
try:
    exec code_line
except SyntaxError:
    print "Oops! Wrong syntax..."
except:
    print "Syntax all right"
else:
    print "Syntax all right"

Simple lines should cause an appropriate answer

6 Comments

I was just about to suggest the += "pass" approach. You might want to .rstrip the line though. Also, you don't need the new line and the indentation.
Executing lines is opening Pandora's box. Let's see if while True: is syntactically valid. How about import os; os.system('rm -rf /')?
@JohnKugelman Right, but this is going to need to be sand-boxed anyways. If OP is randomly generating programs, some of them may not halt, and some of them affect the environment.
@JohnKugelman You're right... But I can't think of any way to do it without simply making python interpreter... Does someone know of a way to execute python code without really executing it? Stupid question I know but can help much with this
this is similar to what my labmate and I discussed just now. I will try this approach and report back in an hour. thanks!
|

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.