5
\$\begingroup\$

I'm building a very simple Lisp interpreter, similar to the the one from here. Here is what I have so far for the parsing part, that is, everything from "text is passed to the the program" to "AST is built". How does the below look? What can be improved? The hardest part for me was the recursive read_from_tokens function.

import re
Symbol = str

def pops(L, d=None):
    "Pop from the left of the list, or return the default."
    return L.pop(0) if L else d

def parse(program):
    "Read a Scheme expression from a string and return tokens in AST."
    program = preprocess(program)
    assert program.count(')') == program.count('('), "Mismatched parens"
    return read_from_tokens(tokenize(program))

def preprocess(s):
    "Replace comments with a single space."
    return re.sub(r';.+', ' ', s)

def tokenize(s):
    "Convert a string into a list of tokens."
    return s.replace('(', ' ( ').replace(')', ' ) ').split()

def atom(token):
    "Return a number (only accepted literal) or a symbol."
    try:
        return int(token) if token.isdigit() else float(token)
    except ValueError:
        return Symbol(token)

def read_from_tokens(tokens):
    "Read one or more expression from a sequence of tokens. Returns a list of lists."
    L = []
    while (token := pops(tokens, ')')) != ')':
        L.append(atom(token) if token!='(' else read_from_tokens(tokens))
    return L

if __name__ == '__main__':
    print (parse("(define (+ 2 2) 10) (* pi (* r r))"))
    print (parse("(define r 10)"))
    print (parse("(* pi (* (+ 1 1) r))"))
\$\endgroup\$
3
  • \$\begingroup\$ No string literals? \$\endgroup\$ Commented Mar 31, 2021 at 5:24
  • \$\begingroup\$ @vnp nope, not yet. \$\endgroup\$ Commented Mar 31, 2021 at 7:34
  • 3
    \$\begingroup\$ A lot can be learned from Peter Norvig's Python examples, which usually combine elegance and practicality. If you want to pursue this topic, I recommend Ruslan Spivak's Let's build a simple interpreter. I have referred to them several times while working on an unusual parsing project over the past few months. \$\endgroup\$ Commented Mar 31, 2021 at 17:26

2 Answers 2

2
\$\begingroup\$

Let's avoid some repetition by defining a list of sample cases and then looping over that.

if __name__ == '__main__':
    print (parse("(define (+ 2 2) 10) (* pi (* r r))"))
    print (parse("(define r 10)"))
    print (parse("(* pi (* (+ 1 1) r))"))

Becomes the following, which makes it trivial to add more samples.

if __name__ == '__main__':
    samples = [
        "(define (+ 2 2) 10) (* pi (* r r))",
        "(define r 10)",
        "(* pi (* (+ 1 1) r))"
    ]

    for sample in samples:
        print(parse(sample))

This is also probably a good place to assert that these test cases yield the expected values. If we're not going to use a full-blown unit testing library, then at the very least, we may want to output the raw string and the parsed result.

    for sample in samples:
        print(f"{sample} => {parse(sample)}")
\$\endgroup\$
1
\$\begingroup\$

Overview

The code layout is good, and you used meaningful names for many of the functions and variables. Well done. Here are some minor coding style suggestions.

Documentation

It's great that your functions have docstrings as recommended by the PEP 8 style guide. However, the guide recommends using triple quotes for docstrings (instead of single quotes). This:

"Pop from the left of the list, or return the default."

would be:

"""Pop from the left of the list, or return the default."""

For the read_from_tokens function, it would be helpful if the docstring explicitly mentioned that the function is recursive.

The guide also recommends adding a docstring at the top of the code to summarize its purpose, such as:

"""
Very simple Lisp interpreter

Describe typical input here
Describe output here
"""

Naming

The variable names in the following function are not very descriptive:

def pops(L, d=None):

L might be better as something like tokens, and d might be better as default.

The same applies to L ans s elsewhere in the code.

Layout

Typically, you would omit the space between a function name and the opening left paren:

print(parse("(define (+ 2 2) 10) (* pi (* r r))"))
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.