Infix epression evaluation

Question

I'm looking for some feedback on my OCaml implementation of some methods to translate infix expressions to postfix and then evaluate them.

I'm very new to OCaml, coming from C#/Java and JavaScript so pretty much any advice is appreciated, in particular how to make it more idiomatic or "functional".

It seems like I have to reverse my stack at some point because I want to operate on both sides, is this unavoidable? Could I be using better exceptions for my helper functions? Is it good practice to declare helper functions like I did? apply_and_store and str_to_fn could be inside eval_postfix, is it better to have them encapsulated or to keep eval_postfix cleaner?

let explode_to_strings (s : string) : string list =
    Str.split (Str.regexp "[ \t]+") s;;

(* Method from pleac.sourceforge.net *)
let is_Integer (s : string) : bool =
    try ignore (int_of_string s); true with Failure _ -> false;;

let str_to_fn (s : string) : (int -> int -> int) =
    match s with 
    | "+" -> (+)
    | "-" -> (-)
    | "*" -> ( * )
    | "/" -> (/)
    | _ -> raise (Invalid_argument "Not an operator");;

let apply_and_store (lst : 'a list) (fn : 'a -> 'a -> 'a) : 'a list =
    match lst with 
    | one :: two :: tl -> fn two one :: tl
    | _ -> raise (Invalid_argument "List too short");;

let prec (s : string) : int =
    match s with
    | "+" | "-" -> 1
    | "*" | "/" -> 2 
    | _ -> raise (Invalid_argument "Not in operator table");;

let infix_to_postfix (lst : string list) : string list =
    let rec 
        push (elem : string) (output : string list) (ops : string list) : (string list * string list) =
            match ops with
            | [] -> (output, elem :: ops)
            | hd :: tl ->   if prec elem <= prec hd 
                            then push elem (hd :: output) tl
                            else (output, elem :: ops)
    in
    let rec 
        aux (l : string list) (output : string list) (ops : string list) : string list =
            match l with 
            | [] -> ops @ output
            | hd :: tl ->   if is_Integer hd 
                            then aux tl (hd :: output) ops 
                            else let ret = push hd output ops in 
                                aux tl (fst ret) (snd ret)
    in
    aux lst [] [];;

let eval_postfix (lst : string list) : int =
    let rec aux (l : string list) (stack : int list) : int =
        match l with 
        | [] -> List.nth stack 0
        | hd :: tl ->   if is_Integer hd 
                        then aux tl ((int_of_string hd) :: stack)
                        else aux tl (apply_and_store stack (str_to_fn hd)) in
    aux (List.rev lst) [];;

let eval_infix (s : string) : int =
    eval_postfix (infix_to_postfix (explode_to_strings s));;

if you need to invert your input, doesn't it mean that you actually produced prefix expressions? — didierc
– didierc, Commented Oct 24, 2014 at 14:24

Michaël Le Barbier · Accepted Answer · 2015-10-30 15:28:29Z

TL;DR:

Do not write explicit types! Symbolic processing is easy in OCaml, use it to your advantage instead of mimicking cryptic algorithms which are only useful in languages where symbolic processing is a pain – as C, C++ or Java.

In OCaml, we never write explicitly write types, as they do in C++ or Java. There is usually no point to do that, since the compiler computes the types for you. If you want to see the types inferred, you can use the -i option of the compiler, or better, use merlin to see the types interactively.

When we write larger programs, we write signature files where we specify the signatures of public functions and abstract types, but this more a software engineering technique you should not bother much while you do your very first steps.

If you do not have to handle the typing information by yourself, your code is much easier to edit, take advantage of this freedom instead of binding yourself again with your old ties!

I could let your code run after a few minor modifications and found out it did not work as expected. When I read your code, I have difficulties to understand what happens, it is probably a bit too clever – and broken.

There is an important object of the problem that does not appear in your code, it is the abstract syntactic tree (AST) associated to your expression. If you can generate it from infix, it is trivial to generate the prefix or postfix expression out of it. In languages like C, C++ or Java, working with ASTs is painful, but ML languages were designed to make this easy.

Here is how we define the type of arithmetic expressions:

  type t =
    | Integer of int
    | Binary of binary * t * t
  and binary =
    | Add
    | Sub
    | Mult
    | Div

It is very easy to handle, say if you want to evaluate an expression

  let rec eval = function
  | Integer(k) -> k
  | Binary(op, a, b) ->
     (match op with
      | Add -> ( + )
      | Sub -> ( - ) 
      | Mult -> ( * )
      | Div -> ( / )) (eval a) (eval b)

or implement algebraic expansion

  let rec expand = function
  | (Integer(_) | Binary((Add | Sub), _, _)) as a -> a
  | Binary(Mult, Binary(Add, a, b), c) -> …

I do not write the full function, but I hope you get the idea.

Now, the best way to solve our problem is to write a parser converting infix expression from their concrete syntax to an AST. In practical situations, we should use ocamllex and ocamlyacc to generate the lexer and the parser for arithmetic expressions, but let us write everything manually.

The first step is lexing, which you got pretty right, but let me rewrite it this way, defining a type for tokens and using a more capable regular expression:

  type token =
    | INTEGER of int
    | ADD
    | SUB
    | MULT
    | DIV

  let lexer s =
    let open Str in
    split (regexp " +\\|\\b *") s
    |> List.map
      (function
        | "+" -> ADD
        | "-" -> SUB
        | "*" -> MULT
        | "/" -> DIV
        | n -> (try INTEGER(int_of_string n)
                with _ -> ksprintf failwith "lexer: Invalid token: %s" n))

You can see the \b code for word boundary, it allows to process correctly text with fewer spaces:

# lexer "1+41*2-5/ 7";;
- : token list =
[INTEGER 1; ADD; INTEGER 41; MULT; INTEGER 2; SUB; INTEGER 5; DIV; INTEGER 7]

Once we have converted the string to a token list, it is very easy to analyse its structure, using a continuation-passing-style-mutually-recursive-functions which is much easier as the pedantic designation I wrote can suggest it. :)

We first need a help function

  let _binop op a b =
    Binary(op, a, b)

Now the parser itself looks like this:

  let rec parser_entry cont =
    function
    | [] -> failwith "Unexpected end of input"
    | INTEGER(k) :: ADD :: tl -> cont(parser_entry (_binop Add (Integer(k))) tl)
    | INTEGER(k) :: SUB :: tl -> cont(parser_entry (_binop Sub (Integer(k))) tl)
    | INTEGER(k) :: MULT :: tl -> cont(parser_mult (_binop Mult (Integer(k))) tl)
    | INTEGER(k) :: DIV :: tl -> cont(parser_mult (_binop Div (Integer(k))) tl)
    | INTEGER(k) :: [] -> cont (Integer(k))
    | _ -> failwith "Syntax error"
  and parser_mult cont =
    function
    | [] ->  failwith "Unexpected end of input"
    | INTEGER(k) :: ADD :: tl ->
        (_binop Add (cont (Integer(k)))) (parser_entry (fun x -> x) tl)
    | INTEGER(k) :: SUB :: tl ->
        (_binop Sub (cont (Integer(k)))) (parser_entry (fun x -> x) tl)
    | INTEGER(k) :: MULT :: tl ->
        cont (parser_mult (_binop Mult (Integer(k))) tl)
    | INTEGER(k) :: DIV :: tl ->
        cont (parser_mult (_binop Div (Integer(k))) tl)
    | INTEGER(k) :: [] -> cont (Integer(k))
    | _ -> failwith "Syntax error"
  and parser lst =
    parser_entry (fun x -> x) lst

The parser_entry function is the entry point for the parser and parser_mult reads multiplications and divisions. The functions have two parameters cont the famous continuation and the list of tokens they need to process. The continuation encodes the current state of the parser, it is reminiscent of the stack found in LALR parsers. As you see, when parsing in the entry point, the continuation is the outermost function called but in when parsing operators with a priority, the continuation is deeply nested, which is how the priority is represented in the program. Let's try our parser:

# parser(lexer("1"));;
- : t = Integer 1
# parser(lexer("1 +2 - 3"));;
- : t =
Binary (Add, Integer 1, Binary (Sub, Integer 2, Integer 3))
# parser(lexer "1+41*2-5/ 7");;
- : t =
Binary (Add, Integer 1,
 Binary (Sub, Binary (Mult, Integer 41, Integer 2),
  Binary (Div, Integer 5, Integer 7)))

Now it is is trivial to convert an abstract expression to prefix notation:

  let rec to_prefix_string = function
    | Integer(k) -> string_of_int k
    | Binary(op, a, b) ->
        String.concat " " [
          (match op with
           | Add -> "+"
           | Sub -> "-"
           | Mult -> "*"
           | Div -> "/");
          to_prefix_string a;
          to_prefix_string b
        ]

The advantage of symbolic processing is that the code is very explicit! Now we write a convenience function:

  let infix_to_prefix s =
    to_prefix_string(parser(lexer s))

and try it!

# infix_to_prefix "1+41*2-5/ 7";;
- : string = "+ 1 - * 41 2 / 5 7"

Cool answer! Also interesting too see that in OCaml explicit types are discouraged while they are encouraged in Haskell — Caridorc
– Caridorc, Commented Oct 30, 2015 at 15:30
OCaml has to type of files: implementation files (for code) and signature file (for signatures). While it is perfectly possible to write signatures in implementation files, the common practice is to do this only in signature files, where you also put the documentation. In Haskell, there is no such a distinction, so that you usually see signatures and code interleaved. In explorative programming, we usually skip types. :) — Michaël Le Barbier
– Michaël Le Barbier, Commented Oct 30, 2015 at 15:34

Chris · Accepted Answer · 2025-10-28 00:24:29Z

Notes

You've used the ;; token extensively in your code. A well-formed OCaml program has no need for this, as it is composed only of top-level bindings/definitions, rather than expressions. Your code already meets this standard, so the ;; tokens can be removed.

In your prec function you raise Invalid_argument. There is a shorthand for this using the invalid_arg function.

let prec (s : string) : int =
    match s with
    | "+" | "-" -> 1
    | "*" | "/" -> 2 
    | _ -> raise (Invalid_argument "Not in operator table")

let prec (s : string) : int =
    match s with
    | "+" | "-" -> 1
    | "*" | "/" -> 2 
    | _ -> invalid_arg "Not in operator table"

Your is_Integer function doesn't conform to idiomatic naming conventions in OCaml, where it would be more appropriately is_integer, but it also seems a bit unnecessary. It'd be more idiomatic to handle that exception in the functions that are calling is_Integer.

Also, function application in OCaml has higher precedence than most other operators, so you have some unnecessary parens.

let eval_postfix (lst : string list) : int =
    let rec aux (l : string list) (stack : int list) : int =
        match l with 
        | [] -> List.nth stack 0
        | hd :: tl ->   if is_Integer hd 
                        then aux tl ((int_of_string hd) :: stack)
                        else aux tl (apply_and_store stack (str_to_fn hd)) in
    aux (List.rev lst) []

let eval_postfix (lst : string list) : int =
  let rec aux (l : string list) (stack : int list) : int =
    match l with 
    | [] -> List.nth stack 0
    | hd::tl ->   
      try 
        aux tl (int_of_string hd :: stack)
      with Failure _ -> 
        aux tl (apply_and_store stack (str_to_fn hd)) 
  in
  aux (List.rev lst) []

Note also that List.nth stack 0 is equivalent to List.hd stack, and both potentially raise a Failure exception. However, you do nothing to handle this exception. Now, you may have written your code so that stack is never an empty list when this gets called, but it's not going to be a very informative exception if an exceptional situation arises and it gets raised. At the very least you may wish to handle it by raising a more informative exception.

In an expression like the following, encountered in infix_to_postfix you're better off using pattern-matching rather than explicitly calling fst or snd.

let ret = push hd output ops in 
aux tl (fst ret) (snd ret)

let (f, s) = push hd output ops in 
aux tl f s

Revisiting your prec function, let's use a basic association list, rather than having the function hard-code the precedence levels.

let prec (s : string) : int =
    match s with
    | "+" | "-" -> 1
    | "*" | "/" -> 2 
    | _ -> raise (Invalid_argument "Not in operator table")

let prec_levels = [("+", 1); ("-", 1); ("*", 2); ("/", 2)]

let prec (s : string) : int =
  try
    List.assoc s prec_levels
  with Not_found -> 
    invalid_arg "Not in operator table"

With a much larger precedence table we might want to use something that scales better like a set or hashtable, but for this small dataset an association list is sufficient.

Stack Exchange Network

Infix epression evaluation

2 Answers 2

TL;DR:

Notes

You must log in to answer this question.

Hot Network Questions

Infix epression evaluation

2 Answers 2

TL;DR:

Notes

You must log in to answer this question.

Related

Hot Network Questions