TL;DR:
Do not write explicit types! Symbolic processing is easy in OCaml, use it to your advantage instead of mimicking cryptic algorithms which are only useful in languages where symbolic processing is a pain – as C, C++ or Java.
In OCaml, we never write explicitly write types, as they do in C++ or Java. There is usually no point to do that, since the compiler computes the types for you. If you want to see the types inferred, you can use the -i option of the compiler, or better, use merlin to see the types interactively.
When we write larger programs, we write signature files where we specify the signatures of public functions and abstract types, but this more a software engineering technique you should not bother much while you do your very first steps.
If you do not have to handle the typing information by yourself, your code is much easier to edit, take advantage of this freedom instead of binding yourself again with your old ties!
I could let your code run after a few minor modifications and found out it did not work as expected. When I read your code, I have difficulties to understand what happens, it is probably a bit too clever – and broken.
There is an important object of the problem that does not appear in your code, it is the abstract syntactic tree (AST) associated to your expression. If you can generate it from infix, it is trivial to generate the prefix or postfix expression out of it. In languages like C, C++ or Java, working with ASTs is painful, but ML languages were designed to make this easy.
Here is how we define the type of arithmetic expressions:
type t =
| Integer of int
| Binary of binary * t * t
and binary =
| Add
| Sub
| Mult
| Div
It is very easy to handle, say if you want to evaluate an expression
let rec eval = function
| Integer(k) -> k
| Binary(op, a, b) ->
(match op with
| Add -> ( + )
| Sub -> ( - )
| Mult -> ( * )
| Div -> ( / )) (eval a) (eval b)
or implement algebraic expansion
let rec expand = function
| (Integer(_) | Binary((Add | Sub), _, _)) as a -> a
| Binary(Mult, Binary(Add, a, b), c) -> …
I do not write the full function, but I hope you get the idea.
Now, the best way to solve our problem is to write a parser converting infix expression from their concrete syntax to an AST. In practical situations, we should use ocamllex and ocamlyacc to generate the lexer and the parser for arithmetic expressions, but let us write everything manually.
The first step is lexing, which you got pretty right, but let me rewrite it this way, defining a type for tokens and using a more capable regular expression:
type token =
| INTEGER of int
| ADD
| SUB
| MULT
| DIV
let lexer s =
let open Str in
split (regexp " +\\|\\b *") s
|> List.map
(function
| "+" -> ADD
| "-" -> SUB
| "*" -> MULT
| "/" -> DIV
| n -> (try INTEGER(int_of_string n)
with _ -> ksprintf failwith "lexer: Invalid token: %s" n))
You can see the \b code for word boundary, it allows to process correctly text with fewer spaces:
# lexer "1+41*2-5/ 7";;
- : token list =
[INTEGER 1; ADD; INTEGER 41; MULT; INTEGER 2; SUB; INTEGER 5; DIV; INTEGER 7]
Once we have converted the string to a token list, it is very easy to analyse its structure, using a continuation-passing-style-mutually-recursive-functions which is much easier as the pedantic designation I wrote can suggest it. :)
We first need a help function
let _binop op a b =
Binary(op, a, b)
Now the parser itself looks like this:
let rec parser_entry cont =
function
| [] -> failwith "Unexpected end of input"
| INTEGER(k) :: ADD :: tl -> cont(parser_entry (_binop Add (Integer(k))) tl)
| INTEGER(k) :: SUB :: tl -> cont(parser_entry (_binop Sub (Integer(k))) tl)
| INTEGER(k) :: MULT :: tl -> cont(parser_mult (_binop Mult (Integer(k))) tl)
| INTEGER(k) :: DIV :: tl -> cont(parser_mult (_binop Div (Integer(k))) tl)
| INTEGER(k) :: [] -> cont (Integer(k))
| _ -> failwith "Syntax error"
and parser_mult cont =
function
| [] -> failwith "Unexpected end of input"
| INTEGER(k) :: ADD :: tl ->
(_binop Add (cont (Integer(k)))) (parser_entry (fun x -> x) tl)
| INTEGER(k) :: SUB :: tl ->
(_binop Sub (cont (Integer(k)))) (parser_entry (fun x -> x) tl)
| INTEGER(k) :: MULT :: tl ->
cont (parser_mult (_binop Mult (Integer(k))) tl)
| INTEGER(k) :: DIV :: tl ->
cont (parser_mult (_binop Div (Integer(k))) tl)
| INTEGER(k) :: [] -> cont (Integer(k))
| _ -> failwith "Syntax error"
and parser lst =
parser_entry (fun x -> x) lst
The parser_entry function is the entry point for the parser and parser_mult reads multiplications and divisions. The functions have two parameters cont the famous continuation and the list of tokens they need to process. The continuation encodes the current state of the parser, it is reminiscent of the stack found in LALR parsers. As you see, when parsing in the entry point, the continuation is the outermost function called but in when parsing operators with a priority, the continuation is deeply nested, which is how the priority is represented in the program. Let's try our parser:
# parser(lexer("1"));;
- : t = Integer 1
# parser(lexer("1 +2 - 3"));;
- : t =
Binary (Add, Integer 1, Binary (Sub, Integer 2, Integer 3))
# parser(lexer "1+41*2-5/ 7");;
- : t =
Binary (Add, Integer 1,
Binary (Sub, Binary (Mult, Integer 41, Integer 2),
Binary (Div, Integer 5, Integer 7)))
Now it is is trivial to convert an abstract expression to prefix notation:
let rec to_prefix_string = function
| Integer(k) -> string_of_int k
| Binary(op, a, b) ->
String.concat " " [
(match op with
| Add -> "+"
| Sub -> "-"
| Mult -> "*"
| Div -> "/");
to_prefix_string a;
to_prefix_string b
]
The advantage of symbolic processing is that the code is very explicit! Now we write a convenience function:
let infix_to_prefix s =
to_prefix_string(parser(lexer s))
and try it!
# infix_to_prefix "1+41*2-5/ 7";;
- : string = "+ 1 - * 41 2 / 5 7"