In my previous post I’ve talked about writing micro compiler in OCaml under 300 lines of source code. There are number of ways to make our work easier and number of source code lines significantly smaller.
Let’s rewrite our micro compiler using tools called lexer and parser generators. We’ll be using tools called ocamllex and ocamlyacc which are distributed with OCaml compiler and are modeled after famous lex and yacc tools for Unix operating systems. Those tools actually have better modern analogues called flex and bison which are described in detail in Flex & Bison: Text Processing Tools book. Nowadays however if you are writing a professional compiler in OCaml I strongly suggest you consider using sedlex and menhir instead of ocamllex and ocamlyacc as both tools are quite outdated and lack some significant features that their modern analogues have such as unicode support for lexing, parameterized parser generation and built-in grammar interpreter. So what are lexer and parser generators? To put it simply ocamllex and ocamlyacc take special .mll and .mly definition files of lexer and parser semantics mixed with OCaml source code and generate an .ml source code files that do the actual token generation and parsing for you. Pretty neat indeed, and it’s actually easier to use than it sounds so let’s rewrite our micro compiler using those tools. We’ll be using original source code of micro compiler as reference only as entire code base needs to be changed. You can see the end result of our rewrite in micro git repository branch called simple. For the actual description of the micro language see my previous post. So let’s get started!