Download CS421 Lecture 8: Lexing and ocamllex and more Study notes Computer Science in PDF only on Docsity!
Outline Overview Lexing ocamllex Activity
CS421 Lecture 8: Lexing and ocamllex
Mark Hills
University of Illinois at Urbana-Champaign
June 16, 2008
Based on slides by Mattox Beckman, as updated by Vikram Adve, Gul
Agha, and Elsa Gunter
Outline Overview Lexing ocamllex Activity
1 Overview
(^2) Lexing
(^3) ocamllex
(^4) Activity
Outline Overview Lexing ocamllex Activity
Lexing and Parsing
Strings are converted into ASTs in two phases:
Lexing Convert strings (streams of characters) into lists (or
streams) of tokens, representing words in the
language
Parsing Convert lists of tokens into abstract syntax trees
Outline Overview Lexing ocamllex Activity
Overview Strategy Options
Lexing
With lexing, we break sequences of characters into different
syntactic categories, called tokens. As an example, we could break
this:
asd 123 jkl 3.
into this:
[String ‘‘asd’’, Int 123; String ‘‘jkl’’; Float 3.14]
Outline Overview Lexing ocamllex Activity
Overview Strategy Options
Lexing: Multiple Tokens
To solve this, we will modify the behavior of the DFA.
if we find a character where there is no transition from the
current state, stop processing the string
if we are in an accepting state, return the token corresponding
to what we found as well as the remainder of the string
now, use iterator or recursion to keep pulling out more tokens
if we were not in an accepting state, fail – invalid syntax
Outline Overview Lexing ocamllex Activity
Overview Strategy Options
Lexing Options
We could write a lexer by writing regular expressions, and then
translating these by hand into a DFA. That sounds tedious and
repetitive – perfect for a computer! Can we write a program that
takes regular expressions and generates automata for us?
Someone already did – Lex!
OCaml version of this is ocamllex
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
Mechanics of Using ocamllex
Lexer definitions using ocamllex are written in a file with a
.mll extension. The file includes the regular expressions in a
table, with associated actions for each.
OCaml code for the lexer is generated with
ocamllex file.mll
This generates the code for the lexer in file file.ml
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
Sample Lexer
1 rule main = parse
2 | [’0’-’9’]+’.’[’0’-’9’]+ { print_string "Float\n"}
3 | [’0’-’9’]+ { print_string "Int\n"}
4 | [’a’-’z’]+ { print_string "String\n"}
5 | _ { main lexbuf }
7 let newlexbuf = (Lexing.from_channel stdin) in
8 print_string "Ready to lex.\n";
9 main newlexbuf
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
ocamllex Input
header and footer contain arbitrary OCaml code to insert into
generated .ml file
shorthands for regular expressions can be introduced with
let ident = regexp
multiple entry points turn into multiple functions in the .ml
file, with the given arguments and an additional argument for
the lexing buffer
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
Regular Expressions in ocamllex
The regular expression format is similar to what we’ve seen so far,
but still slightly different.
Single quoted characters for letters: ’a’
Underscores match any letter:
End-of-file marker: eof
Concatenation of most expressions same as before
Concatenation of character sequence shown as a string:
‘‘while’’
Choice: instead of e 1 + e 2 , it is e 1 | e 2
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
For more information...
The page for the ocamllex tool is at
http://caml.inria.fr/pub/docs/manual-ocaml/manual026.html
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
Example
2 type result = Int of int
3 | Float of float
4 | String of string
6 let digit = [’0’-’9’]
7 let digits = digit +
8 let lower_case = [’a’-’z’]
9 let upper_case = [’A’-’Z’]
10 let letter = upper_case | lower_case
11 let letters = letter +
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
Example
1 # #use "test.ml";;
3 val main : Lexing.lexbuf -> result =
4 val __ocaml_lex_main_rec :
5 Lexing.lexbuf -> int -> result =
6 Ready to lex.
7 hi there 234 5.
8 - : result = String "hi"
What happened to the rest?
Outline Overview Lexing ocamllex Activity
Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments
What went wrong?
How do we get the lexer to look at more than one token?
The action has to tell it to look for more – we need recursion
Side benefit – we can add “state” into the calls, since we can
pass information as parameters
Note: we are already doing this with the case