CS421 Lecture 8: Lexing and ocamllex, Study notes of Computer Science

A set of lecture notes from the university of illinois at urbana-champaign's cs421 course, covering the topic of lexing and using the ocamllex tool. The notes include an overview of lexing and parsing, the strategy of using regular expressions and finite automata to recognize tokens, and a getting started guide for using ocamllex. The document also includes examples and explanations of regular expressions in ocamllex.

Typology: Study notes

Pre 2010

Uploaded on 03/16/2009

koofers-user-dzj
koofers-user-dzj 🇺🇸

10 documents

1 / 30

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Outline
Overview
Lexing
ocamllex
Activity
CS421 Lecture 8: Lexing and ocamllex1
Mark Hills
University of Illinois at Urbana-Champaign
June 16, 2008
1Based on slides by Mattox Beckman, as updated by Vikram Adve, Gul
Agha, and Elsa Gunter
Mark Hills CS421 Lecture 8: Lexing and ocamllex 1 / 30
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e

Partial preview of the text

Download CS421 Lecture 8: Lexing and ocamllex and more Study notes Computer Science in PDF only on Docsity!

Outline Overview Lexing ocamllex Activity

CS421 Lecture 8: Lexing and ocamllex

Mark Hills

[email protected]

University of Illinois at Urbana-Champaign

June 16, 2008

Based on slides by Mattox Beckman, as updated by Vikram Adve, Gul

Agha, and Elsa Gunter

Outline Overview Lexing ocamllex Activity

1 Overview

(^2) Lexing

(^3) ocamllex

(^4) Activity

Outline Overview Lexing ocamllex Activity

Lexing and Parsing

Strings are converted into ASTs in two phases:

Lexing Convert strings (streams of characters) into lists (or

streams) of tokens, representing words in the

language

Parsing Convert lists of tokens into abstract syntax trees

Outline Overview Lexing ocamllex Activity

Overview Strategy Options

Lexing

With lexing, we break sequences of characters into different

syntactic categories, called tokens. As an example, we could break

this:

asd 123 jkl 3.

into this:

[String ‘‘asd’’, Int 123; String ‘‘jkl’’; Float 3.14]

Outline Overview Lexing ocamllex Activity

Overview Strategy Options

Lexing: Multiple Tokens

To solve this, we will modify the behavior of the DFA.

if we find a character where there is no transition from the

current state, stop processing the string

if we are in an accepting state, return the token corresponding

to what we found as well as the remainder of the string

now, use iterator or recursion to keep pulling out more tokens

if we were not in an accepting state, fail – invalid syntax

Outline Overview Lexing ocamllex Activity

Overview Strategy Options

Lexing Options

We could write a lexer by writing regular expressions, and then

translating these by hand into a DFA. That sounds tedious and

repetitive – perfect for a computer! Can we write a program that

takes regular expressions and generates automata for us?

Someone already did – Lex!

OCaml version of this is ocamllex

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Mechanics of Using ocamllex

Lexer definitions using ocamllex are written in a file with a

.mll extension. The file includes the regular expressions in a

table, with associated actions for each.

OCaml code for the lexer is generated with

ocamllex file.mll

This generates the code for the lexer in file file.ml

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Sample Lexer

1 rule main = parse

2 | [’0’-’9’]+’.’[’0’-’9’]+ { print_string "Float\n"}

3 | [’0’-’9’]+ { print_string "Int\n"}

4 | [’a’-’z’]+ { print_string "String\n"}

5 | _ { main lexbuf }

7 let newlexbuf = (Lexing.from_channel stdin) in

8 print_string "Ready to lex.\n";

9 main newlexbuf

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

ocamllex Input

header and footer contain arbitrary OCaml code to insert into

generated .ml file

shorthands for regular expressions can be introduced with

let ident = regexp

multiple entry points turn into multiple functions in the .ml

file, with the given arguments and an additional argument for

the lexing buffer

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Regular Expressions in ocamllex

The regular expression format is similar to what we’ve seen so far,

but still slightly different.

Single quoted characters for letters: ’a’

Underscores match any letter:

End-of-file marker: eof

Concatenation of most expressions same as before

Concatenation of character sequence shown as a string:

‘‘while’’

Choice: instead of e 1 + e 2 , it is e 1 | e 2

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

For more information...

The page for the ocamllex tool is at

http://caml.inria.fr/pub/docs/manual-ocaml/manual026.html

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Example

2 type result = Int of int

3 | Float of float

4 | String of string

6 let digit = [’0’-’9’]

7 let digits = digit +

8 let lower_case = [’a’-’z’]

9 let upper_case = [’A’-’Z’]

10 let letter = upper_case | lower_case

11 let letters = letter +

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Example

1 # #use "test.ml";;

3 val main : Lexing.lexbuf -> result =

4 val __ocaml_lex_main_rec :

5 Lexing.lexbuf -> int -> result =

6 Ready to lex.

7 hi there 234 5.

8 - : result = String "hi"

What happened to the rest?

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

What went wrong?

How do we get the lexer to look at more than one token?

The action has to tell it to look for more – we need recursion

Side benefit – we can add “state” into the calls, since we can

pass information as parameters

Note: we are already doing this with the case