CS421 Lecture 12: Lexing and ocamllex - Prof. Elsa Gunter, Papers of Computer Science

A set of lecture notes from the university of illinois at urbana-champaign's cs421 course, covering the topic of lexing and using ocamllex for lexical analysis. The notes discuss the process of turning strings of characters into computer instructions through lexing and parsing, and the use of regular expressions and finite automata for recognizing tokens.

Typology: Papers

Pre 2010

Uploaded on 03/16/2009

koofers-user-9hw
koofers-user-9hw 🇺🇸

10 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Outline
Overview
Lexing
ocamllex
Activity
CS421 Lecture 12: Lexing and ocamllex1
Mark Hills
University of Illinois at Urbana-Champaign
June 27, 2006
1Based on slides by Mattox Beckman, as updated by Vikram Adve, Gul
Agha, and Elsa Gunter
Mark Hills CS421 Lecture 12: Lexing and ocamllex
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download CS421 Lecture 12: Lexing and ocamllex - Prof. Elsa Gunter and more Papers Computer Science in PDF only on Docsity!

Outline Overview Lexing ocamllex Activity

CS421 Lecture 12: Lexing and ocamllex

Mark Hills

[email protected]

University of Illinois at Urbana-Champaign

June 27, 2006

Based on slides by Mattox Beckman, as updated by Vikram Adve, Gul

Agha, and Elsa Gunter

Outline Overview Lexing ocamllex Activity

Overview

Lexing

ocamllex

Activity

Outline Overview Lexing ocamllex Activity

Lexing and Parsing

Strings are converted into ASTs in two phases:

Lexing Convert strings (streams of characters) into lists (or

streams) of tokens, representing words in the

language

Parsing Convert lists of tokens into abstract syntax trees

Outline Overview Lexing ocamllex Activity

Overview Strategy Options

Lexing

With lexing, we break sequences of characters into different

syntactic categories, called tokens. As an example, we could break

this:

asd 123 jkl 3.

into this:

[String ‘‘asd’’, Int 123; String ‘‘jkl’’; Float 3.14]

Outline Overview Lexing ocamllex Activity

Overview Strategy Options

Lexing: Multiple Tokens

To solve this, we will modify the behavior of the DFA.

◮ (^) if we find a character where there is no transition from the

current state, stop processing the string

◮ (^) if we are in an accepting state, return the token corresponding

to what we found as well as the remainder of the string

◮ (^) now, use iterator or recursion to keep pulling out more tokens

◮ (^) if we were not in an accepting state, fail – invalid syntax

Outline Overview Lexing ocamllex Activity

Overview Strategy Options

Example

s 0

s 1

s 2

s 3

a − z

a − z

Outline Overview Lexing ocamllex Activity

Overview Strategy Options

How does it work?

We need a few core items to get this working:

◮ (^) Some way to identify the input string – we’ll call this the

lexing buffer

◮ (^) A set of regular expressions that correspond to tokens in our

language

◮ (^) A corresponding set of actions to take when tokens are

matched

The lexer can then take the regular expressions to build state

machines, which are then used to process the lexing buffer. If we

reach an accept state and can take no further transitions, we can

apply the actions.

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Mechanics of Using ocamllex

◮ (^) Lexer definitions using ocamllex are written in a file with a

.mll extension. The file includes the regular expressions in a

table, with associated actions for each.

◮ (^) OCaml code for the lexer is generated with

ocamllex file.mll

◮ (^) This generates the code for the lexer in file file.ml

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

General Lexer Format

1 { header }

2 let ident = regexp ...

3 rule entrypoint [arg1... argn] = parse

4 | regexp { action }

5 | ...

6 | regexp { action }

7 and entrypoint [arg1... argn] = parse

8 ...and ...

9 { trailer }

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

ocamllex Input

◮ (^) header and footer contain arbitrary OCaml code to insert into

generated .ml file

◮ (^) shorthands for regular expressions can be introduced with

let ident = regexp

◮ (^) multiple entry points turn into multiple functions in the .ml

file, with the given arguments and an additional argument for

the lexing buffer

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Regular Expressions, cont.

◮ (^) Character ranges – pick any character in the range, based on

character codes: [c 1 − c 2 ]

◮ (^) Negative character ranges – any character not in the range:

[

c

1 −^ c^2 ]

◮ (^) e∗ has same meaning as we’ve already seen

◮ (^) e+ means one ore more, same as ee∗

◮ (^) e? means one or none, same as e + ǫ

◮ (^) e 1 #e 2 means the characters in e 1 but not in e 2

◮ (^) ident – shorthand for earlier definition of a regular expression

using let

◮ (^) e 1 as id – binds matched string to id

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

For more information...

The page for the ocamllex tool is at

http://caml.inria.fr/pub/docs/manual-ocaml/manual026.html

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Example

1 rule main = parse

2 (digits)’.’digits as f { Float (float_of_string f) }

3 | digits as n { Int (int_of_string n) }

4 | letters as s { String s}

5 | _ { main lexbuf }

6 { let newlexbuf = (Lexing.from_channel stdin) in

7 print_string "Ready to lex.";

8 print_newline ();

9 main newlexbuf

Outline Overview Lexing ocamllex Activity

Getting Started Lexer Input Regular Expressions Example 1 Example 2 Scanning Comments

Example

1 # #use "test.ml";;

2 ...

3 val main : Lexing.lexbuf -> result =

4 val __ocaml_lex_main_rec :

5 Lexing.lexbuf -> int -> result =

6 Ready to lex.

7 hi there 234 5.

8 - : result = String "hi"

What happened to the rest?