Download Lexical Analysis - Compilers - Slides | ECS 142 and more Study notes Computer Science in PDF only on Docsity!
Lexical Analysis
Lecture 3
Outline
- Informal sketch of lexical analysis – Identifies tokens in input string
- Issues in lexical analysis – Lookahead
- Specifying lexers – Regular expressions
- Examples of regular expressions
What’s a Token?
- A syntactic category – In English: noun, verb, adjective, …
- In a programming language: Identifier, Integer, Keyword, Whitespace, …
Tokens
- Tokens correspond to sets of strings.
- Identifier: starting with a letter strings of letters or digits,
- Integer: a non-empty string of digits
- Keyword: “else” or “if” or “begin” or …
- Whitespace: newlines, and tabs a non-empty sequence of blanks,
Designing a Lexical Analyzer: Step 1
- Define a finite set of tokens
- Tokens describe all items of interest
- Choice of tokens depends on language, design ofparser
Example
- Recall \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
- Useful tokens for this expression: Integer, Keyword, Relation, Identifier, Whitespace, (, ), =, ;
- N.B., (, ), =, ; are tokens, not characters, here
Lexical Analyzer: Implementation
- An implementation must do two things:
- Recognize substrings corresponding to tokens
- Return the value or – The lexeme is the substring lexeme of the token
Example
- Recall: \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
- Token-lexeme groupings: – Identifier: i, j, z
- Keyword: if, else– Relation: ==
- Integer: 0, 1– (, ), =, ; single character of the same name
True Crimes of Lexical Analysis
- Is it as easy as it sounds?
- Not quite!
- Look at some history...
Lexical Analysis in FORTRAN
- FORTRAN rule: Whitespace is insignificant
- E.g., VAR1 is the same as VA R
- A terrible design!
Lexical Analysis in FORTRAN (Cont.)
- (^) 1.Two important points: The goal is to partition the string. This is implemented by reading left-to-write, recognizingone token at a time
- “Lookahead” may be required to decide where onetoken ends and the next token begins
Lookahead
- Even our simple example has lookahead issues – i vs. if
- Footnote: FORTRAN Whitespace rulemotivated by inaccuracy of punch card operators
Lexical Analysis in PL/I (Cont.)
- PL/I Declarations: DECLARE (ARG1,.. ., ARGN)
- Can’t tell whetherarray reference until after the DECLARE is a keyword or ).
- Requires arbitrary lookahead!
- More on PL/I’s quirks later in the course...
Lexical Analysis in C++
- Unfortunately, the problems continue today
- C++ template syntax: Foo
- C++ stream syntax: cin >> var;
- But there is a conflict with nested templates: Foo<Bar>