Lexical Analysis - Compilers - Slides | ECS 142, Study notes of Computer Science

Material Type: Notes; Class: Compilers; Subject: Engineering Computer Science; University: University of California - Davis; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-4x3
koofers-user-4x3 🇺🇸

10 documents

1 / 41

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Prof. Su ECS 142 Lecture 3 1
Lexical Analysis
Lecture 3
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29

Partial preview of the text

Download Lexical Analysis - Compilers - Slides | ECS 142 and more Study notes Computer Science in PDF only on Docsity!

Lexical Analysis

Lecture 3

Outline

  • Informal sketch of lexical analysis – Identifies tokens in input string
  • Issues in lexical analysis – Lookahead
    • Ambiguities
  • Specifying lexers – Regular expressions
    • Examples of regular expressions

What’s a Token?

  • A syntactic category – In English: noun, verb, adjective, …
  • In a programming language: Identifier, Integer, Keyword, Whitespace, …

Tokens

  • Tokens correspond to sets of strings.
  • Identifier: starting with a letter strings of letters or digits,
  • Integer: a non-empty string of digits
  • Keyword: “else” or “if” or “begin” or …
  • Whitespace: newlines, and tabs a non-empty sequence of blanks,

Designing a Lexical Analyzer: Step 1

  • Define a finite set of tokens
    • Tokens describe all items of interest
    • Choice of tokens depends on language, design ofparser

Example

  • Recall \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
  • Useful tokens for this expression: Integer, Keyword, Relation, Identifier, Whitespace, (, ), =, ;
  • N.B., (, ), =, ; are tokens, not characters, here

Lexical Analyzer: Implementation

  • An implementation must do two things:
    1. Recognize substrings corresponding to tokens
    2. Return the value or – The lexeme is the substring lexeme of the token

Example

  • Recall: \tif (i == j)\n\t\tz = 0;\n\telse\n\t\tz = 1;
  • Token-lexeme groupings: – Identifier: i, j, z
    • Keyword: if, else– Relation: ==
    • Integer: 0, 1– (, ), =, ; single character of the same name

True Crimes of Lexical Analysis

  • Is it as easy as it sounds?
  • Not quite!
  • Look at some history...

Lexical Analysis in FORTRAN

  • FORTRAN rule: Whitespace is insignificant
  • E.g., VAR1 is the same as VA R
  • A terrible design!

Lexical Analysis in FORTRAN (Cont.)

  • (^) 1.Two important points: The goal is to partition the string. This is implemented by reading left-to-write, recognizingone token at a time
  1. “Lookahead” may be required to decide where onetoken ends and the next token begins

Lookahead

  • Even our simple example has lookahead issues – i vs. if
    • = vs. ==
  • Footnote: FORTRAN Whitespace rulemotivated by inaccuracy of punch card operators

Lexical Analysis in PL/I (Cont.)

  • PL/I Declarations: DECLARE (ARG1,.. ., ARGN)
  • Can’t tell whetherarray reference until after the DECLARE is a keyword or ).
    • Requires arbitrary lookahead!
  • More on PL/I’s quirks later in the course...

Lexical Analysis in C++

  • Unfortunately, the problems continue today
  • C++ template syntax: Foo
  • C++ stream syntax: cin >> var;
  • But there is a conflict with nested templates: Foo<Bar>