Cunning Plan - Programming Languages - Slides | CS 4610, Study notes of Programming Languages

Material Type: Notes; Professor: Weimer; Class: Programming Languages; Subject: Computer Science; University: University of Virginia; Term: Spring 2008;

Typology: Study notes

Pre 2010

Uploaded on 03/19/2009

koofers-user-1c8-2
koofers-user-1c8-2 ๐Ÿ‡บ๐Ÿ‡ธ

9 documents

1 / 39

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
#1
Lexical Analysis
Lexical Analysis
Finite Automata
Finite Automata
(Part 1 of 2)
(Part 1 of 2)
Cool
Demo?
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27

Partial preview of the text

Download Cunning Plan - Programming Languages - Slides | CS 4610 and more Study notes Programming Languages in PDF only on Docsity!

Lexical Analysis Lexical Analysis

Finite Automata Finite Automata

(Part 1 of 2) (Part 1 of 2)

Cool Demo?

Cunning Plan

  • Informal Sketch of Lexical Analysis
    • LA identifies tokens from input string
    • (^) lexer : (char list) โ†’ (token list)
  • Issues in Lexical Analysis
    • Lookahead
    • Ambiguity
  • Specifying Lexers
    • Regular Expressions
    • Examples

Fold Batter Lightly ...

  • fold_left f a [1;...;n] == f (... (f (f a 1) 2)) n
    • fold_left (fun a e -> e :: a) [] [1;2;3]
      • = [3;2;1]
    • fold_left (fun a e -> a @ [e]) [] [1;2;3]
      • (^) = [1;2;3]
  • fold_right f [1;...;n] b == f 1 (f 2 (... (f n b)))
    • fold_right (fun a e -> e :: a) [1;2;3] []
      • = [1;2;3]
    • fold_right (fun e a -> a @ [e]) [1;2;3] []
      • = [3;2;1]

Structure of an Interpreter Source Lexical Analysis List of Tokens Abstract Syntax Tree Parsing Optimization Run It! Code Generation Machine Code (Interpreter) (Compiler)

What's a Token?

  • Output of lexical analysis is a list of tokens
  • A token is a syntactic category
    • In English:
      • noun, verb, adjective, ...
    • In a programming language:
      • Identifier, Integer, Keyword, Whitespace, ...
  • Parser relies on token distinctions:
    • e.g., identifiers are treated differently than keywords

Tokens

  • Tokens correspond to sets of strings.
  • (^) Identifier : strings of letters or digits, starting

with a letter

  • Integer : a non-empty string of digits
  • Keyword : โ€œelseโ€ or โ€œifโ€ or โ€œbeginโ€ or ...
  • Whitespace : a non-empty sequence of blanks,

newlines, and/or tabs

  • OpenPar : a left-parenthesis

Example

  • Recall:
    • if (i == j)\n\tz = 0;\nelse\n\tz = 1;
  • Token-lexeme pairs returned by the lexer:
    • <Keyword, โ€œifโ€>
    • <Whitespace, โ€œ โ€>
    • <OpenPar, โ€œ(โ€>
    • <Identifier, โ€œiโ€>
    • <Whitespace, โ€œ โ€>
    • <Relation, โ€œ==โ€>
    • <Whitespace, โ€œ โ€>
    • ...

Lexical Analyzer: Implementation

  • The lexer usually discards โ€œuninterestingโ€

tokens that don't contribute to parsing.

  • Examples: Whitespace, Comments
    • Exception: which language cares about whitespace?
  • Question: What happens if we remove all

whitespace and comments prior to lexing?

Still Needed

  • A way to describe the lexemes of each token
    • Recall: lexeme = โ€œthe substring corresponding to the tokenโ€
  • A way to resolve ambiguities
    • Is if two variables i and f?
    • Is == two equal signs = =?

Languages

  • Definition. Let ฮฃ be a set of characters. A language over ฮฃ is a set of strings of characters drawn from ฮฃ. ฮฃ is called the alphabet.

Notation

  • Languages are sets of strings
  • (^) We need some notation for specifying which

sets we want

  • that is, which strings are in the set
  • For lexical analysis we care about regular

languages , which can be described using

regular expressions.

Regular Expressions

  • Each regular expression is a notation for a

regular language (a set of words)

  • You'll see the exact notation in minute!
  • If A is a regular expression then we write L(A)

to refer to the language denoted by A

Compound Regular Expressions

  • Union
    • (^) L(A | B) = { s | s โˆˆ L(A) or s โˆˆ L(B) }
  • Examples:
    • L('if' | 'then' | 'else') = { โ€œifโ€, โ€œthenโ€, โ€œelseโ€ }
    • L('0'|'1'|'2'|'3'|'4'|'5'|'6'|'7'|'8'|'9') = what?
  • Fun Example:
    • L( ('0'|'1') ('0'|'1') ) = {โ€œ00โ€,โ€01โ€,โ€10โ€,โ€11โ€}

Starz!

  • So far we have only finite languages
  • Iteration: A*
    • (^) L(A*) = {โ€œโ€} โˆช L(A) โˆช L(AA) โˆช L(AAA) ...
  • Examples:
    • L('0'*) = {โ€œโ€, โ€œ0โ€, โ€œ00โ€, โ€œ000โ€, โ€œ0000โ€, ... }
    • L('1''0'*) = {โ€œ1โ€, โ€œ10โ€, โ€œ100โ€, โ€œ1000โ€, ...}
  • Empty: ฮต
    • (^) L(ฮต) = { โ€œโ€ }