Structured and Semistructured Data, XML Basics/Document Type Descriptors | CMPS 180, Study notes of Database Management Systems (DBMS)

Material Type: Notes; Class: Database Systems I; Subject: Computer Science; University: University of California-Santa Cruz; Term: Unknown 2003;

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-k0j
koofers-user-k0j 🇺🇸

10 documents

1 / 28

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
1
Winter 2003 1
Today’s Lecture
Background: documents (SGML/HTML) and databases
(structured and semistructured data)
XML Basics and Document Type Descriptors
These slides were adapted from slides developed at the
University of Pennsylvania (by Peter Buneman and
Susan Davidson)
P art I : B ack g roun d
What’s the difference between documents
and databases?
80% of the world’s data does NOT reside in a
database!
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c

Partial preview of the text

Download Structured and Semistructured Data, XML Basics/Document Type Descriptors | CMPS 180 and more Study notes Database Management Systems (DBMS) in PDF only on Docsity!

Winter 2003 1

Today’s Lecture

  • Background: documents (SGML/HTML) and databases

(structured and semistructured data)

  • XML Basics and Document Type Descriptors
  • These slides were adapted from slides developed at the

University of Pennsylvania (by Peter Buneman and Susan Davidson)

P art I : B ack g roun d

What’s the difference between documents and databases? 80% of the world’s data does NOT reside in a database!

Winter 2003 3

H TM L

  • Lingua franca for publishing hypertext on the World Wide Web
  • Designed to describe how a Web browser should arrange text, images and push-buttons on a page.
  • Easy to learn, but does not convey structure.
  • Fixed tag set.

       !!"#$"&%'!() +,#-. /0) 1 2/    3'425

67  08 "&,#791+'.";:  8 / 6 <=?>@1ACB D09,#E'F 8 2G H;I''FJ K<B JML7N7NJ C 7#=B J.6 OND

/ 3 74P25 / C

QR.STVU TWXZY.W [S]]X1^#_`a7b^ [bc

`d ef<U T.WgXZY.W hji (^) Y.k]l.S demn1XYW b'X XZmZU op]XS*TYVqS b'X XZmZU op]XSrVYVd pS

Winter 2003 4

D ocum en ts v s D atab ases

Document world

plenty of small documents usually static

implicit structure section, paragraph, toc,

tagging

human friendly

content form/layout, annotation

Paradigms “Save as”, wysiwyg meta-data

author name, date, subject

Database world

fewer large databases usually dynamic

explicit structure (schema)

records

machine friendly

content schema, data, methods

Paradigms Atomicity, Concurrency, Isolation, Durability

meta-data schema description

Winter 2003 7

P erson al address b ook ov er 2 0 years stuu v wx&yjz {|]}M~7€j x#|] ‚ ƒ „†ˆ‡.2‡ ‰j‡0wx;y<z {|j} wg„VŠ<‹MŒˆ‡<|< Ž]|]‚ ‹<MŒ Šj† <x;z Š<}Mx#Š w‘}<z ’0Šj† {&z Œ“|M ”0•.z }<–jj† —.y wg˜jz }M—j{2™j<z •]z }0—]{ wg”0•]z }j–j<† —]y”]š;›œM' w<x#|<Œ €<}M• žŸM .š&¡š;›0 <¡ œ<œ<¢<¢Š;£0Œ‡<¤< j¢0¥C¦ §|j† ¨ª© žŸM .š&¡ 0¤j¢M¡ «j¢0«<Ÿ)¦ yM|]‚Š0© v w–M€<}<z ~‰;€M|j | ƒ ‰<† |< ‡.‰0€0|j | w –M€<}jz wg„z ‹.‡j¬ˆ}0 |]† ‚2€0Œ zx#€Š]z {Œ Šj‚z {Œ zx#€ w‘}<z ’0Šj† {&z Œ€•]zjj|]‚2€C®#€)M€<‹jz Š<}M¯Z€ ‡‡‡

st°± v wx&yjz {|]}M~7€j x#|] ‚ ƒ „†ˆ‡.2‡ ‰j‡0wx;y<z {|j} wg„VŠ<‹MŒˆ‡<|< Ž]|]‚ ‹<MŒ Šj† <x;z Š<}Mx#Š ‡‡‡‡ žŸM .š&¡ ²M²M«<¡ «j¢0«<Ÿ)¦ yM|]‚Š0© Ž‚‹M€<³j¨M‡ €0xM‡ Š0•‡ x&{

stt± v wx&yjz {Z|j}<~ €< xª|j ‚ ƒ ‰<† |< ‡.2‡ ‰]‡0wx&yjz {|]} wg„VŠ<‹MŒˆ‡<|< Ž]|]‚ ‹<MŒ z}0—CMx;z Š<}MxªŠ w‘}<z ’0Šj† {&z Œ“|M ´€M{Z—<|<§ wg®&z “<–M€<}j¨2´ €<† •jŠ<}<{ w´ €M{—j|M§μ´7š;›œM' w<x#|<Œ €<}M• žŸ0¤š&¡ M M¥<¡ œ<œ<¢<¢Š;£0Œ‡<¤< j¢0¥ žŸ0¤š&¡ <¢0«<¡ <«Mœ<«)¦ ‹j† z’0€MŒ Š0© žŸM .š&¡ ²M²M«<¡ «j¢0«<Ÿ)¦ yM|]‚Š0© ¶Ÿ0¤š&¡ M M¥<¡ Ÿ<ŸM¥<Ÿ Ž‚‹M€<³j¨M‡ €0xM‡ —] €.‡ xª{ v wx&yjz {Z|j}<~ €< xª|j ‚ ƒ ‰<† |< ‡.2‡ ‰]‡0wx&yjz {|]} w 0¤C¬ˆ};’0Šj† }MŠM{ª{1‰< €0xªŠ wg”0•]z }j–j<† —]y<~ ”j· œj‘.¸

sttu v wx&yjz {Z|j}<~ €< xª|j ‚ ƒ ‰<† |< ‡.2‡ ‰]‡0wx&yjz {|]} wg„VŠ<‹M€<† Œˆ‚1Š<}MŒ|M Ž]|.‚‹j0Œ z}M—C<x;z Š<}Mx#Š ‡‡‡ žŸM .š&¡ ²M²M«<¡ «j¢0«<Ÿ)¦ yM|]‚Š0© ¶Ÿ0¤š&¡ M M¥<¡ Ÿ<ŸM¥<Ÿ Ž‚‹M€<³)•jxª{;‡ —] €]‡ €0xM‡ <¨ ¹ºyMŒ Œ‹V» ¼¼Z§ §§2‡ •jx&{&‡ —. €]‡ €0xM‡ <¨¼&‚ ‹0€

½.¾ ¾¿ À

Winter 2003 8

S w issp rot <Á66>13VÂ2A ÃA) >2 CÄ2 @)Å Æ@'CÅÈÇÉN! 2 G

AÊÆ6Ë7Ì'ÇÇÅ ÊN6ÍÎj CÄÍj6VÏ7ÏNÑÐM@  G 6VË7Ò A @   Ó ÊN6ÍÎj CÄÍj6VÏ7ÏNÑÐM@  G 6VË7Ò< >º>2 ÔgÃ Ä A) ÑÃÆ2  Ó ÊN6Í0Ä 4ÕÍM6 Ï7Ï NÖÐM@ G6V×7ÒCj > K ÄCÄ 4C2 )&4ÄØÃÆ2  Ó  Ù676 >$=]43Ã<ÄÚ31  Û>Ã37ÃĈ?Æ@ AÃ@1>4g@G 4 >ÜA ÃA Ã@32 Ý 2( < ÛÐ<Æ7ÃCÆÞ7<Ä ÓЪKMÄ @>7Ôà >22ÓG 4 Aß ÃÞ @754) )ÅÆj Ä Å à3@754PÆ 25 )Å CÄ=&4>2Æ @  7Åá&A14)5C  4PÄ   'Å 4 AâÕ&4<  7> Å)A ÃA ÃC@32 A)   1G @ ÄÊã<6Vä @Æå> 'ÔÃ Ä A) Öæ'@4gçÄG G @1AÙ>C@ MÄBCA ÕG2Þ Ã@4Þ è   Þ Ã@1Ä2 ÄÞMÄ Å @(ß* .<Ä 7ÅÉÉ6×7×Ì ÇÇ G @é2 151 >1êG ÒC4@1G Ò)Ä;>2MPÃ@ ÚêG Ò1 Þ ë' è èG ÒC2 @ Ä;><PÃ@ ÛjG Å @ì 7Ã@G7ÎG23&4A) êG6 ÌL í ×L'ÌÍ&×7ËL7Ð]6Ï É7ÉÓ.G @ ÄÊã&L'ä @Æå> 'ÔÃ Ä A) !4PæμL L7ͪËNî CÄØL ÏÌ ÍªË NL1G @ï4gZ51 ÚG Ò2 )@ ÛjG ÒC > CÃC3 @ ÚG Å @ðÆj ÄÛA μÆ25

&4*G1L26íª6 O'Ì Íj6 ×7ÌÐj6VÏ7ÉN'Ó.G

Winter 2003 9

S w issp rot ( con t’d) 'òñ;ó ñô.õö [÷Qöø (^) [ù÷#ú*÷Zú b úûû a ú<[ Q)ü bý (^) û _Vü7Q (^) [1ûV÷ ö1þ'òñ;ó ñúõiõö (^) ÷[øùû.ÿ b (^) ûü (^) ûb (^) ùîú õiõö (^) ÷[$÷Zú 'Q_Q (^) úûa!Q)ôböμb (^) ÷a÷àb1ö1aêb'ibúV÷ê(^) ùb÷öÖa (^) ûü÷û a$ô.ü'Q b úV÷ öýû _üûõü (^) úQüàbö1a ÷ö ûa i^ b ' a÷Zú õ.ô (^) ÷aû iQ)öa2þ 'òñ;ó ñú÷÷0bü (^) ÷[ ø[QèQ (^) [2ù1û ü (^) úPú ûûa új[ Q)ü bý (^) û _ü'Q (^) [1ûV÷ öú ^ý<Q iõ (^) ÷öúc&þ a1ü (^) ûi ýñ#þ a1ü _÷üú!1ô"μVõ iþ a1ü ü'Q (^) ú÷[1û  (^) ú!#$ (^) ú%'ú ûû a%új[ Qü bý (^) û$þ " (^) úûû a ú<[ Qü bý (^) û ü Q[1û ÷öúV÷ ý2ö7b .þ ô[ úV÷ ýö'b &' ô[ ùb÷ö ()  (^) ú ý$<Q iõ* (^) ÷ö iû<[ b úõiõö÷[þ ô[ùb÷ö + ýb, bî(^) ùb÷öî^ˆb (^) ÷a÷'c;þ ô[ùb÷ö -) aû[bî(^) ùb÷öî^ ibúV÷'c;þ ô[ )Qa (^) %üûVú . _ üü'Q (^) ÷aQö (^) û bü iQÿ ÷àb(^) ÷a2þ ô[ a÷#ú õ*]ô (^) ÷a/01 (^) ÷ö[1û ü' (^) ùb÷ö$^ý b, b ñZa (^) û[bc ^ªQ (^) [û ö[÷ b<c&þ ô[ 'Qö1ô (^) ÷[ 2 (^) ú ñ (^43) û ^÷ö$ü (^) ûôþ!Vc&þ ô[ 'Qö1ô (^) ÷[ . (^) û ñ (^43) ú ^÷ö$ü (^) ûôþ!Vc&þ ú 5ðú û5 õûöû )b'b6##7"8$a^ #!#Va1a*^ ûü'^  bü (^) úVú jô (^) [ô0b (^) ô÷öý1 (^) ú5 ÷#û55 ú _" (^) û ô 5 ýúû " 5 5ù ü 5 ú _ü bü ûö]ü b 5 a2 (^) ü übûbûb÷ô[1û9 "!a 5 a1ö1a (^) ûô 5 bý (^) ö ÷üù[$÷ ü7_ ý._.ýèô (^) úö7b_  (^) ÷ô b 5 ýôjý (^) ÷üý÷b ÷_.ý1b^ ûj[ 5 [ a]üü^ ú5 ú bý^ ú bô 'a^5 )ù*5 ÷ü!_Vôü^ ûýa^ 0^ ' _jbý^ úù ":^ ö$ü7ý^5 ú a÷ þþþ

Winter 2003 10

II. XML Basics and Document Type

Descr iptor s

Winter 2003 13

XML s tr u c tu r e

Nesting tags can be used to express various structures.

E.g. A tuple (record) :

M a lc o lm A tc his o n </ n a m e> ( 2 1 5 ) 8 9 8 4 3 2 1 </ tel> m p@ d cs .gla. a c.s c </ em a il> </ per s o n >

Winter 2003 14

XML s tr u c tu r e ( c o n t. )

  • We can represent a list by using the same

tag repeatedly: ... </ per s o n > ... </ per s o n > ... </ per s o n > ... </ a d d res s es >

Winter 2003 15

T er m in o lo g y

The segment of an XML document between an opening and a corresponding closing tag is called an element.

M a lc o lm A tc his o n </ n a m e> ( 2 1 5 ) 8 9 8 4 3 2 1 </ tel> ( 2 1 5 ) 8 9 8 4 3 2 1 </ tel> m p@ d c s.gla. a c .sc </ em a il> </ per s o n >

element

element, a s u b -element no t a n element o f

Winter 2003 16

XML i s tr ee-l i k e

per s o n

n a m e tel tel em a il

M a lc o lm A tc his o n ( 2 1 5 ) 8 9 8 4 3 2 1

( 2 1 5 ) 8 9 8 4 3 2 1 m p@ d cs .gla. a c .s c

S emi s tr u c tu r ed d a ta mo d els ty p ic a lly p u t th e la b els o n th e ed g es

Winter 2003 19

T w o w a y s o f r ep r es en ti n g a D B pr o jec ts : title b u d get m a n a ged By

em plo y ees : n a m e ss n a ge

Winter 2003 20

Project and Employee relations in XML

hi jk h Olmonprq'st0k htAu t4v qkxwy!totoqm4z m>q'sn {zu t4u nz|h}0t4u tAv qk hOj~'i{q!tk€ |h }j~'i{q!t0k hO‚:yz y'{q'iƒ„ kZ n q†h}‚:yz'y'{q!i ƒ!„ k h}lmonprq'st0k hq‚Xlv n'„!q'q k hOz y‚:q k‡ n q|h }z y‚:q k hˆˆz kW‰'Š Š‹‹Œh}0ˆˆz k hy'{q kމ'ŠZh}y'{qk h}q‚Xlv n'„!q'q k

hq‚Xlv n'„!q'q k hOz y‚:q k yz im>y h}z'y‚:q k hˆˆz k '‰'Š h}0ˆˆz k hy'{q k ‰‹ h}y'{q k h}q‚Xlv n'„!q'q k hOlmonprq'st0k htAu t4v qk ‘~!trn{ ~u iq'i’'q“u sv q h}0tAu t4v qk hOj~'i{q!tk ”  h}j~'i{q!t0k hO‚:yz y'{q'iƒ„ k yz imoy h}‚:yz'y'{q'i ƒ!„ k h}lmonprq'st0k • h}i jk

P r o jec ts a nd emp lo y ees a r e i nter mi x ed

Winter 2003 21

hi jk h Olm>npoq'stoˆ'k hOlmonprq'stk ht4u tAv qkxwy!t>trqm4z6moq'sn {zu tAu nzh}0tAu t4v qk hOj~ i{q!t0k€  h}j~'i{q!tk hO‚:yz'y'{q'i ƒ!„ kZ n q†h}‚8yz'y'{q'iƒ!„k h}lmonprq'st0k hOlmonprq'stk ht4u tAv qkZ‘~!ton6{~u iq'i’!q“u sv q'ˆXh}0t4u tAv qk hOj~ i{q!t0k–”  h}j~'i{q!tk hO‚:yz'y'{q'i ƒ!„ kW yz'i moy†h }‚8yz'y'{q!iƒ„ k h}lmonprq'st0k • h}lmonprq'strˆ'k

Project and Employee relations in XML (cont’d)

hq‚Xlv n'„!q'q'ˆ'k hq‚Xlv n'„!q'q k h,z y‚:q k‡ nqh }z'y‚:q k hˆˆz kW‰!Š Š‹‹ Œh}0ˆˆz k hy'{q k–‰!ŠZh }y'{qk h}q‚Xlv n!„!q'q k hq‚Xlv n'„!q'q k h,z y‚:q kW yz im>yh }z'y‚:q k hˆˆz k— '‰'ŠZh}0ˆˆz k hy'{q k‰‹†h }y'{q k h}q‚Xl*v n!„!q'q k • hq‚Xlv n'„!q'q'ˆ'k h}i jk

E mp lo y ees f o llo w p r o jec ts

Winter 2003 22

hi jk h Olmonprq'strˆk htAu t4v qkxwy!totoqm4z m>q'sn {zu t4u nz|h}0t4u tAv qk hOj~'i{q!tk€ |h }j~'i{q!t0k hO‚:yz y'{q'iƒ„ kZ n q†h}‚:yz'y'{q!i ƒ!„ k htAu t4v qkZ‘~!trn{ ~u iq'i’'q“u sv q'ˆXh}0tAu t4v qk hOj~'i{q!tk–” |h }j~'i{q!t0k hO‚:yz y'{q'iƒ„ kW yz imoy†h }‚:yz'y'{q'i ƒ!„ k • h}lmonprq'strˆ'k

Project and Employee relations in XML (cont’d)

hq‚Xlv n'„!q'q'ˆ'k hOz y‚:q k n q h}z y‚:q k hˆˆz k ‰'Š Š‹‹Œ h}0ˆˆz k hy'{q k ‰'Š h}y'{q k hOz y‚:q k yz im>y h}z'y‚:q k hˆˆz k '‰'Š h}0ˆˆz k hy'{q k ‰*‹ h}y'{q k • h}q‚Xlv n'„!q'q'ˆ'k h}i jk

O r w ith o u t “ s ep a r a to r ” ta g s …

Winter 2003 25

W h en to u s e a ttr i b u tes

It’s not always clear when to use attributes

; ˜M9JGT]EL T*T L] œ ,¸ ,¹º¢9¬€§\»,¼,¦OŸ =

; L$I\NPMB= ·,?

I™½8< MHK ;BF LI9NPM7=

;:M\NPIH< K>=

cNPI™$LO±¾`O™ T$?rC$IHJJšI7? I™?T*™ ;BF'M\NPIH< K>= ?A?A?

;BF ˜$MHJGT ]ELH=

; ˜M9JGT]ELH=

;8TTL$]^= ¸\¹€¢9¬¿§O»¼O¦ ;BF TT L$]g= ; L$I\NPMB= ·,?

I™½8< MHK ;BF LI9NPM7=

;:M\NPIH< K>=

cNPI™$LO±¾`O™ T$?rC$IHJJšI7? I™?T*™ ;BF'M\NPIH< K>= ?A?A? ;BF ˜$MHJGT ]ELH=

Winter 2003 26

U s in g I D s

ÀEÁAÂ$ÃZÄ ÅÆ,Ç À^ÈÉ$ÊAËÌÍÎÄ ÏOÐgÑAҚÂ$Í É$ÑÓÃÌ ÔÕ ÉÊÐgÑÆÂ$ÊrÆ ÑgÁAÂÔÕÉ$ÊÐgÑrҚÌÕ$ÍÑ!Ç À^Í Â$ÍÉ,Ç¿Ö!ÂÍ ÉW×\Ì$ÉPÀ\Ø'Í Â$ÍÉ,Ç À\Ø'ÈÉ$ÊAËÌÍ,Ç À^ÈÉ$ÊAËÌÍ–Ä ÏOÐgÑrҚÌÕÍÑgÙÕ$Ä ÅÏʚÉÍ,ÐgÑrҚÂ$Í É ÒšÂ ÙÚ Ñ!Ç À^Í Â$ÍÉ,Ç¿ÖÌÕ$͗×\ÌÉPÀ\Ø'Í Â$ÆÉ,ÇÛÀ^ÃÌ ÔÕ É$ÊÜØÇ À\Ø'ÈÉ$ÊAËÌÍ,Ç À^ÈÉ$ÊAËÌÍ–Ä ÏOÐgÑÆÂÊrÆ Ñ^ÙÕ$Ä ÅÏÊAÉ$ÍÐgÑ4ҚÂ$Í É¡ÒG ÙÚ Ñ!Ç À^Í Â$ÍÉ,ǨÝ:Â$ÊrÆW×\Ì$ÉPÀ\Ø'Í Â$ÍÉ,Ç À\Ø'ÈÉ$ÊAËÌÍ,Ç À^ÈÉ$ÊAËÌÍÎÄ Ï\ÐgÑ4Қ ÙÚ ÑÛÃ†Ì ÔÕ É$ÊÐgÞÍÂ$ÊoÆ ÑgÁAÂÔÕ É$ÊÐgÑrҚÌÕ$Í$Ñ!Ç À^Í Â$ÍÉ,Ç¿Ö!ÂÙ*Ú|×9ÌÉPÀ\Ø'Í ÂÍÉ,Ç À\Ø'ÈÉ$ÊAËÌÍ,Ç À\ØÁAÂ$Ã‡Ä ÅÆ,Ç

Winter 2003 27

ODL schema

ßà á,ââ Ý:Ì ãÄ É äæåç$èå9é$è (^) Ý:Ì ãÄ ÉËêBë åì (^) ÔÄ ÔÅ Éîí ï á èèðñ^ òOóèå^ ËÔÊÄ Íô‡ÔÄ ÔÅ É$õ áè0èðñ^ òOóèå^ ËÔÊGÄ Íô–ÏÄ ÊAÉ ÙÔGÌÊõ ðšå (^) àáè!ñ öEé (^) â÷ ñø ËÉÔÀHù\ÙÔGÌÊǾÙ ËÔË ñéúåHð (^) âå ù\ÙÔGÌÊûû ÙÔGÉ Ïü,ýÍ$õ áè0èðñ^ òOóèå^ ÄÍÔ¥þÿ Ïô$ÉÔõ õ

ßà á,ââ ù\ÙÔGÌÊ äæåç$èå9é$è (^) ù\ÙÔGÌʚËêEë å$ì (^) ÍÂ$ÃÉ í ï á èèðñ^ òOóèå^ ËÔÊÄ Íô—ÍÂÍÉ$õ ðšå (^) àáè!ñ öEé (^) â÷ ñø ËÉÔÀ:Ý¥Ì ãÄ É,Ç ÂÙÔGÉ Ïü,ýÍ ñéúåHð (^) âå Ý:Ì ã$Ä ÉOûGûÜÙ' ËÔË'õ áè0èðñ^ òOóèå^ ÄÍÔB ôÉ$õ áè0èðñ^ òOóèå^ ËÉ*ÔÀBËÔÊÄ Íô\DzÏÄ ÊAÉ ÙÔGÉÏõ õ

Winter 2003 28

A n ex amp l e

        !#"$% &('#)+-,  &+./    0 1+234 506875 0 9$;:< 5&++=?>>>. 0 @+234 506 23"+= A6BDC6ED3"F"(G<H./23"+=I  5J+'5 6KML(LN@LL(L.MJ+5' / .3        PO  8,04"+' 5&5Q++"016./   0 1+234 5068RD 5TS 5Q(&.D 0 4+2<1 D0I 23"+= A6BDC6ED3"(OT"+UT"O K./23"+=/  5J+'5 6K(MLN@LL(L.MJ+5' / .3        VG5  8WH ( D&+"&(23./    0 1+234 5068,"+'D "0YX 0 19Z+. 0 4234 506 23"+= A6BDC6ED3"F"+[5<H./23"+=I  5J+'5 6ULN@L(LL.35J+5'( 6 .3   \

"+2<4 D0  3"<  &(" 8," ( -7( ].3&("  "+2<9+^>& AIB5C6E %VVG a([5b ."+234+%^*>_& ."+2<4 D0I "+2<4 D0  3"(O< c&(" cd(+"(&TS 5&&(0e].3&+"F "+234+^*>_& AIBDC/E VO Uf< .<"234+^*>& "+'5g([."+' ."+2<4 D0I "+2<4 D0  3"+G5< &+" c>9"&h("&5&+&.3&("  "+2<4+%^>& A6BDC6E  PGi(< ." 2 <4+ ^>& ."+2<4 D0I
.

Winter 2003 31

S p eci f y in g t he st r u ct u r e

The structure of a person entry can be specified by

name, greet?, ad d r* , ( tel | fax ) * , emai l *

This is known as a regular expression. Why is it important?

Winter 2003 32

R eg u lar E x p r essi o n s

Each regular expression determines a corresponding finite state automaton. Let’s start with a simpler example:

name, ad d r* , emai l

This suggests a simple parsing program

name

ad d r

emai l

Winter 2003 33

A n o t her ex amp l e

name,ad d res s * ,( tel | fax ) * ,emai l *

name

ad d res s

tel

tel

fax

fax

emai l

emai l

Adding in the optional greet f u r ther complicates things

email

Winter 2003 34

A DTD for the address book

< !D O C T Y P E ad d res s b o o k [ < !E L E M E N T ad d res s b o o k (p ro jec t * )> < !E L E M E N T p ro jec t (n ame, g reet? , ad d res s * , (fax | tel) * , email* ) > < !E L E M E N T n ame (# P C D A T A )> < !E L E M E N T g reet (# P C D A T A )> < !E L E M E N T ad d res s (# P C D A T A )> < !E L E M E N T tel (# P C D A T A )> < !E L E M E N T fax (# P C D A T A )> < !E L E M E N T email (# P C D A T A )> ]>

Winter 2003 37

R ec u rsi v e DTDs

€‚Fƒ„c ;† ‡ˆ®¥Y–8¨Ÿ–¦H ”c¥cž¯Œ €-ˆYˆ8Žsˆ88 °¥ –H¨–¦H ”c¥cžx‘ ’Ÿ–8“/™(”K¨¢c 5¡ €-ˆYˆ8Žsˆ88 ±’–H“I™(”K¨x‘ ¨Ÿ¦ ›§–cš ‰Y¦Ÿ˜–cƒ²3©K£ “˜³Ÿš ’Ÿ–8“I™”K¨Ÿš ´<´›§”˜³–H“ ’Ÿ–8“I™”K¨μ ¡ ´<´ ²/¦Ÿ˜³–H“

««« ¶¡

What is the problem with this?

Winter 2003 38

R ec u rsi v e DTDs c on t’ d.

€‚Fƒ„c ;† ‡ˆ®¥Y–8¨Ÿ–¦H ”c¥cž¯Œ €-ˆYˆ8Žsˆ88 °¥ –H¨–¦H ”c¥cžx‘ ’Ÿ–8“/™(”K¨¢c 5¡ €-ˆYˆ8Žsˆ88 ±’–H“I™(”K¨x‘ ¨Ÿ¦ ›§–cš ‰Y¦Ÿ˜–cƒ²3©K£ “˜³Ÿš ’Ÿ–8“I™”K¨·Dš ´<´›§”˜³–H“ ’Ÿ–8“I™”K¨·¸ 5¡ ´<´V²6¦˜³–H“

««« ¶¡

What is now the problem with this?

Winter 2003 39

Some things are hard to specify

Each – Y›œ’ ”žŸ–– element is to contain ¨ Ÿ¦ ›§– , ¦ ¥ – and ™ ™*¨ elements in some order.

€-ˆYˆYŽˆY8 °–Y›œ’ ”žŸ–– ‘`‘ ¨¦Y›§–cš‹¦¥ –cš;™5™¨ ¹¬K‘<¦¥c–cš;™5™¨ŸšV¨¦Y›§–c ¹¬ ‘3™5™¨ŸšV¨¦Y›§–š;¦¥Y–c ¹¬ (^) ««« 5¡

Suppose there were many more fields!

Winter 2003 40

S u m m ary of X M L reg u lar ex p ressi on s

  • A The tag A occurs
  • e1,e2 The expression e1 followed by e
  • e* 0 or more occurrences of e
  • e? Optional -- 0 or 1 occurrences
  • e+ 1 or more occurrences
  • e1 | e2 either e1 or e
  • (e) grouping