Revised Pseudo Code for Non-probabilistic CKY Algorithm & Its Application to a Sentence, Lab Reports of Linguistics

The revised pseudo code for the non-probabilistic cky algorithm, which is a parsing algorithm used in natural language processing to identify the constituents of a given sentence. The document also includes an example of how to apply this algorithm to the sentence 'snow in oslo snores' using the provided grammar. The chart created during the algorithm's execution is shown step by step, illustrating how new constituents are built from existing ones.

Typology: Lab Reports

Pre 2010

Uploaded on 03/10/2009

koofers-user-q8g
koofers-user-q8g ๐Ÿ‡บ๐Ÿ‡ธ

5

(1)

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Ling 472 Lab, November 5, 2004
Revised pseudo code for the (non-probabilistic) CKY algorithm:
Create and clear chart[#words, #words]
for i ๎˜ 1 to #words
chart[i, i] ๎˜ {
๎˜€
|
๎˜€
๎˜‚ inputi}
for span ๎˜ 2 to #words
for begin ๎˜ 1 to #words โ€“ span + 1
end ๎˜ begin + span โ€“ 1
for m ๎˜ begin to end โ€“ 1
if (
๎˜€
๎˜‚
๎˜
1
๎˜
2 ๎˜ P ๎˜‚
๎˜
1 ๎˜ chart[begin, m] ๎˜‚
๎˜
2 ๎˜ chart[m + 1, end] then
chart[begin, end] = chart[begin, end] ๎˜ƒ {
๎˜€
}
Step through the (non-probabilistic) CKY algorithm, using this grammar:
S ๎˜‚ NP VP
S ๎˜‚ Aux S
VP ๎˜‚ V S
VP ๎˜‚ V NP
VP ๎˜‚ VP PP
NP ๎˜‚ Det N
NP ๎˜‚ NP PP
NP ๎˜‚ Waikiki
NP ๎˜‚ Oslo
NP ๎˜‚ Kim
NP ๎˜‚ snow
PP ๎˜‚ P NP
PP ๎˜‚ P S
V ๎˜‚ adores
VP ๎˜‚ snores
Aux ๎˜‚ does
Aux ๎˜‚ can
Aux ๎˜‚ is
pf3
pf4

Partial preview of the text

Download Revised Pseudo Code for Non-probabilistic CKY Algorithm & Its Application to a Sentence and more Lab Reports Linguistics in PDF only on Docsity!

Ling 472 Lab, November 5, 2004

Revised pseudo code for the (non-probabilistic) CKY algorithm:

Create and clear chart [ #words , #words ]

for i  1 to #words

chart [ i , i ]  { |  inputi }

for span  2 to #words

for begin  1 to #words โ€“ span + 1

end  begin + span โ€“ 1

for m  begin to end โ€“ 1

if ( 

 (^1)



2 ^ P^ 



1 ^ chart [ begin , m ] ^



2  chart [ m + 1, end ] then

chart[begin, end] = chart[begin, end]  { }

Step through the (non-probabilistic) CKY algorithm, using this grammar:

S  NP VP S  Aux S

VP  V S VP  V NP VP  VP PP

NP  Det N NP  NP PP

NP  Waikiki NP  Oslo NP  Kim NP  snow

PP  P NP PP  P S

V  adores VP  snores

Aux  does Aux  can Aux  is

P  in P  on P  before

Det  this Det  these Det  the

Use this sentence:

Snow in Oslo snores 1 2 3 4

First, start out with a chart with the appropriate cells. Each one corresponds to a substring of the input string:

The first loop:

for i  1 to #words chart [ i , i ]  { |  inputi }

This fills in the chart with pre-terminals.

So we go through i = 1 to i = 4; for each of these, we put an element in the corresponding cell in the chart for each preterminal that expands to that input. We end up with a chart that looks like this:

1 NP

2 P

3 NP

4 VP

In the next set of nested loops, we build new constituents out of existing ones. Each time we execute the innermost loop, we are looking at two potential daughters and seeing if they form a constituent. If they do, we add that constituent to the appropriate place in the chart. The loops have these variables:

In the final iteration, weโ€™re building constituents of length 4, so span will be 4. begin can just be 1. end can only be 4. m can range from 1 to 3.

we look at 1,1 (NP) and 2,4 (PP) and add NP to 1, we look at 1,2 and donโ€™t find anything we look at 1,3 (NP) and 4,4 (VP) and add S to 1,

We end up with this table:

1 NP NP NP, S

2 P PP PP

3 NP S

4 VP