Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Classifiers for parsing, Study Guides, Projects, Research of Natural Language Processing (NLP)

The University of Manchester Natural Language Processing (NLP)

There is a large number of classifiers that can be used for generating a parse model; i.e., as an oracle for guiding data- driven parsers when parsing natural languages. In this paper we present a general and simple approach for generating a parse model. Additionally, we present a large number of experiments on various classifiers. We also present the effect of various parse models that are generated from different classifiers on a data-driven parser to see they way each model contributes

Typology: Study Guides, Projects, Research

2014/2015

Uploaded on 06/05/2015

unknown user 🇬🇧

1 / 12

This page cannot be seen from the preview

Don't miss anything!

The Selection of a Classifier for a

Data-driven Parser

Author One1, Author22

1Institution, Department, Address, City, Country

2Institution, Department, Address, City, Country

Abstract. There is a large number of classifiers that can be used

for generating a parse model; i.e., as an oracle for guiding data-

driven parsers when parsing natural languages. In this paper we

present a general and simple approach for generating a parse model.

Additionally, we present a large number of experiments on various

classifiers. We also present the effect of various parse models that

are generated from different classifiers on a data-driven parser to see

they way each model contributes to parsing performance.

1 Introduction

The objective of this study is to present an approach for generating dif-

ferent parse models, which are used for guiding parsers during natural language

parsing, from different machine learning classifiers. There are various classifica-

tion algorithms that can be used for this purpose. However, each classifier may

learn from a set of data in different ways, which means that they may affect

parsing performance in different ways. In Section 3 we present a data-driven

parser that we have used for evaluating different parse models that are generated

from different classifiers. In Section 4 we show a simple approach for generating

a parse model from the J48 classifier while in Section 6.1 we show the accuracy

of a large number of classifiers. Section 6.2 covers the effect of each parse model

on parsing performance. Finally, in Section 7 we compare our parser with the

arc-standard algorithm of MaltParser.

Discover Study Guides, Projects, Research of Natural Language Processing (NLP) The University of Manchester

Partial preview of the text

Download Classifiers for parsing and more Study Guides, Projects, Research Natural Language Processing (NLP) in PDF only on Docsity!

The Selection of a Classifier for a

Data-driven Parser

Author One

, Author

Institution, Department, Address, City, Country

Abstract. There is a large number of classifiers that can be used

for generating a parse model; i.e., as an oracle for guiding data-

driven parsers when parsing natural languages. In this paper we

present a general and simple approach for generating a parse model.

Additionally, we present a large number of experiments on various

classifiers. We also present the effect of various parse models that

are generated from different classifiers on a data-driven parser to see

they way each model contributes to parsing performance.

1 Introduction

The objective of this study is to present an approach for generating dif-

ferent parse models, which are used for guiding parsers during natural language

parsing, from different machine learning classifiers. There are various classifica-

tion algorithms that can be used for this purpose. However, each classifier may

learn from a set of data in different ways, which means that they may affect

parsing performance in different ways. In Section 3 we present a data-driven

parser that we have used for evaluating different parse models that are generated

from different classifiers. In Section 4 we show a simple approach for generating

a parse model from the J48 classifier while in Section 6.1 we show the accuracy

of a large number of classifiers. Section 6.2 covers the effect of each parse model

on parsing performance. Finally, in Section 7 we compare our parser with the

arc-standard algorithm of MaltParser.

2 Dataset

We have used the Penn Arabic Treebank (PATB) (Maamouri and Bies,

2004) part 1 version 3 for evaluating various classifiers. We have also used

this dataset for training and testing our data-driven parser, which is a re-

implementation of the arc-standard version of MaltParser (Nivre et al., 2010;

Kuhlmann and Nivre, 2010; Nivre et al., 2006), and the arc-standard algorithm.

We have converted the phrase structure trees of the PATB to dependency

structure trees using the standard conversion algorithm for transforming phrase

structure trees to dependency trees, as described by Xia and Palmer (2001).

In order to perform a 5-fold validation, we have systematically generated

five sets of testing data and five sets of training data from the treebank, where

the testing data is not part of the training data. The training data contains

approximately 4853 sentences. The average length of sentences is 29 words and

the total number of testing sentences in each fold is about 970 sentences.

3 A Basic Shift-Reduce Parser

Our parser is based on the arc-standard algorithm of MaltParser (Kuhl-

mann and Nivre, 2010). This algorithm deterministically generates dependency

trees using two data-structures: a queue of input words, and a stack of items

that have been looked at by the parser. Three parse actions are applied to the

queue and the stack: SHIFT, LEFT-ARC and RIGHT-ARC (we will write LA

and RA for LEFT-ARC and RIGHT-ARC respectively to save space). SHIFT

moves the head of the queue onto the top of the stack, LA makes the head of

the queue a parent of the topmost item on the stack and pops this item from the

stack, and RA makes the topmost item on the stack a parent of the head of the

queue; RA removes the head of the queue and moves the topmost item on the

stack back to the queue. MaltParser uses a support vector machine classifier for

generating a parse model from a set of parsed trees, which is used for predicting

the next parse action given the current state of the parser.

We re-implement the arc-standard algorithm so that we can train it on

different classifiers and also run it non-deterministically. We will call our parser

NDParser. At each parse step, we generate a state for LA, RA, and SHIFT, and

we will assign different scores to each state. A score is computed for each newly

generated state by computing two different scores: (i) a score that is based on

the recommendation made by a parse model. For example, for a SHIFT state

the parser gives a score of 1 if a SHIFT operation is recommended by the model.

Otherwise a score of 0 is given (and the same applies to LA and RA). (ii) the

score of the given state (which is the state that the new parse state is generated

from). The sum of these two scores is assigned to the newly generated state.

The advantage of assigning a score to each parse state is that we can rank

a collection of parse states by using their scores and then process the state with

situation – for instance, in a situation like in step 6 in Fig. 2 the parser should

use SHIFT instead of RA for the reason explained above.

Dependency relations: (sat>cat) (sat>on) (cat>the) (on>mat) (mat>the)

Steps Action Queue Stack Arcs

1 θ [the,cat,sat,on,the,mat] [] θ 2 SHIFT [cat,sat,on,the,mat] [the] θ 3 LA [cat,sat,on,the,mat] [] A1=(cat>the) 4 SHIFT [sat,on,the,mat] [cat] A 5 LA [sat,on,the,mat] [] A2=A1∪(sat>cat) 6 SHIFT [on,the,mat] [sat] A 7 SHIFT [the,mat] [on,sat] A 8 SHIFT [mat] [the,on,sat] A 9 LA [mat] [on,mat] A3=A2∪(mat>the) 10 RA [on] [sat] A4=A3∪(on mat) 11 RA [sat] [] A5=A4∪(sat>on) 12 SHIFT [] [sat] A 13 θ [] [sat] A

Fig. 2: Action sequence for parsing the sentence ‘the cat sat on the mat’.

Given a set of such data-points, it is possible to extract and record the

parse states and train a classifier for building a parse model, which can be used

for predicting parse operation; i.e., it can be used for guiding the parser. The

task here is to classify intermediate states of the parser into three groups: cases

where SHIFT should be performed, cases where LA should be performed, and

cases where RA should be performed.

Step 3. Preparing recorded parse states for classification: from the set of

parse states that we obtain in step 2, we populate an .arff file with the correct

data format, i.e., the format that is accepted by WEKA. An example of a set of

WEKA-style data format is shown in Fig. 3, which is based on the parse states

shown in Fig. 2. Here we have extracted the word forms as a feature for learning

but it is possible to use a number of different features (such as POS tags, word

position etc.) as values for the queue and the stack attribute parameters.

@relation states @attribute queue_word_pos_1{‘the’,‘cat’,‘sat’,‘on’,‘mat’,‘-’} @attribute queue_word_pos_2{‘cat’,‘sat’,‘on’,‘the’,‘mat’, ‘-’} @attribute stack_word_pos_1{‘-’, ‘the’,‘cat’,‘sat’,‘on’} @attribute stack_word_pos_2{‘-’,‘sat’,‘on’} @attribute parse_action{‘SHIFT’, ‘LEFT-ARC’, ‘RIGHT-ARC’} @data ‘the’, ‘cat’, ‘-’, ‘-’, ‘SHIFT’ ‘cat’, ‘sat’, ‘the’, ’-’, ‘LEFT-ARC’ ‘cat’, ‘sat’, ‘-’, ‘-’, ‘SHIFT’ ‘sat’, ‘on’, ‘cat’, ‘-’, ‘LEFT-ARC’ ‘sat’, ‘on’, ‘-’, ‘-’, ‘SHIFT’ ‘on’, ‘the’, ‘sat’, ‘-’, ‘SHIFT’ ‘the’, ‘mat’, ‘on’, ‘sat’, ‘SHIFT’ ‘mat’, ‘-’, ‘the’, ‘on’, ‘LEFT-ARC’ ‘mat’, ‘-’, ‘on’, ‘sat’, ‘RIGHT-ARC’ ‘on’, ‘-’, ‘sat’, ‘-’, ‘RIGHT-ARC’ ‘sat’, ‘-’, ‘-’, ‘-’, ‘SHIFT’

Fig. 3: An example of data for a .arff file.

Additionally, one can use many different window sizes for the queue and the

stack in the data selection as instances for the classification algorithms to learn

from. In Table 3, we use the window sizes of two items for the queue and two

items for the stack, while the dash mark (‘-’) represents an empty item where

the queue or the stack did not contain an item in the given position.

Step 4. Training a classifier using the .arff file: we supply WEKA with

the data prepared in step 3 (i.e., the .arff file) and then we select a classification

algorithm for learning. Fig. 4 is a screenshot of the J48 classification algorithm

output from WEKA. The classification rules inferred by a decision tree classifier

take the form of questions, such as Is the POS tag on the top of the stack (AB-

BREV)?, and a possible answer where the possible answer is a further question

such as (Is POS tag in the head of queue (ABBREV)?) or a classification such

as (This is the kind of situation where you should carry out a RIGHT-ARC.)

Fig. 4: A screenshot of the J48 classifier output using WEKA.

Step 5. Generating a parse model from the classification output: finally, we

convert the output produced by the classification algorithm to an appropriate

question-answer model that we can use for guiding the parser to parse new

sentences. Fig. 5 is a sample of some questions and answers we have extracted

from the J48 (Quinlan, 1992) classifier’s output.

5 Labelled Attachment Score

In this section we show the way we obtain labelled attachment scores,

which is largely different from the way this is implemented in the standard

implementation of MaltParser. As in the arc-standard algorithm, for each de-

pendency relation between two words, a semantic label is attached to indicate

the semantic role of the daughter item with its parent. However, the way we

assign labels to dependency relations during parsing is that we extract patterns

from the training data during the training phase. This contrasts with the ap-

proach used in MaltParser whereby labels are predicted with the LA and RA

We consider a classifier appropriate for producing a parse model if it meets

two requirements: (i) it produces good classification accuracy. Although the

accuracy of the classifiers that are presented in Table 1 may not directly reflect

the accuracy of a parser that uses its recommendations, a classifier that produces

a high level of accuracy is more likely to assist a parser to make more informed

parse decisions at each parse step than a classifier that produces a low level of

accuracy; and (ii) its output can be used for generating a parse model which

can be used for making recommendations to a data-driven parser, for example,

what action (SHIFT, LA, or RA) the parser should take in a specific situation.

We have used various features for training different classification algo-

rithms. These features included POS tags, word forms, word locations in sen-

tences, their spans (i.e., their start and end positions in sentences). Additionally,

we have used a combination of these features such as word forms with POS tags,

word forms with word location or word spans, and similar combination of POS

tags with other features. Also various window sizes are used for the queue and

the stack, ranging between two items to four items. The use of these features

for training each classifier along with the classification accuracy is presented in

Table 1.

During the evaluation of the classifiers, some widely used classifiers did

not yield encouraging results. For example, the LiBSVM classifier (Chang and

Lin, 2001) which is used in MaltParser did not perform well with the set of

features that we have supplied. It only managed to learn successfully from one

feature (POS tags), while the accuracy was well below the accuracy of some

of the other classifiers. The entries for LiBSVM in Table 1 are incomplete

because training takes so long (3 days per case) that future experiments seemed

infeasible. However, the fact that it produces no better classification than the

J48 classifier in the cases that we have looked at suggests that it is unlikely to

substantially outperform it in the remaining cases.

From the large number of experiments we have conducted on several classi-

fiers, we will evaluate NDParser by training it using the classification algorithms

that produced a high classification accuracy in the following section.

6.2 Evaluating NDParser with Various Classi-

fiers

As presented in Table 1 the classification accuracy varies because each

classifier learns differently from the set of training data. In this section, we in-

vestigate the effect of different classifiers on parsing. Our objective is to identify

the algorithms that help the parser perform best in terms of accuracy and speed

(We measure speed as second per dependency relation).

These experiments also highlight whether generating different parsing mod-

els by using different classifiers contribute in different ways to parsing perfor-

mance. The optimal classification of intermediate states may not necessarily

Table 1: Classification accuracy with various features and settings.

Word (%) - - - - - - - - - - - Word + location (%) - - - - - - - - - - - Word + location + span (%) - - - - - - - - - - - POS + location (%) - - - - - - - - - - - POS + location + span (%) - - - - - - - - - - - POS + span (%) - - - - - - - - - - - Word + POS (%) - - - - - - - - - - - Word + POS + location (%) - - - - - - - - - - - Word + POS + location + span (%) - - - - - - - - - - -

J
Items on Queue
Items on Stack
Word (%) 68.08 68.12 68.24 67.81 68.29 68.37 68.53 68.10 68.56 68.67 68.
Word + location (%) 72.08 71.79 71.92 72.03 72.23 71.88 71.67 72.10 72.41 72.11 71.
Word + location + span (%) 72.65 72.08 71.73 72.50 72.76 72.25 72.00 72.49 72.87 72.45 72.
Word + span (%) 70.67 70.46 70.17 70.66 70.81 70.64 70.43 70.68 70.83 70.79 70.
POS (%) 84.76 84.88 84.94 84.48 85.63 85.77 85.80 84.75 85.89 86.05 86.
POS + location (%) 85.27 85.27 85.27 85.04 85.89 85.91 85.92 85.25 86.96 86.08 86.
POS + location + span (%) 85.12 85.11 85.23 85.10 85.81 85.84 85.92 85.28 85.95 85.97 85.
POS + span (%) 84.98 85.00 85.00 84.93 85.71 85.69 85.67 85.09 85.88 85.88 85.
Word + POS (%) 85.25 85.25 85.28 85.05 86.23 86.24 86.24 85.25 86.46 86.47 86.
Word + POS + location (%) 85.93 85.93 85.83 85.77 86.57 86.53 86.45 85.84 86.63 86.57 86.
Word + POS + location + span (%) 85.90 85.84 85.83 85.89 86.48 86.54 86.47 85.94 86.49 86.55 86.
Word + POS + span (%) 85.87 85.76 85.79 85.89 86.50 86.45 86.43 85.90 86.53 86.49 86.
Items on Queue LIBSVM
Items on Stack
POS (%) 74.40 74.73 74.62 74.32 75.41 75.43 75.39 74.58 75.62 75.63 75. Word + span (%) - - - - - - - - - - - - Id Word + POS + span (%) - - - - - - - - - - -
Items on Queue
Items on Stack
Word (%) 67.85 67.71 67.65 67.64 67.94 67.77 67.68 67.74 67.97 67.79 67.
Word + location (%) 68.70 63.63 62.37 67.83 65.22 63.04 61.89 66.92 64.41 62.55 61.
Word + location + span (%) 68.61 65.65 63.69 67.77 64.85 63.18 62.26 66.89 64.23 62.79 62.
Word + span (%) 67.08 64.46 62.42 61.21 66.14 63.60 61.84 65.24 62.81 61.25 60.
POS (%) 83.15 83.04 81.64 83.74 83.41 81.78 80.54 82.31 81.47 79.70 78.
POS + location (%) 77.55 75.79 74.80 76.25 75.48 75.04 74.95 74.94 74.83 74.65 74.
POS + location + span (%) 77.57 75.71 74.51 76.31 75.42 74.92 74.83 75.13 74.85 74.63 74.
POS + span (%) 77.42 75.51 74.49 74.28 76.10 75.17 74.59 74.92 74.64 74.41 74.
Word + POS (%) 83.62 82.25 81.02 83.34 82.89 81.29 80.36 81.76 81.04 79.67 79.
Word + POS + location (%) 77.36 75.84 75.13 76.31 75.73 75.39 75.33 75.21 75.29 75.07 75.
Word + POS + location + span (%) 77.38 75.72 74.92 76.39 75.64 75.22 75.15 75.36 75.21 75.00 74.
Word + POS + span (%) 77.23 75.54 74.68 74.55 76.17 75.45 75.04 75.18 75.04 74.81 74.
Items on Queue RandomTree
Items on Stack
Word (%) 67.72 68.04 68.00 68.00 67.82 68.27 68.26 68.32 68.01 68.47 68.
Word + location (%) 71.36 70.67 69.48 70.25 71.32 70.64 69.35 68.67 71.05 70.19 69.
Word + span (%) 69.97 69.27 68.23 67.46 69.80 68.99 68.25 67.39 69.63 68.67 67. Word + location + span (%) 72.02 70.55 - - 71.83 70.27 - - 71.50 69.937 -
POS (%) 83.67 84.82 84.50 83.71 84.47 85.28 84.71 84.26 84.26 84.78 84.
POS + location (%) 83.04 82.93 80.76 79.18 82.77 81.41 80.15 78.84 81.71 80.09 78.
POS + location + span (%) 82.54 81.00 79.17 76.32 81.78 79.41 78.02 76.26 80.27 78.79 77.
POS + span (%) 83.15 81.47 79.89 79.19 81.50 80.89 78.69 77.78 80.60 79.93 77.
Word + POS (%) 84.08 85.02 84.34 83.62 84.80 85.37 84.34 83.57 84.36 84.46 83.
Word + POS + location (%) 83.33 82.06 81.23 80.10 82.02 81.84 80.62 79.17 81.24 80.27 79.
Word + POS + location + span (%) 82.91 80.68 78.92 77.35 81.75 79.59 77.50 76.33 80.48 78.09 76.
Word + POS + span (%) 82.89 81.75 79.46 78.42 81.71 80.58 77.87 77.34 80.37 78.95 76.
Items on Queue NaiveBayes
Items on Stack
Word (%) 66.99 61.83 60.13 65.70 65.95 65.48 63.37 64.57 65.06 64.68 64.
Word + location (%) 60.72 57.56 57.02 64.45 64.12 57.03 55.68 62.10 62.21 54.67 52.
Word + location + span (%) 50.95 47.39 49.98 55.98 47.29 44.51 47.60 51.12 45.28 42.79 45.
Word + span (%) 56.69 51.92 53.47 61.93 55.32 48.75 51.09 58.75 52.30 46.46 48.
POS (%) 76.58 74.79 70.42 76.95 76.78 76.19 74.02 75.40 76.01 75.38 74.
POS + location (%) 73.49 66.60 64.05 74.76 74.70 71.13 67.13 71.35 72.39 70.68 67.
POS + location + span (%) 65.37 59.14 58.43 69.06 65.72 58.22 57.11 63.75 61.27 55.49 54.
POS + span (%) 69.93 62.92 61.15 71.68 70.93 65.13 61.31 67.19 67.62 63.37 59.
Word + POS (%) 76.72 71.35 66.00 73.84 74.67 73.66 71.04 70.76 72.12 72.21 71.
Word + POS + location (%) 73.52 64.61 62.25 71.93 72.50 69.20 63.57 68.08 69.13 67.89 62.
Word + POS + location + span (%) 65.40 58.09 57.62 67.91 64.58 56.72 55.55 62.92 59.78 53.95 52.
Word + POS + span (%) 69.66 61.49 59.98 69.87 69.69 62.84 59.08 65.67 65.33 60.50 56.
Items on Queue DecisionStump
Items on Stack
Word (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
Word + location (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
Word + location + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
Word + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
POS (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
POS + location (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
POS + location + span (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
POS + span (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
Word + POS (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
Word + POS + location (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
Word + POS + location + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
Word + POS + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.

Classifiers for parsing, Study Guides, Projects, Research of Natural Language Processing (NLP)

Related documents

Partial preview of the text

Download Classifiers for parsing and more Study Guides, Projects, Research Natural Language Processing (NLP) in PDF only on Docsity!

The Selection of a Classifier for a

Data-driven Parser

Author One

, Author

Institution, Department, Address, City, Country

Institution, Department, Address, City, Country

Abstract. There is a large number of classifiers that can be used

for generating a parse model; i.e., as an oracle for guiding data-

driven parsers when parsing natural languages. In this paper we

present a general and simple approach for generating a parse model.

Additionally, we present a large number of experiments on various

classifiers. We also present the effect of various parse models that

are generated from different classifiers on a data-driven parser to see

they way each model contributes to parsing performance.

1 Introduction

The objective of this study is to present an approach for generating dif-

ferent parse models, which are used for guiding parsers during natural language

parsing, from different machine learning classifiers. There are various classifica-

tion algorithms that can be used for this purpose. However, each classifier may

learn from a set of data in different ways, which means that they may affect

parsing performance in different ways. In Section 3 we present a data-driven

parser that we have used for evaluating different parse models that are generated

from different classifiers. In Section 4 we show a simple approach for generating

a parse model from the J48 classifier while in Section 6.1 we show the accuracy

of a large number of classifiers. Section 6.2 covers the effect of each parse model

on parsing performance. Finally, in Section 7 we compare our parser with the

arc-standard algorithm of MaltParser.

2 Dataset

We have used the Penn Arabic Treebank (PATB) (Maamouri and Bies,

2004) part 1 version 3 for evaluating various classifiers. We have also used

this dataset for training and testing our data-driven parser, which is a re-

implementation of the arc-standard version of MaltParser (Nivre et al., 2010;

Kuhlmann and Nivre, 2010; Nivre et al., 2006), and the arc-standard algorithm.

We have converted the phrase structure trees of the PATB to dependency

structure trees using the standard conversion algorithm for transforming phrase

structure trees to dependency trees, as described by Xia and Palmer (2001).

In order to perform a 5-fold validation, we have systematically generated

five sets of testing data and five sets of training data from the treebank, where

the testing data is not part of the training data. The training data contains

approximately 4853 sentences. The average length of sentences is 29 words and

the total number of testing sentences in each fold is about 970 sentences.

3 A Basic Shift-Reduce Parser

Our parser is based on the arc-standard algorithm of MaltParser (Kuhl-

mann and Nivre, 2010). This algorithm deterministically generates dependency

trees using two data-structures: a queue of input words, and a stack of items

that have been looked at by the parser. Three parse actions are applied to the

queue and the stack: SHIFT, LEFT-ARC and RIGHT-ARC (we will write LA

and RA for LEFT-ARC and RIGHT-ARC respectively to save space). SHIFT

moves the head of the queue onto the top of the stack, LA makes the head of

the queue a parent of the topmost item on the stack and pops this item from the

stack, and RA makes the topmost item on the stack a parent of the head of the

queue; RA removes the head of the queue and moves the topmost item on the

stack back to the queue. MaltParser uses a support vector machine classifier for

generating a parse model from a set of parsed trees, which is used for predicting

the next parse action given the current state of the parser.

We re-implement the arc-standard algorithm so that we can train it on

different classifiers and also run it non-deterministically. We will call our parser

NDParser. At each parse step, we generate a state for LA, RA, and SHIFT, and

we will assign different scores to each state. A score is computed for each newly

generated state by computing two different scores: (i) a score that is based on

the recommendation made by a parse model. For example, for a SHIFT state

the parser gives a score of 1 if a SHIFT operation is recommended by the model.

Otherwise a score of 0 is given (and the same applies to LA and RA). (ii) the

score of the given state (which is the state that the new parse state is generated

from). The sum of these two scores is assigned to the newly generated state.

The advantage of assigning a score to each parse state is that we can rank

a collection of parse states by using their scores and then process the state with

Dependency relations: (sat>cat) (sat>on) (cat>the) (on>mat) (mat>the)

Steps Action Queue Stack Arcs

Additionally, one can use many different window sizes for the queue and the

stack in the data selection as instances for the classification algorithms to learn

from. In Table 3, we use the window sizes of two items for the queue and two

items for the stack, while the dash mark (‘-’) represents an empty item where

the queue or the stack did not contain an item in the given position.

Step 4. Training a classifier using the .arff file: we supply WEKA with

the data prepared in step 3 (i.e., the .arff file) and then we select a classification