Classifiers for parsing, Study Guides, Projects, Research of Natural Language Processing (NLP)

There is a large number of classifiers that can be used for generating a parse model; i.e., as an oracle for guiding data- driven parsers when parsing natural languages. In this paper we present a general and simple approach for generating a parse model. Additionally, we present a large number of experiments on various classifiers. We also present the effect of various parse models that are generated from different classifiers on a data-driven parser to see they way each model contributes

Typology: Study Guides, Projects, Research

2014/2015

Uploaded on 06/05/2015

unknown user
unknown user 🇬🇧

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
The Selection of a Classifier for a
Data-driven Parser
Author One1, Author22
1Institution, Department, Address, City, Country
2Institution, Department, Address, City, Country
Abstract. There is a large number of classifiers that can be used
for generating a parse model; i.e., as an oracle for guiding data-
driven parsers when parsing natural languages. In this paper we
present a general and simple approach for generating a parse model.
Additionally, we present a large number of experiments on various
classifiers. We also present the effect of various parse models that
are generated from different classifiers on a data-driven parser to see
they way each model contributes to parsing performance.
1 Introduction
The objective of this study is to present an approach for generating dif-
ferent parse models, which are used for guiding parsers during natural language
parsing, from different machine learning classifiers. There are various classifica-
tion algorithms that can be used for this purpose. However, each classifier may
learn from a set of data in different ways, which means that they may affect
parsing performance in different ways. In Section 3 we present a data-driven
parser that we have used for evaluating different parse models that are generated
from different classifiers. In Section 4 we show a simple approach for generating
a parse model from the J48 classifier while in Section 6.1 we show the accuracy
of a large number of classifiers. Section 6.2 covers the effect of each parse model
on parsing performance. Finally, in Section 7 we compare our parser with the
arc-standard algorithm of MaltParser.
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Classifiers for parsing and more Study Guides, Projects, Research Natural Language Processing (NLP) in PDF only on Docsity!

The Selection of a Classifier for a

Data-driven Parser

Author One

, Author

Institution, Department, Address, City, Country

Institution, Department, Address, City, Country

Abstract. There is a large number of classifiers that can be used

for generating a parse model; i.e., as an oracle for guiding data-

driven parsers when parsing natural languages. In this paper we

present a general and simple approach for generating a parse model.

Additionally, we present a large number of experiments on various

classifiers. We also present the effect of various parse models that

are generated from different classifiers on a data-driven parser to see

they way each model contributes to parsing performance.

1 Introduction

The objective of this study is to present an approach for generating dif-

ferent parse models, which are used for guiding parsers during natural language

parsing, from different machine learning classifiers. There are various classifica-

tion algorithms that can be used for this purpose. However, each classifier may

learn from a set of data in different ways, which means that they may affect

parsing performance in different ways. In Section 3 we present a data-driven

parser that we have used for evaluating different parse models that are generated

from different classifiers. In Section 4 we show a simple approach for generating

a parse model from the J48 classifier while in Section 6.1 we show the accuracy

of a large number of classifiers. Section 6.2 covers the effect of each parse model

on parsing performance. Finally, in Section 7 we compare our parser with the

arc-standard algorithm of MaltParser.

2 Dataset

We have used the Penn Arabic Treebank (PATB) (Maamouri and Bies,

2004) part 1 version 3 for evaluating various classifiers. We have also used

this dataset for training and testing our data-driven parser, which is a re-

implementation of the arc-standard version of MaltParser (Nivre et al., 2010;

Kuhlmann and Nivre, 2010; Nivre et al., 2006), and the arc-standard algorithm.

We have converted the phrase structure trees of the PATB to dependency

structure trees using the standard conversion algorithm for transforming phrase

structure trees to dependency trees, as described by Xia and Palmer (2001).

In order to perform a 5-fold validation, we have systematically generated

five sets of testing data and five sets of training data from the treebank, where

the testing data is not part of the training data. The training data contains

approximately 4853 sentences. The average length of sentences is 29 words and

the total number of testing sentences in each fold is about 970 sentences.

3 A Basic Shift-Reduce Parser

Our parser is based on the arc-standard algorithm of MaltParser (Kuhl-

mann and Nivre, 2010). This algorithm deterministically generates dependency

trees using two data-structures: a queue of input words, and a stack of items

that have been looked at by the parser. Three parse actions are applied to the

queue and the stack: SHIFT, LEFT-ARC and RIGHT-ARC (we will write LA

and RA for LEFT-ARC and RIGHT-ARC respectively to save space). SHIFT

moves the head of the queue onto the top of the stack, LA makes the head of

the queue a parent of the topmost item on the stack and pops this item from the

stack, and RA makes the topmost item on the stack a parent of the head of the

queue; RA removes the head of the queue and moves the topmost item on the

stack back to the queue. MaltParser uses a support vector machine classifier for

generating a parse model from a set of parsed trees, which is used for predicting

the next parse action given the current state of the parser.

We re-implement the arc-standard algorithm so that we can train it on

different classifiers and also run it non-deterministically. We will call our parser

NDParser. At each parse step, we generate a state for LA, RA, and SHIFT, and

we will assign different scores to each state. A score is computed for each newly

generated state by computing two different scores: (i) a score that is based on

the recommendation made by a parse model. For example, for a SHIFT state

the parser gives a score of 1 if a SHIFT operation is recommended by the model.

Otherwise a score of 0 is given (and the same applies to LA and RA). (ii) the

score of the given state (which is the state that the new parse state is generated

from). The sum of these two scores is assigned to the newly generated state.

The advantage of assigning a score to each parse state is that we can rank

a collection of parse states by using their scores and then process the state with

situation – for instance, in a situation like in step 6 in Fig. 2 the parser should

use SHIFT instead of RA for the reason explained above.

Dependency relations: (sat>cat) (sat>on) (cat>the) (on>mat) (mat>the)

Steps Action Queue Stack Arcs

1 θ [the,cat,sat,on,the,mat] [] θ 2 SHIFT [cat,sat,on,the,mat] [the] θ 3 LA [cat,sat,on,the,mat] [] A1=(cat>the) 4 SHIFT [sat,on,the,mat] [cat] A 5 LA [sat,on,the,mat] [] A2=A1∪(sat>cat) 6 SHIFT [on,the,mat] [sat] A 7 SHIFT [the,mat] [on,sat] A 8 SHIFT [mat] [the,on,sat] A 9 LA [mat] [on,mat] A3=A2∪(mat>the) 10 RA [on] [sat] A4=A3∪(on mat) 11 RA [sat] [] A5=A4∪(sat>on) 12 SHIFT [] [sat] A 13 θ [] [sat] A

Fig. 2: Action sequence for parsing the sentence ‘the cat sat on the mat’.

Given a set of such data-points, it is possible to extract and record the

parse states and train a classifier for building a parse model, which can be used

for predicting parse operation; i.e., it can be used for guiding the parser. The

task here is to classify intermediate states of the parser into three groups: cases

where SHIFT should be performed, cases where LA should be performed, and

cases where RA should be performed.

Step 3. Preparing recorded parse states for classification: from the set of

parse states that we obtain in step 2, we populate an .arff file with the correct

data format, i.e., the format that is accepted by WEKA. An example of a set of

WEKA-style data format is shown in Fig. 3, which is based on the parse states

shown in Fig. 2. Here we have extracted the word forms as a feature for learning

but it is possible to use a number of different features (such as POS tags, word

position etc.) as values for the queue and the stack attribute parameters.

@relation states @attribute queue_word_pos_1{‘the’,‘cat’,‘sat’,‘on’,‘mat’,‘-’} @attribute queue_word_pos_2{‘cat’,‘sat’,‘on’,‘the’,‘mat’, ‘-’} @attribute stack_word_pos_1{‘-’, ‘the’,‘cat’,‘sat’,‘on’} @attribute stack_word_pos_2{‘-’,‘sat’,‘on’} @attribute parse_action{‘SHIFT’, ‘LEFT-ARC’, ‘RIGHT-ARC’} @data ‘the’, ‘cat’, ‘-’, ‘-’, ‘SHIFT’ ‘cat’, ‘sat’, ‘the’, ’-’, ‘LEFT-ARC’ ‘cat’, ‘sat’, ‘-’, ‘-’, ‘SHIFT’ ‘sat’, ‘on’, ‘cat’, ‘-’, ‘LEFT-ARC’ ‘sat’, ‘on’, ‘-’, ‘-’, ‘SHIFT’ ‘on’, ‘the’, ‘sat’, ‘-’, ‘SHIFT’ ‘the’, ‘mat’, ‘on’, ‘sat’, ‘SHIFT’ ‘mat’, ‘-’, ‘the’, ‘on’, ‘LEFT-ARC’ ‘mat’, ‘-’, ‘on’, ‘sat’, ‘RIGHT-ARC’ ‘on’, ‘-’, ‘sat’, ‘-’, ‘RIGHT-ARC’ ‘sat’, ‘-’, ‘-’, ‘-’, ‘SHIFT’

Fig. 3: An example of data for a .arff file.

Additionally, one can use many different window sizes for the queue and the

stack in the data selection as instances for the classification algorithms to learn

from. In Table 3, we use the window sizes of two items for the queue and two

items for the stack, while the dash mark (‘-’) represents an empty item where

the queue or the stack did not contain an item in the given position.

Step 4. Training a classifier using the .arff file: we supply WEKA with

the data prepared in step 3 (i.e., the .arff file) and then we select a classification

algorithm for learning. Fig. 4 is a screenshot of the J48 classification algorithm

output from WEKA. The classification rules inferred by a decision tree classifier

take the form of questions, such as Is the POS tag on the top of the stack (AB-

BREV)?, and a possible answer where the possible answer is a further question

such as (Is POS tag in the head of queue (ABBREV)?) or a classification such

as (This is the kind of situation where you should carry out a RIGHT-ARC.)

Fig. 4: A screenshot of the J48 classifier output using WEKA.

Step 5. Generating a parse model from the classification output: finally, we

convert the output produced by the classification algorithm to an appropriate

question-answer model that we can use for guiding the parser to parse new

sentences. Fig. 5 is a sample of some questions and answers we have extracted

from the J48 (Quinlan, 1992) classifier’s output.

5 Labelled Attachment Score

In this section we show the way we obtain labelled attachment scores,

which is largely different from the way this is implemented in the standard

implementation of MaltParser. As in the arc-standard algorithm, for each de-

pendency relation between two words, a semantic label is attached to indicate

the semantic role of the daughter item with its parent. However, the way we

assign labels to dependency relations during parsing is that we extract patterns

from the training data during the training phase. This contrasts with the ap-

proach used in MaltParser whereby labels are predicted with the LA and RA

We consider a classifier appropriate for producing a parse model if it meets

two requirements: (i) it produces good classification accuracy. Although the

accuracy of the classifiers that are presented in Table 1 may not directly reflect

the accuracy of a parser that uses its recommendations, a classifier that produces

a high level of accuracy is more likely to assist a parser to make more informed

parse decisions at each parse step than a classifier that produces a low level of

accuracy; and (ii) its output can be used for generating a parse model which

can be used for making recommendations to a data-driven parser, for example,

what action (SHIFT, LA, or RA) the parser should take in a specific situation.

We have used various features for training different classification algo-

rithms. These features included POS tags, word forms, word locations in sen-

tences, their spans (i.e., their start and end positions in sentences). Additionally,

we have used a combination of these features such as word forms with POS tags,

word forms with word location or word spans, and similar combination of POS

tags with other features. Also various window sizes are used for the queue and

the stack, ranging between two items to four items. The use of these features

for training each classifier along with the classification accuracy is presented in

Table 1.

During the evaluation of the classifiers, some widely used classifiers did

not yield encouraging results. For example, the LiBSVM classifier (Chang and

Lin, 2001) which is used in MaltParser did not perform well with the set of

features that we have supplied. It only managed to learn successfully from one

feature (POS tags), while the accuracy was well below the accuracy of some

of the other classifiers. The entries for LiBSVM in Table 1 are incomplete

because training takes so long (3 days per case) that future experiments seemed

infeasible. However, the fact that it produces no better classification than the

J48 classifier in the cases that we have looked at suggests that it is unlikely to

substantially outperform it in the remaining cases.

From the large number of experiments we have conducted on several classi-

fiers, we will evaluate NDParser by training it using the classification algorithms

that produced a high classification accuracy in the following section.

6.2 Evaluating NDParser with Various Classi-

fiers

As presented in Table 1 the classification accuracy varies because each

classifier learns differently from the set of training data. In this section, we in-

vestigate the effect of different classifiers on parsing. Our objective is to identify

the algorithms that help the parser perform best in terms of accuracy and speed

(We measure speed as second per dependency relation).

These experiments also highlight whether generating different parsing mod-

els by using different classifiers contribute in different ways to parsing perfor-

mance. The optimal classification of intermediate states may not necessarily

Table 1: Classification accuracy with various features and settings.

Word (%) - - - - - - - - - - - Word + location (%) - - - - - - - - - - - Word + location + span (%) - - - - - - - - - - - POS + location (%) - - - - - - - - - - - POS + location + span (%) - - - - - - - - - - - POS + span (%) - - - - - - - - - - - Word + POS (%) - - - - - - - - - - - Word + POS + location (%) - - - - - - - - - - - Word + POS + location + span (%) - - - - - - - - - - -

  • J
  • Items on Queue
  • Items on Stack
  • Word (%) 68.08 68.12 68.24 67.81 68.29 68.37 68.53 68.10 68.56 68.67 68.
  • Word + location (%) 72.08 71.79 71.92 72.03 72.23 71.88 71.67 72.10 72.41 72.11 71.
  • Word + location + span (%) 72.65 72.08 71.73 72.50 72.76 72.25 72.00 72.49 72.87 72.45 72.
  • Word + span (%) 70.67 70.46 70.17 70.66 70.81 70.64 70.43 70.68 70.83 70.79 70.
  • POS (%) 84.76 84.88 84.94 84.48 85.63 85.77 85.80 84.75 85.89 86.05 86.
  • POS + location (%) 85.27 85.27 85.27 85.04 85.89 85.91 85.92 85.25 86.96 86.08 86.
  • POS + location + span (%) 85.12 85.11 85.23 85.10 85.81 85.84 85.92 85.28 85.95 85.97 85.
  • POS + span (%) 84.98 85.00 85.00 84.93 85.71 85.69 85.67 85.09 85.88 85.88 85.
  • Word + POS (%) 85.25 85.25 85.28 85.05 86.23 86.24 86.24 85.25 86.46 86.47 86.
  • Word + POS + location (%) 85.93 85.93 85.83 85.77 86.57 86.53 86.45 85.84 86.63 86.57 86.
  • Word + POS + location + span (%) 85.90 85.84 85.83 85.89 86.48 86.54 86.47 85.94 86.49 86.55 86.
  • Word + POS + span (%) 85.87 85.76 85.79 85.89 86.50 86.45 86.43 85.90 86.53 86.49 86.
  • Items on Queue LIBSVM
  • Items on Stack
  • POS (%) 74.40 74.73 74.62 74.32 75.41 75.43 75.39 74.58 75.62 75.63 75. Word + span (%) - - - - - - - - - - - - Id Word + POS + span (%) - - - - - - - - - - -
  • Items on Queue
  • Items on Stack
  • Word (%) 67.85 67.71 67.65 67.64 67.94 67.77 67.68 67.74 67.97 67.79 67.
  • Word + location (%) 68.70 63.63 62.37 67.83 65.22 63.04 61.89 66.92 64.41 62.55 61.
  • Word + location + span (%) 68.61 65.65 63.69 67.77 64.85 63.18 62.26 66.89 64.23 62.79 62.
  • Word + span (%) 67.08 64.46 62.42 61.21 66.14 63.60 61.84 65.24 62.81 61.25 60.
  • POS (%) 83.15 83.04 81.64 83.74 83.41 81.78 80.54 82.31 81.47 79.70 78.
  • POS + location (%) 77.55 75.79 74.80 76.25 75.48 75.04 74.95 74.94 74.83 74.65 74.
  • POS + location + span (%) 77.57 75.71 74.51 76.31 75.42 74.92 74.83 75.13 74.85 74.63 74.
  • POS + span (%) 77.42 75.51 74.49 74.28 76.10 75.17 74.59 74.92 74.64 74.41 74.
  • Word + POS (%) 83.62 82.25 81.02 83.34 82.89 81.29 80.36 81.76 81.04 79.67 79.
  • Word + POS + location (%) 77.36 75.84 75.13 76.31 75.73 75.39 75.33 75.21 75.29 75.07 75.
  • Word + POS + location + span (%) 77.38 75.72 74.92 76.39 75.64 75.22 75.15 75.36 75.21 75.00 74.
  • Word + POS + span (%) 77.23 75.54 74.68 74.55 76.17 75.45 75.04 75.18 75.04 74.81 74.
  • Items on Queue RandomTree
  • Items on Stack
  • Word (%) 67.72 68.04 68.00 68.00 67.82 68.27 68.26 68.32 68.01 68.47 68.
  • Word + location (%) 71.36 70.67 69.48 70.25 71.32 70.64 69.35 68.67 71.05 70.19 69.
  • Word + span (%) 69.97 69.27 68.23 67.46 69.80 68.99 68.25 67.39 69.63 68.67 67. Word + location + span (%) 72.02 70.55 - - 71.83 70.27 - - 71.50 69.937 -
  • POS (%) 83.67 84.82 84.50 83.71 84.47 85.28 84.71 84.26 84.26 84.78 84.
  • POS + location (%) 83.04 82.93 80.76 79.18 82.77 81.41 80.15 78.84 81.71 80.09 78.
  • POS + location + span (%) 82.54 81.00 79.17 76.32 81.78 79.41 78.02 76.26 80.27 78.79 77.
  • POS + span (%) 83.15 81.47 79.89 79.19 81.50 80.89 78.69 77.78 80.60 79.93 77.
  • Word + POS (%) 84.08 85.02 84.34 83.62 84.80 85.37 84.34 83.57 84.36 84.46 83.
  • Word + POS + location (%) 83.33 82.06 81.23 80.10 82.02 81.84 80.62 79.17 81.24 80.27 79.
  • Word + POS + location + span (%) 82.91 80.68 78.92 77.35 81.75 79.59 77.50 76.33 80.48 78.09 76.
  • Word + POS + span (%) 82.89 81.75 79.46 78.42 81.71 80.58 77.87 77.34 80.37 78.95 76.
  • Items on Queue NaiveBayes
  • Items on Stack
  • Word (%) 66.99 61.83 60.13 65.70 65.95 65.48 63.37 64.57 65.06 64.68 64.
  • Word + location (%) 60.72 57.56 57.02 64.45 64.12 57.03 55.68 62.10 62.21 54.67 52.
  • Word + location + span (%) 50.95 47.39 49.98 55.98 47.29 44.51 47.60 51.12 45.28 42.79 45.
  • Word + span (%) 56.69 51.92 53.47 61.93 55.32 48.75 51.09 58.75 52.30 46.46 48.
  • POS (%) 76.58 74.79 70.42 76.95 76.78 76.19 74.02 75.40 76.01 75.38 74.
  • POS + location (%) 73.49 66.60 64.05 74.76 74.70 71.13 67.13 71.35 72.39 70.68 67.
  • POS + location + span (%) 65.37 59.14 58.43 69.06 65.72 58.22 57.11 63.75 61.27 55.49 54.
  • POS + span (%) 69.93 62.92 61.15 71.68 70.93 65.13 61.31 67.19 67.62 63.37 59.
  • Word + POS (%) 76.72 71.35 66.00 73.84 74.67 73.66 71.04 70.76 72.12 72.21 71.
  • Word + POS + location (%) 73.52 64.61 62.25 71.93 72.50 69.20 63.57 68.08 69.13 67.89 62.
  • Word + POS + location + span (%) 65.40 58.09 57.62 67.91 64.58 56.72 55.55 62.92 59.78 53.95 52.
  • Word + POS + span (%) 69.66 61.49 59.98 69.87 69.69 62.84 59.08 65.67 65.33 60.50 56.
  • Items on Queue DecisionStump
  • Items on Stack
  • Word (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
  • Word + location (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
  • Word + location + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
  • Word + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
  • POS (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
  • POS + location (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
  • POS + location + span (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
  • POS + span (%) 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.58 60.
  • Word + POS (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
  • Word + POS + location (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
  • Word + POS + location + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.
  • Word + POS + span (%) 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.77 63.

a training feature is lower than when using POS tags and their locations as

training features. Using only POS tags as a training feature yielded 86.04%

classification accuracy while using POS tags and locations as training features

yielded 86.96% accuracy for classification.

However, although using only POS tags as a training feature yielded lower

classification accuracy than using POS tags and location as training features,

when we look at parsing accuracy the reverse is true. The second and fourth rows

of Table 2 show that using only POS tags as a training feature yielded higher

parsing accuracy than using POS tags and location as training features. Using

only POS tags as a training feature for data classification with the J48 classifier

improved the parsing accuracy by 4.2% for unlabelled attachment score, 4% for

labelled attachment score and 0.3% for labelled accuracy. Additionally, using

only POS tags as a training feature for the same classifier makes the parser 44%

quicker than using POS tags and location as training features.

Furthermore, using a smaller window size for the queue and the stack

improved the performance of our parser. Training our parser with the same

classifier (the J48 classifier) and with the same training feature (POS tags only)

but with a smaller window size (four items on the queue and three items on

the stack) improved the parsing performance, 0.04% for unlabelled attachment

score, 0.5% for labelled attachment score and 5.8% for its speed, over using a

large window size (four items on the queue and four items on the stack).

The reason for the improved parsing speed when training the parser on

smaller sets of feature is that the question set that the parser searches through

is smaller and hence it will find an answer more quickly compared to situations

where there are several features for searching through. The reason for the

improved parsing accuracy when training on smaller set of features is likely

because the parser learns consistently from a smaller set of feature than from

several features where the training is not large enough to learn more effectively.

7 A Comparison with the State-of-the-

Art Parser

In this section we compare our parser with MaltParser, as shown in Ta-

ble 3, where we have conducted a 5-fold cross validation on both parsers. We

have trained our parser using the J48 classifier with POS tags as features and

a window size of four items on the queue and three items on the stack. We

can note that our parser is 40.3% more efficient than MaltParser. Although

the unlabelled attachment accuracy of our parser is slightly lower than that of

MaltParser (0.7%), the labelled attachment score and the labelled score is more

accurate than MaltParser by 1% and 1.4% respectively. We believe that this

improved accuracy of labelled attachment score and labelled score is because our

parser have information about intermediate items between parent and daugh-

ter, which are collected during training (see Section 5 for more details) where

such information is not available to MaltParser. MaltParser learns models that

contain information about parent and daughter relations and their labels dur-

ing training phase where the information about the intermediate items between

parents and daughters that we use is not recorded.

Table 3: Parser performance of MaltParser and NDParser.

Parser UAS(%) LAS(%) LA(%) Speed

MaltParser 75.2 70.0 92.2 0.

NDParser 74.5 71.0 93.6 0.

The results for MaltParser in Table 3 were obtained by training and testing

it on the dependency trees that we extracted from the PATB. The structure

of these trees depends on the head-percolation table (Magerman, 1995) that is

used during the conversion process. It is likely that this underlies the differences

between our results from MaltParser and the results published by Nivre (2008)

(77.76% for UAS, 65.79% for LAS, and 79.30% for LA), where Niver’s results

for UAS are slightly better than the ones we obtained in this study, for LAS

and LA they are slightly worse.

8 Conclusion

In this paper we have presented a simple approach for evaluating and gen-

erating parse models from various machine learning classifiers. We have shown

that generating a parse model, which is used for guiding data-driven parsers,

from different classifiers affects parsing performance in different ways. We have

discovered that generating a parse model using a small set of features and set-

tings improves parsing accuracy and speed compared with using large features

and settings. We have presented a basic shift-reduce parser, which is based on

the arc-standard algorithm of MaltParser, and we have evaluated our parser

with a various parse models that were generated from different classification

algorithms, features and settings.

References

Chang, C.-c. and Lin, C.-J. (2001), ‘Libsvm: A Library for Support Vector

Machines’.

Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. and Written,

I. H. (2009), ‘The weka data mining software: An update’, SIGKDD Explo-

rations 11 (1).