Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Machine Learning for Natural Language Processing: Kernel Functions and Tree Kernels, Study Guides, Projects, Research of Linguistics

San Diego State University (SDSU)Linguistics

An overview of machine learning techniques, specifically kernel functions and tree kernels, for natural language processing. Topics include the use of timbl for classification, string and tree kernels, and the benefits of kernel methods in linguistics. The document also discusses the challenges of floating point arithmetic and the importance of designing appropriate kernel functions for linguistic problems.

Typology: Study Guides, Projects, Research

Pre 2010

Uploaded on 03/28/2010

koofers-user-37i 🇺🇸

7 documents

1 / 5

This page cannot be seen from the preview

Don't miss anything!

Project

•Presentations next week:

Hannah, Lara, (Guoyan), Lucien, (John), Rebecca

•Final project due: Wednesday, May 13 @ 5:00pm

•Turn in a paper explaining what you did, how well it worked, why it

worked as well as it did but not better, etc., plus any programs you

wrote

•The project grade will be based on the paper, which should look like

aconference paper (∼8pages, references)

1

Project

2

Project

•Are chunks within the same clause part of the same argument?

•Feature vector:

NP,NP,Britain,NNP,’s,POS,yes.

NP,VP,industry,NN,is,VBZ,no.

•Use Timbl for classification

•Best result so far: 81.27% accuracy

•Largest source of error is PPs

3

Kernel functions

•Much of the power of SVMs comes from the use of kernel functions

and derived feature spaces

•Linear kernels allow efficient processing of the very large feature

vectors that come with a bag of words model

•Polynomial kernels capture dependencies between features

•Special purpose kernels reflect the structure of a particular problem

•Combinations of kernels are also kernels

4

Discover Study Guides, Projects, Research of Linguistics San Diego State University (SDSU)

Partial preview of the text

Download Machine Learning for Natural Language Processing: Kernel Functions and Tree Kernels and more Study Guides, Projects, Research Linguistics in PDF only on Docsity!

Project

Presentations next week:

Hannah, Lara, (Guoyan), Lucien, (John), Rebecca

Final project due: Wednesday, May 13 @ 5:00pm

Turn in a

paper

explaining what you did, how well it worked, why it

worked as well as it did but not better, etc., plus any

programs

you

wrote

The project grade will be based on the paper, which should look likea conference paper (

8 pages, references)

1

Project

Are chunks within the same clause part of the same argument?

Feature vector: NP,NP,Britain,NNP,’s,POS,yes.NP,VP,industry,NN,is,VBZ,no.

Use Timbl for classification

Best result so far: 81.27% accuracy

Largest source of error is PPs

3

Kernel functions

Much of the power of SVMs comes from the use of kernel functionsand derived feature spaces

Linear kernels allow efficient processing of the very large featurevectors that come with a bag of words model

Polynomial kernels capture dependencies between features

Special purpose kernels reflect the structure of a particular problem

Combinations of kernels are also kernels

String kernels

String subsequence kernels

represent as string as a bag of

(possibly discontinuous)

n

grams

The feature set is very large, but dot products can be computedefficiently

Dynamic programming and suffix trees

For text classification SSKs give a small improvement over

n

-gram

kernels for small training sets

5

Tree kernels

We can use similar tricks to compare trees by comparing commonsubtrees

Given trees

T

1

and

T

2

, with nodes

N

1

and

N

2

, define:

I

i

n

if subtree

i

is rooted at

n

otherwise

The kernel function is:

K

T

1

, T

2

h

T

1

h

T

2

where

h

i

T

1

n

1

∈

N

1

I

i

n

1

or the number of times subtree

i

occurs in tree

T

1

Tree kernels

The feature vector

h

T

1

will have as many dimensions as there are

possible subtrees (which will be astronomical)

But, the dot product

h

T

1

h

T

2

can only depend on dimensions for

subtrees which occur in both

T

1

and

T

2

Let

C

n

1

, n

2

be the number of common subtrees rooted at

n

1

and

n

2

The kernel function is:

K

T

1

, T

2

h

T

1

h

T

2

n

1

∈

N

1

n

2

∈

N

2

i

I

i

n

1

I

i

n

2

n

1

∈

N

1

n

2

∈

N

2

C

n

1

, n

2

7

Tree kernels

We can compute efficiently compute

C

n

1

, n

2

by recursion

If the rules applied at

n

1

and

n

2

are different, then there are no

common subtrees and

C

n

1

, n

2

If the rules are the same and

n

1

and

n

2

are preterminals, then

C

n

1

, n

2

Otherwise:

C

n

1

, n

2

nc

(

n

1

)

∏ i

=

C

ch

n

1

, i

, ch

n

2

, i

Worst case,

K

can be computed in

O

N

1

N

2

time, but in practice

C

n

1

, n

2

for most

n

1

, n

2

and the computation is much cheaper

Floating point arithmetic

Digital computers can’t represent real numbers: bulba% pythonPython 2.2.1 (#1, Aug 30 2002, 12:15:30)[GCC 3.2 20020822

(Red Hat Linux Rawhide 3.2-4)] on linux

Type "help", "copyright",

"credits"

or "license" for more information.

3.33.2999999999999998>>>

Financial calculations use integers

Scientific calculations use approximations, which vary in theiraccuracy

Standard for floating point calculations: IEEE 754

13

Floating point arithmetic

Floating point numbers are stored as a

mantissa

and an

exponent

IEEE floating point formats:

precision

min

max

eps

digits

single

×

−

38

×

38

×

−

7

double

×

−

308

×

308

×

−

16

Just because you can represent

300

doesn’t mean you get 300

significant digits!

Default in python and perl is double precision

Don’t use single precision (

float

) unless you have a good reason

Floating point arithmetic

It’s easy to lose precision:

Things to watch out for:^?

subtractions of numbers that are nearly equal,

additions of numbers whose magnitudes are nearly equal, butwhose signs are opposite

additions and subtractions of numbers that differ greatly inmagnitude

Exact comparisons between floating point numbers can bemisleading

The same operations performed in a different order or on differenthardware may given different results

15

A look back

We’ve come a long way, from flipping coins to Support VectorMachines

Non-parametric methods:^?

decision trees

instance-based learning

transformation-based learning

perceptron

support vector machines

Parametric methods:^?

naive Bayes

maximum entropy

A look back

One theme that runs through machine learning research is the waywe characterize

generalization

curse of dimensionality

bias vs. variance

overtraining

simplicity

capacity

17

A look ahead

Some current directions in machine learning for NLP:^?

getting at ‘deep’ structures

task-specific representations (remember, there’s no free lunch!)

scaling methods to deal with huge datasets

Data mining uses machine learning to find patterns in unstructureddata collections...

... which we’ll be looking at in more detail in the fall

Machine Learning for Natural Language Processing: Kernel Functions and Tree Kernels, Study Guides, Projects, Research of Linguistics

Related documents

Partial preview of the text

Download Machine Learning for Natural Language Processing: Kernel Functions and Tree Kernels and more Study Guides, Projects, Research Linguistics in PDF only on Docsity!

Project

Project

Project

Kernel functions

String kernels

Tree kernels

T

T

N

N

I

K

T

, T

T

T

T

I

T

Tree kernels

T

T

T

T

T

C

K

T

, T

T

T

I

I

C

Tree kernels

C

C

C

C

C

K

O

N

N

C

Floating point arithmetic

Floating point arithmetic

×

×

×

×

×

×

Floating point arithmetic

A look back

A look back

A look ahead