Probabilistic Language Modeling: N-grams, Smoothing, and Evaluation, Lecture notes of Natural Language Processing (NLP)

Session 9-12 of the elaborate lecture of Natural language processing by Derrik Higgins

Typology: Lecture notes

2018/2019

Uploaded on 11/18/2019

mohammed-jawhar
mohammed-jawhar 🇺🇸

1.5

(2)

2 documents

1 / 70

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Language Models
CS-585
Natural Language Processing
Derrick Higgins
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46

Partial preview of the text

Download Probabilistic Language Modeling: N-grams, Smoothing, and Evaluation and more Lecture notes Natural Language Processing (NLP) in PDF only on Docsity!

Language Models

CS- 585

Natural Language Processing

Derrick Higgins

Probabilistic Language Models

  • Today’s goal: assign a probability to a sentence
    • Machine Translation:

» P(“

high winds tonite”) > P(“

large winds tonite”)

  • Spell Correction

» The office is about fifteen minuets from my house

» P(about fifteen minutes from) > P(about fifteen minuets from)

  • Speech Recognition

» P(“I saw a van”) >> P(“eyes awe of an”)

    • Summarization, question-answering, etc., etc.

How to compute P(W)

  • How to compute this joint probability:
    • P(its, water, is, so, transparent, that)
  • Intuition: let’s rely on the Chain Rule of

Probability

Reminder: The Chain Rule

  • Recall the definition of conditional probabilities

Rewriting:

  • More variables:
  • The Chain Rule in General

𝑃(𝑥

1

, 𝑥

2

, 𝑥

3

, … , 𝑥

𝑛

) = 𝑃(𝑥

1

)𝑃(𝑥

2

|𝑥

1

)𝑃(𝑥

3

|𝑥

1

, 𝑥

2

) … 𝑃(𝑥𝑛|𝑥

1

, … , 𝑥

456

)

How to estimate these probabilities

  • Could we just count and divide?
  • No! Too many possible sentences!
  • We’ll never see enough data for estimating these

P (the | its water is so transparent that) =

Count (its water is so transparent that the)

Count (its water is so transparent that)

Markov Assumption

  • Simplifying assumption:
  • Or maybe

P (the | its water is so transparent that) ≈ P (the | that)

P (the | its water is so transparent that) ≈ P (the | transparent that)

Andrei Markov

Simplest case: Unigram model

fifth, an, of, futures, the, an, incorporated, a,

a, the, inflation, most, dollars, quarter, in, is,

mass

thrift, did, eighty, said, hard, 'm, july, bullish

that, or, limited, the

Some automatically generated sentences from a unigram model

P ( w

1

w

2

… w

n

) ≈ P ( w

i

i

  • Condition on the previous word:

Bigram model

texaco, rose, one, in, this, issue, is, pursuing, growth, in,

a, boiler, house, said, mr., gurria, mexico, 's, motion,

control, proposal, without, permission, from, five, hundred,

fifty, five, yen

outside, new, car, parking, lot, of, the, agreement, reached

this, would, be, a, record, november

P ( w

i

| w

1

w

2

w

i − 1

) ≈ P ( w

i

| w

i − 1

)

ESTIMATING N-GRAM

PROBABILITIES

Estimating bigram probabilities

  • The Maximum Likelihood Estimate

P ( w

i

| w

i − 1

) =

count ( w

i − 1

, w

i

)

count ( w

i − 1

)

P ( w

i

| w

i − 1

) =

c ( w

i − 1

, w

i

)

c ( w

i − 1

)

More examples:

Berkeley Restaurant Project

  • can you tell me about any good cantonese restaurants close by
  • mid priced thai food is what i’m looking for
  • tell me about chez panisse
  • can you give me a listing of the kinds of food that are available
  • i’m looking for a good place to eat breakfast
  • when is caffe venezia open during the day

Raw bigram counts

  • Out of 9222 sentences

Bigram estimates of sentence probabilities

P( I want english food ) =

P(I|)

× P(want|I)

× P(english|want)

× P(food|english)

× P(|food)

=.

What kinds of knowledge?

  • • P(english | want) =.
  • • P(chinese | want) =.
  • • P(to | want) =.
  • • P(eat | to) =.
  • • P(food | to) =
  • • P(want | spend) =
  • • P (i | ) =.