Machine Learning Feature Selection: Info Gain, Rel. Entropy, Markov Blanket, Slides of Fundamentals of E-Commerce

Various feature selection techniques used in machine learning, including information gain, average relative entropy, markov blanket, and approximate markology blanket. Information gain measures the information provided by each term about the class, while average relative entropy determines the relevance of a set of features using the average relative entropy. Markov blanket identifies features that make a term conditionally independent of all other features, and approximate markov blanket uses correlation factors and average cross entropy to select the most relevant features. The document also covers measures of performance, such as the confusion matrix, precision, recall, and precision-recall curve.

Typology: Slides

2012/2013

Uploaded on 07/29/2013

masti
masti 🇮🇳

4.5

(10)

121 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Information Gain Method
Information Gain, G – Measure of information about the class that is
provided by the observation of each term
Also defined as
mutual information l(C, Wj) between the class C and the term Wj
For feature selection
compute the information gain for each unique term
remove terms whose information gain is less than some predefined threshold
Limitations
relevance assessment of each term is done separately
effect of term co-occurrences is not considered
= =
=
K
cj
j
jj
jPcP
cP
cPWG
1
1
0)()(
),(
log),()(
ω
ω
ω
ω
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Machine Learning Feature Selection: Info Gain, Rel. Entropy, Markov Blanket and more Slides Fundamentals of E-Commerce in PDF only on Docsity!

Information Gain Method • Information Gain,^ G^ – Measure of information about the class that isprovided by the observation of each term • Also defined as – mutual information

l(C, Wj) between the class

C^ and the term

Wj

-^ For feature selection^ –^ compute the information gain for each unique term^ –^ remove terms whose information gain is less than some predefined threshold •^ Limitations^ –^ relevance assessment of each term is done separately^ –^ effect of term co-occurrences is not considered

K = ∑ ∑=^ = c^

j j j j^ j^

cPc PcP 1 PWG 10

),( log) )()( ,() (^ ω

ω ω^ ω

Average Relative Entropy Method • Whole sets of features are tested forrelevance about the class (Koller and Sahami,1996) • For feature selection^ –^ determine relevance of a selected set using theaverage relative entropy

Markov Blanket Method • M is a Markov Blanket for term

Wj

-^ If^ Wj^ is conditionally independent of all features in

V –

M - {Wj}, given

M^ ⊂^ V,^ Wj

∉M

-^ class^ C^ is conditionally independent of

Wj, given^ M

-^ Feature selection is performed by^ –^ removing features for which the Markov blanket isfound

Approximate Markov Blanket • For each term

Wj^ in^ G,

-^ compute the co-relation factor of

Wj^ with^ Wi

-^ obtain a set

M^ of^ k^ terms, that have highest co- relation with

Wj

-^ find the average cross entropy

∆(Wj, Mj)

-^ select the term for which the average relativeentropy is minimum • Repeat steps until a predefined number ofterms are eliminated from the set

G

Confusion Matrix

-^ TN - irrelevant values not retrieved •^ TP - relevant values retrieved •^ FP - irrelevant values retrieved •^ FN - relevant values not retrieved PredictedActual CategoryCategory-^ •^ Total retrieved terms = TP + FP •^ Total relevant terms = TP + FN

-^ TN

FN +^ FP

TP

Measures of Performance • For balanced domains – accuracy characterizes performance

A = (TP+TN) / |D|

-^ classification error,

E = 1 - A

-^ For unbalanced domain^ –^ precision and recall characterize performance

TP = π^ FPTP^ +

TP = ρ FNTP^ +

Precision-Recall Averages • Microaveraging^ ∑ • Macroaveraging

  • k TP ∑ c μ= c^1 = π k + FPTP ( cc ) = c
  • k TP ∑ c μ= c^1 = ρ k + FNTP )( ∑ cc = c
    • K 1 M = ππ ∑ c K = c
      • K 1 M = ρρ∑ c K = c