Association Rules - Database Design - Lecture Slides, Slides of Database Management Systems (DBMS)

These lecture slide are very easy to understand and very helpful to built a concept about the foundation of computers and Database Design.The key points in these slide are:Association Rules, Data Mining, Apriori Algorithm, Knowledge-Discovery in Databases, Searching Large Volumes, Extraction of Implicit, Information Visualization, Neural Networks, Data Mining Techniques

Typology: Slides

2012/2013

Uploaded on 04/27/2013

arunima
arunima 🇮🇳

3

(2)

99 documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
DATA MINING
-Association Rules-
Docsity.com
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Association Rules - Database Design - Lecture Slides and more Slides Database Management Systems (DBMS) in PDF only on Docsity!

DATA MINING

- Association Rules-

Outline

1. Data Mining (DM) ~ KDD [Definition]

2. DM Technique

-> Association rules [support & confidence]

3. Example

(4. Apriori Algorithm)

1. Data Mining ~ KDD [Definition]

Data Mining techniques

  • Information Visualization
  • k-nearest neighbor
  • decision trees
  • neural networks
  • association rules

2. Association rules

Support

Every association rule has a support and a confidence. “The support is the percentage of transactions that demonstrate the rule.” Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 1, 3, 5. 2: 1, 8, 14, 17, 12. 3: 4, 6, 8, 12, 9, 104. 4: 2, 1, 8. support {8,12} = 2 (,or 50% ~ 2 of 4 customers) support {1, 5} = 1 (,or 25% ~ 1 of 4 customers ) support {1} = 3 (,or 75% ~ 3 of 4 customers)

2. Association rules

Confidence

Every association rule has a support and a confidence. An association rule is of the form: X => Y

  • X => Y: if someone buys X, he also buys Y

The confidence is the conditional probability that, given X

present in a transition , Y will also be present.

Confidence measure, by definition:

Confidence(X=>Y) equals support(X,Y) / support(X)

2. Association rules

Confidence

We should only consider rules derived from itemsets

with high support , and that also have high confidence.

“A rule with low confidence is not meaningful.”

Rules don’t explain anything, they just point out hard

facts in data volumes.

3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … ) 1: 3, 5, 8. 2: 2, 6, 8. 3: 1, 4, 7, 10. 4: 3, 8, 10. 5: 2, 5, 8. 6: 1, 5, 6. 7: 4, 5, 6, 8. 8: 2, 3, 4. 9: 1, 5, 7, 8. 10: 3, 8, 9, 10.

Conf ( {5} => {8} )? 80% Done. Conf ( {8} => {5} )?

supp({5}) = 5 , supp({8}) = 7 , supp({5,8}) = 4,

then conf( {8} => {5} ) = 4/7 = 0.57 or 57%

3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf ( {5} => {8} )? 80% Done.

Conf ( {8} => {5} )? 57% Done.

Rule ( {5} => {8} ) more meaningful then

Rule ( {8} => {5} )

3. Example

Example: Database with transactions ( customer_# : item_a1, item_a2, … )

Conf( {9} => {3} ) = 100%. Done.

Notice: High Confidence, Low Support.

-> Rule ( {9} => {3} ) not meaningful

4. APRIORI ALGORTHM

APRIOIRI is an efficient algorithm to find association rules (or,

actually, frequent itemsets ). The apriori technique is used for

“generating large itemsets.” Out of all candidate (k)-itemsets,

generate all candidate (k+1)-itemsets.

(Also: Out of one k-itemset, we can produce ((2^k) – 2) rules)

4. APRIORI ALGORTHM

Example (CONTINUED):

Delete (prune) all itemset candidates with non-frequent subsets.

Like; {3,5,6,8} self never frequent since subset {5,6,8} is not

frequent.

Actually, here, only one remaining candidate {3,4,5,7}

Last; after pruning, determine the support of the remaining

itemsets, and check if they make the threshold.