








Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Prepara tus exámenes
Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity
Prepara tus exámenes con los documentos que comparten otros estudiantes como tú en Docsity
Encuentra los documentos específicos para los exámenes de tu universidad
Estudia con lecciones y exámenes resueltos basados en los programas académicos de las mejores universidades
Responde a preguntas de exámenes reales y pon a prueba tu preparación
Consigue puntos base para descargar
Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium
Comunidad
Pide ayuda a la comunidad y resuelve tus dudas de estudio
Ebooks gratuitos
Descarga nuestras guías gratuitas sobre técnicas de estudio, métodos para controlar la ansiedad y consejos para la tesis preparadas por los tutores de Docsity
Asignatura: Teoria de la probabilitat, Profesor: fineti fineti, Carrera: Ciències i Tècniques Estadístiques, Universidad: UV
Tipo: Apuntes
1 / 14
Esta página no es visible en la vista previa
¡No te pierdas las partes importantes!









Similarity, Uncertainty and Case-Based Reasoning in PATDEX
Michael M. Richter, Stefan Wessy University of Kaiserslautern Dept. of Computer Science P.O. Box 3049 D-6750 Kaiserslautern
Abstract Patdex is an exp ert system which carries out case-based reasoning for the fault di- agnosis of complex machines. It is integrated in the Moltke workb ench for technical diagnosis, which was develop ed at the university of Kaiserslautern over the past years, Moltke contains other parts as well, in particular a mo del-based approach; in Patdex where essentially the heuristic features are lo cated. The use of cases also plays an imp or- tant role for knowledge acquisition. In this pap er we describ e Patdex from a principal p oint of view and emb ed its main concepts into a theoretical framework
1 General Considerations
Patdex^1 is an exp ert system which carries out case-based reasoning for the fault diagnosis of complex machines. It is integrated in the Moltke workb ench^2 for technical diagnosis, which was develop ed at the university of Kaiserslautern over the past years (cf. e.g. [4, 5 , 23]), Moltke contains other parts as well (cf. e.g. [16]), in particular a mo del-based approach (cf. [21, 22 ]); in Patdex [3] where essentially the heuristic features are lo cated. The use of cases also plays an imp ortant role for knowledge acquisition. In this pap er we describ e Pat- dex from a principal p oint of view and emb ed its main concepts into a theoretical framework.
This research has a numb er of mainly indirect connections to the work of Wo o dy Bledso e. We mention his interest in analogy, his early connectionist work and his in uence in merging mathematics and arti cial intelligence. For the rst author the main p oint was that Wo o dy Bledso e brought him in contact with AI at an early stage. More than twenty years ago we started a lively discussion which still go es on and will hop efully last for many more years.
(^) also: SEKI-Rep ort SR-91-01, Universitaet Kaiserslautern, Fachb ereich Informatik and Festschrift for Wo o drow W. Bledso e y (^) The work presented herein was partially supp orted by the Deutsche Forschungsgemeinschaft,SFB 314: Arti cial Intelligence - Knowledge-Based Systems, pro jects X6 and X9. (^1) PATtern Directed EXp ert Systems (^2) MOdels, Learning and Temp oral Knowledge in Exp ert Systems for Technical Domains
1.1 Similarity
Similarity and uncertainty have in common that b oth can b e describ ed by measures of values e.g. in the real interval [0; 1]. At rst glance it seems that here the analogy b etween these concepts comes to an end; we will, however, discuss some more connections later on. A similarity measure sim(x; y ) can b e de ned on arbitrary ob jects of interest as physical ob jects, situations, problems or formulae; let U b e the ( nite) universe of these ob jects.
The basic axioms for sim are:
The dual notion is that of a distance measure d(x; y ) which may attain arbitrary nonnegative values. In the corresp onding axioms re exivity reads as d(x; x) = 0. One do es not require, however, the triangle inequality and allow d(x; y ) = 0 for x 6 = y which means that d is neither a metric nor even a pseudo-metric. One says that d and sim corresp ond to each other i there is an order reversing one-one mapping f : r ang e(d)! r ang e(sim)
such that f (0) = 1 and sim(x; y ) = f (d(x; y )); we denote this by d f sim.
Popular candidates for f are: f (z ) = 1 (^) 1+zz for unb ounded r ang e(d) orf (z ) = 1 (^) maxz , if r ang e(d) has a greatest element max.
Usually individual values of sim or d are not so much of interest as certain relations b etween them. For the use in analogical reasoning the following relations are basic. If d is a distance measure and sim a similarity measure then we de ne
Rd (x; y ; u; v ) : () d(x; y ) d(u; v ) (1) Rsim (x; y ; u; v ) : () sim(x; y ) sim(u; v ) (2)
and
Sd (x; y ; z ) : () Rd (x; y ; x; z ) (3) Ssim (x; y ; z ) : () Rsim (x; y ; x; z ) (4)
It is very often easier to determine the relation Sd (x; y ; z ) than the distance measure d itself and it is also sucient for many applications. We say that d and sim are compatible, i
Rd (x; y ; u; v ) () Rsim (x; y ; u; v ) (5)
compatibility is ensured by d f sim for some f.
For some set M U some y 2 M is called most similar to x with resp ect to M i
car d(Ml ) car d(Mu )
; clearly 0 (M ) 1 holds.
Rough sets o ccur in various ways. There are essentially two di erent typ es which are b oth connected with diagnostic problems:
The indiscernabil ity relation is transitive
The indiscernabil ity relation is not transitive
In the rst case de nes a partition of U into blo cks of indistinguishabl e elements. The typical example for this arises as ab ove when the ob jects in U are describ ed by attributes which may take on certain values. Each set A of attributes de nes an indiscernabili ty relation A , where x A y holds i the values of all attributes in A for x and y are identical. Because of the niteness of the universe U the upp er and lower approximations of sets as well as the correlated accuracy measure can in principal b e computed. In [17] it is shown how this can b e expressed in terms of rules and how one can apply it to classi cation problems; an extension to a decision logic is presented in [18]. For larger numb ers these computations b ecome very time-consuming, however.
A prototyp e for the second case arises from a distance measure d. For each > 0 there is an indiscernabili ty relation d; de ned by
x d; y i y 2 Vd; (x)
In case d is a metric d; is transitive and the blo cks are of the form Vd; 0 (x). The intention is of course, that elements in these blo cks are in some sense absolutely indistinguishable , i.e. there is no further information available to separate these elements.
1.3 Diagnosis
Our area of interest is fault diagnosis and we need to intro duce the basic notions. We assume a xed numb er N of symptoms S 1 ; : : : ; SN. With each symptom Si a range Ri is asso ciated; in principal symptoms are nothing than attributes. Typically Ri is either a real interval [a; b] or the b o olean domain 0 ; 1 or some other nite set. Symptoms may take on values in their range and these values are assumed to b e the only source of information. Values of symptoms are obtained by carrying out a test. A test can b e an observation, a measurement or simply the answer to a question. In some situations certain tests may not b e allowed. The information at some stage of the diagnostic pro cess is usually incomplete and is expressed in the form of an information vector or a situation. A situation is a vector S it = (ai 1 : : : ; aik ), 1 i; j N such that each aij 2 Rij. The comp onents of S it are the known symptom values whereas the values of the remaining symptoms are unknown. A situation is complete, if every symptom has a value. Situations are arranged in the information graph. Its no des are lab elled with situations and an edge go es from S it 1 to S it 2 if S it 2 has one more comp onent then S it 1 and there is a
test tj available which can provide the value necessary to extend S it 1 to S it 2 ; in this case tj is a lab el of the edge. A diagnosis (or fault description) is a formula of the rst order predicate calculus using constants and relations over the ranges Ri ; the precise form of these formulae is not of interest here. To avoid technical diculties we assume always a single fault. This means that the set of complete situations is partitioned into sets representing these faults; a sp ecial set is "no fault" and, if wanted, another one is "unknown fault". The applicabili ty of this approach relies on the fact that at least the "interesting" faults can b e fully describ ed. In the diagnosis of even complex machines this assumption is usually satis ed; in medical diagnostics this sometimes may b e doubtful.
In a diagnostic problem some complete situations have o ccurred but are only partially known, i.e. one is confronted with some incomplete situation S it. The task is to determine the diagnosis of the unknown complete situation (at least with some certainty). At rst glance this seems to b e a pure classi cation problem. For each set A of symptoms and each fault diagnosis we can compute the accuracy intro duced ab ove as
() = (fS itjS it a complete situation that satis es g)
where =A as intro duced in section 1.2. If () = 1, then is decided by the symptoms in A and the symptoms not in A are redundant. To recognize redundant symptoms some of the computations mentioned in 1.2 are useful; in Patdex they are complemented using the case-base. With equal right, one can say however, that the real problem is to nd an optimal way to complete incomplete situations suciently enough so that a diagnosis with a high degree of certainty can b e established. This task has b een attacked less successful in the literature.
Hence, when given a situation S it one may pro ceed in two ways:
In Patdex (as well as in the general Moltke approach) these two steps play the central role. The second step contains mainly a classi cation problem. The rst step is more complex and can b e discussed from di erent p oints of views. Ab ove we said that we want an optimal way to complete the information; the term optimal is not clearly de ned here, however. The main p oint is that we want an optimization with resp ect to an unknown target, namely the true diagnosis. We b elieve that purely information theoretic based approaches like the Top-Down- Inductive-Decision-Tree s (ID3, cf.[20]) are insucient for our purp oses so Patdex cho oses another way.
1.4 Case-Based Reasoning
In case-based reasoning (cf. e.g. [14, 13, 25 , 8]) one has a base of cases where a case is an ordered pair c = (problem, solution). The cases are stored in a case base. Instead of solving a new problem directly the case base is employed in order to use solutions from earlier problems.
2 The PATDEX-System
2.1 Motivation and Overview
Patdex 4 is a part of the Moltke-System which was develop ed in the past years at the University of Kaiserslautern. The starting p oint of Patdex are the ab ove considerations. Patdex employs the techniques develop ed in these contexts b oth in an explicit or implicit way; we will not discuss these details here. Instead we are interested in some shortcomings of these approaches in real world applica- tions which we will p oint out next.
The problem in the use of rough sets based on the relations A for sets A of symptoms consists mainly in the fact that the diagnosis problem is considered only as a classi ca- tion problem, in particular the indiscernabili ty relations A for di erent A are totally unrelated to each other. If A B then B is ner and therefore more informative than A ; this should b e re ected in the system. Also, one would like to get hints which B for a given A provides the most suitable new information.
The diculty with the similarity measure is that its quality is related to the nal success of the whole reasoning pro cedure; this is an a posteriori criterion. A priori it is not clear what the criteria for similarity of ob jects should b e; they do not only dep end on the ob jects themselves but also on the pragmatics of reasoning. In case-based reasoning it is usually clear whether a solution for a given problem is correct but is far from clear what it means that two problems are similar enough that the solution for one problem also works for the other one. An even more serious diculty arises when the world of problems is continuously changing.
All this suggests that the similarity should not b e de ned in some xed way but instead b e the result of an adaptive learning pro cess. This will b e carried out later on.
2.2 The PATDEX/1 prototyp e
The rst version of Patdex is Patdex/1. This prototyp e contains the basic structures which have b een extended later on as describ ed in section 2.3. In this section we will brie y describ e this prototyp e. As basic techniques, Patdex/ applies learning by memory adaptation and analogical reasoning. The system has capabilities to memorize and utilize b oth its individual exp eriences and its statistical information. The reasoning pro cess that uses this exp erience knowledge is combined with another one that fo cuses on similarities. The overall pro cess of diagnosis is based on the analogical problem solving algorithm (APS) prop osed by [11]. The pro cess is started by the user giving some observed symptom values as input to the system.
(^4) Actually, there are two systems, Patdex/1 and Patdex/2. By Patdex (or the Patdex approach) we denote all the information which relates to b oth systems.
The toplevel algorithm of Patdex reads as follows:
Input : The actual situation S it Output : diagnosis or failure
Here we need an external teacher who says whether a diagnosis is correct or not. We also have to explain "minimal ly similar" and "suciently similar". For this we need a partition of the case base which is given after the intro duction of the similarity measure. Finally, we have to describ e the selection of the next test. Therefore Patdex has two main features, similarity and the exp erience net. Both make use of the case base but are indep endent and could work in parallel. For situations Patdex uses as a rst prop osal the similarity measure from equation 6 with parameters = 1 ; = 2 ; = = 1 = 2 : It should b e remarked that this measure is normalized to [ 2 ; 1] and reads as
sim(S it 1 ; S it 2 ) = car d(E ) 2 car d(C ) 1 =2(car d(U 1 ) + car d(U 2 )) car d(E [ C [ U 1 [ U 2 )
This sp ecial choice of the parameters is at the moment mainly motivated by exp erimental results. It has a defensive, p essimistic character. A high negative contribution to the measure is given for con icting symptom values, i.e. we strongly wish to avoid false diagnoses. If the value assigned to a given case by the similarity measure exceeds a lower b ound (hyp othesis-threshold), this case is said to b e quali ed for further pro cessing. If the value exceeds an upp er b ound it is even quali ed as diagnosis (diagnosis-threshold). Both thresh- olds are lo cally de ned for each case of the case base. If, for a given case, the similarity value equals 1 this case is said to b e proven. A case b ecomes disquali ed for further use in a particular diagnosis session as so on as all symptoms contained in that case do not hold, given a situation encountered during diagnosis, or if there are no unknown symptom values any more and the sp eci ed case do es not exceed the diagnosis-threshold. Another reason for disquali cation is given if the case the system cho oses as its hyp othesis is refused by the user. For the use of cases in the top-level algorithm we will de ne di erent similarity classes. For this we cho ose real numb ers and such that 0 < < < 1 and de ne:
as a guideline. Imp ortant features of this approach are the combination of similarity and exp erience for the diagnosis of technical systems and the di erentiation b etween classi cation and test selection. This has to b e seen as the ful llment of a requirement of the underlying real world application. Particularly derivational analogy [10] can b e elegantly applied to the eld of technical diagnosis. Compared with a human engineer Patdex/1 came o very well, in particular with resp ect to the similarity measure which has b een de ned in equation 7. Shortcomings of Patdex/1 are the diculty to generalize the similarity measure and the fact that the case-fo cussing test selection is not necessarily globally optimal. Usually the complexity of handling the exp erience graph is sup er-exp onential complexity concerning space and time b ecause in the worst case all sequences of symptom values have to b e represented. Furthermore, Patdex/1 takes no advantage of causal or functional background knowledge. This increases the p ossibilities of faulty diagnoses when to o many redundant symptom values are presented or some relevant ones are missing.
Patdex/2 (cf. [6, 28]) is an integral part of the Moltke workb ench which allows the utilization of all its qualities (prop osed in [26]). Therefore it is p ossible to switch b etween case- based reasoning and the interpretation of a Moltke knowledge base during problem solving. The use of causal knowledge enables Patdex/2 to identify pathologic symptom values. Thus, redundant information can b e ltered o and cannot b e the cause for a false diagnosis any more. By the exploitation of functional background knowledge additional symptom values can b e derived from the known ones. In this manner the selection of the resp ective most similar case is considerably sp eeded up. The overall case-based reasoning approach which is used by Patdex/2 is comparable to the memory-based reasoning approach prop osed by Stan ll and Waltz in [25]. An imp ortant asp ect of our Patdex/2 approach is to view the relevances of certain symptom values for sp ecial situations as a part of the empirical knowledge which shall b e learned. In Patdex/2 we combine the case-based reasoning approach for diagnosis with a connectionist approach for learning this empirical knowledge.These relevances wij 2 [0; 1] are represented by means of a relevance matrix R= [w ij ] where the symptoms Si and diagnoses j o ccur as inscriptions of the rows and columns, resp ectively. In course of time the weights of the symptoms, i.e. the elements of the relevance matrix, are learned by Patdex/2. The strategy for learning the entries of the relevance matrix is similar to the comp etitive learning mechanism prop osed in [24]. The matrix R reads as follows:
1 2 : : : m S 1 w 11 w 12 : : : w 1 m S 2 w 21 w 22 : : : w 2 m .. .
Sn wn 1 wn 2 : : : :wnm
For the degree of relevance of a certain symptom value it is imp ortant whether it is a con- sequence of the normal functioning of the technical system or of a fault. E.g. relais 21K switched is of the rst kind while voltage 214 too high is a pathological symptom value. To identify certain pathological symptom values Patdex/2 can use the functional background knowledge which is represented in the Moltke workb ench.
Since Patdex/1 uses its similarity measure sim only for the comparison of two cases, it is not necessary to de ne relations b etween symptom values. Patdex/2 extends this view of similarity by the additional use of lo cal similarity measures !i (aik ; ail ) which determine the similarity b etween p ossible symptom values aik ; ail ; 2 Ri of a symptom Si. If one of the symptom values is unknown then the similarity !i evaluates to zero. The intro duction of R and !i leads to the de nition of a new similarity measure which is normalized to [0; 1] and matches the de nition made in section 1.1:
sim(S it 1 ; S it 2 ) =
The new attribute sets (based on the sets de ned in section 1.4) are de ned by multiplying the relevance of a sp eci c symptom Si (represented by the relevance matrix R) with the similarity of the observed symptom value aik in S it 1 and the de ned symptom value ail in the actual case c = (S it 2 ; j ):
Si 2 E
wij !i (aik ; ail ) (10)
Si 2 C
wij (1 !i (aik ; ail )) (11)
Si 2 U 1
vij (1 !i (aik ; ail J )) = car d(U 1 ) (12)
Si 2 U 2
wij (1 !i (aik ; ail )) =
Si 2 U 2
wij (13)
We p oint out here that !i is zero for symptoms Si which b elong to one of the attribute sets U 1 or U 2 , b ecause the corresp onding symptom values are unknown (cf. section 1.4). Additionally, we restrict the representation of redundant symptoms (i.e. Si 2 U 1 ) to pathological ones. Thus, observed redundant symptom values representing the normal b e- havior of the underlying technical system cannot decrease the value of sim any more. Since Patdex/2 fo cusses on the learning of symptom relevances only for the resp ective diagnosis no entries for redundant symptoms Si can b e created. Here we need an alternative weighting vij. In Patdex/2 we de ne 8 i; j : vij = 1, which is motivated by the ab ove mentioned restriction of U 1. By the use of these de nitions we get a similarity measure sim which is dep ending on the values represented in the relevance matrix. After each erroneous diagnosis the weights of the relevance matrix are changed. Thus, the similarity measure sim is the result of an adaptive learning pro cess.
2.3.1 Test Selection
As opp osed to other known case-based systems which concentrate on the asp ect of classi - cation Patdex/2 uses case-based mechanisms for classi cation as well as for test selection. In Patdex/2 the case-fo cussing test selection pro cedure is extended by a case- based one 5. (^5) This sub comp onent of Patdex/2 is a case-based reasoning system of its own where strategy cases are used which can b e automatically generated out of the known diagnostic cases. As it is an improvement of the exp erience graph and, b eyond that, the cost estimation pro cedure can b e viewed as a kind of graph interpretation, we maintain the denotation exp erience graph for Patdex/2 for reasons of simplicity.
References
[1] Aamo dt, A.: A Computational Mo del of Knowledge-Intensive Learning and Problem Solving, in: Pro c. EKAW-90, pp. 1-
[2] Altho , K.-D.:Lernen aus Fallb eispiel en zur Diagnose technischer Systeme do ctoral dis- sertation, University of Kaiserslautern, 1991 (forthcoming)
[3] Altho , K.-D., De la Ossa, A., Maurer, F., Stadler, M., Wess, S.: Adaptive Learning in the Domain of Technical Diagnosis, in: Pro c. of Workshop on Adaptive Learning, FAW Ulm, July 1989
[4] Altho , K.-D., Faup el, B., Ko ckskamp er, S., Traphoner, R., Wernicke, W.: Knowledge Acquisition in the Domain of CNC Machining Centers: the Moltke Approach, in: Pro c. of EKAW-89, pp. 180-
[5] Altho , K.-D., Maurer, F., Rehb old, R.: Multiple Knowledge Acquisition Strategies in Moltke, in: Pro c. EKAW-90, pp. 21-
[6] Altho , K.-D., Wess, S.: Patdex/2: Case-Based Knowledge Acquisition in Moltke, Technical Rep ort, University of Kaiserslautern, 1991 (forthcoming)
[7] Bareiss , R., Branting, K. Porter, B.: The role of explanation in exemplar-based classi- cation and learning, in Pro c. Case-Based Reasoning, AAAI, 1988.
[8] Bareiss, R.: Exemplar-Based Knowledge Acquisition, Academic Press Inc, 1989.
[9] Carb onell, J.G.: Learning by analogy: formulating and generalizing plans from exp e- rience, in: Michalski, R.S., Carb onell, J.G., Mitchell, T.M. (Eds.): Machine Learning, Tiogo Publishing Co., Palo Alto, 1983
[10] Carb onell, J.G.: Derivational Analogy in Problem Solving and knowledge acquisition, in: Michalski, R.S., Carb onell, J.G. , Mitchell, T.M. (Eds.): Machine Learning, Vol I I, M. Kaufmann, 1986
[11] Gick, M. L., Holyoak, K. J.: Analogical Problem Solving, Cognitive Psychology, Vol. 12, 1980, pp. 306-
[12] Hall, R.: Computational Approaches to Analogical Reasoning: A Comp erative Analysis, in: Arti cial Intelligence 39, 1989
[13] Hammond,K.:Case-Based Planning, Academic Press Inc, 1989.
[14] Kolo dner,J: Maintaining organization in a dynamic long-term memory, Cognitive science, Vol 7, 243-280,1983.
[15] Koton, P.: Reasoning ab out evidence in causal explanations, Pro c. AAAI-88, pp. 256-
[16] Nokel, K.: Temp oral Matching: Recognizing Dynamic Situations from Discrete Mea- surements, in: Pro c. IJCAI-
[17] Pawlak, Z: Rough Classi cation. Int. J. of Man-Machine Studies 20 (1984), p. 469-483.
[18] Pawlak,Z.: Decision Logic. Preprint Warszawa 1990.
[19] Porter, B.: PROTOS; an exp eriment in knowledge acquisition for heuristic classi cation tasks, in Pro c. First Intern. Meeting on Advances in Learning, Les Arcs, France, 1986
[20] Quinlan,J.R.: Induction of Decision Trees. Machine Learning 1 (1986), p.81- 106.
[21] Rehb old, R.: Mo del-Based knowledge acquisition from Structure Descriptions in a Tech- nical Diagnosis Domain, in: Pro c. Avignon-89.
[22] Rehb old, R.: Integration mo dellbasierten Wissens in technische Diagnostik{ Exp erten- systeme, dissertation, University of Kaiserslautern, 1991 (forthcoming)
[23] Richter, M. M., Pfeifer, T., Altho , K.-D., Faup el, B., Nokel, K., Rehb old, R.: nal rep ort pro ject X6, SFB 314 "AI - Knowledge-Based Systems", Kaiserslautern: 1990
[24] Rumelhart, D.E., Zipser, D.: Feature Discovery by Comp etitive Learning, Cognitive Science 9, 75-112.,1985.
[25] Stan ll, C., Waltz, D.: The memory-based reasoning paradigm, Pro c. DARPA Workshop on Case-Based Reasoning, Morgan Kaufmann, 1988
[26] van Someren, M. W., Zheng, L. L., Post, W.: Cases, Mo dels or Compiled Knowledge; a Comparative Analysis and Prop osed Integration, Pro c. EKAW-
[27] Wolstencroft, J: Restructuring, Reminding and Repair: What`s missing from Mo dels of analogy, in: AI Communications, Vol2, No. 2, 1989
[28] Wess,S.: Patdex/2: Ein System zum adaptiven, fallfokussierenden Lernen in tech- nischen Diagnosesituationen. SEKI-Working-Pap er, SWP-1-91, Kaiserslautern 1991 (in german).