















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Automated knowledge base management
Typology: Study Guides, Projects, Research
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















Submitted on 22 Jun 2018
HAL is a multi-disciplinary open access archive for the deposit and dissemination of sci- entific research documents, whether they are pub- lished or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL , est destinée au dépôt et à la diffusion de documents scientifiques de niveau recherche, publiés ou non, émanant des établissements d’enseignement et de recherche français ou étrangers, des laboratoires publics ou privés.
Jorge Martinez-Gil. Automated knowledge base management: A survey. Computer Science Review, Elsevier, 2015, 18, pp.1-9. 10.1016/j.cosrev.2015.09.001. hal-01820946
Jorge Martinez-Gil Software Competence Center Hagenberg (Austria) email: [email protected], phone number: 43 7236 3343 838
Keywords: Information Systems, Knowledge Management, Knowledge-based Technology
A fundamental challenge in the intersection of Artificial Intelligence and Databases consists of devel- oping methods to automatically manage Knowledge Bases which can serve as a knowledge source for computer systems trying to replicate the decision-making ability of human experts. Despite of most of tasks involved in the building, exploitation and maintenance of KBs are far from being trivial, signifi- cant progress has been made during the last years. However, there are still a number of challenges that remain open. In fact, there are some issues to be addressed in order to empirically prove the technology for systems of this kind to be mature and reliable.
Knowledge may be a critical and strategic asset and the key to competitiveness and success in highly dynamic environments, as it facilitates capacities essential for solving problems. For instance, expert systems, i.e. systems exploiting knowledge for automation of complex or tedious tasks, have been proven to be very successful when analyzing a set of one or more complex and interacting goals in order to determine a set of actions to achieve those goals, and provide a detailed temporal ordering of those actions, taking into account personnel, materiel, and other constraints [9].
However, the ever increasing demand of more intelligent systems makes knowledge has to be cap- tured, processed, reused, and communicated in order to complete even more difficult tasks. Nevertheless, achieving these new goals has proven to be a formidable challenge since knowledge itself is difficult to
Knowledge Creation Knowledge Exploitation Knowledge Maintenance Knowledge Acquisition Knowledge Reasoning Knowledge Meta-Modeling Knowledge Representation Knowledge Retrieval Knowledge Integration Knowledge Storage and Manipulation Knowledge Sharing Knowledge Validation Table 1: Summary of concepts in the Knowledge Management field Concerning the automatic creation of KBs (a.k.a. knowledge learning, knowledge extraction or knowledge generation), there are three major steps that should be fulfilled: automatic acquisition of the knowledge, appropriate representation of that knowledge, and storage and manipulation of the knowl- edge into the KB. These major steps are summarized below:
the inherent inferential capability given by KBs each KB is also a database in the sense that there is a schema, i.e. the concepts and roles, and a set of instances. Therefore, adopting database technology as key method to address this issue is an idea adopted by most of the solutions.
Concerning the automatic exploitation of KBs (a.k.a. knowledge exploitation or knowledge appli- cation) can be divided in two subgroups: knowledge utilization and knowledge transfer. At the same time, the utilization of knowledge can be used for knowledge reasoning or for knowledge retrieval (in the way the Question and Answering (Q & A) systems work [44]). Meanwhile, the purpose of knowl- edge sharing (a.k.a. knowledge exchange) is the process through which explicit or tacit knowledge is communicated to others.
commodate the new information, and how the new information should be modified in light of the existing knowledge [32]. These techniques can be used for going beyond the literal lexical match of words and operate at the conceptual level when comparing specific labels for concepts (e.g., Finance) also yields matches on related terms (e.g., Economics, Economic Affairs, Financial Af- fairs, etc.). As another example, in the healthcare field, an expert on the treatment of cancer could also be considered as an expert on oncology, lymphoma or tumor treatment, etc.
Concerning explanation delivery, the purpose is that expert systems may be able to give the user clear explanations of what it is doing and what it has deduced. The most sophisticated expert systems are able to detect contradictions [3] in user information or in the knowledge and can explain them clearly, revealing at the same time the expert’s knowledge and way of thinking, what makes the process much more interpretable.
From the state-of-the-art, we can deduce that a lot of successful work have been done in the field of automated knowledge-base management during the last years. However, despite of these great advance- ments, there are still some problems that remain open. These problems should be addressed to support a more effective and efficient knowledge-base management. Therefore, the gist of these problems is to
support the complete life cycle for large KBs so that computer systems can exploit them to reflect the way human experts take decisions in their domains of expertise. These tasks are often pervasive because large KBs must be developed incrementally, this means that segments of knowledge are added sepa- rately to a growing body of knowledge [6]. Satisfactory results in this field can have a great impact in the advancement of many important and heterogeneous disciplines and fields of application. However, there are a number of challenging questions that should be successfully addressed in advance. These problems which are summarized as follows:
One possible way to evaluate these criteria could consists of treating the KB as a set of assertions, and use set-oriented measures such precision and recall to determine the accuracy of the recently built KB. Treating each assertion as atomic avoids the need to perform alignment between the expert system output and ground truth. Comparing the expert system and ground truth KB should require encoding the assertions in compatible or mappable ontologies. Identifying the differences should take into account the logical dependencies between assertions for not over-penalizing an expert systems for missing asser- tions from which many others are derivable [13]. Evaluation of temporal qualification can be partially handled by treating the KB as a sequence of fixed sets of assertions over time. Augmentation can also be examined by performing ablation studies over the assertions in the KB.
The TAC KBP 2013 Cold Start Track^2 could serve as a base for this research. The idea behind this workshop is to test the ability of proposed methods to extract specific knowledge from text and other sources and place it into a KB. The schema for the target KB is specified a priori, but the KB is otherwise empty to start. Expert systems should be able to process some sources, extracting information (^2) http://www.nist.gov/tac/2013/KBP/ColdStart
about entities mentioned in the collection, adding the information to a new KB, and indicating how entry point entities mentioned in the collection correspond to nodes in the KB [33]. The sources consists of tens of thousands of news and web documents that contain entities that are not included in existing well-known KBs.
The second challenge should lay on the development of strategies for improving the efficiency of tasks exploiting the KB. Moreover, these strategies should not alter the capability of current methods to pro- duce desired results by comparing them with task requirements. These methods are those concerning to knowledge reasoning, knowledge retrieval, and knowledge sharing. It is necessary to focus in many different aspects and requirements brought by these exploitation methods. Some of them may concern on efficiency, e.g., time and space complexity of the algorithms developed, and the rest will concern the effectiveness in relation to efficiency, e.g. correctness, completeness, and so on. Therefore, the problem needs to be addressed from a point of view involving multi-decision criteria.
The ultimate goal is to measure and improve the extent to which time, effort or cost is well used for the intended KB exploitation methods. According the literature, efficiency issues are currently tackled through a number of computational strategies. This strategies could be organized as follows:
To the best of our knowledge the first two items above remain largely unaddressed so far. Maybe the reason is that researchers thought that more computing power does not necessarily improve effec- tiveness of exploitation methods. However, it is possible to think that, at least at the beginning, it would
from two different perspectives: using semantic similarity measures and semantic relatedness measures. Fortunately, recent works have clearly defined the scope of each of them [39]. Firstly, semantic sim- ilarity is used when determining the taxonomic proximity between objects. For example, automobile and car are similar because the relation between both terms can be defined by means of a taxonomic relation. Secondly, the more general concept of semantic relatedness considers taxonomic and relational proximity. For example, blood and hospital are not completely similar, but there is still possible to define a naive relation between them because both belong to the world of healthcare.
In most of cases, the problem to face is more complex since it does not involve the matching of two individual entities only, but two complete KBs. This can be achieved by computing a set of semantic correspondences between individual entities belonging to each of the two KBs. A set of semantic corre- spondences between entities is often called an alignment. It is possible to define formally an alignment A as a set of tuples in the form {(id, μ 1 , μ 2 , r, s)}, where id is an unique identifier for the correspondence, μ 1 and μ 2 are the entities to be compared, r is the kind of relation between them, and s the score in the range [0, 1] stating the degree of correspondence for the relation r.
Therefore, when matching two KBs, the challenge that scientists try to address consists of finding an appropriate semantic matching function leading to a high quality alignment between these two KBs. Quality here is measured by means of a function A × Aideal → R × R that associates an alignment A and an ideal alignment Aideal to two real numbers ∈ [0, 1] stating the precision and recall of A in relation to Aideal.
Precision represents the notion of accuracy, that it is to say, states the fraction of retrieved corre- spondences that are relevant for the matching task (0 stands for no relevant correspondences, and 1 for all correspondences are relevant). Meanwhile, Recall represents the notion of completeness, thus, the fraction of relevant correspondences that were retrieved (0 stands for not retrieved correspondences, and 1 for all relevant correspondences were retrieved).
The fourth challenge should find a way to provide explanations in a simple, clear and precise way to the users or software applications in order to facilitate informed decision making. In particular, most of techniques used by expert systems do not yield simple or symbolic explanations. It is necessary to take into account that different types of explanations may be needed. For example, if negotiating agents trust each others information sources, explanations should focus on the manipulations. If on the other hand, the sources may be suspect, explanations should focus on meta information about sources. If a user wants an explanation of the reasoning engine used by the expert system, a more complex explanation may be required.
There are some preliminary works which try to address the problem, i.e. laying the foundations about how an expert system should deliver explanations. According the literature, there are a set of require- ments that are intended to act as criteria for the evaluation of explanations given by expert systems. For instance, Moore [34] states that explanations given by an expert system should have the characteristics listed below:
The spectrum of potential application domains that could be benefited from these advances is really wide. Let us summarize some application fields which can be benefited from satisfactorily addressing the aforementioned research challenges:
Financial decision support. The financial services industry has been a traditional user of expert sys- tems [8]. Some systems have been created to assist bankers in determining whether to make loans to businesses and individuals, insurance firms have used expert systems to assess the risk pre- sented by a given customer or software applications has been built for foreign exchange trading. Therefore, advances in this field could be beneficial for improving the traditional systems, by ag- gregating new knowledge sources, improving the real time performance, explaining the rationale behind financial decisions, and so on.
Manufacturing industry. Configuration, whereby a solution to a problem is synthesized from a given set of elements related by a set of constraints, is one of the most important of expert system ap-
plications [19]. Configuration applications were pioneered by computer companies as a means of facilitating the manufacture of semi-custom minicomputers. Nowadays, expert systems have found its way into use in a wide range of different industries, from textile industry where fab- rics must be optimally cut, to failure detection in factories which consists of deducing faults and suggest corrective actions for malfunctioning devices or processes [26].
Question & Answering systems. Expert systems in this field are able to deliver knowledge that is rel- evant to the user’s problem, in the context of the user’s problem [35]. In case ne improvements may be proposed, very interesting Q & A systems would be built. For example, a computational assistant which may give some hints to a user on appropriate grammatical usage in a text, or a tax advisor that accompanies a tax preparation program and advises the user on individual tax policy. Therefore, advances in this field could help to the popularization of this kind of systems in many additional fields like education, eTourism, personal finance, and so on.
Scientific research. Scientists need to be able to easily gain access to all information about chemical compounds, biological systems, diseases, and the interactions between these kinds of entities, and this requires data to be effectively integrated in order to provide a greater level view to the user, for instance, a complete view of biological activity [21]. Therefore, advances on the automatic building, exploitation and maintenance of large KBs will certainly help scientists to more easily work with all knowledge of their interest. More specifically, the benefits include the aggregation of heterogeneous sources using explicit semantics, and the expression of rich and well-defined models for working with knowledge.
In this work, we have presented the current state-of-the-art, problems that are still open and future re- search challenges for automated knowledge-base management. Our aim is to overview the past, present and future of this discipline so that complex expert systems exploiting knowledge from knowledge bases can be automatically developed and practically used.
[1] P. Ardimento, M. T. Baldassarre, M. Cimitile, and G. Visaggio. Empirical validation of knowledge packages as facilitators for knowledge transfer. JIKM, 8(3):229–240, 2009. [2] M. Arevalillo-Herr´aez, D. Arnau, and L. Marco-Gim´enez. Domain-specific knowledge represen- tation and inference engine for an intelligent tutoring system. Knowl.-Based Syst., 49:97–105,
[3] N. Arman. Fault detection in dynamic rule bases using spanning trees and disjoint sets. Int. Arab J. Inf. Technol., 4(1):67–72, 2007. [4] R. Balch, S. Schrader, and T. Ruan. Collection, storage and application of human knowledge in expert system development. Expert Systems, 24(5):346–355, 2007. [5] R. Balzer. Automated enhancement of knowledge representations. In IJCAI, pages 203–207, 1985. [6] R. Bareiss, B. W. Porter, and K. S. Murray. Supporting start-to-finish development of knowledge bases. Machine Learning, 4:259–283, 1989. [7] K. Barker, J. Blythe, G. C. Borchardt, V. K. Chaudhri, P. Clark, P. R. Cohen, J. Fitzgerald, K. D. Forbus, Y. Gil, B. Katz, J. Kim, G. W. King, S. Mishra, C. T. Morrison, K. S. Murray, C. Otstott, B. W. Porter, R. Schrag, T. E. Uribe, J. M. Usher, and P. Z. Yeh. A knowledge acquisition tool for course of action analysis. In IAAI, pages 43–50, 2003. [8] O. Ben-Assuli. Assessing the perception of information components in financial decision support systems. Decision Support Systems, 54(1):795–802, 2012. [9] Y. Chen, L.-J. Zhang, and Q. Wang. Intelligent scheduling algorithm and application in moderniz- ing manufacturing services. In IEEE SCC, pages 568–575, 2011.
[10] P. Cimiano, A. Hotho, and S. Staab. Comparing conceptual, divise and agglomerative clustering for learning taxonomies from text. In ECAI, pages 435–439, 2004.
[11] J. de Bruijn, D. Pearce, A. Polleres, and A. Valverde. A semantical framework for hybrid knowl- edge bases. Knowl. Inf. Syst., 25(1):81–104, 2010.
[12] R. Q. Dividino, S. Schenk, S. Sizov, and S. Staab. Provenance, trust, explanations - and all that other meta knowledge. KI, 23(2):24–30, 2009.
[13] M. Dredze, P. McNamee, D. Rao, A. Gerber, and T. Finin. Entity disambiguation for knowledge base population. In COLING, pages 277–285, 2010.
[14] A. Felfernig and F. Wotawa. Intelligent engineering techniques for knowledge bases. AI Commun., 26(1):1–2, 2013.
[15] I. Filali, F. Bongiovanni, F. Huet, and F. Baude. A survey of structured p2p systems for rdf data storage and retrieval. T. Large-Scale Data- and Knowledge-Centered Systems, 3:20–55, 2011.
[16] M. L. Ginsberg. Knowledge interchange format: the kif of death. AI Magazine, 12(3):57–63, 1991.
[17] F. Gomez and C. Segami. Semantic interpretation and knowledge extraction. Knowl.-Based Syst., 20(1):51–60, 2007.
[18] H.-F. Hung, H.-P. Kao, and Y.-Y. Chu. An empirical study on knowledge integration, technology innovation and experimental practice. Expert Syst. Appl., 35(1-2):177–186, 2008.
[19] A. Jones, R. H. Weston, B. Grabot, and B. Hon. Decision making in support of manufacturing enterprise transformation. ADS, 2013, 2013.
[20] E. S. Jr. and H. T. Dinh. On automatic knowledge validation for bayesian knowledge bases. Data Knowl. Eng., 64(1):218–241, 2008.
[21] P. D. Karp. Development of large scientific knowledge bases. In ICAART (1), page 23, 2010.
[22] J. L. Kenney and S. P. Gudergan. Knowledge integration in organizations: an empirical assessment. J. Knowledge Management, 10(4):43–58, 2006.