Ontology-Driven KDD Process Composition

Claudia Diamantini; Domenico Potena

doi:10.1007/978-3-642-03915-7_25

Outline

Ontology-driven KDD Process Composition

Emanuele Storti

https://doi.org/10.1007/978-3-642-03915-7_25

visibility

…

description

12 pages

Abstract

One of the most interesting challenges in Knowledge Discovery in Databases (KDD) field is giving support to users in the composition of tools for forming a valid and useful KDD process. Such an activity implies that users have both to choose tools suitable to their knowledge discovery problem, and to compose them for designing the KDD process. To this end, they need expertise and knowledge about functionalities and properties of all KDD algorithms implemented in available tools. In order to support users in this heavy activity, in this paper we introduce a goal-driven procedure for automatically compose algorithms. The proposed procedure is based on the exploitation of KDDONTO, an ontology formalizing the domain of KDD algorithms, allowing us to generate valid and non-trivial processes. http://boole.diiga.univpm.it/paper/ida09.pdf

Ontology-driven KDD Process Composition Claudia Diamantini, Domenico Potena and Emanuele Storti Dipartimento di Ingegneria Informatica, Gestionale e dell'Automazione M. Panti, Università Politecnica delle Marche - via Brecce Bianche, 60131 Ancona, Italy {diamantini,potena,storti}@diiga.univpm.it Abstract. One of the most interesting challenges in Knowledge Discov- ery in Databases (KDD) eld is giving support to users in the composi- tion of tools for forming a valid and useful KDD process. Such an activity implies that users have both to choose tools suitable to their knowledge discovery problem, and to compose them for designing the KDD process. To this end, they need expertise and knowledge about functionalities and properties of all KDD algorithms implemented in available tools. In or- der to support users in this heavy activity, in this paper we introduce a goal-driven procedure for automatically compose algorithms. The pro- posed procedure is based on the exploitation of KDDONTO, an ontology formalizing the domain of KDD algorithms, allowing us to generate valid and non-trivial processes. 1 Introduction Knowledge discovery in databases (KDD) has been dened as the non-trivial extraction of implicit, previously unknown, and potentially useful information from databases [1]. A KDD process is a highly complex, iterative and interac- tive process, with a goal-driven and domain-dependent nature. Given the huge amount of tools for data manipulation, their various characteristics and dierent performances, users should have various skills and expertise in order to manage all of them. As a matter of fact, in designing a KDD process they have to choose, to set-up, to compose and to execute the tools most suitable to their problems. For these reasons, one of the most interesting challenges in KDD eld involves the possibility to give support to users both in tool discovery and in process composition. We refer to the former as the activity of searching tools on the basis of the KDD goal to achieve, the characteristics of the dataset at hand, and functional and non-functional properties of the implemented algorithm. The process composition is the activity of linking suitable tools in order to build valid and useful knowledge discovery processes. We are working on these issues in the ambit of Knowledge Discovery in Databases Virtual Mart (KDDVM) project [2], that is aimed at realizing an open and extensible environment where users can look for implementations, sugges- tions, evaluations, examples of use of tools implemented as services. In KDDVM each KDD service is represented by three logical layers, with growing abstraction degrees. Algorithm level is the most abstract one, whereas dierent implemen- tations of the same algorithm are described at the tool level. Finally, dierent instances of the same tool can be made available on the net as services through several providers. At present, among the available services, we have at disposal a broker service for supporting the discovery of suitable KDD services. Such a service is based on KDDONTO, a domain ontology describing KDD algorithms and their interfaces [3]. In this work we introduce a goal-driven procedure aimed at automatically composing KDD processes; in order to guide the whole composition procedure, algorithm matching functions are dened on the basis of the ontological infor- mation contained in KDDONTO. The outcome of the procedure is a subset of all possible workows of algorithms, which allows to achieve the goal requested by the user and satises a set of constraints. The generated processes are then ranked according to both user-dened and built-in criteria, allowing users to choose the most suitable processes w.r.t. their requests, and to let them try more than a single solution. Our composition procedure, working at the algo- rithm level, allows to produce abstract KDD processes, which are general and reusable, since each instance of algorithm can be replaced with one among the services implementing it. Furthermore, such a generated process can be itself considered as useful, valid and unknown knowledge. In the rest of this section, relevant literature references are discussed. Then, Section 2 presents the KD- DONTO and its main concepts and relations. Section 3 introduces algorithm matching functions, which are used as basic step for the process composition procedure, that is then described in detail in Section 4. Finally, Section 5 ends the paper. 1.1 Related Works In last years researchers in Data Mining and KDD elds have shown more and more interest in techniques for giving support in the design of knowledge discov- ery processes. To this end several ontologies have been dened, even if they focus only on tools and algorithms for Data Mining, which is one of the phases of the wider and more complex KDD eld [4, 1]. The rst ontology of this kind is DA- MON (DAta Mining ONtology) [5], that is built for simplifying the development of distributed KDD applications on the Grid, oering to domain experts a tax- onomy for discovering tasks, methods and software suitable for a given goal. In [6], the ontology is exploited for selecting algorithms on the basis of the specic application domain they are used for. Finally, OntoDM [7] is a general purpose top-level ontology aimed to describe the whole Data Mining domain. Some research works were also proposed for supporting process composition [812]. An early work of this kind is [9], where authors suggests a framework for guiding users in breaking up a complex KDD task into a sequence of manageable subtasks, which are then mapped to appropriate Data Mining techniques. Such an approach was exploited in [11], where the user is supported in iteratively rening a KDD skeleton process, until executable techniques are available to solve low-level tasks. To this end, algorithms and data are modeled into an object oriented schema. In [10] a system is described focusing on setting-up and reusing chains of preprocessing algorithms, which are represented in a relational meta-model. Although these works help users in choosing the most suitable tools for each KDD phase, no automatic composition procedure is dened. Recent works have dealt with this issue [8, 12] for dening eective support on process composition. In detail, in [8] authors dene a simple ontology (actually not much more than a taxonomy) of KDD algorithms, that is exploited for designing a KDD process facing with cost-sensitive classication problems. A forward composition, from dataset characteristics towards the goal, is achieved through a systematic enu- meration of valid processes, that are ranked on the basis on accuracy achieved on the processed dataset, and on process speed. [12] introduces a KDD ontology representing concrete implementations of algorithms and any piece of knowledge involved in a KDD process (dataset and model), that is exploited for guiding a forward state-space search planning algorithm in the design of a KDD work- ow. Such an ontology describes algorithms in very few classes and a poor set of relationships, resulting in a at knowledge base. Both in [12] and in [8], ontologies are not rich enough to be extensively used both for deducing hidden relations among algorithms and for supporting relaxed matches among algorithms or complex pruning strategies during planning proce- dure. In order to overcome the limits of the cited works, in our proposal we dene and exploit a formal KDD ontology expressly conceived for supporting composi- tion. Such an ontology is exploited by a backward composition procedure, which composes algorithms not only by exact matches, but also by evaluating similarity between their interfaces, in order to extract unknown and non-trivial processes. Our approach, moreover, is aimed to achieve a higher level of generality, by producing abstract and reusable KDD process. In this work, we use the term composition instead of planning in order to emphasize the dierence from tra- ditional AI planning, in which execution stops when a single proper solution is found, and also because we ultimately refer to service composition. 2 The KDD ONTOlogy KDDONTO is an ontology describing the domain of KDD algorithms, conceived for supporting the discovery of KDD algorithms and their composition. In order to build a KDD ontology, among many methodologies proposed in literature for ontology building, we choose a formal approach based on the goal- oriented step-wise strategy described in [13]; moreover, the quality requirements and formal criteria dened in [14] are taken into account, with the aim to make meaning explicit and not ambiguous. The key concept of KDDONTO is algorithm, because it is the basic compo- nent of each process. Other fundamental domain concepts, from which any other concept can be derived, are the following: method : a methodology, a technique used by an algorithm to extract know- ledge from input data; phase : a phase of a KDD process; task : the goal at which aims who executes a KDD process; model : a set of constructs and rules for representing knowledge; dataset : a set of data in a proper format; parameter : any information required in input or produced in output by an algorithm; precondition/postcondition : specic features that an input (or output) must have in order to be used by a method or an algorithm. Such conditions con- cern format (normalized dataset), type (numeric or literal values), or quality (missing values, balanced dataset) properties of an input/output datum; performance : an index and a value about the way an algorithm works; optimization function : the function that an algorithm or a method optimizes with the aim to obtain the best predictive/descriptive model. Starting from these concepts, top level classes are identied, namely Algo- rithm, Method, Phase, Task, Data (which contains Model, Dataset and Pa- rameter as subclasses), DataFeature (corresponding to precondition /postcon- dition ), PerformanceIndex and PerformanceClass (for describing performance indexes and performance values), ScoreFunction (corresponding to optimization function ). Main relations among the classes are: specifies_phase, between Task and Phase; specifies_task, between Method and Task; uses, between Algorithm and Method; has_input/has_output, a n-ary relation with domain Algorithm, Method or Task and codomain Data, and optionally DataFeature. For each instance of DataFeature involved in has_input, a value express- ing the precondition strenght is also provided. Hence, a value equal to 1.0 corresponds to a mandatory precondition, whereas lower values to optional ones; also inverse properties input_for/output_for are introduced; has_performance, a n-ary relation with domain Algorithm, Method, or Task and codomain PerformanceIndex and PerformanceClass. Subclasses are dened by means of existential restrictions on main classes, that can be considered as fundamental bricks for building the ontology. At rst some Phase instances are introduced, namely PREPROCESSING, MODELING, POSTPROCESSING. They represent the main phases in a KDD process and are used to start the subclassing as follows: Task specializes in subclasses, according to the argument of specifies_pha- se, e.g.: ModelingTask v Task u ∃specifies_phase{MODELING} Method is detailed in subclasses according to the tasks that each method species by means of specifies_task relation, e.g.: ClassificationMethod v Method u ∃specifies_task{CLASSIFICATION} Algorithm specializes in subclasses according to uses and has_output re- lations. For example: ClassificationAlgorithm v Algorithm u ∃uses.ClassificationMethod u ∃has_output.ClassificationModel Model is further detailed in subclasses, on the basis of the task which the models are used for, e.g.: ClassificationModelv Model u ∃output_for{CLASSIFICATION} A top-level view of described classes and relations is shown in Figure 1. Fig. 1. KDDONTO: main classes and relations Many other relations are introduced in order to represent information useful to support KDD process composition. Among the most interesting: not_with links two instances of Method that cannot be used in the same process; not_before links two instances of Method such that the rst cannot be used in a process before the second; in_module/out_module allow to connect an instance of Algorithm to others, which can be executed respectively before or after it. These relations provide suggestions about process composition, representing in an explicit fashion KDD experts' experience about process building; part_of (and its inverse1 has_part), between an instance of Model and an its component (a generic Data instance), allows to describe a model in terms of the subcomponents it is made of. These relations are useful for identify- ing algorithms working on similar models, that is models having common substructures, as discussed in next section. 1 We use inverse rather than reciprocal because both part_of and has_part are instance-level relations. At present, KDDONTO is represented in OWL-DL, whose logical model is based on Description Logics and is decidable; it is a sublanguage of OWL [15], the de-facto standard language for building ontologies. An implementation of KDDONTO has been obtained after some renements, whose details are not reported here, and is available at the KDDVM project site2 . 3 Algorithm Matching For the purposes of this work, we dene a KDD process as a workow of algo- rithms that allows to achieve the goal requested by the user. The basic issue in composition is to dene the algorithm matching, that is to specify under which conditions two or more algorithms3 can be executed in sequence. Each algo- rithm takes data with certain features in input, performs some operations and returns data in output, which are then used as input for the next algorithm in the process. Therefore, two algorithms can be matched if the output of the rst is compatible with the input of the second. An exact match between a set of algorithms {A1 ,...,An } and an algorithm B is dened as: matchE ({A1 , ..., An }, B) ↔ ∀ iniB ∃Ak ∃outjAk : outjAk ≡o iniB where iniB is the ith input of the algorithm B , outjAk is the j th output of the algorithm Ak . ≡o represents the conceptual equivalence and is dened as follows: let a and b be two parameters, a ≡o b if Ca v Cb , i.e. if a and b refer to the concepts Ca and Cb such that Ca is subsumed by Cb (they are the same concept or the former is a subconcept of the latter). In such cases the whole set of algorithms {A1 ,...,An } provide the required data for B , realizing the piece of workow shown in Figure 2a. Furthermore, an exact match is complete if all the required inputs for an algorithm are provided by a single algorithm, as represented in Figure 2b. More formally, an exact complete match between two algorithms A and B is dened as: matchEc (A, B) ↔ ∀ iniB ∃outjA : outjA ≡o iniB By exploiting properties of algorithms, described in the previous section, it is possible to dene a match based not only on exact criteria, but also on similarity among data. We can consider compatible two algorithms even if their interfaces are not perfectly equivalent: the relaxation of constraints results in a wider set of possible matches. Hence, an approximate match between a set of algorithms {A1 ,...,An } and an algorithm B is dened as: matchA ({A1 , ..., An }, B) ↔ ∀ iniB ∃Ak ∃outjAk : outjAk ≡o iniB ∨ similar(outjAk , iniB ) 2 http://boole.diiga.univpm.it/kddontology.owl. 3 Hereafter we use class and concept as synonyms, and refer to algorithm as the Algorithm class. (a) (b) Fig. 2. (a) Exact and (b) exact complete matches (dashed lines represent ≡O relation) where the similarity predicate similar(x, y) is satised if x and y are similar concepts, i.e. if there is a path in the ontology graph that links them together. An approximate match is useful not only when an exact match cannot be performed, but also for extracting unknown and non-trivial processes. The similarity between concepts can be evaluated on the basis of various KD- DONTO relations. The simplest similarity relation is at hierarchic level: a spe- cic datum is similar to its siblings, because they share, through an is-a relation, the membership to the same class. Moreover, similarity is also at compositional level: a datum can be made of simpler data, according to part_of/has_part relationships, described in Section 2. As shown in Figure 3, a compound datum (e.g. d) can be used in place of one of its components (e.g. in1B ), because the former is a superset of the latter, containing all the needed information, and other that can be discarded. To give a practical example, a Labeled Vector Quantization model (LVQ) has_part a VQ model and a Labeling function: if an algorithm requires VQ model in input, LVQ model can be provided in place of it. Given two similar concepts, we dene ontological distance as the number of is-a or part_of relations that are needed to link them in the ontological graph; as only exception, ontological distance from a concept to its subconcepts is considered null. In approximate match, the higher is ontological distance between two concepts, the less they are similar. This allows to assign a score to each match and to dene a rank among the generated processes, as described in Subsection 4.3. Fig. 3. approximate match (dashed line represents ≡O relation) In process composition, whatever match is used, it is needed to check the satisfaction of preconditions and postconditions: this means that postconditions of the rst algorithm must not be in contrast with preconditions of the second one, as regards the same data. 4 Process Composition Procedure Based on algorithm matching, in this section a goal-driven procedure for com- posing KDD processes is described. Our approach is aimed at the generation of all potentially useful, valid and unknown processes satisfying the user requests; this allows the user to choose among processes with dierent characteristics and to experiment more than a single solution. We use Jena4 as a framework for querying the ontology through SPARQL language [16], which is a W3C Recom- mendation, whereas Pellet5 is used as reasoner for inferring non-explicit facts. The proposed process composition procedure is formed of the following phases: (I) dataset and goal denition, (II) process building, (III) process ranking. 4.1 Dataset and goal denition Any KDD process is built for achieving a specic KDD goal processing a given dataset. Hence, the rst step of our procedure is the description of both the dataset and the goal. In our framework, the former is described by a set of characteristics (e.g. representation model, size, feature type), which are instances of the DataFeatu- re class. The latter is expressed as an instance of the Task class, leaving the user to move from complex domain-dependent business goals to one or more well-dened and domain-independent KDD tasks. The description of both dataset and goal allows us to guide the composition procedure, bounding the number and type of algorithms that can be used at the beginning and at the end of each process. Moreover, some process constraints are provided in this phase for contributing to dene a balance between procedure execution speed and composition accuracy. Some of these constraints can be dened by the user; among others: kind of match (only exact or also approximate), maximum ontological distance for each match in a process, maximum number of algorithms in a process, and maximum computational complexity of a process. Other constraints are predened and built-in into the procedure for ensuring to produce valid KDD processes. Some examples are the following: two algorithms whose methods are linked through not_with property cannot coexist in the same process; two algorithms whose methods are linked through not_before property can coexist in the same process only if the rst follows the second; 4 http://jena.sourceforge.net/ 5 http://clarkparsia.com/pellet more than one FeatureExtraction algorithm cannot coexist in the same pro- cess. 4.2 Process Building Process building is an iterative phase, which starts from the given task and goes backwards adding one or more algorithms to each process and for each iteration. Such algorithms are chosen on the basis of the algorithm matching functionalities dened in the previous section. The procedure goes on until the rst algorithm of each process is compatible with the given dataset, and stops if one of the following conditions come true: no given process can be further expanded because no compatible algorithms exist, or one of the process constraints is violated. The main steps in process building phase are described in Table 1. A process Pi =<Vi ,Ei > is represented as a directed acyclic graph, where Vi is the set of nodes, namely algorithms, and Ei is the set of directed edges linking algorithms together. At rst, algorithms Ai , which return as output a model x used for performing the given task T , are found; then, for each of them a process Pi is created. Such a Pi is added to the set P which contains all the processes that are going to be evaluated in the next step. Until there is a process Pi in P , algorithms compatible with the one(s) at the head of Pi are extracted. If the process constraints are satised, these extracted algorithms are used for forming a new valid process, which is added to the set P . At last, the process Pi is deleted from the set P because its expansion has ended, and the procedure is iterated. At the beginning of each iteration, Pi is checked against the characteristics of the dataset at hand: if they are compatible, Pi is moved to the set F of nal processes. Note that the process constraints are used as pruning criteria, that ensure to produce useful and valid processes, keeping the complexity of the whole procedure under control. During the procedure, it may happen that a single algorithm or a set of algorithms can be executed more than one time inside a process. To avoid any possible endless loop, we x the maximum number of algorithms in a process. 4.3 Process Ranking In order to support the user in choosing among the generated processes, we dene some criteria for ranking them: similarity measurement : an exact match is more accurate than an approx- imate one, thus a process can be ranked on the basis of the sum of the ontological distances of each match. The higher the value of the sum, the less the rank of the process; precondition relaxation : in algorithm matching, preconditions on some data can be relaxed if they have a condition_strenght value lower than 1, i.e. a non-mandatory precondition. Relaxing preconditions reduces the process rank, because algorithm execution can lead to lower quality outcomes; Let P be the set of processes at each iteration, Pi be the ith process in P , described by the pair <Vi , Ei > where Vi is the set of algorithms in Pi and Ei is the set of directed edges (Ap ,Aq ) which connect algorithm Ap to algorithm Aq . Let F be the nal list of valid generated processes, T be the task, D be the set of dataset characteristics, matchD (D,Pi ) be a predicate, which is true if the precondi- tion of the algorithms at the head of Pi are compatible with D. P ← ∅, F ← ∅; Find the set Γ ={Ai : has_output(Ai ,x) u output_for(x,T )}; foreach Ai ∈ Γ do initialize Pi =<Ai ,∅>; if (process_constraints(Ai ,Pi )) then P ← Pi ; foreach Pi ∈ P do if (matchD (D,Pi )) then F ← Pi ; Dene the set ∆={Ak ∈ Vi : @ (x, Ak ) ∈ Ei }; foreach Ak ∈ ∆ do Find the set Φ={Φ1 ,...,Φm }, where Φj is the set of algorithms {B1 ,...,Bmj } such that matchE (Φj ,Ak )t matchA (Φj ,Ak ); foreach Φj ∈ Φ do if (process_constraints(Φj ,Pi )) then dene P 0 =<Vi ← Φj , Ei ←{(B1 ,Ak ),...,(Bmj ,Ak )}>; P ← P 0; P =P -{Pi }. Table 1. The composition algorithm. use of link modules : the score of a process in which there are algorithms linked through the properties in_module and out_module is increased, because these relations state that a specic connection among algorithms was proved to be eective; performance evaluation : algorithm performances are used to assign a global score to a process. For example, in the case of a computational complexity index, it is possible to determine the whole process complexity as the highest complexity among the algorithms in a process. 4.4 Applicative example At present the KDDONTO implementation is formed of 88 classes, 31 rela- tions and more than 150 instances; we describe 15 algorithms of preprocessing, modeling and postprocessing phases, in particular for Feature Extraction, Clas- sication, Clustering, Evaluation and Interpretation tasks. On this basis, the eectiveness of the composition procedure has been eval- uated through a prototype implementation. The following scenario has been assumed: an user wants to perform a classication task on a normalized dataset with 2 balanced classes, missing values and both literal and numeric values. The constraints she puts are the following: both exact and approximate matches allowed, maximum number of algorithms for a process equal to 5. The evaluation has been performed comparing our proposal with other two solutions. As rst solution we have dened a procedure using a database for representing information about algorithms, in which no inference is possible. In the other solution, we have exploited a combinatorial approach for composing algorithms, where a Postprocessing algorithm cannot precede Preprocessing or Modeling ones, and a Modeling algorithm cannot precede a Preprocessing algo- rithm. Resulting processes have been then evaluated by a KDD expert, in order to identify the valid ones, i.e. processes in which the algorithm sequence is both semantically correct w.r.t. all input/output matches and consistent w.r.t. the user goal and requests. Using a non-ontological approach, we are able to extract 37 processes, that the expert assesses to be all valid. The number of processes considerably in- creases when the combinatorial approach is exploited, but most of them are invalid and often meaningless, and need to be manually ltered. Finally, our procedure generates a set of 70 processes, which consists of the valid processes extracted through the non-ontological approach and other 33 valid and not ex- plicit processes, composed by using inference and approximate match. Hence, our procedure is able to produce a high number of alternatives, without introducing spurious and semantically incorrect processes. 5 Conclusion The main contribution of this work is the introduction of a goal-oriented proce- dure aimed at the automatic composition of algorithms forming valid KDD pro- cesses. The proposed procedure is based on the exploitation of KDDONTO, that formalize knowledge about KDD algorithms. The use of such an ontology leads to manifold advantages. Firstly, the resulting processes are valid and semanti- cally correct. Secondly, unlike works in Literature, we are able to generate not only explicit processes formed by directly linkable algorithms, but also implicit, interesting and non-trivial processes where algorithms share similar interfaces. Thirdly, KDDONTO is able to support complex pruning strategies during com- position procedure, making also use of inferential mechanism. Finally, processes can be ranked according to both ontological and non-ontological criteria. Comparing with planning algorithms [12], such an approach allows users to choose more processes suitable w.r.t. their requirements. Moreover, generated processes can be themselves considered as useful, valid and unknown knowledge, valuable both for novice and expert users. At present we are working on the development of a support service imple- menting the described process composition procedure, in order to actually in- tegrate it into the KDDVM project. Since abstract KDD processes cannot be directly executed, each of them needs to be substituted with a workow of ser- vices, in which every algorithm is replaced with a service implementing it. As future extensions, we are also working on increasing the number of instances de- scribed in KDDONTO and performing more comprehensive tests. Furthermore, we are studying several heuristics to provide an actual ranking of the generated processes. References 1. Fayyad, U.M., Piatetsky-Shapiro, G. and Smyth, P. In: From data mining to knowledge discovery: an overview. American Association for Articial Intelligence, Menlo Park, CA, USA (1996) 134 2. KDDVM project site. http://boole.diiga.univpm.it 3. Diamantini, C. and Potena, D.: Semantic Annotation and Services For KDD Tools Sharing and Reuse. In: Proc. of the 8th IEEE International Conference on Data Mining Workshops. 1st Int. Workshop on Semantic Aspects in Data Mining, Pisa, Italy (Dec 19 2008) 761770 4. CRISP-DM site. http://www.crisp-dm.org 5. Cannataro, M. and Comito, C.: A data mining ontology for grid programming. In: Proc. 1st Int. Workshop on Semantics in Peer-to-Peer and Grid Computing, in conjunction with WWW2003, Budapest, Hungary (2003) 113134 6. Yu-hua, L., Zheng-ding, L., Xiao-lin, S., Kun-mei, W. and Rui-xuan, L.: Data mining ontology development for high user usability. Wuhan University Journal of Natural Sciences 11(1) (2006) 5156 7. Panov, P., Dºeroski, S. and Soldatova, L.: OntoDM: An Ontology of Data Mining. In: Data Mining Workshops, International Conference on, Los Alamitos, CA, USA, IEEE Computer Society (2008) 752760 8. Bernstein, A., Provost, F. and Hill, S.: Towards Intelligent Assistance for a Data Mining Process: An Ontology Based Approach for Cost-Sensitive Classication. IEEE Transactions on Knowledge and Data Engineering 17(4) (2005) 503518 9. Engels, E.: Planning tasks for knowledge discovery in databases; performing task- oriented user-guidance. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Databases (KDD'96), Portland, Oregon (August 1996) 10. Morik, K. and Scholz, M.: The MiningMart Approach to Knowledge Discovery in Databases. In Zhong, N. and Liu, J., ed.: Intelligent Technologies for Information Analysis. Springer (2004) 4765 11. Wirth, R., Shearer, C., Grimmer, U., Reinartz, T., Schlösser, J.J., Breitner, C., Engels, R. and Lindner, G.: Towards Process-Oriented Tool Support for Knowl- edge Discovery in Databases. In: PKDD '97: Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery, London, UK, Springer-Verlag (1997) 243253 12. áková, M., K°emen, P., elezný F. and Lavra£, N.: Using Ontological Reasoning and Planning for Data Mining Workow Composition. In: SoKD: ECML/PKDD 2008 workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery, Antwerp, Belgium (2008) 13. Noy, N. and McGuinnes, D.L.: Ontology Development 101: A Guide to Creating Your First Ontology. Stanford University (2002) 14. Gruber, T.: Toward principles for the design of ontologies used for knowledge sharing. Int. J. Hum.-Comput. Stud. 43(5-6) (1995) 907928 15. Smith, M.K., Welty, C. and McGuinness, D.L.: OWL Web Ontology Language Guide, W3C Recommendation. http://www.w3.org/TR/owl-guide/ (2004) 16. Prud'hommeaux, E. and Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation. http://www.w3.org/TR/rdf-sparql-query/ (2008)

References (16)

Fayyad, U.M., Piatetsky-Shapiro, G. and Smyth, P. In: From data mining to knowledge discovery: an overview. American Association for Articial Intelligence, Menlo Park, CA, USA (1996) 134
KDDVM project site. http://boole.diiga.univpm.it
Diamantini, C. and Potena, D.: Semantic Annotation and Services For KDD Tools Sharing and Reuse. In: Proc. of the 8th IEEE International Conference on Data Mining Workshops. 1st Int. Workshop on Semantic Aspects in Data Mining, Pisa, Italy (Dec 19 2008) 761770
CRISP-DM site. http://www.crisp-dm.org
Cannataro, M. and Comito, C.: A data mining ontology for grid programming. In: Proc. 1st Int. Workshop on Semantics in Peer-to-Peer and Grid Computing, in conjunction with WWW2003, Budapest, Hungary (2003) 113134
Yu-hua, L., Zheng-ding, L., Xiao-lin, S., Kun-mei, W. and Rui-xuan, L.: Data mining ontology development for high user usability. Wuhan University Journal of Natural Sciences 11(1) (2006) 5156
Panov, P., Dºeroski, S. and Soldatova, L.: OntoDM: An Ontology of Data Mining. In: Data Mining Workshops, International Conference on, Los Alamitos, CA, USA, IEEE Computer Society (2008) 752760
Bernstein, A., Provost, F. and Hill, S.: Towards Intelligent Assistance for a Data Mining Process: An Ontology Based Approach for Cost-Sensitive Classication. IEEE Transactions on Knowledge and Data Engineering 17(4) (2005) 503518
Engels, E.: Planning tasks for knowledge discovery in databases; performing task- oriented user-guidance. In: Proceedings of the 2nd International Conference on Knowledge Discovery in Databases (KDD'96), Portland, Oregon (August 1996)
Morik, K. and Scholz, M.: The MiningMart Approach to Knowledge Discovery in Databases. In Zhong, N. and Liu, J., ed.: Intelligent Technologies for Information Analysis. Springer (2004) 4765
Wirth, R., Shearer, C., Grimmer, U., Reinartz, T., Schlösser, J.J., Breitner, C., Engels, R. and Lindner, G.: Towards Process-Oriented Tool Support for Knowl- edge Discovery in Databases. In: PKDD '97: Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery, London, UK, Springer-Verlag (1997) 243253
áková, M., K°emen, P., elezný F. and Lavra£, N.: Using Ontological Reasoning and Planning for Data Mining Workow Composition. In: SoKD: ECML/PKDD 2008 workshop on Third Generation Data Mining: Towards Service-oriented Knowledge Discovery, Antwerp, Belgium (2008)
Noy, N. and McGuinnes, D.L.: Ontology Development 101: A Guide to Creating Your First Ontology. Stanford University (2002)
Gruber, T.: Toward principles for the design of ontologies used for knowledge sharing. Int. J. Hum.-Comput. Stud. 43(5-6) (1995) 907928
Smith, M.K., Welty, C. and McGuinness, D.L.: OWL Web Ontology Language Guide, W3C Recommendation. http://www.w3.org/TR/owl-guide/ (2004)
Prud'hommeaux, E. and Seaborne, A.: SPARQL Query Language for RDF, W3C Recommendation. http://www.w3.org/TR/rdf-sparql-query/ (2008)

About the author

Emanuele Storti

Università Politecnica delle Marche, Italy, Post-Doc

Papers

Followers

137

View all papers from Emanuele Stortiarrow_forward

Ontology-driven KDD Process Composition

Sign up for access to the world's latest research

Abstract

Related papers

References (16)

Related papers

Related topics

Cited by