Ontology-driven KDD Process Composition
Claudia Diamantini, Domenico Potena and Emanuele Storti
Dipartimento di Ingegneria Informatica, Gestionale e dell'Automazione M. Panti,
Università Politecnica delle Marche - via Brecce Bianche, 60131 Ancona, Italy
{diamantini,potena,storti}@diiga.univpm.it
Abstract. One of the most interesting challenges in Knowledge Discov-
ery in Databases (KDD) eld is giving support to users in the composi-
tion of tools for forming a valid and useful KDD process. Such an activity
implies that users have both to choose tools suitable to their knowledge
discovery problem, and to compose them for designing the KDD process.
To this end, they need expertise and knowledge about functionalities and
properties of all KDD algorithms implemented in available tools. In or-
der to support users in this heavy activity, in this paper we introduce
a goal-driven procedure for automatically compose algorithms. The pro-
posed procedure is based on the exploitation of KDDONTO, an ontology
formalizing the domain of KDD algorithms, allowing us to generate valid
and non-trivial processes.
1 Introduction
Knowledge discovery in databases (KDD) has been dened as the non-trivial
extraction of implicit, previously unknown, and potentially useful information
from databases [1]. A KDD process is a highly complex, iterative and interac-
tive process, with a goal-driven and domain-dependent nature. Given the huge
amount of tools for data manipulation, their various characteristics and dierent
performances, users should have various skills and expertise in order to manage
all of them. As a matter of fact, in designing a KDD process they have to choose,
to set-up, to compose and to execute the tools most suitable to their problems.
For these reasons, one of the most interesting challenges in KDD eld involves
the possibility to give support to users both in tool discovery and in process
composition. We refer to the former as the activity of searching tools on the
basis of the KDD goal to achieve, the characteristics of the dataset at hand,
and functional and non-functional properties of the implemented algorithm. The
process composition is the activity of linking suitable tools in order to build valid
and useful knowledge discovery processes.
We are working on these issues in the ambit of Knowledge Discovery in
Databases Virtual Mart (KDDVM) project [2], that is aimed at realizing an open
and extensible environment where users can look for implementations, sugges-
tions, evaluations, examples of use of tools implemented as services. In KDDVM
each KDD service is represented by three logical layers, with growing abstraction
degrees. Algorithm level is the most abstract one, whereas dierent implemen-
tations of the same algorithm are described at the tool level. Finally, dierent
instances of the same tool can be made available on the net as services through
several providers. At present, among the available services, we have at disposal
a broker service for supporting the discovery of suitable KDD services. Such a
service is based on KDDONTO, a domain ontology describing KDD algorithms
and their interfaces [3].
In this work we introduce a goal-driven procedure aimed at automatically
composing KDD processes; in order to guide the whole composition procedure,
algorithm matching functions are dened on the basis of the ontological infor-
mation contained in KDDONTO. The outcome of the procedure is a subset of
all possible workows of algorithms, which allows to achieve the goal requested
by the user and satises a set of constraints. The generated processes are then
ranked according to both user-dened and built-in criteria, allowing users to
choose the most suitable processes w.r.t. their requests, and to let them try
more than a single solution. Our composition procedure, working at the algo-
rithm level, allows to produce abstract KDD processes, which are general and
reusable, since each instance of algorithm can be replaced with one among the
services implementing it. Furthermore, such a generated process can be itself
considered as useful, valid and unknown knowledge. In the rest of this section,
relevant literature references are discussed. Then, Section 2 presents the KD-
DONTO and its main concepts and relations. Section 3 introduces algorithm
matching functions, which are used as basic step for the process composition
procedure, that is then described in detail in Section 4. Finally, Section 5 ends
the paper.
1.1 Related Works
In last years researchers in Data Mining and KDD elds have shown more and
more interest in techniques for giving support in the design of knowledge discov-
ery processes. To this end several ontologies have been dened, even if they focus
only on tools and algorithms for Data Mining, which is one of the phases of the
wider and more complex KDD eld [4, 1]. The rst ontology of this kind is DA-
MON (DAta Mining ONtology) [5], that is built for simplifying the development
of distributed KDD applications on the Grid, oering to domain experts a tax-
onomy for discovering tasks, methods and software suitable for a given goal. In
[6], the ontology is exploited for selecting algorithms on the basis of the specic
application domain they are used for. Finally, OntoDM [7] is a general purpose
top-level ontology aimed to describe the whole Data Mining domain.
Some research works were also proposed for supporting process composition
[812]. An early work of this kind is [9], where authors suggests a framework for
guiding users in breaking up a complex KDD task into a sequence of manageable
subtasks, which are then mapped to appropriate Data Mining techniques. Such
an approach was exploited in [11], where the user is supported in iteratively
rening a KDD skeleton process, until executable techniques are available to
solve low-level tasks. To this end, algorithms and data are modeled into an
object oriented schema. In [10] a system is described focusing on setting-up and
reusing chains of preprocessing algorithms, which are represented in a relational
meta-model.
Although these works help users in choosing the most suitable tools for each
KDD phase, no automatic composition procedure is dened. Recent works have
dealt with this issue [8, 12] for dening eective support on process composition.
In detail, in [8] authors dene a simple ontology (actually not much more than
a taxonomy) of KDD algorithms, that is exploited for designing a KDD process
facing with cost-sensitive classication problems. A forward composition, from
dataset characteristics towards the goal, is achieved through a systematic enu-
meration of valid processes, that are ranked on the basis on accuracy achieved
on the processed dataset, and on process speed. [12] introduces a KDD ontology
representing concrete implementations of algorithms and any piece of knowledge
involved in a KDD process (dataset and model), that is exploited for guiding
a forward state-space search planning algorithm in the design of a KDD work-
ow. Such an ontology describes algorithms in very few classes and a poor set
of relationships, resulting in a at knowledge base.
Both in [12] and in [8], ontologies are not rich enough to be extensively used
both for deducing hidden relations among algorithms and for supporting relaxed
matches among algorithms or complex pruning strategies during planning proce-
dure. In order to overcome the limits of the cited works, in our proposal we dene
and exploit a formal KDD ontology expressly conceived for supporting composi-
tion. Such an ontology is exploited by a backward composition procedure, which
composes algorithms not only by exact matches, but also by evaluating similarity
between their interfaces, in order to extract unknown and non-trivial processes.
Our approach, moreover, is aimed to achieve a higher level of generality, by
producing abstract and reusable KDD process. In this work, we use the term
composition instead of planning in order to emphasize the dierence from tra-
ditional AI planning, in which execution stops when a single proper solution is
found, and also because we ultimately refer to service composition.
2 The KDD ONTOlogy
KDDONTO is an ontology describing the domain of KDD algorithms, conceived
for supporting the discovery of KDD algorithms and their composition.
In order to build a KDD ontology, among many methodologies proposed in
literature for ontology building, we choose a formal approach based on the goal-
oriented step-wise strategy described in [13]; moreover, the quality requirements
and formal criteria dened in [14] are taken into account, with the aim to make
meaning explicit and not ambiguous.
The key concept of KDDONTO is algorithm, because it is the basic compo-
nent of each process. Other fundamental domain concepts, from which any other
concept can be derived, are the following:
method : a methodology, a technique used by an algorithm to extract know-
ledge from input data;
phase : a phase of a KDD process;
task : the goal at which aims who executes a KDD process;
model : a set of constructs and rules for representing knowledge;
dataset : a set of data in a proper format;
parameter : any information required in input or produced in output by an
algorithm;
precondition/postcondition : specic features that an input (or output) must
have in order to be used by a method or an algorithm. Such conditions con-
cern format (normalized dataset), type (numeric or literal values), or quality
(missing values, balanced dataset) properties of an input/output datum;
performance : an index and a value about the way an algorithm works;
optimization function : the function that an algorithm or a method optimizes
with the aim to obtain the best predictive/descriptive model.
Starting from these concepts, top level classes are identied, namely Algo-
rithm, Method, Phase, Task, Data (which contains Model, Dataset and Pa-
rameter as subclasses), DataFeature (corresponding to precondition /postcon-
dition ), PerformanceIndex and PerformanceClass (for describing performance
indexes and performance values), ScoreFunction (corresponding to optimization
function ).
Main relations among the classes are:
specifies_phase, between Task and Phase;
specifies_task, between Method and Task;
uses, between Algorithm and Method;
has_input/has_output, a n-ary relation with domain Algorithm, Method
or Task and codomain Data, and optionally DataFeature.
For each instance of DataFeature involved in has_input, a value express-
ing the precondition strenght is also provided. Hence, a value equal to 1.0
corresponds to a mandatory precondition, whereas lower values to optional
ones; also inverse properties input_for/output_for are introduced;
has_performance, a n-ary relation with domain Algorithm, Method, or Task
and codomain PerformanceIndex and PerformanceClass.
Subclasses are dened by means of existential restrictions on main classes,
that can be considered as fundamental bricks for building the ontology. At
rst some Phase instances are introduced, namely PREPROCESSING, MODELING,
POSTPROCESSING. They represent the main phases in a KDD process and are
used to start the subclassing as follows:
Task specializes in subclasses, according to the argument of specifies_pha-
se, e.g.: ModelingTask v Task u ∃specifies_phase{MODELING}
Method is detailed in subclasses according to the tasks that each method
species by means of specifies_task relation, e.g.:
ClassificationMethod v Method u ∃specifies_task{CLASSIFICATION}
Algorithm specializes in subclasses according to uses and has_output re-
lations. For example:
ClassificationAlgorithm v Algorithm
u ∃uses.ClassificationMethod
u ∃has_output.ClassificationModel
Model is further detailed in subclasses, on the basis of the task which the
models are used for, e.g.:
ClassificationModelv Model u ∃output_for{CLASSIFICATION}
A top-level view of described classes and relations is shown in Figure 1.
Fig. 1. KDDONTO: main classes and relations
Many other relations are introduced in order to represent information useful to
support KDD process composition. Among the most interesting:
not_with links two instances of Method that cannot be used in the same
process;
not_before links two instances of Method such that the rst cannot be used
in a process before the second;
in_module/out_module allow to connect an instance of Algorithm to others,
which can be executed respectively before or after it. These relations provide
suggestions about process composition, representing in an explicit fashion
KDD experts' experience about process building;
part_of (and its inverse1 has_part), between an instance of Model and an
its component (a generic Data instance), allows to describe a model in terms
of the subcomponents it is made of. These relations are useful for identify-
ing algorithms working on similar models, that is models having common
substructures, as discussed in next section.
1
We use inverse rather than reciprocal because both part_of and has_part are
instance-level relations.
At present, KDDONTO is represented in OWL-DL, whose logical model is
based on Description Logics and is decidable; it is a sublanguage of OWL [15],
the de-facto standard language for building ontologies. An implementation of
KDDONTO has been obtained after some renements, whose details are not
reported here, and is available at the KDDVM project site2 .
3 Algorithm Matching
For the purposes of this work, we dene a KDD process as a workow of algo-
rithms that allows to achieve the goal requested by the user. The basic issue in
composition is to dene the algorithm matching, that is to specify under which
conditions two or more algorithms3 can be executed in sequence. Each algo-
rithm takes data with certain features in input, performs some operations and
returns data in output, which are then used as input for the next algorithm in
the process. Therefore, two algorithms can be matched if the output of the rst
is compatible with the input of the second.
An exact match between a set of algorithms {A1 ,...,An } and an algorithm B
is dened as:
matchE ({A1 , ..., An }, B) ↔ ∀ iniB ∃Ak ∃outjAk : outjAk ≡o iniB
where iniB is the ith input of the algorithm B , outjAk is the j th output of the
algorithm Ak . ≡o represents the conceptual equivalence and is dened as follows:
let a and b be two parameters, a ≡o b if Ca v Cb , i.e. if a and b refer to the
concepts Ca and Cb such that Ca is subsumed by Cb (they are the same concept
or the former is a subconcept of the latter). In such cases the whole set of
algorithms {A1 ,...,An } provide the required data for B , realizing the piece of
workow shown in Figure 2a.
Furthermore, an exact match is complete if all the required inputs for an
algorithm are provided by a single algorithm, as represented in Figure 2b.
More formally, an exact complete match between two algorithms A and B is
dened as:
matchEc (A, B) ↔ ∀ iniB ∃outjA : outjA ≡o iniB
By exploiting properties of algorithms, described in the previous section, it is
possible to dene a match based not only on exact criteria, but also on similarity
among data. We can consider compatible two algorithms even if their interfaces
are not perfectly equivalent: the relaxation of constraints results in a wider set
of possible matches. Hence, an approximate match between a set of algorithms
{A1 ,...,An } and an algorithm B is dened as:
matchA ({A1 , ..., An }, B) ↔ ∀ iniB ∃Ak ∃outjAk : outjAk ≡o iniB ∨
similar(outjAk , iniB )
2
http://boole.diiga.univpm.it/kddontology.owl.
3
Hereafter we use class and concept as synonyms, and refer to algorithm as the
Algorithm class.
(a) (b)
Fig. 2. (a) Exact and (b) exact complete matches (dashed lines represent ≡O relation)
where the similarity predicate similar(x, y) is satised if x and y are similar
concepts, i.e. if there is a path in the ontology graph that links them together. An
approximate match is useful not only when an exact match cannot be performed,
but also for extracting unknown and non-trivial processes.
The similarity between concepts can be evaluated on the basis of various KD-
DONTO relations. The simplest similarity relation is at hierarchic level: a spe-
cic datum is similar to its siblings, because they share, through an is-a relation,
the membership to the same class. Moreover, similarity is also at compositional
level: a datum can be made of simpler data, according to part_of/has_part
relationships, described in Section 2. As shown in Figure 3, a compound datum
(e.g. d) can be used in place of one of its components (e.g. in1B ), because
the former is a superset of the latter, containing all the needed information,
and other that can be discarded. To give a practical example, a Labeled Vector
Quantization model (LVQ) has_part a VQ model and a Labeling function: if
an algorithm requires VQ model in input, LVQ model can be provided in place
of it.
Given two similar concepts, we dene ontological distance as the number
of is-a or part_of relations that are needed to link them in the ontological
graph; as only exception, ontological distance from a concept to its subconcepts is
considered null. In approximate match, the higher is ontological distance between
two concepts, the less they are similar. This allows to assign a score to each match
and to dene a rank among the generated processes, as described in Subsection
4.3.
Fig. 3. approximate match (dashed line represents ≡O relation)
In process composition, whatever match is used, it is needed to check the
satisfaction of preconditions and postconditions: this means that postconditions
of the rst algorithm must not be in contrast with preconditions of the second
one, as regards the same data.
4 Process Composition Procedure
Based on algorithm matching, in this section a goal-driven procedure for com-
posing KDD processes is described. Our approach is aimed at the generation of
all potentially useful, valid and unknown processes satisfying the user requests;
this allows the user to choose among processes with dierent characteristics and
to experiment more than a single solution. We use Jena4 as a framework for
querying the ontology through SPARQL language [16], which is a W3C Recom-
mendation, whereas Pellet5 is used as reasoner for inferring non-explicit facts.
The proposed process composition procedure is formed of the following phases:
(I) dataset and goal denition, (II) process building, (III) process ranking.
4.1 Dataset and goal denition
Any KDD process is built for achieving a specic KDD goal processing a given
dataset. Hence, the rst step of our procedure is the description of both the
dataset and the goal.
In our framework, the former is described by a set of characteristics (e.g.
representation model, size, feature type), which are instances of the DataFeatu-
re class. The latter is expressed as an instance of the Task class, leaving the
user to move from complex domain-dependent business goals to one or more
well-dened and domain-independent KDD tasks.
The description of both dataset and goal allows us to guide the composition
procedure, bounding the number and type of algorithms that can be used at the
beginning and at the end of each process.
Moreover, some process constraints are provided in this phase for contributing
to dene a balance between procedure execution speed and composition accuracy.
Some of these constraints can be dened by the user; among others: kind of
match (only exact or also approximate), maximum ontological distance for each
match in a process, maximum number of algorithms in a process, and maximum
computational complexity of a process.
Other constraints are predened and built-in into the procedure for ensuring
to produce valid KDD processes. Some examples are the following:
two algorithms whose methods are linked through not_with property cannot
coexist in the same process;
two algorithms whose methods are linked through not_before property can
coexist in the same process only if the rst follows the second;
4
http://jena.sourceforge.net/
5
http://clarkparsia.com/pellet
more than one FeatureExtraction algorithm cannot coexist in the same pro-
cess.
4.2 Process Building
Process building is an iterative phase, which starts from the given task and goes
backwards adding one or more algorithms to each process and for each iteration.
Such algorithms are chosen on the basis of the algorithm matching functionalities
dened in the previous section.
The procedure goes on until the rst algorithm of each process is compatible
with the given dataset, and stops if one of the following conditions come true: no
given process can be further expanded because no compatible algorithms exist,
or one of the process constraints is violated.
The main steps in process building phase are described in Table 1. A process
Pi =<Vi ,Ei > is represented as a directed acyclic graph, where Vi is the set of
nodes, namely algorithms, and Ei is the set of directed edges linking algorithms
together. At rst, algorithms Ai , which return as output a model x used for
performing the given task T , are found; then, for each of them a process Pi is
created. Such a Pi is added to the set P which contains all the processes that
are going to be evaluated in the next step.
Until there is a process Pi in P , algorithms compatible with the one(s) at the
head of Pi are extracted. If the process constraints are satised, these extracted
algorithms are used for forming a new valid process, which is added to the set P .
At last, the process Pi is deleted from the set P because its expansion has ended,
and the procedure is iterated. At the beginning of each iteration, Pi is checked
against the characteristics of the dataset at hand: if they are compatible, Pi is
moved to the set F of nal processes. Note that the process constraints are used
as pruning criteria, that ensure to produce useful and valid processes, keeping
the complexity of the whole procedure under control.
During the procedure, it may happen that a single algorithm or a set of
algorithms can be executed more than one time inside a process. To avoid any
possible endless loop, we x the maximum number of algorithms in a process.
4.3 Process Ranking
In order to support the user in choosing among the generated processes, we
dene some criteria for ranking them:
similarity measurement : an exact match is more accurate than an approx-
imate one, thus a process can be ranked on the basis of the sum of the
ontological distances of each match. The higher the value of the sum, the
less the rank of the process;
precondition relaxation : in algorithm matching, preconditions on some data
can be relaxed if they have a condition_strenght value lower than 1, i.e.
a non-mandatory precondition. Relaxing preconditions reduces the process
rank, because algorithm execution can lead to lower quality outcomes;
Let P be the set of processes at each iteration, Pi be the ith process in P , described by
the pair <Vi , Ei > where Vi is the set of algorithms in Pi and Ei is the set of directed
edges (Ap ,Aq ) which connect algorithm Ap to algorithm Aq .
Let F be the nal list of valid generated processes, T be the task, D be the set of
dataset characteristics, matchD (D,Pi ) be a predicate, which is true if the precondi-
tion of the algorithms at the head of Pi are compatible with D.
P ← ∅, F ← ∅;
Find the set Γ ={Ai : has_output(Ai ,x) u output_for(x,T )};
foreach Ai ∈ Γ do
initialize Pi =<Ai ,∅>;
if (process_constraints(Ai ,Pi )) then P ← Pi ;
foreach Pi ∈ P do
if (matchD (D,Pi )) then F ← Pi ;
Dene the set ∆={Ak ∈ Vi : @ (x, Ak ) ∈ Ei };
foreach Ak ∈ ∆ do
Find the set Φ={Φ1 ,...,Φm }, where Φj is the set of algorithms {B1 ,...,Bmj }
such that matchE (Φj ,Ak )t matchA (Φj ,Ak );
foreach Φj ∈ Φ do
if (process_constraints(Φj ,Pi )) then
dene P 0 =<Vi ← Φj , Ei ←{(B1 ,Ak ),...,(Bmj ,Ak )}>;
P ← P 0;
P =P -{Pi }.
Table 1. The composition algorithm.
use of link modules : the score of a process in which there are algorithms linked
through the properties in_module and out_module is increased, because
these relations state that a specic connection among algorithms was proved
to be eective;
performance evaluation : algorithm performances are used to assign a global
score to a process. For example, in the case of a computational complexity
index, it is possible to determine the whole process complexity as the highest
complexity among the algorithms in a process.
4.4 Applicative example
At present the KDDONTO implementation is formed of 88 classes, 31 rela-
tions and more than 150 instances; we describe 15 algorithms of preprocessing,
modeling and postprocessing phases, in particular for Feature Extraction, Clas-
sication, Clustering, Evaluation and Interpretation tasks.
On this basis, the eectiveness of the composition procedure has been eval-
uated through a prototype implementation. The following scenario has been
assumed: an user wants to perform a classication task on a normalized dataset
with 2 balanced classes, missing values and both literal and numeric values.
The constraints she puts are the following: both exact and approximate matches
allowed, maximum number of algorithms for a process equal to 5.
The evaluation has been performed comparing our proposal with other two
solutions. As rst solution we have dened a procedure using a database for
representing information about algorithms, in which no inference is possible. In
the other solution, we have exploited a combinatorial approach for composing
algorithms, where a Postprocessing algorithm cannot precede Preprocessing or
Modeling ones, and a Modeling algorithm cannot precede a Preprocessing algo-
rithm. Resulting processes have been then evaluated by a KDD expert, in order
to identify the valid ones, i.e. processes in which the algorithm sequence is both
semantically correct w.r.t. all input/output matches and consistent w.r.t. the
user goal and requests.
Using a non-ontological approach, we are able to extract 37 processes, that
the expert assesses to be all valid. The number of processes considerably in-
creases when the combinatorial approach is exploited, but most of them are
invalid and often meaningless, and need to be manually ltered. Finally, our
procedure generates a set of 70 processes, which consists of the valid processes
extracted through the non-ontological approach and other 33 valid and not ex-
plicit processes, composed by using inference and approximate match. Hence, our
procedure is able to produce a high number of alternatives, without introducing
spurious and semantically incorrect processes.
5 Conclusion
The main contribution of this work is the introduction of a goal-oriented proce-
dure aimed at the automatic composition of algorithms forming valid KDD pro-
cesses. The proposed procedure is based on the exploitation of KDDONTO, that
formalize knowledge about KDD algorithms. The use of such an ontology leads
to manifold advantages. Firstly, the resulting processes are valid and semanti-
cally correct. Secondly, unlike works in Literature, we are able to generate not
only explicit processes formed by directly linkable algorithms, but also implicit,
interesting and non-trivial processes where algorithms share similar interfaces.
Thirdly, KDDONTO is able to support complex pruning strategies during com-
position procedure, making also use of inferential mechanism. Finally, processes
can be ranked according to both ontological and non-ontological criteria.
Comparing with planning algorithms [12], such an approach allows users to
choose more processes suitable w.r.t. their requirements. Moreover, generated
processes can be themselves considered as useful, valid and unknown knowledge,
valuable both for novice and expert users.
At present we are working on the development of a support service imple-
menting the described process composition procedure, in order to actually in-
tegrate it into the KDDVM project. Since abstract KDD processes cannot be
directly executed, each of them needs to be substituted with a workow of ser-
vices, in which every algorithm is replaced with a service implementing it. As
future extensions, we are also working on increasing the number of instances de-
scribed in KDDONTO and performing more comprehensive tests. Furthermore,
we are studying several heuristics to provide an actual ranking of the generated
processes.
References
1. Fayyad, U.M., Piatetsky-Shapiro, G. and Smyth, P. In: From data mining to
knowledge discovery: an overview. American Association for Articial Intelligence,
Menlo Park, CA, USA (1996) 134
2. KDDVM project site. http://boole.diiga.univpm.it
3. Diamantini, C. and Potena, D.: Semantic Annotation and Services For KDD Tools
Sharing and Reuse. In: Proc. of the 8th IEEE International Conference on Data
Mining Workshops. 1st Int. Workshop on Semantic Aspects in Data Mining, Pisa,
Italy (Dec 19 2008) 761770
4. CRISP-DM site. http://www.crisp-dm.org
5. Cannataro, M. and Comito, C.: A data mining ontology for grid programming.
In: Proc. 1st Int. Workshop on Semantics in Peer-to-Peer and Grid Computing, in
conjunction with WWW2003, Budapest, Hungary (2003) 113134
6. Yu-hua, L., Zheng-ding, L., Xiao-lin, S., Kun-mei, W. and Rui-xuan, L.: Data
mining ontology development for high user usability. Wuhan University Journal
of Natural Sciences 11(1) (2006) 5156
7. Panov, P., Dºeroski, S. and Soldatova, L.: OntoDM: An Ontology of Data Mining.
In: Data Mining Workshops, International Conference on, Los Alamitos, CA, USA,
IEEE Computer Society (2008) 752760
8. Bernstein, A., Provost, F. and Hill, S.: Towards Intelligent Assistance for a Data
Mining Process: An Ontology Based Approach for Cost-Sensitive Classication.
IEEE Transactions on Knowledge and Data Engineering 17(4) (2005) 503518
9. Engels, E.: Planning tasks for knowledge discovery in databases; performing task-
oriented user-guidance. In: Proceedings of the 2nd International Conference on
Knowledge Discovery in Databases (KDD'96), Portland, Oregon (August 1996)
10. Morik, K. and Scholz, M.: The MiningMart Approach to Knowledge Discovery in
Databases. In Zhong, N. and Liu, J., ed.: Intelligent Technologies for Information
Analysis. Springer (2004) 4765
11. Wirth, R., Shearer, C., Grimmer, U., Reinartz, T., Schlösser, J.J., Breitner, C.,
Engels, R. and Lindner, G.: Towards Process-Oriented Tool Support for Knowl-
edge Discovery in Databases. In: PKDD '97: Proceedings of the First European
Symposium on Principles of Data Mining and Knowledge Discovery, London, UK,
Springer-Verlag (1997) 243253
12. áková, M., K°emen, P., elezný F. and Lavra£, N.: Using Ontological Reasoning
and Planning for Data Mining Workow Composition. In: SoKD: ECML/PKDD
2008 workshop on Third Generation Data Mining: Towards Service-oriented
Knowledge Discovery, Antwerp, Belgium (2008)
13. Noy, N. and McGuinnes, D.L.: Ontology Development 101: A Guide to Creating
Your First Ontology. Stanford University (2002)
14. Gruber, T.: Toward principles for the design of ontologies used for knowledge
sharing. Int. J. Hum.-Comput. Stud. 43(5-6) (1995) 907928
15. Smith, M.K., Welty, C. and McGuinness, D.L.: OWL Web Ontology Language
Guide, W3C Recommendation. http://www.w3.org/TR/owl-guide/ (2004)
16. Prud'hommeaux, E. and Seaborne, A.: SPARQL Query Language for RDF, W3C
Recommendation. http://www.w3.org/TR/rdf-sparql-query/ (2008)