Query Based Access Control for Linked Data
arXiv:2007.00461v2 [cs.DB] 31 Dec 2020
Sabrina Kirrane1 , Alessandra Mileo2 , Axel Polleres1 , and Stefan
Decker3
1
Vienna University of Economics and Business, Austria
2
Dublin City University, Ireland
3
Fraunhofer FIT, RWTH Aachen University, Germany
Abstract
In recent years we have seen significant advances in the technology
used to both publish and consume Linked Data. However, in order to
support the next generation of ebusiness applications on top of interlinked
machine readable data suitable forms of access control need to be put
in place. Although a number of access control models and frameworks
have been put forward, very little research has been conducted into the
security implications associated with granting access to partial data or
the correctness of the proposed access control mechanisms. Therefore the
contributions of this paper are two fold: we propose a query rewriting
algorithm which can be used to partially restrict access to SPARQL 1.1
queries and updates; and we demonstrate how a set of criteria, which was
originally used to verify that an access control policy holds over different
database states, can be adapted to verify the correctness of access control
via query rewriting.
1
Introduction
The term Linked Data Web (LDW) is used to describe a World Wide Web
where data is directly linked with other relevant data using machine-accessible
formats [9, 6]. Although the LDW has seen considerable growth in recent years,
the focus continues to be on linking public data. This can partially be attributed
to the fact that no formal recommendation exists for allowing partial access to
Linked Data based on predefined access control policies.
Several researchers have proposed access control strategies for the Resource
Description Framework (RDF), which could be applied to Linked Data. Broadly
speaking, these frameworks enforce access control either at the data layer [4, 8],
the query layer [1, 5, 2, 10, 11] or a combination of both [7]. Enforcement of
access control at the data layer is concerned with removing unauthorised data
from a dataset. Whereas, enforcement at the query layer relies on using query
rewriting techniques to ensure that only authorised query results are returned.
Given we need to cater for many users, with many different authorisations
a filtering approach will not scale well, therefore in this paper we adopt a query
rewriting approach. To date researchers have proposed query rewriting strategies that involve adding contextual information pertaining to the requester to
1
the query using path expressions and bindings [1, 5, 11], or adding bindings
for authorised/unauthorised classes, properties or instances using filters [2, 10].
Our query rewriting strategy builds on the latter approach by demonstrating
how quad patterns can be used to grant or deny access to RDF data at multiple levels of granularity (triple, named graph, classes, properties, instances).
In addition, we demonstrate how our query rewriting strategy can be used not
only to enforce access control over basic graph patterns but also in conjunction
with SPARQL 1.1 queries that include subqueries and negation, graph update
operations and graph management operations. Although in this paper we discuss how access can be restricted to a single quad pattern as we use FILTER
NOT EXISTS to deny access to unauthorised data our approach can easily be
extended to work with graph patterns. In order to verify the effectiveness of
our query rewriting strategy, we define a set of criteria that can be used to
compare the results obtained when a query is executed over a dataset where all
unauthorised data has been removed, and the results obtained when a query
is rewritten and executed over a dataset, which contains both authorised and
unauthorised data. The contributions of the paper can be summarised as follows: we (i) formally define a set of criteria which can be used to verify the
correctness of access control via query rewriting; (ii) propose query rewriting
strategies that can be used to partially restrict access to SPARQL 1.1 queries
and updates; and (iii) demonstrate how the proposed correctness criteria can
be used to verify existing query rewriting strategies.
The remainder of the paper is structured as follows: Section 2 describes the
different data filtering and query rewriting strategies that are used to restrict
access to RDF data. Section 3 presents a set of correctness criteria, which allows
for access control via query rewriting to be compared to access control via data
filtering. Section 4 proposes query rewriting strategies for both SPARQL 1.1
queries and updates. In Section 5 we use the proposed correctness criteria to
evaluate both an existing query rewriting strategy and the alternative strategy
presented in this paper. Finally, we present conclusions and directions for future
work in Section 6.
2
Related Work
Both Dietzold and Auer [4] and Muhleisen et al. [8] adopt a filtering approach
to access control over RDF data. Dietzold and Auer [4] propose access control
policy specification at multiple levels of granularity (triples, classes and properties). Authorisations are used to associate filters (SPARQL CONSTRUCT queries)
with users and resources. When a requester submits a query, a virtual model
is created based on the matched authorisations. The query is subsequently executed against the virtual model, which only contains data that the requester
is authorised to access. Whereas, Muhleisen et al. [8] allow access control policies to be specified for triple patterns, resources or instances using SWRL rules.
When a requester submits a query, the system uses the access control rules to
generate a temporary named graph containing only authorised data. The requesters query is subsequently executed against the temporary named graph
and the results are returned to the user.
A number of researchers Abel et al. [1], Franzoni et al. [5], Oulmakhzoune
et al. [11] have proposed strategies that involve adding policy information to the
2
query (for example, the requester can only see information relating to papers
that they have authored). Abel et al. [1] specify authorisations in terms of
sets of contextual predicates, path expressions and boolean expressions that are
used to restrict access to instance data. For positive authorisations, the path
expressions and the bindings are appended to the query WHERE clause. For
negative authorisations, the path expressions and the bindings are added to
a MINUS clause, which in turn is appended to the query. Franzoni et al. [5]
also propose a query rewriting strategy which is used to grant/deny access to
ontology instances, based on contextual information pertaining to the user or the
environment. Like Abel et al. [1] bindings are generated for path expressions
specified in the access control policy and both the path expressions and the
bindings are added to the query WHERE clause. Oulmakhzoune et al. [11] extend
previous work by demonstrating how the SERVICE operator can be used to return
bindings based on an external privacy preferences policy.
Both Chen and Stuckenschmidt [2] and Oulmakhzoune et al. [10] use simple
filters to bind/unbind query solutions based on access control policies that are
specified for specific classes, properties or individuals. According to the rewriting strategy proposed by Oulmakhzoune et al. [10], when access is prohibited
to predicates or objects, the relevant triple patterns are made OPTIONAL. Although both Chen and Stuckenschmidt [2] and Oulmakhzoune et al. [10] propose
query rewriting strategies for SPARQL queries, the authors focus specifically on
SELECT queries composed of basic graph patterns and no special consideration
is given to complex SPARQL queries that include subqueries or negation.
Costabello et al. [3] also use contextual data to restrict access to RDF. However, the proposed query rewriting strategy restricts access to named graphs as
opposed to specific classes, properties and instances. Li and Cheung [7] propose a query rewriting strategy for views generated from ontological relations.
The proposed query rewriting strategy involves expanding the view concepts
to include implicit concepts, retrieving both explicit and implicit access control
policies and adding range and instance restrictions, associated with the matched
authorisation, to the view query.
When it comes to the evaluations of the aforementioned access control strategies only Oulmakhzoune et al. [11] examine the correctness of their access control strategy. The authors present a set of criteria which is used by Wang et al.
[12] to verify that a given access control policy holds over different relational
database states and discuss how their algorithm satisfies the criteria. However,
no formal evaluation is performed.
In contrast to other approaches, our query rewriting strategy can be used not
only to enforce access control over basic graph patterns but also in conjunction
with SPARQL 1.1 queries that include subqueries and negation, graph update
operations and graph management operations. In addition, we demonstrate how
a set of correctness criteria, that was originally used to verify that a given access
control policy holds over different database states, can be adapted to verify the
correctness of access control via query rewriting, by performing a comparison
with access control via data filtering. As the formal correctness criteria we
propose in this paper is not specific to RDF, it can be used in conjunction with
any non-monotonic query language.
3
3
Access Control Correctness Criteria
According to Wang et al. [12] a query processing algorithm should be secure,
sound and maximum. An algorithm is secure if it does not return information
which has not been authorised. An algorithm is sound if it does not return
invalid results. Furthermore, an algorithm is maximum if it returns as much information as possible without violating the secure and sound constraints. In
its current form, the formal correctness criteria presented in Wang et al. [12]
is unsuitable for the verification of access control over RDF data. Although it
is possible to use their formalism to verify that a secure query processing algorithm holds over different database states, it cannot be used to verify that the
algorithm is in fact secure, nor can it be used in conjunction with non monotonic queries such as MINUS and FILTER NOT EXISTS. Therefore, in this paper
we redefine each of the criteria to cater for a comparison between the results
obtained when: (i) a query is executed against a dataset which is generated
by removing the unauthorised data (henceforth referred to as filtering); and
(ii) when the same query is rewritten so that unauthorised data will not be returned and subsequently executed over the unmodified dataset (which we refer
to as rewriting). In the definitions that follow the term dataset is used to
denote a collection of RDF graphs, which can include a default graph and one
or more named graphs. In this paper we assume that the dataset does not contain blank nodes. Before formally defining the correctness criteria we provide
definitions for an authorised dataset and an unauthorised dataset:
Definition 1 (Authorised Dataset and Unauthorised Dataset). Let D denote a
dataset and P an access control policy. Given D and P , let DG denote the set
of quads in D where access is granted by P and DD the set of quads in D where
access is denied by P . Assuming that DG is disjoint from DD, then DG is the
RDF subgraph of D which is authorised by P , and DD is the RDF subgraph of
D which is unauthorised by P .
3.1
Correctness Criteria for Non-Monotonic Queries
When access control via query rewriting is compared to access control via data
filtering, a query rewriting algorithm is deemed secure if each of the resources
returned by the algorithm are present in the authorised dataset. The algorithm
is sound if all of the results returned by the algorithm are also present in the
result set which is generated when the original query is executed over the authorised dataset. The algorithm is maximum if the data returned by the algorithm is
equivalent to the data returned when the query is executed over the authorised
dataset. We formally redefine the correctness criteria proposed by Wang et al.
[12] as follows:
Definition 2 (Query correctness criteria). Given our definition of an authorised dataset DG, let S denote a query processing algorithm without access
control. It follows that when query Q is executed on DG the result set returned
by S(DG, Q) only contains authorised data.
Let A(D, P, Q) represent a query processing algorithm with access control, which
returns the result set authorised by P when query Q is executed on D. A query
4
processing algorithm is:
Secure if and only if ∀P ∀Q ∀D ∀DG ∀r [r ∈ A(D, P, Q) =⇒ r ∈ DG].
Sound if and only if ∀P ∀Q ∀D ∀DG [A(D, P, Q) ⊑ S(DG, Q)].
Maximum if and only if ∀P ∀Q ∀D ∀DG [A(D, P, Q) ≡ S(DG, Q)].
3.2
Correctness Criteria for Updates
Although Wang et al. [12] focused specifically on queries, in this paper we are
also interested in updates. As SPARQL 1.1 updates change the state of the
dataset, it is necessary to examine the resulting datasets as opposed to comparing the query results. Here we use the term rewritten dataset to refer
to the dataset which is generated when a rewritten query is executed against a
given dataset, and the term merged filtered dataset to refer to the merge
of the unauthorised dataset with the authorised dataset after the update query
has been executed. A delete query processing algorithm is deemed secure, if
every resource that is in the merged filtered dataset, is in the rewritten dataset.
The algorithm is sound, if the rewritten dataset does not contain less resources
than the merged filtered dataset. The algorithm is maximum, if the rewritten
dataset is equivalent to the merged filtered dataset. An insert query processing
algorithm is deemed secure, if all of the data contained in the rewritten dataset
is also in the merged filtered dataset. The algorithm is sound, if the rewritten
dataset does not contain more resources than the merged filtered dataset. The
algorithm is maximum, if all of the rewritten dataset is equivalent to the merged
filtered dataset.
Definition 3 (Update correctness criteria). Given our definitions for an authorised dataset DG and an unauthorised dataset DD, let U denote an update
query processing algorithm without access control. It follows that when query Q
is executed on DG the dataset generated by U (DG, Q) only contains authorised
data.
5
Let U D denote a delete processing algorithm with access control, where U D(D, P, Q)
produces a new dataset when query Q is executed on the subset of D which is
authorised by policy P . A delete query processing algorithm is:
Secure if and only if
∀P ∀Q ∀D ∀DG ∀r [r ∈ (U (DG, Q) ∪ DD) =⇒ r ∈ U D(D, P, Q)].
Sound if and only if
∀P ∀Q ∀D ∀DG ∀DD [U D(D, P, Q) ⊒ (U (DG, Q) ∪ DD)].
Maximum if and only if
∀P ∀Q ∀D ∀DG ∀DD [U D(D, P, Q) ≡ (U (DG, Q) ∪ DD)].
Let U I denote an insert processing algorithm with access control and U I(D, P, Q)
produces a new dataset when query Q is executed on the subset of D which is
authorised by policy P . An insert query processing algorithm is:
Secure if and only if
∀P ∀Q ∀D ∀DG ∀DD ∀r [r ∈ U I(D, P, Q) =⇒ r ∈ (U (DG, Q) ∪ DD)].
Sound if and only if
∀P ∀Q ∀D ∀DG ∀DD [U I(D, P, Q) ⊑ (U (DG, Q) ∪ DD)].
Maximum if and only if
∀P ∀Q ∀D ∀DG ∀DD [U I(D, P, Q) ≡ (U (DG, Q) ∪ DD)].
4
Query Rewriting for SPARQL Queries and
Updates
An RDF triple is used to asserts a binary relationship between two pieces of
information. A triple is represented as a tuple hS, P, Oi ∈ (U ∪ B) × (I) ×
(I ∪ B ∪ L), where S is called the subject, P the predicate, and O the object and
I, B and L, are used to represent Internationalized Resource Identifier (IRIs),
blank nodes and literals respectively. A triple pattern is a triple that can potentially contain variables in the subject, predicate and object positions. A
quad pattern is a an extension of a triple pattern where the fourth element is
used to denote the named graph, which can take the form of either an IRI
or a variable. In this paper, we focus specifically on rewriting the query to
filter out the data which has been restricted, as such we assume that an authorisation framework has already determined the unauthorised quad patterns
that are relevant for a given query. Although we demonstrate how access can
be restricted to a quad pattern, the approach can easily be extended to work
with graph patterns. We use the foaf prefix to denote the FOAF Vocabulary http://xmlns.com/foaf/0.1/ 1 and the entx prefix to refer to our sample enterprise vocabulary http://example.org/enterprisex#. The query results presented throughout this paper are based on the sample data presented
in Figure 1 and the authorisations presented in Figure 2. The quad pattern,
entx:MRyan entx:salary ?o ?g, denies access to May Ryan’s salary. Whereas,
entx:MRyan entx:worksFor ?o ?g, restricts access to information pertaining
to the people that May Ryan works for.
1 FOAF
vocabulary Specification, http://xmlns.com/foaf/spec/.
6
1
2
3
4
5
6
7
8
9
10
11
12
13
entx : EmployeeDeta il s {
entx : JBloggs rdf : type foaf : Person .
entx : JBloggs foaf : name " Joe Bloggs " .
entx : JBloggs entx : salary 60000 .
entx : MRyan rdf : type foaf : Person .
entx : MRyan foaf : name " May Ryan " .
entx : MRyan entx : salary 33000 .
entx : JSmyth rdf : type foaf : Person .
entx : JSmyth foaf : name " John Smyth " .
entx : JSmyth entx : salary 33000 . }
entx : OrgStructure {
entx : MRyan entx : worksFor entx : JBloggs .
entx : JSmyth entx : worksFor entx : MRyan . }
Figure 1: Enterprise Dataset
1
2
hentx:MRyan, entx:salary, ?o, ?gi
hentx:MRyan, entx:worksFor, ?o, ?gi
Figure 2: Unauthorised quad patterns
4.1
SPARQL Queries
The SPARQL query language supports four distinct query types (SELECT, ASK,
CONSTRUCT and DESCRIBE). In each case, SPARQL graph pattern matching is
used in order to determine the output of the query. Although the queries presented in this section are limited to SELECT queries, the proposed query rewriting
strategy is identical for each of the query types.
Basic Graph Patterns and Aggregate. Basic Graph Patterns (BGPs)
are sets of triple patterns and aggregates are functions that are applied to
groups of solutions, for example COUNT, SUM, MIN, MAX, AVG, GROUP CONCAT
and SAMPLE. When we execute Query 1 without any access restrictions the identifiers, names and the salaries of all persons are returned.
Query 1 (Basic graph pattern). Given the following query:
SELECT ? id ? name ? salary WHERE { GRAPH entx : EmployeeDeta il s
{
? id foaf : name ? name . ? id entx : salary ? salary } }
The output is as follows:
?id
?name
entx:JBloggs
entx:MRyan
entx:JSmyth
?salary
"Joe Bloggs"
"May Ryan"
"John Smyth"
60000
33000
33000
However, if the requester is denied access to the salary pertaining to entx:MRyan,
by authorisation 1 in Figure 2, we need to filter out the restricted data. This
can be achieved by using a FILTER NOT EXISTS to filter out unauthorised data
and adding it to the relevant graph pattern group. Query 2 limits the result set
to the identities, names and salaries of authorised persons.
7
Query 2 (Basic graph pattern restricted using binding). Given the following
query:
SELECT ? id ? name ? salary WHERE { GRAPH entx : EmployeeDeta il s
{
? id foaf : name ? name . ? id entx : salary ? salary
FILTER NOT EXISTS { GRAPH entx : EmployeeDeta il s {
? id foaf : name ? name . ? id entx : salary ? salary
FILTER ( ? id = entx : MRyan ) } } } }
The output is as follows:
?id
?name
entx:JBloggs
entx:JSmyth
?salary
"Joe Bloggs"
"John Smyth"
60000
33000
Subqueries and Filters.
In SPARQL 1.1, negation can be achieved by
filtering query results using FILTER EXISTS, FILTER NOT EXISTS or MINUS expressions. Although subqueries are not classified under negation, such queries
are used to limit the result set based on the results of an embedded query.
The following queries are constructed using an inner SELECT query, however the
query rewriting strategy is the same for queries that contain FILTER EXISTS,
FILTER NOT EXISTS and MINUS expressions. Query 3 returns the names of all
people and the names of the people that they work for.
Query 3 (Subqueries). Given the following query:
SELECT DISTINCT ? employee ? manager WHERE { GRAPH ? g {
? x foaf : name ? employee . ? y foaf : name ? manager {
SELECT ? x ? y WHERE { GRAPH ? g { ? x entx : worksFor ? y } } } }
}
The output is as follows:
?employee
?manager
"John Smyth"
"May Ryan"
"May Ryan"
"Joe Bloggs"
As authorisation 2 in Figure 2 matches a quad in the inner query we add
a FILTER NOT EXISTS to the relevant graph pattern group in the subquery.
Query 4 results in the filtering of May Ryan and her manager from the result
set.
Query 4 (Subqueries restricted with binding). Given the following query:
SELECT DISTINCT ? employee ? manager WHERE { GRAPH ? g {
? x foaf : name ? employee . ? y foaf : name ? manager
{ SELECT ? x ? y WHERE { GRAPH ? g { ? x entx : worksFor ? y
FILTER NOT EXISTS { GRAPH ? g { ? x entx : worksFor ? y
FILTER ( ? x = entx : MRyan ) } } } } } } }
The output is as follows:
?employee
?manager
"John Smyth"
"May Ryan"
8
4.2
Query Rewriting Algorithm
Based on the query rewriting strategies presented in the previous section, we
propose a query rewriting algorithm, which ensures that only authorised data is
returned by SPARQL 1.1 queries. The algorithm takes as input a query, and a
set of quad patterns that need to be filtered out of the query results, and checks
each SPARQL graph pattern recursively:
(i) If any of the graph patterns in the outer query match any of the unauthorised quad patterns, a FILTER NOT EXISTS element is generated. If
the named graph in the query is a variable and the named graph in the
authorisation is a constant, then a new graph pattern group is constructed
using the named graph from the authorisation and the graph pattern from
the query. Otherwise the unchanged graph pattern group from the query
is added to the filter. In addition, the constants in the subject, predicate
and object positions of the authorisation are bound to the variables in the
query using FILTER = expression. If multiple bindings exists the FILTER
is generated using the conjunction of the bindings.
(ii) If any of the graph patterns in an inner SELECT, EXISTS FILTER, NOT
EXISTS FILTER or a MINUS match any of the unauthorised quad patterns,
a FILTER NOT EXISTS element is generated as described above and added
to the relevant graph pattern group in the subquery or the filter expression.
4.3
SPARQL Updates
SPARQL 1.1 update caters for a number of update operations (CLEAR, LOAD,
INSERT DATA, DELETE DATA and DELETE/INSERT) and a number of graph management operations (CREATE, DROP, MOVE, COPY and ADD). In the case of update
queries there are two possible options: (i) the system should inform the requester
that the query cannot be completed and provide a list of the triples that cannot
be deleted, inserted etc.; or (ii) the system should behave as if the unauthorised
data is not present. If we adopt the first option, the requester will be aware that
data exists which they do not have access to, and could potentially infer unauthorised information by issuing one or more additional queries. As a result we
adopt the second option and behave as per access control via data filtering. For
SPARQL update queries, three distinct query rewriting strategies are required.
DELETE, INSERT, DELETE/INSERT. As the DELETE/INSERT operation uses graph patterns in order to determine the data to be inserted, deleted
or updated, the query rewriting strategy proposed in Section 4.2 is used to filter
out unauthorised data.
DELETE DATA and INSERT DATA. Given, that the DELETE DATA and
the INSERT DATA operations are used to delete and insert specific data, any
quads matching an unauthorised quad pattern need to be removed from the
query. The query presented in Query 5 is used to delete all data relating to Joe
Bloggs and May Ryan from the entx:EmployeeDetails graph.
Query 5 (DELETE authorised data ). Given the following query:
9
DELETE WHERE { GRAPH entx : EmployeeDet ai ls {
entx : JBloggs rdf : type foaf : Person .
entx : JBloggs foaf : name " Joe Bloggs " .
entx : JBloggs entx : salary 60000 .
entx : MRyan rdf : type foaf : Person .
entx : MRyan foaf : name " May Ryan " .
entx : MRyan entx : salary 33000 . } }
If authorisation 1 in Figure 2 is used to prohibit the requester from deleting May
Ryan’s salary, the query is rewritten as follows:
DELETE WHERE { GRAPH entx : EmployeeDet ai ls {
entx : JBloggs rdf : type foaf : Person .
entx : JBloggs foaf : name " Joe Bloggs " .
entx : JBloggs entx : salary 60000 .
entx : MRyan rdf : type foaf : Person .
entx : MRyan foaf : name " May Ryan " . } }
After the rewritten query is executed over the dataset presented in Figure 1, the
new state of the dataset is as follows:
1
2
3
4
5
6
7
8
entx : EmployeeDeta il s {
entx : JSmyth rdf : type foaf : Person .
entx : JSmyth foaf : name " John Smyth " .
entx : JSmyth entx : salary 33000 .
entx : MRyan entx : salary 33000 . }
entx : OrgStructure {
entx : MRyan entx : worksFor entx : JBloggs .
entx : JSmyth entx : worksFor entx : MRyan . }
Graph based update operations. As the CLEAR, DROP, ADD, LOAD, COPY
and the MOVE operations work at the graph level, when the requester does not
have access to the entire graph these queries need to be rewritten so that they
operate at the triple level. For example the CLEAR operation removes all of
the data from a target graph. When the requester does not have access to
the entire graph, the DELETE operation can be used to only remove authorised
data. Query 6 demonstrates how the CLEAR operation can be represented using
a DELETE operation.
Query 6 (CLEAR authorised data ). Given the following query:
CLEAR GRAPH entx : EmployeeDeta il s
If authorisation 1 in Figure 2 is used to prohibit the requester from clearing
salary information pertaining to entx:MRyan, the query is rewritten as follows:
DELETE { GRAPH entx : EmployeeDeta il s { ? s ? p ? o } }
WHERE { GRAPH entx : EmployeeDeta il s { ? s ? p ? o
FILTER NOT EXISTS { GRAPH entx : EmployeeDeta il s {? s ? p ? o
FILTER (? s = entx : MRyan && ? p = entx : salary ) } } } }
After the rewritten query is executed over the dataset presented in Figure 1, the
new state of the dataset is as follows:
10
1
2
3
4
5
entx : EmployeeDeta il s {
entx : MRyan entx : salary 33000 . }
entx : OrgStructure {
entx : MRyan entx : worksFor entx : JBloggs .
entx : JSmyth entx : worksFor entx : MRyan . }
4.4
Update Rewriting Algorithm
Based on the query rewriting strategies presented in the previous section, we
propose an update query rewriting algorithm, which ensures that only authorised data is inserted and deleted. The algorithm takes as input a query, and a
set of quads that need to be filtered out of the query results. In the case of:
(i) DELETE/INSERT. The query rewriting algorithm presented in Section 4.2
is used to filter out unauthorised quad patterns.
(ii) DELETE DATA and INSERT DATA. If any of the quads in the query match
an unauthorised quad pattern these quads are removed from the query.
(iii) CLEAR and DROP. Negative authorisations pertaining to the specified graph
are added as filters to a DELETE query, which is used to ensure that only
authorised data is removed from the graph.
(iv) ADD and LOAD. Negative authorisations relating to the source and the
destination graphs are added as filters to an INSERT query, which is used
to add/load only authorised data to the destination graph.
(v) COPY. A DELETE query which is constructed using negative authorisations
matching the destination graph, is used to remove all data from the destination graph. While negative authorisations matching the source and the
destination graphs are added as filters to an INSERT query, which is used
to copy only authorised data into the destination graph.
(vi) MOVE. The rewriting strategy for the MOVE operation is the same as the
COPY operation with an additional DELETE query, constructed using negative authorisations matching the source graph, which is subsequently used
to remove authorised data from the source graph.
5
Query Rewriting Evaluation
In Section 3 we formally defined a set of correctness criteria that can be used
to compare access control via query rewriting against access control via data
filtering. In this section, we demonstrate how the proposed correctness criteria
can be used to verify the effectiveness of alternative query rewriting proposals.
Given that SPARQL query results are dependent on pattern matching and filtering, in order to prove the correctness of a query rewriting strategy we only
have to show it works for all 24 possible combinations of quad patterns for each
of the SPARQL 1.1 query types presented in Section 4.1 and Section 4.3. Both
the size of the dataset and the data itself are irrelevant. The entire system (test
data generator, query rewriting algorithm and model checking algorithm) is implemented in Java, and the query evaluation is performed over an in memory
11
store using Jena. The Berlin SPARQL Benchmark (BSBM) dataset generator
is used to generate a dataset containing 1194 quads. The dataset, queries and
authorisations used in the experiments described in this paper can be found at
http://correctness.sabrinakirrane.com/.
5.1
Evaluation of SPARQL Query Rewriting
In order to evaluate the different query rewriting strategies we systematically
generate authorisations and queries from our auto generated dataset.The following algorithm is used to evaluate each of the auto generated queries:
(i) Firstly, the unauthorised quad pattern is used to remove unauthorised
data, and the query is executed against the resulting authorised dataset.
(ii) Secondly, the unauthorised quad pattern is used to rewrite the query based
on the query rewriting algorithm and this rewritten query is executed over
the dataset which contains both authorised and unauthorised data.
(iii) Finally, the results of both approaches are compared using the criteria
presented in Section 3.1.
Existing query rewriting algorithms. Both Chen and Stuckenschmidt [2]
and Oulmakhzoune et al. [10] use filters to bind/unbind query solutions based on
access control policies that are associated with classes, properties or individuals.
Although the authors propose query rewriting strategies for both positive and
negative authorisations our evaluation focuses on negative authorisations.
• Chen and Stuckenschmidt [2] use a FILTER != expressions to remove
bindings for unauthorised individuals. Whereas OPTIONAL {?s ?p ?o.
FILTER (?p=R)} is used to filter matches for a named relation R and
OPTIONAL {?s rdf:type ?o. FILTER(?o=C)} FILTER(!BOUND (?o)) is
used to filter out matches for a named class C.
• Oulmakhzoune et al. [10] also use FILTER != expressions to remove bindings for specific individuals. However, according to their rewriting strategy
if access is restricted to the entire triple pattern then the triple pattern is
removed. Whereas, if access is partially restricted then the triple pattern
is converted to an optional OPTIONAL pattern and the FILTER expression
is added to the optional pattern.
As both query rewriting strategies are based on triples as opposed to quads, in
order to evaluate we systematically generate authorisations from all 23 possible
combinations (of constants and variables) for each quad in the auto generated
dataset (the graph is always a variable). As the authors focus on BGPs, we auto
generated queries, composed of either one, two or three RDF quad patterns that
are randomly generated from the dataset, for each authorisation.
As the query rewriting strategies proposed by Chen and Stuckenschmidt
[2] relies on binding/unbinding constants in the authorisation to variables in
the query, when the restricted class, property or individual appears as a constant in the query, it is not possible to generate a binding. In such instances
their respective query rewriting strategies fail to satisfy the secure, sound and
maximum criteria. Whereas, according to the rewriting strategy proposed by
12
Oulmakhzoune et al. [10], when access is prohibited to predicates or objects,
the relevant triple patterns are made OPTIONAL, which changes the semantics of
the query. Although the query does not return any unauthorised data both the
sound and maximum criteria are violated.
Our query rewriting algorithm. For each RDF quad in the BSBM dataset,
we generate 24 authorisations, resulting in a total of 19104 authorisations. As
SAMPLE returns different data each time it is executed it is not possible to compare query rewriting to results filtering. Therefore, for each of the authorisations
eleven queries are generated as follows:
• Basic Graph Patterns. A query is generated, which contains either
one, two or three RDF quad patterns, that are randomly generated from
data selected from the entire dataset.
• Aggregates. For COUNT and GROUP CONCAT operations, queries are also
generated from up to three RDF quad patterns that randomly generated
from the entire dataset. Given SUM, MIN, MAX and AVG operations are dependent on numeric data, these queries are generated from a quad pattern
which matches all offers (?s rdf:type bsbm:offer ?g) together with a
pattern that matches the associated delivery days (?s bsbm:deliveryDays
?o ?g).
• Subqueries and Filters. A pattern with all variables is added to the
outer query, and the inner SELECT, MINUS, FILTER EXISTS and FILTER
NOT EXISTS are generated from either one, two or three RDF quad patterns, which are randomly selected from the entire dataset.
In the case of basic graph patterns, aggregates and negation, each of the queries
generated from the BSBM dataset were deemed secure, sound and maximum.
However, in the case of property paths, given a FILTER NOT EQUALS does not
restrict access to the path data, in some instances the algorithm failed to satisfy
all three criteria. If we remove the FILTER = binding, the query rewriting
strategy is both secure and sound, however it is not maximum. As such, further
analysis is required in order to determine a rewriting strategy for property paths.
5.2
Evaluation of SPARQL Update Rewriting
For SPARQL updates,the following algorithm is used to evaluate each of the
auto generated queries:
(i) Firstly, the unauthorised quad pattern is used to create a dataset which
only contains authorised data and a dataset which only contains unauthorised data. The query is subsequently executed against the authorised
dataset and both the unauthorised dataset and the updated authorised
dataset are merged to form a new merged filtered dataset. In the case
of INSERT DATA unauthorised triples need to be removed from the query
before it is executed over the authorised dataset. In such instances the
filtering approach is quite similar to the rewriting approach.
(ii) Secondly, the quad pattern is used to rewrite the query and this rewritten
query is executed over the original dataset.
13
(iii) Finally, the results of both approaches are compared using the criteria
presented in Section 3.2.
Our update rewriting algorithm. The query rewriting strategy for LOAD
is identical to that for the ADD operation and no rewriting strategy is required
for CREATE as access is either granted or denied. Therefore, for each of the
authorisations we generate ten different queries as follows:
• Delete Data and Insert Data. For DELETE DATA and INSERT DATA,
queries are generated from one, two or three RDF quads, that are randomly selected from the entire dataset.
• Delete, Insert and Delete/Insert. As per basic graph patterns, DELETE,
INSERT and DELETE/INSERT queries are generated from graph patterns
that are randomly selected from the entire dataset.
• Graph Update Operations. CLEAR, DROP, ADD, COPY and MOVE queries
are generated for each graph appearing in the dataset.
As each of the update queries, that were generated from the BSBM dataset,
were deemed secure, sound and maximum, we can conclude that the update
query processing algorithm is secure, sound and maximum.
6
Conclusion and Future Directions
Although the technology to link web data with other relevant data using machinereadable formats has been in existence for a number of years, in order to support
the next generation of eBusiness applications on top of Linked Data appropriate
security and privacy mechanisms need to be put in place. When it comes to
access control via query rewriting, existing proposals do not consider complex
queries that include negation and subqueries, and to the best of our knowledge
there is currently no query rewriting strategy for update queries. Also, to date
researchers have focused on performance evaluations, as opposed to verifying
the correctness of the proposed access control mechanisms.
In this paper, we proposed a query rewriting strategy for both the SPARQL
1.1 queries and updates. We redefined a set of criteria, which was originally used
to verify that an access control policy holds over different database states, to
allow for access control via query rewriting to be compared against access control
via results filtering. We subsequently used the adapted correctness criteria to
evaluate the proposed query rewriting algorithms. In future work we plan to
devise an appropriate query rewriting strategy for property paths. Based on our
initial performance evaluation, which is not presented in the paper, it is evident
that FILTER NOT EXISTS can be expensive. As such, we plan to investigate
query optimisation techniques for the different SPARQL 1.1 query types and
to publish a benchmark that can be used to assess alternative access control
strategies.
References
1. Fabian Abel, JuriLuca De Coi, Nicola Henze, ArneWolf Koesling, Daniel Krause,
and Daniel Olmedilla. Enabling advanced and context-dependent access control
14
in rdf stores. In The Semantic Web, volume 4825 of Lecture Notes in Computer
Science, pages 1–14. Springer Berlin Heidelberg, 2007.
2. W. Chen and H. Stuckenschmidt. A model-driven approach to enable access
control for ontologies. Business Services: Konzepte, Technologien, Anwendungen:
Wirtschaftsinformatik 2009, pages 663–672, 2010.
3. Luca Costabello, Serena Villata, Nicolas Delaforge, and Fabien Gandon. Linked
data access goes mobile: Context-aware authorization for graph stores. In LDOW
- 5th WWW Workshop on Linked Data on the Web, 2012.
4. Sebastian Dietzold and Soren Auer. Access control on rdf triple stores from a
semantic wiki perspective. In Proceedings of the ESWC’06 Workshop on Scripting
for the Semantic Web, 2006.
5. S. Franzoni, P. Mazzoleni, S. Valtolina, and E. Bertino. Towards a fine-grained
access control model and mechanisms for semantic databases. In Web Services,
2007. ICWS 2007. IEEE International Conference on, pages 993–1000, July 2007.
6. Tom Heath and Christian Bizer. Linked data: Evolving the web into a global data
space, volume 1. Morgan & Claypool Publishers, 2011.
7. Jian Li and WilliamK. Cheung. Query rewriting for access control on semantic
web. In Willem Jonker and Milan Petković, editors, Secure Data Management,
volume 5159 of Lecture Notes in Computer Science, pages 151–168. Springer Berlin
Heidelberg, 2008. ISBN 978-3-540-85258-2.
8. Hannes Muhleisen, Martin Kost, and Johann-Christoph Freytag. SWRL-based
Access Policies for Linked Data. In Proceedings of the Second Workshop on Trust
and Privacy on the Social and Semantic Web (SPOT2010), 2010.
9. Kieron O’Hara, Noshir S Contractor, Wendy Hall, James A Hendler, and Nigel
Shadbolt. Web science: understanding the emergence of macro-level features on
the world wide web. Foundations and Trends in Web Science, 4(2-3), 2013.
10. Said Oulmakhzoune, Nora Cuppens-Boulahia, Frédéric Cuppens, and Stephane
Morucci. fquery: Sparql query rewriting to enforce data confidentiality. In
Sara Foresti and Sushil Jajodia, editors, Data and Applications Security and Privacy XXIV, volume 6166 of Lecture Notes in Computer Science, pages 146–161.
Springer Berlin Heidelberg, 2010.
11. Said Oulmakhzoune, Nora Cuppens-Boulahia, Frederic Cuppens, and Stephane
Morucci. Privacy policy preferences enforced by sparql query rewriting. In Availability, Reliability and Security (ARES), 2012 Seventh International Conference
on, pages 335–342. IEEE, 2012.
12. Qihua Wang, Ting Yu, Ninghui Li, Jorge Lobo, Elisa Bertino, Keith Irwin, and JiWon Byun. On the correctness criteria of fine-grained access control in relational
databases. In Proceedings of the 33rd International Conference on Very Large
Data Bases, VLDB ’07, pages 555–566. VLDB Endowment, 2007.
15