Amorphous Program Slicing
Mark Harman
Brunel University, Uxbridge, Middlesex, UB8 3PH, UK.
David Binkley
Loyola College in Maryland, Baltimore, Maryland 21210, USA.
Sebastian Danicic
Goldsmiths College, University of London, New Cross, London SE14 6NW, UK.
Abstract
Traditional, syntax-preserving program slicing simplifies a program by deleting com-
ponents (e.g., statements and predicates) that do not affect a computation of in-
terest. Amorphous slicing removes the limitation to component deletion as the only
means of simplification, while retaining the semantic property that a slice preserves
the selected behaviour of interest from the original program. This leads to slices
which are often considerably smaller than their syntax-preserving counterparts.
A formal framework is introduced to define and compare amorphous and tradi-
tional program slicing. After this definition, an algorithm for computing amorphous
slices, based on the System Dependence Graph, is presented. An implementation of
this algorithm is used to demonstrate the utility of amorphous slicing with respect
to code-level analysis of array access safety. The resulting empirical study indicates
that programmers’ comprehension of array safety is improved by amorphous slicing.
Key words: Program Slicing, Program Dependence Graph, Program
Transformation, Program Comprehension
1 Introduction
This paper concerns a variation of traditional program slicing called amor-
phous slicing, which can produce smaller slices by abandoning the traditional
restriction to syntax-preservation. Traditional, syntax-preserving program slic-
ing simplifies a program by removing program components (i.e., statements
and predicates) that do no affect a computation of interest. The resulting
Preprint submitted to Elsevier Preprint 3 November 2002
1 scanf("%d",&n); 1 scanf("%d",&n);
2 sum=0; 2 sum=0;
3 product=1; 3
4 while (n>0) 4 while (n>0)
5 { 5 {
6 sum=sum+n; 6 sum=sum+n;
7 product=product*n; 7
8 n=n-1; 8 n=n-1;
9 } 9 }
′
p: Original Program p : Slice of p w.r.t. sum at line 9
Fig. 1. A program and one of its syntax-preserving slices
(smaller) program, called a slice, captures a projection of the semantics of the
original program. Figure 1 illustrates syntax-preserving program slicing with
a simple example of a program, p, and one of its slices, p′ .
Slicing is useful in many aspects of software engineering because it preserves
a projection of the semantics of the original program while reducing the size
of the program. Originally slicing was applied to debugging [?], [?], [?] and
(more recently) to algorithmic debugging [?,?]. Many other applications have
also been investigated, for example,
• Cohesion Measurement
[?], [?], [?]
• Re-engineering
[?], [?], [?]
• Component Re-use
[?], [?]
• Program Comprehension
[?], [?]
• Maintenance
[?]
• Testing
[?], [?], [?], [?], [?]
• Program Integration
[?], [?]
• Software Conformance Certification
[?]
Tip [?], Binkley and Gallagher [?], Harman and Hierons [?] and De Lucia [?]
survey paradigms, applications and algorithms for program slicing.
In addition to capturing a projection of a program’s semantics, the original
application of slicing required a syntactic property: a slice must preserve a
subset of the original program’s syntax. Syntax preservation allows statements
2
such as the following to be made about slices
“if the bug is an ‘error of commission’, and it manifests itself as a wrong
value in variable x at statement s, then the slice with respect to x at s
contains the bug.”
This syntactic requirement is also important when slicing is applied to cohesion
measurement, algorithmic debugging and program integration. However, for
applications such as re-engineering, program comprehension, and testing it is
primarily the semantic property of a slice that is of interest.
This paper describes a variation on the slicing theme, termed amorphous pro-
gram slicing, in which the syntactic requirement is dropped while the semantic
requirement is retained. For a slice taken with respect to x at statement s,
amorphous slicing preserves the effect of the original program on the values
of variable x at s, but places no restriction upon the syntax of the slice.
Example. Consider the following program fragment:
for(i=0,sum=a[0],biggest=sum;i<19;sum=a[++i])
if (a[i+1] > biggest)
biggest = a[i+1];
average = sum/20;
The fragment was written with the intention that the variable biggest would
be assigned the largest value in the 20-element array a, and that the variable
average would be assigned the average of the elements of a. However, the
fragment contains a bug which affects the variables sum and average, but not
the variable biggest.
To illustrate amorphous slicing, the variables biggest and average will be
analyzed using both traditional syntax-preserving slicing and amorphous slic-
ing. The syntax-preserving slice on the final value of biggest consists of all
but the last statement. This does not help to make clear the correctness of
the computation of biggest. However, the amorphous slice, constructed using
transformations such as a loop unrolling (which has changed the loop bounds)
is the following:
for(i=1, biggest=a[0]; i<20; ++i)
if (a[i] > biggest)
biggest = a[i];
¿From the amorphous slice it is a little clearer that the computation on
biggest is correct.
3
The syntax-preserving slice on the final value of average contains all but the
if statement from the original program. Once again, this simplification is
a relatively weak aid to comprehension. However, the amorphous slice goes
much further:
average=a[19]/20;
This amorphous slice reveals, rather more starkly, that there is a fault in the
program affecting its computation with respect to average (though its effect
on biggest is not influenced by the fault).
In this paper, slicing is used to refer to static backward slicing [?], [?]. Thus, the
two kinds of slicing studied herein are (traditional) syntax-preserving static
backward slicing (hereafter syntax-preserving slicing) and amorphous static
backward slicing (hereafter amorphous slicing). Slices are termed ‘backward’
because dependences are followed backward to their sources and they are
termed ‘static’ because input values are not known. In contrast, in a forward
slice, dependences are followed forward to their targets; thus capturing the
components affected by some other particular component. A dynamic slice
considers one particular input to the program.
The rest of this paper is organized as follows.
• Section 2: Theoretical Foundations.
This section describes a theory of program projection and uses it to define
and compare syntax-preserving slicing and amorphous slicing.
• Section 3: Safety Slicing.
This section describes the application of an implementation of amorphous
slicing to array access safety analysis.
• Section 4: Empirical Study.
This section describes the results of an empirical study aimed at assessing
the suitability of amorphous slicing as an aid to program comprehension.
• Section 5: Related Work.
This section describes previous work which involved some level of ‘amor-
phousness’ in slice construction or a combination of slicing and transforma-
tion.
• Section 7: Summary.
This section provides a briefly summary.
2 Theoretical Foundations
In order to formalize amorphous slicing and to compare it with syntax-preserving
slicing, a framework of program projection is introduced. Syntax-preserving
4
slicing and amorphous slicing are then cast as instances within this framework.
2.1 A Framework for Program Projection
The projection framework is in essence a generalization of program slicing. It
is defined with respect to two relations on programs: a syntactic ordering and
a semantic equivalence. The syntactic ordering is simply an ordering relation
on programs. It is used to capture the syntactic property that slicing seeks to
optimize. Programs which are lower according to the ordering are considered to
be ‘better’. The semantic relation is an equivalence relation which captures the
semantic property which remains invariant during slicing and transformation.
Definition 1 (Syntactic Ordering)
A syntactic ordering, denoted by <
∼
, is any computable transitive reflexive
relation on programs.
Definition 2 (Semantic Equivalence)
A semantic equivalence, denoted by ≈, is an equivalence relation on a projec-
tion of program semantics.
Definition 3 ((<∼
, ≈) Projection)
Given syntactic ordering <
∼
and semantic equivalence ≈,
Program p is a (<
∼
, ≈) projection of program q ⇔ p<
∼
q ∧ p≈q
That is, in a projection, the syntax can only improve while the semantics of
interest must remain unchanged.
2.2 Syntax Preserving Slicing in the Framework
The following definition formalizes the oft-quoted remark:
“a slice is a subset of the program from which it is constructed.”
It defines the syntactic ordering for syntax-preserving slicing. Note that for
ease of presentation, it is assumed that each program component occupies a
unique line. Thus, a line number can be used to uniquely identify a particular
program component.
Definition 4 (Syntax-Preserving Syntactic Ordering)
Let F be a function which takes a program and returns a partial function
5
from line–numbers to statements, such that the function F (p) maps l to c iff
program p contains the statement c at line number l. The syntax-preserving
syntactic ordering, denoted by <
∼ SP
, is defined as follows:
p<
∼ SP
q ⇔ F (p) ⊆ F (q).
Example. In Figure 1, p′ <
∼ SP
p.
The second part of the projection framework is the semantic equivalence. The
semantic property that syntax-preserving slicing respects is based upon the
concept of a state trajectory: The following definitions of state trajectory, state
restriction, P roj, and P roj ′ are extracted from Weiser’s definition of a slice [?].
Definition 5 (State Trajectory)
A state trajectory is a finite sequence of line–number × state pairs:
(l1 , σ1 )(l2 , σ2 ) . . . (lk , σk )
where entry i is (li , σi ) if after i statement executions the state is σi , and the
next component to be executed is at line number li .
Definition 6 (State Restriction)
Given a state, σ and a set of variables V , σ | V restricts σ so that it is defined
only for variables in V :
σ x if x ∈ V
(σ | V )x =
⊥
otherwise
Definition 7 (P roj ′ )
For slicing criterion (V, i), line number l, and state σ,
′
λ
if l 6= i
P roj(V,i) ((l, σ)) =
(l, σ | V ) if l = i.
where λ denotes the empty string.
Definition 8 (P roj)
For slicing criterion (V, i) and state trajectory T = (t1 , t2 , . . . , tk ),
′ ′
P roj(V,i) (T ) = P roj(V,i) (t1 ) . . . P roj(V,i) (tk ).
6
For the slicing criterion (V, i), which specifies a slice taken with respect to
a set of variables V at line number i, P roj extracts from a state trajectory
the values of the variables in V at statement i. Using P roj, the semantic
equivalence for static slicing can now be defined:
Definition 9 (Static Semantic Equivalence)
Given two programs p and q, and slicing criterion (V, i), p is syntax-preserving
(V,i)
semantically equivalent to q, written p ≈SP q, iff for all states σ, when the
execution of p in σ gives rise to a trajectory Tp and the execution of q in σ
gives rise to a trajectory Tq , then P roj(V,i) (Tp ) = P roj(V,i) (Tq ).
({sum},9)
Example. In Figure 1, p ≈SP p′ and p′ ≈SP
({sum},9)
p.
Weiser deliberately left unspecified the behaviour of slices in states where the
original program fails to terminate [?]. Subsequent work by Cartwright and
Felleisen [?] clarified the behavior of slicing and related dependence-based
analyses, showing that they respect a lazy (or demand driven) semantics.
In this paper, the semantic equivalence preserved during slicing is re-cast as a
parameter to the definition of a slice. This allows for different interpretations
and definitions of the way in which slicing is to behave. The framework is
sufficiently general to allow the definition of dynamic [?] and conditioned slices
[?].
Because the syntax-preserving semantic equivalence relation is parameterized
by V and i, Definition 9 describes a class of relations based upon the choice
of V and i. This reflects the fact that each slicing criterion yields a slice which
respects a different projection of the semantics of the program from which it
is constructed.
Instantiating Definitions 4 and 9 into Definition 3, yields the following:
Definition 10 (Syntax Preserving Slicing)
Program q is a syntax-preserving slice of program p with respect to the slicing
(V,i)
criterion (V, i) iff q is a (<
∼ SP
, ≈SP ) projection of p.
Example. Program p′ shown in Figure 1 is a (<
∼ SP
, ≈({sum},9)
SP ) projection of p.
2.3 Amorphous Slicing in the Framework
To define amorphous slicing, only the syntactic ordering needs to be altered;
the semantic equivalence remains the same. For amorphous slicing, a natural
choice for a syntactic relation is that p is less than q if it contains fewer
statements. Such a definition of the syntactic ordering naturally depends upon
7
how statements are counted. The approach adopted in this paper, in keeping
with the spirit of slicing, is to count the nodes in the program’s control-flow
graph(CFG) [?], [?].
Definition 11 (Amorphous Syntactic Ordering)
Let N (p) be the number of nodes in the CFG of program p. The amorphous
syntactic relation, denoted by <
∼ AS
, is defined as follows:
p<
∼ AS
q ⇔ N (p) ≤ N (q)
Example. In Figure 2, p1 < p , p1 <
∼ AS 2
p , and p2 <
∼ AS 3
p.
∼ AS 3
Definition 12 (Amorphous Semantic Equivalence)
≈AS = ≈SP
Definition 13 (Amorphous Slicing)
Program q is an amorphous slice of program p with respect to the slicing
(V,i)
criterion (V, i) iff q is a (<
∼ AS
, ≈AS ) projection of p.
Example. Consider the program fragments in Figure 2. Figure 3 shows the
trajectories (for an initial state {}), t1 , t2 , t3 of the three corresponding pro-
gram fragments, p1 , p2 , p3 in Figure 2. Projecting each onto the final state and
projecting each final state onto the mapping for the variables of interest ({x})
gives the same result ({x 7→ 2}) in each case, indicating that each is ≈({x},end)
AS
equivalent to the others. However, none is a syntax-preserving slice of the oth-
ers, as none is a syntactic subset of the others. In contrast, since p1 contains
fewer statements than p2 (which contains fewer statements than p3 ), p1 is an
amorphous slice of p2 and p3 , and p2 is an amorphous slice of p3 .
Observe that if two programs are equivalent, then at least one must be an
amorphous slice of the other. It is even possible that two equivalent programs
are both amorphous slices of each other, Neither of these two observations is
true for syntax-preserving slicing.
In order to determine whether two programs are equivalent according to ≈(V,i) AS ,
one final issue is worth discussing: It is important to identify the slice point in
both the slice and the original program. For syntax-preserving slicing, this is
not normally an issue because it is usually obvious which point in the sliced
program corresponds to the slice point in the original; lines are numbered and
the numbering is preserved as a part of the syntax during slicing. However,
for amorphous slicing, the slicing algorithm will need to return, not just the
amorphous slice, but also a pointer to where the slice point is located in the
slice, because this will not generally be obvious from the context.
8
1 x=2; y=10; z=-1;
2 x=y-8; x=z+3;
3 y=10;
end
p1 p2 p3
Fig. 2. Examples of amorphous slices for variable x and slice point end
p1 < (1, {}), (end, {x 7→ 2}) >
p2 < (1, {}), (2, {y 7→ 10}), (end, {y 7→ 10, x 7→ 2}) >
p3 < (1, {}), (2, {z 7→ −1}), (3, {z 7→ −1, x 7→ 2}), (end, {z 7→ −1, x 7→ 2, y 7→ 10}) >
Fig. 3. Trajectories for the Examples in Figure 2
Amorphous slicing subsumes syntax-preserving slicing because < ∼ SP
⊆ <∼ AS
and ≈AS = ≈SP . To see that the converse is not true, observe that p1 in
Figure 2 is an amorphous slice of p2 , but not a syntax-preserving slice. As a
consequence of this, any implementation of syntax-preserving slicing is also an
implementation of amorphous slicing, albeit a rather sub-optimal implemen-
tation. Therefore, there will always be an amorphous slice which is no larger
than the syntax-preserving slice constructed for the same slicing criterion. Of
course, there will often be an amorphous slice which contains less of the pro-
gram than the corresponding syntax-preserving slice. This makes amorphous
slicing attractive when preservation of syntax is unimportant.
3 Safety Slicing: An Application of Amorphous Slicing
When trying to understand a large program, it is beneficial to have a tool that
can answer questions about the program. These questions represent specula-
tive hypotheses about the supposed behavior of the program. It would be
attractive to have a fully automated formal reasoning tool, which would be
sufficiently powerful to decide all propositions. Unfortunately, because all but
the most trivial propositions will concern properties of the system which are
undecidable, such a quixotic tool is impossible.
Rather that attempt to prove such propositions, the technique used in this sec-
tion advocates the production of a simplified program from the question. Thus,
this technique approximates answers to undecidable propositions using pro-
gram simplification [?]. Recasting the problem as one of simplification circum-
vents the undecidability problem because the simplification process consists of
approximating the ideal simplest program. This provides partial automation as
9
a tool can make progress in deciding the truth or otherwise of the proposition,
in cases where it is impossible to fully decide it.
A large class of hypotheses concern aspects of a program which are not ex-
plicitly represented in the program’s state (recall that a state maps identifiers
to values). By its very definition, such ‘implicit state’ information is beyond
the reach of program slicing, as a slice is computed with respect to a set of
variables.
This problem can be circumvented by making part of the implicit state ex-
plicit [?]. In order to do this, the program is augmented with assignments to a
pseudo variable. These assignments capture the effect of the original program
on the (implicit) semantic property of interest, thereby making it explicit and
thus, sliceable.
This section describes an example application of amorphous slicing to a prob-
lem that involves implicit state. The particular problem studied is that of array
bounds safety: For each array reference a[i], the analysis seeks to determine if i
can be guaranteed to be within the bounds of the array. After introducing the
steps used to make array bound state information explicit, an algorithm for
computing the safety slices (the amorphous slice of the augmented program)
is presented. This is followed by an extended example and a discussion of the
algorithm’s complexity. The next section presents an empirical study of safety
slicing as an aid to program comprehension.
In the case of safety slicing, to make the necessary implicit state explicit, a
program is augmented to include assignments to a new variable, safe, such
that safe has the value true in all executions which do not cause an array
access error and false in all those that do [?]. In more detail, augmentation
adds, at the beginning of the program, the assignment “safe = true” (at
the start of execution there have been no array bounds errors). At the end of
the program the statement “print safe” is added. Finally, before each array
reference a[i] the assignment statement:
safe = safe && i>=0 && i<upper_bound(a);
is added (this assumes that arrays start at index 0, as they do in C, the target
language of the tool).
After augmentation, array safety becomes part of the explicit state. An amor-
phous slice taken with respect to the statement “print safe” captures all
the computations relevant to the array safety question.
10
3.1 Computing Safety Slices
A prototype implementation based on the System Dependence Graph (SDG),
was built to study safety slices. An SDG represents a program as a collec-
tion of vertices and edges [?]. The vertices represent the components of the
program (e.g., assignment statements and predicates) and the edges represent
dependences between them. For example, the SDG for the first while loop
and the initialization code which precedes it in Column 1 of Figure 8 is shown
in Figure 9.
There are two kinds of edges in an SDG: control dependence edges and data
dependence edges. A control dependence edge is labeled either true or false.
It connects a predicate vertex v to a vertex u if during execution, whenever the
predicate represented by v is evaluated and its value matches the edge’s label,
then the program component represented by u will eventually be executed
(provided the program terminates normally). A control dependence edge, with
label L is denoted by →Lc (the label is omitted when clear from context).
The only data-dependence edges used in the implementation are flow depen-
dence edges. A flow dependence edge, denoted by →f , runs from a vertex that
represents an assignment of a variable to a vertex that represents a use of the
variable reached by that assignment. A flow dependence edge is loop-carried,
denoted by →lc , if the value is passed from one loop iteration to another and
loop-independent, denoted by →li , otherwise.
The implementation follows the algorithm shown in Figure 4. It first builds
the program’s SDG before calling Function ConstructSlice, which repeatedly
takes a syntax-preserving slice and then applies a collection of transformation
rules. This function returns when additional slicing and transformations have
no effect on the graph. A simple inductive argument shows that the recursion
always terminates and that the function returns a program no larger than the
syntax-preserving slice of its input.
Function ‘AmorphousSlice’ then produces a program from the resulting graph.
To do this requires that the SDG is feasible (a feasible SDG is the SDG of some
program). Based on the kind of transformation applied to the SDG, this step
is a simplification of existing algorithms [?], [?], [?]. Finally, the feasible graph
is reconstituted into a program by producing a linear order for the vertices in
the graph that respects the ordering constraints of the dependence edges.
The function ‘Transform’ captures the part of the algorithm which makes the
slicing process amorphous. Many transformations are essentially graph based
compiler optimizations [?], [?]. For example, SDG based constant propagation
can use a data-flow model where nodes “fire” updating the graph by, for ex-
11
function AmorphousSlice(P : Program, S: a set of Statements)
returns a Program
declare
G: an SDG
SlicedP : a program
begin
G := BuildSDG(P )
G := ConstructSlice(G, V erticesF or(S))
G := M akeF easible(G)
SlicedP := ReconstituteP rogram(G)
return SlicedP
end
function ConstructSlice(G: SDG, S: Vertex Set) returns a Program
declare
Gsliced , Gtransf ormed : SDGs
begin
Gsliced := T raditionalSlice(G, S)
Gtransf ormed := T ransf orm(Gsliced )
if (G = Gtransf ormed )
return Gsliced
else
return smaller( ConstructSlice(Gtransf ormed , S), Gsliced )
end
Fig. 4. Construction algorithm for amorphous slicing
→+x denotes a path containing one or more edges of kind x.
u[a/b] denotes the replacement of each b in u with a.
dup(n) denotes a duplicate of node n that has a copy of the
and all incoming and outgoing edges of n. n′ is used to
denote the duplicate of n.
copy(n) denotes the duplicate of n if one has been created
and n otherwise.
n:L denotes node n labeled with text L.
sourceSubset(u, v) in(u) − {v →f u} ⊆ in(v)
allowable(a, v, u) | {w | w →f u ∧ a ∈ DEF(w)} |= 1
∧ (a ∈
/ DEF(u) ∨ sourceSubset(u, v))
Fig. 5. Auxiliary Definitions
ample, placing values “on” flow-dependence edges [?]. Other examples include
procedure in-lining and procedure specialization.
Figures 5 and 6 present example SDG transformation rules used for safety
slicing. The figures highlight transformations that are not standard compiler
optimizations, but that are useful for amorphous slicing. Figure 5 contains
auxiliary definitions, used by the transformation rules of Figure 6. Most of
these auxiliary functions are straightforward. The exception is the function
allowable, which is explained in the description of the Flow-Edge Removal
Rule (1). Note that in Figure 6 the removal of node n includes the removal of
all edges that involve n (i.e., x → n and n → x).
12
Rule 1 (Flow Edge Removal)
v →f u ∧ v : “a = e” ∧ allowable(a, v, u)
RHS(label(u))[e/a]
SDG ← SDG − {v →f u} ∪ {x →f u | x →f v}
Rule 2 (Loop Anti-Fusion)
n : “while...”, B, the loop body, can be partitioned into B1, B2, and {inc}
init ∈
/ B ∧ init →f n ∧ inc ∈ B ∧ inc →lc n
¬∃x → y | (x ∈ B1 ∧ y ∈ B2) ∨ (x ∈ B2 ∧ y ∈ B1)
¬∃x →f inc | x ∈ (B1 ∪ B2)
n′ = dup(n), inc′ = dup(inc), init′ = dup(init)
let A = {n, init, inc} and A′ = {n′ , init′ , inc′ }
SDG ← SDG ∪ A′ ∪ {copy(a) → copy(b) | a → b}
−{x → y | x ∈ A ∧ y ∈ B2 ∪ A′ } − {x → y | x ∈ A′ ∧ y ∈ B1 ∪ A}
Rule 3 (Dependent Assignment Removal)
v →f u ∧ v : “saf e = saf e && P 1” ∧ u : “saf e = saf e && P 2”
+
P 1 ⇒ P 2 ∧ sourceSubset(u, v) ∧ ∃n | n →L L
c v ∧ n →c ‘if ’ →c u
SDG ← SDG − {u} ∪ {v →f x | u →f x}
Rule 4 (Induction Variable Elimination)
n : “while · · · i · · · ” ∧ init : “i = · · · ” ∧ inc : “i = · · · ” ∧ u : “saf e = · · · ”
init →f n ∧ inc →lc n ∧ B, the body of n, is {inc, u}
i is an induction variable, where if is the symbolic expression of its final value
uf = copy(u)[if / i], RHS(label(inc))[if / i]
SDG ← SDG − {n} − {u →f x} − {x →f uf | saf e ∈ DEF (x) ∨ i ∈ DEF (x)}
∪ {u →f uf } ∪ {x →f inc | x →f n} ∪ {x →f uf | x →f n ∨ x →f inc}
∪ {x →c u, x →c uf, x →c inc | x →c n}
Fig. 6. Example SDG Transformation Rules
Function Transform contains two kinds of transformations: general purpose
transformations and domain specific transformations. The first two rules are
examples of general purpose rules. The second two are examples of domain
specific rules that are explicitly designed to work with safety slicing. Sometimes
this distinction is blurred as even general rules can be specialized. For example,
the Loop Anti-Fusion Rule (Rule 2) can be specialized to favor separating out
assignments to safe.
The transformation rules from Figure 6 are now considered in some detail.
First, the Flow-Edge Removal Rule (1) states how an expression e can be
propagated along a flow edge from an assignment vertex v labeled “a = e” to
a vertex u. For this rewriting to be allowable, u must be the target of exactly
one flow-dependence edge for variable a. Furthermore, if u represents a vertex
that also defines a, then the rewriting rule requires the incoming flow edges of
u be a subset of the incoming flow edges of v (excluding the edge v →f u). This
ensures that the rewritten graph can be transformed back into a sequential
program. For example, it is unsafe to remove the flow edge from the second
to the fourth assignment in the sequence
i=1; i=i+1; a=i; i=i+1;
13
If the rewriting is allowed, the SDG is updated by copying the incoming flow
dependence edges of v to u, rewriting the label of u by substituting e for a,
and removing the flow dependence edge v →f u.
The Loop Anti-Fusion Rule (Rule 2) splits a single loop into two loops. Refer-
ring to Figure 7, init represents the loop initialization, inc the loop increment,
and n the loop predicate. Recall that copy(a) denotes a unless a has been du-
plicated in which case it refers to the duplicate. Thus, the set of edges
{copy(a) → copy(b) | a → b}
includes mostly edges already present in the SDG, but also necessary new
edges required to connect the new vertices. Unwanted edges, which link the
two loops, are excluded by the final line of the rule.
In the present implementation, the Loop Anti-Fusion Rule requires that the
loop initialization and increment are achieved using one node each. Further-
more, it assumes that the right-hand sides of these expressions are side effect
free. It is possible to provide a more general rule [?], which can handle side
effects and multi-node initialization and increment. Alternatively, additional
transformations can be used to enforce side-effect freedom [?]. One such trans-
formation would replace an initialization
i = f(x);
with the sequence
tmp = f(x); i = tmp;
Where the second of these plays the role of the initialization.
The Dependent Assignment Removal Rule (Rule 3) is a specialized form of
“loop”-independent code removal [?]. It works because assignments to safe
are idempotent; thus, when the pre-conditions hold, it is possible to remove
the second of the two assignments. The rule could potentially be applied to
control-structures other than if statements, but the present implementation
does not attempt this.
Finally, the Induction Variable Elimination Rule replaces qualifying loops with
two assignments that capture the requirements on array safety. Informally, it
replaces the graph equivalent of
14
Fig. 7. The anti-fusion transformation
15
i= k0 ;
while (i < e )
{
safe = safe && f(i);
i = i + k1 ;
}
with the graph equivalent of
safe = safe && f(k0 );
safe = safe && f(kf inal );
where kf inal is the value of i in the final iteration of the loop. Note that this
transformation is safe because of the structure of f(). That is, f() is created
specifically to check array bounds and so is of the form “ · · · i >= 0 && i <
MAX · · · ”.
While this rule may seem to have limited applicability, it works well after slic-
ing or loop-anti fusion has produced suitable candidates for induction variable
elimination. In this rule, the updated vertex u and its copy uf represent the
safety test performed by u on the first and final iterations of the loop. The
control edges added cause a decrease in the nesting level of u, uf, and inc (inc’s
role is changed to ensuring that the “downward” exposed definition of i has
the correct value). Because u and uf form “straight-line code” the flow edges
out of u and into uf are simplified: u’s sole flow successor is uf. The other flow
edges added to the SDG cover variables used in the rewritten expression of uf
and inc.
The implementation presently includes a simplified version of general induc-
tion variable elimination [?]. Three simplifications include the assumption that
the rule is only applied to structured while loops. Second, it is assumed that
the final value of the induction variable, if , can be determined from the ex-
pressions labeling init, inc, and n. The resulting expression may be symbolic.
If the implementation cannot determine if then the rule is not applied. Fi-
nally, the transformation is only applied if it can be determined that the loop
executes at least once.
There are other places where the transformations might better exploit avail-
able information. For example, the SDG encodes significant information about
pointers (including function pointers) in its dependence edges. The present
points-to analysis, which is user selectable, is based on either Andersen’s [?]
or Steensgaard’s [?] algorithm. Points-to information is directly available to
16
the transformation phase of the SDG-based algorithm through a CodeSurfer
API. However, it is not fully exploited by the current implementation.
3.2 Safety Slicing Example
This section traces the computation of a safety slice using the program shown
in Column 1 of Figure 8. This program, which computes the most frequently
occurring lower case character in its input, has already been augmented with
assignments to safe. Italicized statements in Column 1 are removed by the first
slice (note how much of the program remains after syntax-preserving slicing).
Column 2 shows the program after this slice and the first rewriting. Italicized
statement in Column 2 are removed by the second slice. The results of the
second rewriting are shown in the Column 3. Again, italicized statements in
Column 3 are removed by a third slice. Column 4 shows the results of a third
rewriting and an expression simplification, which performs constant folding
and other similar simplifications. The italicized statement in Column 4 is
removed by a fourth rewriting.
The bulk of the rewriting concerns the three while loops and is now dis-
cussed in greater detail. To begin with, consider the transformation of the
first while-loop and the statements immediately preceding it. As illustrated
in Figure 9(a), after slicing, the while loop subgraph is a candidate for anti-
fusion because the assignment to safe and the assignment to result are not
data-dependent on each other. Figure 9(b) shows the result of the transforma-
tion, which isolates the computation of safe (the boxed region in Figure 9(b))
from other computations of the loop. Induction variable elimination replaces
this loop with a pair of assignment vertices, which, followed by flow edge
removal, reduces to a single assignment vertex.
Figure 10 illustrates the removal of flow edges from the middle loop of the
program. The initial loop, shown in Figure 10(a), has many flow edges that
are candidates for flow-edge removal. For example, Figure 10(b) shows the
result of rewriting the edges labeled f low 1 and f low4 . Rewriting f low 1 adds
the edge labeled f low 3 as shown in Figure 10(b). This edge is a copy of the flow
dependence edge from the first assignment of safe to the second assignment
of safe.
The rewriting of f low 4 in essence moves the final assignment of safe to before
the increment of i. Part of the change in the graph is subtle: the edge from
the increment of i is now a loop carried edge (indicating that the increment
must now follow the assignment to safe in the loop body). In addition, the
target of f low 4 is now the target of an edge from the initialization of i (this
is a copy of the edge from the initialization of i to the increment of i). These
17
Post first slice Post first rewrite Post second rewrite Post simplify and third rewrite
int safe; int safe;
char source [MAX]; char source [MAX]; int safe; int safe;
int result [128]; int result [128]; char source [MAX]; char source [MAX];
int i, bc, x; int i, bc, x;
int i; int i;
safe = 1; /* starts out safe */ safe = 1; /* starts out safe */
printf(”Enter a string”);
scanf("%s",source); scanf("%s",source); scanf("%s",source);
i = 0; i = 0; scanf("%s",source);
while (i<128) safe = 1 && 0>=0 && 0<128
while (i < 128) safe = 1;
{ && 1 && 127>=0
{ && 127<128
safe = safe && i>=0 safe = safe && i>=0
&& i<128; i = 0;
&& i<128;
result[i] = 0; i=i+1;
i=i+1; }
} i = 0;
safe = safe && 0>=0 && 0<MAX;
while (i < 128)
{
result[i] = 0;
i=i+1;
}
i = 0; safe = safe && 0>=0
safe = safe && i>=0 && 0<MAX;
&& i<MAX; i = 0; i = 0; i = 0;
while (source[i] != ’\0’) while (source[i] != ’\0’)
while (source[i] != ’\0’) while (source[i] != ’\0’)
{ {
{ {
safe = safe && i>=0 safe = safe && i>=0
safe = safe && i>=0 safe = safe && i>=0
&& i<MAX && source [i]>=0
&& i<MAX; && i<MAX
&& source [i]>=0 && source [i]<128
x = source[i]; safe = safe && source [i]>=0
&& source [i]<128 && i+1<MAX;
i = i+1; && source [i]<128
safe = safe && x>=0 && source [i]>=0 && source [i]>=0
&& x<128; && source [i]<128; && source [i]<128
safe = safe && x>=0 && i+1>=0 && i+1>=0
&& x<128; && i+1<MAX && i+1<MAX;
result[x] = result[x]+1; x = source[i]; i = i+1;
safe = safe && i>=0 i = i+1; i = i+1;
&& i<MAX; result[x] = result[x]+1; }
} } }
b = 0;
bc = 0; bc = 0;
i = ’a’; i = ’a’; i = ’a’; safe = safe;
while (i < ’z’) while (i < ’z’) safe = safe && ’a’>=0
{ { && ’a’<128
safe = safe && i>=0 safe = safe && i>=0 && safe && ’z’>=0
&& i<128; && i<128; && ’z’<128;
if (result[i] > bc) if (result[i] > bc)
{ {
b = i
safe = safe && i>=0
&& i<128;
bc = result[i]; bc = result[i];
} }
i = i+1; i = i+1; }
} }
printf(”Most frequent: %c ”,b);
printf(”occurs %d times\n”,bc); printf("safe = %d",safe); printf("safe = %d",safe);
printf("safe = %d",safe); printf("safe = %d",safe);
Fig. 8. The computation of a safety slice
rewritings update the expression labeling the targets of the two edges. For
example, the target of f low 1 is rewritten to
safe = safe && x>=0 && x<128 && x>=0 && x<128;
Figure 10(c) shows the result of removing the edges labeled f low 2 and then
f low3 . In each removal, if the source vertex has no other outgoing edges it is
removed from the figure. The reconstruction of program text from the graph
shown in Figure 10(c) produces the code for the loop in Column 3 of Figure 8.
The implementation then simplifies the resulting expression producing the
statement shown in Column 4.
Finally, the advantage of iterating slicing and transformation is illustrated
18
Fig. 9. The anti-fusion transformation of the first while loop in the example program.
19
Fig. 10. The middle while loop of the example program. The vertex labeled
“saf e = 1” is not from the original program. It is the result of rewriting the
first loop. Anti-fusion has removed the vertex labeled “result[x] = result[x] + 1”
from the loop.
by the third loop. The initial slice removes sufficient parts of the loop body
to allow the application of the Dependent Assignment Removal Rule. This
results in the removal of the inner assignment to safe along with its control
dependence edge on the if statement. Consequently, the second slice does
not include the if statement. The results appear in columns two and three of
Figure 8.
Slicing and transformation work together in other parts of the program as
well. For example, anti-fusion separates the two computations of the first loop
into two loops. The slice shown in Column 2 excludes the loop unrelated to
safe.
As shown in Column 4 of Figure 8, the final program is
20
scanf("%s", source);
safe = 1;
i = 0;
while(source[i] != ’\0’)
{
safe = safe && i>=0 && i+1 < MAX
&& source[i] >= 0
&& source[i] < 128;
i = i+1;
}
printf("safe = %d", safe);
The location of this code immediately tells an engineer that the array ref-
erences in the first and third loops are safe. The code itself indicates that
the middle loop may cause array subscript errors. In particular, the program
stores false in safe if the following condition is ever violated:
i >= 0 && i+1 < MAX && source[i] >= 0 && source[i] < 128
In other words, (since i is never less than 0) the input string is too long
(i.e., i+1 is greater than or equal to MAX), or an input character (held in
source) is outside the range 0, . . . , 127. The present implementation does not
recognize that i will always be greater than or equal to 0 (although there
is no technical reason why it could not). The remaining conditions highlight
two invalid assumptions made by a previous programmer: first, the string read
into source is assumed to be no longer than MAX-1 characters, and second, the
input characters are assumed to be ASCII characters (which are within the
range 0, . . . , 127). After discovery, these assumptions are easily enforced.
3.3 Algorithmic Complexity
This section considers at the complexity of the algorithm. The following vari-
ables are used in the complexity discussion: N is the number of nodes in the
SDG, E is the number of edges, D is the maximum in or out degree of a vertex
(in theory this can be E, but in practice it is both small (< 10) and indepen-
dent of program size). Finally, B is the largest number nodes controlled by
a predicate, which is bounded by N -1 (but like D, is in practice both small
(< 100) and independent of program size). Sets are represented as bit vectors
which are assumed to be manipulated in constant time.
First, consider the complexity of the four rules given in Figure 6, which will
then be combined into the overall complexity of the algorithm. To begin with,
the flow edge removal requires testing the transformation is allowable: A pre-
21
processing step computes, for each assignment, those variables that are defined
at a single source. This step considers all flow-dependence edges and thus has
a complexity of O(E). The rule then considers each flow dependence edge.
Identifying that v has the correct form and that the application is safe, each
require constant time (sourceSubset requires a constant number of bit vector
operations). Thus, the cost of flow-edge removal is bounded by O(E).
Loop anti-fusion considers each node as a candidate n. For each such node,
determining if suitable init and inc nodes exist requires O(D) time. A greedy
partitioning algorithm requires O(BD) time to partition the nodes into B1 and
B2 (if more than two partitions are found they can be combined arbitrarily).
The final test involving inc and the duplicate creation each require O(D) time.
The remaining modifications of the SDG also require O(D) time assuming the
set {copy(a) → copy(b) | a → b} is only materialized when a or b is in A, which
has size 3. This leaves a complexity of O(N BD).
The Dependent Assignment Removal Rule (Rule 3) starts with each possible
n. ¿From n, it takes O(D) time to attempt to find v (whether or not it exists).
Node v has at most D flow successors of which each is a potential u, and it
takes O(E) time to test if a path exists from n to u. Thus, finding n, v, and u
requires O(DE) time. (This is rather conservative. The union find algorithm,
for example, could reduce this complexity, but in practice the out-degree of
v is small and the additional algorithmic complexity is not warranted.) The
remaining conditions on the first line of the rule require constant time to verify.
In general, testing the implication F ⇒ F ′ is unsolvable. However, a simple
(safe) system can compare each term in F against each term in F ′ (recall that
F and F ′ are essentially in conjunctive normal form) in O(T log(T )) time,
where T is the number of terms in F and F ′ . Finally, sourceSubset requires
a constant number of bit-vector operations to test. Combining these terms
yields O(N (DE + T log(T ))).
Like the Dependent Assignment Removal Rule, the Induction Variable Elimi-
nation Rule starts with each candidate n. Finding and verifying init, inc, and
u each require O(D) time. The symbolic computation of if depends on the
expressions at n, init, and inc. Its complexity is represented by O(S). The
updates in this rule require O(1) or O(D) time, so the total time for the rule
is O(N (D + S)).
The complexity of syntax-preserving slicing is O(E); thus, one recursive call
has worst case complexity of O(N (DE + T log(T ) + S)). Each rule reduces
some aspect of the graph. For example, anti-fusion reduces the size of a loop
body, while flow-edge removal shortens the length of flow-edge chains. This
ensures that the recursion will terminate. In the purely pathologic case, the
anti-fusion rule could divide a single loop program in “half” O(N ) times.
22
Similar cases exist for the other rules; thus, the resulting complexity would be
O(N (N (DE + T log(T ) + S)))
However, this is somewhat misleading as most transformations act locally.
Larger program simply have more locales, but do not require more recursive
iterations. Furthermore, as it is unlikely that either T log(T ) or S will dominate
DE; thus, a more informative statement of the complexity is O(N DE).
4 Empirical Study of Safety Slicing
The claim that the application of this algorithm improves comprehension is
evaluated in this section, which presents a series of experiments designed to
assess the effectiveness of safety slicing. The experiments use both qualitative
and quantitative methods to gauge the effectiveness of safety slicing in helping
a programmer identify array subscript violations. This section begins with a
discussion of the experimental setup and design. It then considers the quan-
titative results followed by subjective results, which provide a context for the
numerical data.
4.1 Experimental Setup and Design
The experiment is designed to test the general hypothesis that a safety slice
assists in program comprehension. Specifically, two hypotheses are under test
in this experiment:
(1) Safety slices make human debugging of array safety more accurate, and
(2) Safety slices make human debugging of array safety more efficient.
Seventy six subjects participated in three controlled experiments. The groups
included students half way through the first, second, and third years of study
at Loyola College. Students from each year were divided into a control group
and an experimental group.
For the first-year students, the two groups were created by simply using two
separate sections of the same course (taught by the same instructor). It was
believed that this would provide sufficient random distribution of talent. Un-
fortunately this assumption was incorrect. This was obvious from the students’
demeanor and confirmed by the instructor. The second and third-year students
were divided by class standing; thus, providing more uniform distribution of
talent.
23
To test the two hypotheses, all groups were given the definition of an array
bounds violation, a worked example, and a brief 15 minute lecture on array
bounds violations. The same person gave the lecture to all six groups. For
the first-year students, this was in back to back classes. For the second and
third-year students it was given simultaneously to both groups. In all cases,
the same set of notes was followed to avoid biasing any particular group. In
order to train the subjects in the use of safety slices, the lecture included an
explanation of safety slicing and how to use the safety slice in searching for
array violations.
All groups were then given several programs containing array subscript viola-
tions. In addition, the experimental groups received the safety slice for each
program. The programs included training exercises designed to determine how
well the subjects understood their task and the program used to generate the
results presented in this section. Following the experiment with the programs,
the students completed a questionnaire, which asked participants subjectively
to rate their performance, confidence, and the difficulty of the problems. Each
group was then given the remainder of the lecture period to find array sub-
script violations.
It is interesting to note that the syntax-preserving slice of all the programs
used in the study was essentially the entire program and that, in all cases,
the amorphous slice was smaller than the syntax-preserving slice. When com-
bined with the results of the experiment, this suggests that amorphous slicing
provides additional comprehension benefit over syntax-preserving slicing.
4.2 Quantitative Results
Overall, there is statistical evidence that safety slicing assists in program com-
prehension. For the most part, it improved subjects’ ability to identify array
access violations. However, for experiments involving novice programmers, the
first null hypothesis could not be rejected, and it was not possible to conclude
that amorphous slicing assisted such programmers. (It should be noted that
these subjects were two weeks into their study of arrays and thus using them
as subjects was probably premature.) In contrast, for two more experienced
groups, the experimental subjects (who were able to consult the safety slice)
significantly outperformed the control group. While subjectively, safety slicing
also improved efficiency, the result is not statistically significant.
The data related to Hypothesis 1 for all six groups is summarized in Figure 11.
The row labeled “First-Year Experimental Y” represents a subset of the first-
year experimental group who answered “yes” they understood what a safety
24
Results Summary
Group x s n
First-Year Control 4.64 1.92 22
First-Year Experimental 4.39 2.02 18
First-Year Experimental Y 5.00 1.85 8
Second-Year Control 5.27 1.35 11
Second-Year Experimental 6.33 0.87 9
Third-Year Control 6.00 0.89 11
Third-Year Experimental 6.50 0.93 8
Fig. 11. Empirical Study Results
Fig. 12. The mean and 95% confidence intervals for the seven groups
slice represented. With the exception of the first-year experimental group,
mean scores increase with experience level and with access to the safety slice.
The graph in Figure 12 shows the mean and 95% confidence interval for each
group. In the graph, short vertical bars represent means and longer horizontal
bars represent confidence intervals. The figure should not be used for direct
comparison of groups; this requires a test of the difference of two means.
However, it does show, with 95% confidence, the expected mean. Furthermore,
the trend in the length of the lines illustrates a decrease in the standard
deviation with experience.
The data related to Hypothesis 2 are only meaningful for the second-year
students. For different reasons the first and third-year students didn’t produce
meaningful timing numbers [?]. Subjectively, the experimental groups finished
faster or completed more of the problems in the time alloted. This result was
not statistically significant.
In comparing various groups from the study, a one-sided t-test involving the
difference of two means is used (t-tests are used because some of the groups
are small). The null hypothesis in each case is µ1 − µ2 = 0 and the alternative
hypothesis is µ1 − µ2 < 0 (i.e., the experimental group has a higher mean
25
score). In each case the p-value, the common standard deviation, and the
degrees of freedom (d.f.) are given. The p-value can be interpreted as follows:
values less than 0.01 are highly significant, values less than 0.05 are significant,
and values less than 0.10 are weakly significant.
The analysis assumes that the parent populations have normal deviations; t-
tests are known to be robust against non-normality of the data. The present
analysis assumes that the standard deviations s1 and s2 for the two populations
are unequal (heteroscedastic). Repeating the analysis assuming they are equal
(homoscedastic), does not affect any of the conclusions. Numerically the p-
values change (some up and some down) by no more than 0.003.
Taking all subjects together, safety slicing does not significant improve the
identifying of array bounds violations (p = 0.28, s = 0.76, d.f. = 7). Replacing
the first-year students with the subset of first-year students who said they
understood what safety slicing was, produces a dramatic turnaround. Here
safety slicing clearly helps (p = 0.016, s = 0.57, d.f. = 7). More importantly,
considering only the more advanced second and third year students safety
slicing is quite effective (p = 0.008, s = 0.04, d.f. = 5). The study lends
empirical support to the assertion that amorphous slicing assists program
comprehension.
4.3 Subjective Results
Third year subjects were asked to rank their confidence in their answers from 0
(I guessed) to 4 (I’m sure). While not all subjects filled in the confidence rank-
ing, of those who did from the experimental group, only one subject marked
anything other than a 4 for an array reference that was identified as safe by
the safety slice. This subject marked a 0 (a guess) on two such references and
got them wrong. The experimental group’s other confidence rankings–for ar-
ray references where the safety slice provided only guidance–were mostly 3’s
with some 2’s. In contrast, the confidence rankings of the control group were
slightly lower and more evenly distributed over all possible responses. This
indicates that the control group is working where they need not.
The experiment included qualitative questions designed to provide a subjective
context for the numerical data. In general, the responses indicated that most
subjects who used the safety slice technique liked it. Individual responses to
the question “did you find the safety slice helpful in finding errors?” included
“yes and no, it helped me find some, but it confused me too” (this answer,
by a first-year student, points to the need to better training). Another first-
year student responded “[the safety slice was] helpful, especially in the harder
programs.” A second-year student responded “somewhat, it gave me an idea
26
if p>q if not(p>q)
then y:=1; then x:=1;
else x:=1;
Fig. 13. A Slightly Amorphous Slice
as to where the errors might have been,” (which is exactly what a safety slice is
designed to do). A third-year student responded “it is nice to see the possible
problems with your code displayed.”
5 Related Work
Several authors have indicated that the syntactic subset requirement of syntax-
preserving slicing has been a hindrance to the computation of small slices.
These authors were implicitly seeking to construct ‘amorphous’ slices. How-
ever, they did not abandon the syntactic requirement altogether. Rather they
sought to remain as faithful to the original syntax as possible, while producing
reasonably small slices. Previous work on these ‘slightly amorphous slices’ is
briefly reviewed, followed by a discussion of techniques the combine slicing
and transformation.
5.1 Slightly Amorphous Slicing
As envisaged by Weiser [?], a slice was to be a syntactic subset of the program
from which it was constructed. However, in his thesis, Weiser immediately
recognized and acknowledged ([?], page 6) that it would not always be possible
for a slice to be constructed as a purely faithful subset of the original program’s
syntax. For example, this happens when the program’s syntax is insufficiently
rich, for all statement deletions to be considered syntactically valid programs.
Consider slicing the program in Figure 13. Suppose that the programming lan-
guage does not allow empty then clauses. Ideally, the slice constructed with
respect to the final value of x would consist of deleting the then part of the
conditional, but this leads to a syntactically invalid program. Therefore, in or-
der to maintain a syntactic subset of the original program, the slice is forced
to include all of the original program. This seems awkward. One possible al-
ternative slice is shown on the right-hand section of Figure 13. To produce
this slice, a minor syntactic re-write is required. This rewrite does not appear
to take the slice too far away from the syntax of the original. Strictly speak-
ing, the ‘slice’ in the right-hand section of the figure is, of course, a slightly
amorphous slice.
27
int x,y,f; int x,f; int x,f;
int* z; int* z; int* z;
1 scanf("%d",&f); scanf("%d",&f); scanf("%d",&f);
2 y=1;
3 x=2; x=2; x=2;
4 if (f==0) if (f==0) if (f==0)
5 z=&x; z=&x; z=&x;
6 else z=&y;
7 *z=3; *z=3; if (f==0) *z=3;
Original Program Non-Executable Slice on x ‘Amorphous’ slice on x
Fig. 14. Pointer Problems for Syntax Preservation
Other authors [?], [?] have also found it necessary to create ‘slightly amor-
phous slices’ in order to avoid the restrictions imposed upon them by the
purely syntax preserving straight-jacket. These authors were implicitly using
‘amorphous slices’, indicating that the idea of removing the syntactic restric-
tion of syntax-preserving slicing is not wholly new and has previously been
found useful.
In 1993, while considering problems associated with slicing programs with
pointers, Lyle and Binkley [?] showed that unnecessary statements (those for
which computation could not affect the values stored in variables in the slicing
criterion) would need to be included in order to ensure that a slice of a program
containing pointers would be executable. The problem is illustrated by the
example in Figure 14. Consider the slice of this program with respect to the
final value of the variable x. One possible syntax-preserving slice is depicted
in the central section of Figure 14. Unfortunately, if the value entered for the
variable f is non-zero then the pointer variable z remains uninitialized and so
the execution may abort.
The pure syntax preservation requirement would dictate that the assignment
of z to the address of y at line 6 and all associated computation on y be
included. However, this could lead to a dramatic increase in the size of the
slice. One possible remedy considered by Lyle and Binkley was the inclusion,
in the slice, of additional new ‘guards’. This approach would yield the slightly
amorphous slice depicted in the rightmost section of Figure 14.
In 1994, Choi and Ferrante faced a similar problem when attempting to slice
unstructured programs. In the presence of goto statements the original PDG-
based slicing algorithm [?], [?] does not produce correct slices. This is because
a goto statement is neither the source of a transitive control dependence
(because it is not a predicate) nor a transitive data dependence (because it is
not an assignment). One solution, adopted by Choi and Ferrante [?] was to
consider the goto to be a pseudo predicate, the true edge of which is the target
28
of the original goto and the false edge of which is the ‘fall through node’
to which control would pass were the goto to be deleted. Ball and Horwitz
[?], and Agrawal [?] independently developed algorithms that produce very
similar results.
Choi and Ferrante observed that the reformulation of a goto as a predicate
created spurious control dependences which could lead to dramatic increases
in slice size. To overcome this, they considered an alternative approach in
which new goto statements were added to the slice. These new goto state-
ments essentially ‘rewire’ the nodes from the slice identified by the PDG-based
algorithm. They ensure that, where there exists a path through the original
program containing nodes which are retained in the slice, there exists an as-
sociated path in the slice containing these nodes in the same order.
As an alternative, Choi and Ferrante’s second algorithm [?] introduces a new
goto statement. The introduction of new goto statements causes the algo-
rithm to deviate from the syntax-preserving requirement for syntax preserva-
tion.
Were Choi and Ferrante to completely drop the syntactic requirement, the
problem of slicing unstructured programs would have become less interesting:
A goto removal algorithm (such as Ashcroft and Manna’s algorithm [?] or
the algorithm of Peterson et al. [?]) could have been applied to the original
program, followed by conventional slicing of the resulting structured program.
5.2 Combining Transformation and Slicing
In 1994, Ernst considered, inter alia, the possibility of slicing optimized code.
Ernst comes close to suggesting full amorphous slicing when he says:
“Compiler transformations (such as constant folding, dead code elimination,
and elimination of partial redundancies) update the program structure to re-
flect knowledge gleaned from analyses. Analysis and transformation remove
and simplify dependence relations, making them a better approximation
to the true dependences of the underlying computations, so slicing after
optimization produces more precise and informative slices.” [?]
Similar transformations to those Ernst describes are used in the implementa-
tions described in Section 3.1. However, although Ernst considers using trans-
formation to inform slicing, he draws back from amorphous slice construction.
Rather, the information gleaned is to be used to sharpen the precision of
syntax-preserving slicing.
29
b = a + 11;
c = b;
d = c + 11;
e = d - 22;
Fig. 15. Slicing Optimized Code
As an example, Ernst considers the problem of slicing the program fragment
reproduced in Figure 15. The slice constructed with respect to the final value
of the variable e in this fragment consists of the entire program. However,
after compiler optimization the result is “e = a;”, which is also the result
produced by amorphous slicing.
In 1995, Tip [?] considered an example, in which constant propagation is
used to statically determine the value of predicates in if statements, thereby
eliminating the predicates, one of the controlled branches, and the code upon
which both predicates depend. Tip [?] also suggested the computation of slices
using a mixture of slicing and transformation, using an algorithm consisting
of the following components:
• Translation of the program to a suitable intermediate representation (IR).
• Transformation and optimization of the IR.
• Maintaining a mapping between the source text, the original IR and the
optimized IR.
• Extraction of slices from the IR.
This sequence of steps is similar to the algorithms considered in the present pa-
per, the AST or SDG being the IR in Tip’s algorithmic schema. The essential
difference is that the algorithms presented in the present paper include itera-
tion and domain specific transformations, which improves their simplification
power.
As an example, Tip considered the problem of slicing the program fragment in
Column (a) of Figure 16 with respect to the final value of the variable y. Tip’s
goal was to produce the slice in Column (c) of the figure. Syntax-preserving
slicing algorithms would not produce this slice, because they assume all paths
to be feasible. Tip pointed out that a slicing algorithm which is capable of
merging the two conditionals to produce the optimized program in Column
(b), ought to be capable of producing the slice in Column (c).
Tip’s goal was not to drop the restriction to syntax preservation. The slice in
Column (c) is syntax preserving. However, it is only a small step from this
syntax-preserving slice of an optimized program to the amorphous slice in
Section (d). Such an amorphous slice can be computed by syntax-preserving
slicing of the optimized version in Column (b).
30
scanf("%d",&p) ; scanf("%d",&p) ; scanf("%d",&p) ; scanf("%d",&p) ;
scanf("%d",&q) ; scanf("%d",&q) ; scanf("%d",&q) ; scanf("%d",&q) ;
if (p==q) if (p==q) { if (p==q) if (p==q)
x = 18; x=18; ; y=2;
else x=17; y=2; } else x=17; else y=17;
if (p!=q) else { if (p!=q)
y=x; x=17; y=x;
else y=2; y=x; } else y=2;
printf("%d",y); printf("%d",y);
(a) Original (b) Optimized Version (c) Syntax-preserving Slice (d) Amorphous Slice
Fig. 16. Tip’s Optimization Example and a Corresponding Amorphous Slice
In 1995 Harman and Danicic [?] showed how slicing could be used to support
testing, by focusing upon the value of a pseudo variable. This was an early form
of what is termed safety slicing in the present paper. It used a combination
of slicing and ah-hoc transformations, but the term amorphous slice was not
coined until 1997 [?] and detailed algorithms did not appear until 1999 [?].
In 1997, Gerber and Hong [?] described an approach to scheduling which relies
upon slicing to identify the observable parts of a real time system (those that
interact with the environment through I/O operations) and the unobservable
components (those which do not perform I/O). The unobservable computa-
tions need not fit within the time frame allocated to the task and are moved
to the end of the task’s computation, increasing its schedulability. The system
mixes slicing and the single transformation code motion, but does it does so
simply to re-order computation and does not delete computations. Therefore,
the systems does not produce amorphous slices.
In 1998, Norris and Pollock [?] considered register allocation using the PDG
as an intermediate representation. Their algorithm consists of three phases.
The initial phase consists of a bottom up register allocation pass. This pass
uses a graph colouring algorithm which exploits the PDG’s hierarchical struc-
ture to allocate registers. The second and third phase implement specialized
code motion and redundant computation elimination transformations. Code
motion consists of moving register store and retrieve operations out of loop
bodies where possible. The redundant code removal phase simply eliminates
unnecessary store and retrieve operations created by the allocation and motion
phases. The system does not perform any slicing (though being PDG-based it
clearly could with little additional effort). The transformations in the second
and third passes could be considered to be examples of what are termed ‘do-
main specific’ in this paper, as they are tailored to the special goal of optimized
register allocation.
In 1998, Das [?] introduced a technique for partial evaluation using the System
Representation Graph (SRG). The main goal of this work was to show that
the SRG was a suitable representation for computing the Binding Time Anal-
ysis phase of partial evaluation. However, Das also considered the problem of
constructing a residual program using an intraprocedural variant of the SRG,
called the Program Representation Graph (PRG). The rule for propagation
31
of expressions used in the present paper is a generalization of the constant
propagation rule introduced by Das on page 182 of his thesis [?].
In 1999, Field and Ramalingam [?] described experience with a Cobol restruc-
turing tool, as well as the theory underlying it. The tool is used to rewrite the
unstructured control flow embodied by the PERFORM/PARAGRAPH ‘procedural’
abstraction of Cobol. The restructured program is equivalent in the sense that
it contains all valid paths in the original and, in ‘well-behaved’ cases, contains
no additional paths. The restructuring phase is used as a pre-processing phase
prior to other analyses, including slicing. Slices of the restructured program
could be considered to be amorphous slices of the original, because they are
not syntactic subsets of the original program.
In 1999, Liao at al. described the SUIF tool [?]; a parallelization tool that
contains transformations for automated and semi-automated parallelization.
SUIF also contains interprocedural slicing, which is used as an analysis phase
to detect independent (and hence parallelizable) array accesses and to de-
termine which previously computed loop-dependent computations can affect
subsequent iterations. Though the tool contains the ability to mix slicing and
transformation, this is not the purpose of SUIF and such a combination is
neither supported nor explored.
In 2000, Corbett et al. [?] describe Bandera, a tool for extracting finite state
models from Java. The models are used in verification by model checkers. The
tool includes a syntax-preserving slicer and allows user-specified transforma-
tions to be added and so could be adapted to implement amorphous slicing,
though it does not currently possess this ability.
In 2001, Harman et al. [?] introduced a general purpose amorphous slicing
system based upon the transformation system FermaT developed by Ward
[?]. This system produces similar amorphous slices to the SDG-based approach
described in the present paper, but does not produce slices tailored to address
the array access safety problem, and uses the Abstract Syntax Tree (AST) as
an internal data structure rather than the SDG.
In 2002, Ward [?] introduced an approach to slicing WSL programs, which,
it is hoped, can be easily generalised to produce amorphous static and condi-
tioned slices by mixing the new primitive transformation of slicing with the
existing FermaT WSL transformations available in the FermaT transformation
workbench. Ward also introduces the notion of a ‘semantic slice’, a generali-
sation of amorphous slicing, which allows abstraction to a specification level
in addition to non syntax-preserving slicing.
Slicing ideas have also been applied to declarative languages [?], [?], [?], [?],
[?], [?], where transformation is more common because of the comparatively
straightforward semantics of the languages concerned. Such slices are often
32
automatically ‘amorphous slices’, though this arises more as a consequence
of linguistic formulation, where ‘statement deletion’ is clearly a meaningless
concept.
6 Acknowledgements
Mark Harman is supported, in part, by Engineering and Physical Sciences
Research Council grants GR/R98938, GR/M58719 and GR/M78083) and two
development grants from DaimlerChrysler AG.
David Binkley is supported by National Science Foundation grant CCR-9803665.
Sebastian Danicic, in part, is supported by Engineering and Physical Sciences
Research Council grant GR/R98938 and GR/M58719.
Preliminary versions of some of the material presented here have appeared in
conferences [?,?,?].
7 Summary
In traditional approaches to slicing, the only simplifying transformation used
to create slices is component deletion. This choice was motivated by the orig-
inal application of slicing to debugging, where the syntax-preserving nature
of a slice was important. However, the restriction to statement deletion is
not necessary for many of the applications of slicing that have emerged since
slicing was originally introduced. As greater simplification power is always
possible when the restriction to statement deletion is lifted, it is clearly ad-
vantageous to lift this restriction in situations where syntax preservation is
relatively unimportant compared to slice size.
This paper has defined amorphous slicing and compared it to traditional
syntax-preserving slicing. The paper also presented an algorithm for comput-
ing amorphous (safety) slices and the results of an empirical study performed
using an implementation of the algorithm. The study showed that amorphous
slicing assists programmers in understanding array safety access issues.
The study also illustrated the utility of the philosophy of ‘programs as answers
to undecidable propositions’, in which hypotheses concerning a subject pro-
gram are expressed as slicing criteria, with the answers being returned in the
form of smaller programs. In particular, the answer (whether the hypothesis
holds) is approximated using amorphous program slicing. This approximate
33
nature of the approach avoids the well-known decidability problems associated
with an attempt to provide a definite answer.
References
34