Figure 7.1 3. Vertex-growing strategy.

Subgraph Merging Procedure via Vertex Growing

An adjacency matrix 141G) i” merged with another matrix llit\z) il the submatrices

obtained by removing the last row and last column of MG) and MQ) are identical

to each other. The resulting matrix is the matrix ry'(l), appended with the Iast

row and last column of matrix M(2). T]he remaining entries of the new matrix are

either zero or replaced by all valid edge labels connecting the pair of vertices’

The resulting graph contains one or two edges more than the original

graphs. In Figure 7.13, both G1 and G2 contain four vertices and four edges’

Aft”, -“rging, the resulting graph G3 has flve vertices. The number of edges

in G3 depends on whether the vertices d and â‚¬ are connected. If d and e

are disconnected, then G3 has five edges and the corresponding matrix entry

for (d, e) \s zero. Otherwise, G3 has six edges and the matrix entry for (d, e)

corresponds to the label for the newly created edge. since the edge label is

unknown, we need to consid.er all possible edge labels fot (d,e), thus increasing

the number of candidate subgraphs substantially’

Candidate Generation via Edge Growing

Edge growing inserts a new edge to an existing frequent subgraph during

candidate generation. unlike vertex growing, the resulting subgraph does not

45O Chapter 7 Association Analysis: Advanced Concepts

G3 = merge (Gl , G2)

G4 = mslgs (G1′ G2)

Figure 7.14. Edge-growing strategy.

necessarily increase the number of vertices in the original graphs. Figure 7.14 shows two possible candidate subgraphs obtained by merging G1 and G2 via the edge-growing strategy. The first candidate subgraph, G3, has one extra vertex, while the second candidate subgraph, G4, has the same number of vertices as the original graphs. The core for the graphs is indicated by dashed lines in the diagram.

The procedure for generating candidate subgraphs via edge growing can be summarized as follows.

Subgraph Merging Procedure via Edge Growing

A frequent subgraph 9(1) is merged with another frequent subgraph g(2) only if the subgraph obtained by removing an edge from 9(r) is topologically equivalent to the subgraph obtained by removing an edge from gQ). After merging, the resulting candidate is the subgraph 9(r), appended with the extra edge from gQ)

The graphs to be merged may contain several vertices that are topolog- ically equivalent to each other. To illustrate the concept of topologically equivalent vertices, consider the graphs shown in Figure 7.15. The graph G1 contains four vertices with identical vertex labels, “a.” lf a new edge is at-

,/

7.5 Subgraph Patterns 451

G1 G2 G3

Figure 7.15. lllustration of topologically equivalent vertices.

tached to any one of the four vertices, the resulting graph will look the same. The vertices in G1 are therefore topologically equivalent to each other.

The graph G2 has two pairs of topologically equivalent vertices, u1 with

u4 and uz with u3, even though the vertex and edge labels are identical. It is easy to see that u1 is not topologically equivalent to u2 because the number of edges incident on the vertices is different. Therefore, attaching a new edge to u1 results in a different graph than attaching the same edge to u2. Meanwhile, the graph G3 does not have any topologically equivalent vertices. While a1

andu4 have the same vertex labels and number of incident edges, attaching a new edge to u1 results in a different graph than attaching the same edge to u4.

The notion of topologically equivalent vertices can help us understand why

multiple candidate subgraphs can be generated during edge growing. Consider the (k – 1)-subgraphs G1 and G2 shown in Figure 7.16. To simplify the notation, their core, which contains k – 2 common edges between the two graphs, is drawn as a rectangular box. The remaining edge in Gl that is not included in the core is shown as a dangling edge connecting the vertices a and b. Similarly, the remaining edge in G2 that is not part of the core is shown as

a dangling edge connecting vertices c and d. Although the cores for G1 and

G2 are identical, a and c may or may not be topologically equivalent to each

Figure 7.16. General approach for merging a pair of subgraphs via edge growing.

452 Chapter 7

G3 = Merge (c1, c2)

Association Analysis: Advanced Concepts

G3 = Merge (G1, G2) G3 = Merge (G1, c2)

( a ) a + c a n d b t d

G3 = Merge (G1, G2)

( b ) a = c a n d b * d

G3 = Merge (G1, G2)

( c ) a + c a n d b = d

G3 = Merge (c1, c2)

( d ) a = c a n d b = d

Figure 7.17. Candidate subgraphs generated via edge growing.

other. If a and c are topologically equivalent, we denote them as a : c. For vertices outside the core, we denote them as b: dif their labels are identical.

The following rule of thumb can be used to determine the candidate sub- graphs obtained during candidate generation:

I. If a I c and b + d, then there is only one possible resulting subgraph, as shown in Figure 7.17(a).

2. If a: c but b + d, then there are two possible resulting subgraphs, as shown in Figure 7.17(b).

G3 = Merge (G1, G2) G3 = Merge (G1, G2)

7.5 Subgraph Patterns 453

Figure 7.18. Multiplicity of candidates during candidate generation.

3. If a I c btft b : d, then there are two possible resulting subgraphs, as shown in Figure 7.17(c).

4. If a: c and b: d, then there are three possible resulting subgraphs, as

shown in Figure 7.L7(d).

Multiple candidate subgraphs can also be generated when there is more than one core associated with the pair of (k – 1)-subgraphs, as shown in Figure 7.18. The shaded vertices correspond to those vertices whose edges form a

core during the merging operation. Each core may lead to a different set of candidate subgraphs. In principle, if a pair of frequent (k – l)-subgraphs is

merged, there can be at most k-2 cores, each of which is obtained by removing

an edge from one of the merged graphs. Although the edge-growing procedure

can produce multiple candidate subgraphs, the number of candidate subgraphs tends to be smaller than those produced by the vertex-growing strategy.

7.5.5 Candidate Pruning

After the candidate k-subgraphs are generated, the candidates whose (k –

1)-subgraphs are infrequent need to be pruned. The pruning step can be performed by successively removing an edge from the candidate k-subgraph and checking whether the corresponding (k – l)-subgraph is connected and frequent. If not, the candidate k-subgraph can be discarded.

To check whether the (k – l)-subgraph is frequent, it should be matched against other frequent (k – 1)-subgraphs. Determining whether two graphs are

topologically equivalent (or isomorphic) is known as the graph isomorphism problem. To illustrate the difficulty of solving the graph isomorphism problem,

454 Chapter 7 Association Analysis: Advanced Concepts

Figure 7.19. Graph isomorphism

consider the two graphs shown in Figure 7.19. Even though both graphs look different, they are actually isomorphic to each other because there is a one-to- one mapping between vertices in both graphs.

Handling Graph Isomorphism

A standard approach for handling the graph isomorphism problem is to map each graph into a unique string representation known as its code or canonical label. A canonical label has the property that if two graphs are isomorphic, then their codes must be the same. This property allows us to test for graph isomorphism by comparing the canonical labels of the graphs.

The first step toward constructing the canonical label of a graph is to find an adjacency matrix representation for the graph. Figure 7.20 shows an

M=

0p

p0

pr q0

pq

r0

00

00

Figure 7.20. Adjacency matrix representation of a graph.

7.5 Subgraph Patterns 455

example of such a matrix for the given graph. In principle, a graph can have more than one adjacency matrix representation because there are multiple ways to order the vertices in the adjacency matrix. In the example shown in Figure 7.20, lhe first row and column correspond to the vertex a that has 3 edges, the second row and column correspond to another vertex a that has 2 edges, and so on. To derive all the adjacency matrix representations for a graph, we need to consider all possible permutations of rows (and their corresponding columns) of the matrix.

Mathematically, each permutation corresponds to a multiplication of the initial adjacency matrix with a corresponding permutation matrix, as illus- trated in the following example.

Example 7.3. Consider the following matrix:

M_

23 67 10 11 L4 15

The following permutation matrix can be used to exchange the first row (and column) with the third row (and column\ of M:

Prs:

where Prs is obtained by swapping the first and third row of the identity matrix. To exchange the first and third rows (and columns), the permutation matrix is multiplied wilh M:

M’ : P{sx M x