Scale-space module detection for random fields ... - Bernard Chalmond

4. This approach also works when one replaces the adjacent/ nonadjacent coefficients ..... in the machine learning community [21]. ..... We start with the bootstrap.
1MB taille 4 téléchargements 316 vues
Scale-space module detection for random fields observed on a graph non embedded in a metric space Bernard Chalmond ∗†

rin

Abstract

t

October 2014

ep

In the spirit of Lindeberg’s approach for image analysis on regular lattice, we adapt from a statistical viewpoint, the blob detection procedure for graphs non embedded in a metric space. We treat data observed on such a graph in the goal of detecting salient modules. This task consists in seeking subgraphs whose activity is strong or weak compared to those of their neighbors. This is performed by analyzing nodes activity at multi-scale levels. To do that, data are seen as the occurrence of a univariate random field, for which we propose a multi-scale graphical modeling. In the framework of diffusion processes, the covariance matrix of the random field is decomposed into a weighted sum of graph Laplacians at different scales. Under the assumption of Gaussian law, the maximum likelihood estimation of the weights is performed that provides a set of relevant scales. As a result, we obtain a multi-scale decomposition of the random field on which the module detection is based. This method is experimentally analyzed on simulated data and biological networks.

pr

Keywords. Blob Detection, Module Detection, Network Activity, Graphical Modeling, Scale-space Random Field, Graph Laplacian, Diffusion Kernel, Multiscale Decomposition, Scale Selection

∗ CMLA,

UMR CNRS 8536, ENS Cachan, France University, France

† Cergy-Pontoise

1

Scale-space module detection for random fields

2

1 Introduction This paper is related to the following general issue : given an undirected graph G = (V, E) with only one component non embedded in a metric space, and an observation x of a univariate real random field X indexed by the nodes V of this graph, one seeks subgraphs {Mk } in V for which the respective observations {xMk } appear as salient profiles, in comparison to their surrounding. Such a subgraph with its profile (Mk , xMk ) is called module. In summary, we have the following schema : [{xi }i∈V , G = (V, E)] ; {(Mk , xMk )} ,

(1)

t

where xi ∈ R. This concept depends on the context and what we seek. We use the term module in a broad sense. However, we focus in the following on a particular module called blob or spot, depending of the context.

ep

rin

Concretely, the problem is as follows. Consider Fig. 3 that shows an image {xi , i ∈ V } where V denotes the nodes of a sampling grid L included in R2 . This image shows a multitude of dark spots of various sizes that a scale-space algorithm has detected. This detection uses the graph L = (V, E) where E are the nearest neighbor connections. Here, our visual perception clearly distinguishes the spots. While keeping the values {xi }, suppose now that we replace L by a graph G whose nodes are not in a metric space. Displaying x requires to represent the graph in the plane, which implies to choose a particular layout. This is illustrated in Fig. 7 and Fig. 8(c) that display an "image" on such a graph with three different layouts. This image contains five distinct spots that it is difficult to recognize, although these figures display the same image x. Similarly, in the case of Fig. 3, our perception of spots would be greatly disturbed, and the detection algorithm would not work because it requires a metric space. Dealing with this problem is the subject of our article. To our knowledge it has not been still treated.

pr

Although different, this problem suggests another problem that needs to be presented in order to avoid some confusions. Let a set of points {ξi , i ∈ V } in space Rp with p > 1, allowing to define a similarity matrix W between these points, e.g. from their correlations or distances. This matrix is then used to infer a connectivity structure E, typically by connecting highly correlated or spatially close nodes. The detection of modules is then performed on the graph G = (V, E), for example by using the concept of betweenness [19]. Here, the objective is the determination of {Mk }. In summary, we have the following two-step schema : (i)

{ξi }i∈V ; W ; E ,

(ii)

G = (V, E) ; {Mk } .

(2)

This second schema is further discussed in Section 1.1. In a nutshell, we can say that (2) seeks sub-networks in G whereas (1) seeks active sub-networks with respect to x. The tackled issue is resulting from molecular biology for which a vast literature exists. With respect to our concerns, a few references are [17, 9, 35, 27, 11, 22]. This

Scale-space module detection for random fields

3

issue arises also in other fields, like social networks where modules refer to communities [1, 26]. One crucial step when studying the structure and dynamics of these networks is to identify modules/communities. However, these studies are mainly devoted to the schema (2) for biological networks and (2-ii) for social networks, whereas we are interested by the schema (1) due to the nature of our data, which come from a univariate random field.

rin

t

The connectivity of G is summarized in the graph Laplacian matrix L, which plays a central role in our context : ⎧ i∼j ⎨ −1 if Li,j = di if i=j (3) ⎩ 0 otherwise ,  where i ∼ j denotes the edge (i, j) ∈ E, and di = j 1j∼i is the degree of node i, i.e. the number of edges connected to i. L is a symmetric positive semi-definite matrix, which can be written as L=D−A

where D = diag{di } and A is the binary adjacency matrix ai,j = 1 if i ∼ j. Note that the graph Laplacian appears when one considers the local variation energy, called bending energy :  U (x) = (xi − xj )2 = x Lx, (4) i∼j

ep

where the sum is over the edges (i, j) ∈ E. In (3) each edge i ∼ j carries the value ai,j = 1. This definition can be extended to the weighted case, where the nonnegative weights are not necessarily all equal to 1. In both cases, we have di = j ai,j . We continue this introduction by positioning our contribution with respect to previous works. Then, the model and the methodology are presented in Section 2. The multi-scale decomposition and the module detection are tested on simulated random fields and on real data in Section 3, where diagnostic tools are introduced. We invite the reader to have a look at Fig. 8 that illustrates and summarizes the method.

pr

1.1 Graph partitioning

Although module detection is not a partitioning task, some aspects of the related problem of spectral partitioning could lead to confusions. There is a large literature on spectral clustering for graph partitioning [36, 30, 28, 10, 3] among many others. In spectral clustering [30, 36], given a graph as in (2-ii) we compute eigenvectors u1 , ..., um associated to the m smallest eigenvalues μ1 , ..., μm of L, and assign to every node i the vector m {uk (i)}m k=1 in R . Then, graph partitioning is the outcome of a vector space clustering algorithm such as k-means applied to the resulting vectors. In background of this procedure, there is an important property. If the graph is composed with c connected components, then the first c eigenvalues of L are zero, and the corresponding eigenvectors are the indicator vectors of the connected components. Fig.5 illustrates such a graph with 4 components.

Scale-space module detection for random fields

4

rin

t

This approach also works when one replaces the adjacent/ nonadjacent coefficients ai,j by a similarity or closeness measure : ai,j = w(i, j). The multiplicity of the eigenvalue 0 is the number of connected components of the underlying graph where nodes i, j are adjacent when w(i, j) > 0. Two examples illustrate the closeness measure for the schema (2). When the graph Laplacian represents a 3D discrete surface (mesh), every node i ∈ V is associated with a 3D coordinate point ξi in R3 , also denoted vi [31]. The weight of an edge i ∼ j is defined by the Gaussian function w(i, j) = exp −(vi − vj 2 /σv2 ). Hence, the geometric structure of the mesh is encoded in the weights. The second example concerns graph based image segmentation [28]. The image is {xi , i ∈ V } where V are the nodes of a 2D regular grid embedded in R2 . Every node i ∈ V is associated with a 3D vector ξi = (vi , xi ) where vi is a 2D coordinate point. The Gaussian weight function is rewritten w(i, j) = exp −(vi − vj 2 /σv2 + |xi − xj |2 /σx2 ). Other weighting functions were proposed in the literature. Two pixels are connected if they are within distance δ : w(i, j) = 0 if vi − vj  < δ. But how to chose the graph connection radius δ ? In [10], from heuristic considerations, the graph weights are segmented into different scales : W = W1 + W2 + ... + Wr ,

(5)

ep

where Ws corresponds to a specific spatial separation range : ws (i, j) = 0 if δs−1 < vi − vj  < δs . In our case, the graph G is not embedded in an Euclidean space as the mesh in the examples above. Although non uniform weights ai,j can be chosen, these weights are not necessarily associated to a distance. To perform blob extraction, we use the diffusion property based on the graph Laplacian that does not require to have an explicit closeness measure. Since diffusion is a multi-scale process, we take advantage of this property to define a decomposition of the affinity matrix. This decomposition is related to generalized additive models that bring a theoretical base [37, 14].

1.2 Blob detection On a regular mesh

pr

In the image analysis domain, when G is simply a regular grid embedded in R2 , the problem of salient area detection has received much attention and in particular for blob detection [24], as illustrated in Fig.3. In this figure, the detected blobs are localized by squares whose size (scale) is adapted to the width of the blobs. A blob is regarded as a spot and a simple model is given by the Gaussian profile [29]. In this case, we have the following result on which the scale-space blob detection is based. Consider the simple image x = {xi , i ∈ V } representing a Gaussian spot characterized by a width parameter λ0 and centered at a point v0 on the grid : xi ∝ (λ) exp −vi − v0 2 /2λ0 , for every i ∈ V . Consider a smooth version  x of the image

Scale-space module detection for random fields

5

obtained by convolution with Gaussian kernel Gλ :  (λ)  xi = xi Gλ (i, i ) = Gλ (i, .)  x , i

(6)

Gλ (i, i ) ∝ exp −(vi − vi 2 /2λ) . [24] gives the following property that the spot center satisfies :  d (λ)  λ[Δ x ]i0  = 0, dλ λ=λ0

(7)

rin

t

where Δ denotes the discretized Laplacian operator on the grid. In the image processing literature, ΔGλ called Laplacian of Gaussian, is used for multi-resolution representations [33]. Laplacians of Gaussian have mathematical properties, which have been widely studied in the scale-space community. (7) tell us that the derivative of λΔGλ is able to select the width λ0 of the Gaussian spot. Essentially, λΔ quantifies in some sense a curvature of the smoothed spot, and this curvature is optimal when λ = λ0 . This property is used to detect blobs in the images : the detected blob centers are the local extrema of the discretized scale-space volume (λ)

{λ[Δ x

]i , i ∈ V, λ ∈ Λ} ,

(8)

ep

where Λ is a finite set of scales corresponding to an increasingly coarse sub-sampling of the regular grid. For every detected blob, the optimization of (8) returns a scale, which characterizes the width of the blob. From mesh to graph non embedded in a metric space

pr

In this paper, module refers to the extension of the blob concept to graphs. In a first step, this extension is straightforward since (6) is the solution of the heat equation on Z2 whose extension to graphs is well known [20]. However, extending the blob detection to non geometric graphs requires some modifications with respect to scale and space. While for a mesh, it is natural to choose Λ from a sub-sampling of the grid ([12], Chap. 10), for non geometric graph this choice is much less trivial since the relevant scales are irregularly spread in R+ , and the scale has no explicit dimension. In this goal, the multi-scale representation { x(λ) , λ ∈ Λ} must be (λ) revisited in order to get a sparse representation denoted {x , λ ∈ Λ} yielding a nonredundant decomposition of x in term of reconstruction : λ x(λ) = x, a property that (λ) does not satisfy. This property of non-redundancy is necessary for the identification x of the right scales.

1.3 Module, semantic module and related works In image analysis, the module detection is used first for extracting areas of interest without using any strong prior information. These areas are then interpreted with greater precision or extended using high-level information in order to obtain semantic modules, as for

Scale-space module detection for random fields

6

ep

rin

t

object recognition [8]. This remark holds also in systemic biology where semantic modules correspond to biological modules (see [17, 9] among many others). The definition of biological modules does not rely solely on areas and profiles, but also uses complex biological knowledge. In several papers, the detection of biological modules operates in two stages: firstly, detection of module seeds, or more simply modules , and secondly refinement of the detected modules to finally obtain meaningful biological modules [35]. We comment some main approaches for module detection as introduced in the bioinformatic literature. Given a scoring function that allows to compute the importance of every sub-network, finding the maximal-scoring connected subgraph is an NP-problem. In the seminal work [17], the main limitation is that node scores are treated independently since the sub-network score is calculated as a sum of the node scores. To overcome this limitation, [9] proposes an inverse problem approach in which the node scores are modeled by a hidden Markov random field model under a constraint of regularity that is expressed by a bending energy as (4). Two major well-known drawbacks are inherent to this approach [5] : the data-driven determination of the regularity scale (the trade-off parameter), and the energy minimization that requires stochastic optimization, a difficult computation task, already encountered in [17]. But conceptually, the main limitation of the Markovian model is that it is mono-scale, which is not suitable when the size of the modules is varying. Instead of using the bending energy at a single scale, we propose to use it with a multi-scale formulation in order to adapt the scale to the module sizes. Technically, the advantage of this approach is twofold. First, the set of relevant scales can be estimated efficiently from the data. Second, we avoid the huge computation burden of the stochastic optimization. The computation is limited to scan a multi-scale representation of type (6) by searching the differential local extrema as it is done for blob detection on a grid L.

2 Models and Method

2.1 Random Field and Diffusion Process

pr

This section summarizes a set of fundamental results on graph Laplacian and diffusion kernels. Consider a random field X = (X1 , ..., Xn ) observed on an undirected graph G = (V, E). V denotes the node set and E the edges connecting them. The dependency structure between the random variables {Xi } depends on the topological structure given by E. This dependency structure is here limited to a covariance structure modeled by a diffusion kernel [25], a choice explored in many domains and especially in pattern recognition, biological networks analysis and image processing [2, 32, 39]. We seek to represent X by a random field model on G, denoted Y(λ), whose covariance stucture depends on a scale parameter λ > 0. Essentially, this model is obtained by equalizing the variations due to a change of scale, with the spatial variations as follows :  Yi (λ + dλ) − Yi (λ) = ˙ dλ (Yj (λ) − Yi (λ)) , (9) j∈V : j∼i

Scale-space module detection for random fields

7

and in vector form : Y(λ + dλ) − Y(λ) = −dλ L Y(λ) , L=D−A,

(10)

where the graph Laplacian L is defined in (3). The equation (10) is the discretized version on G of the classical heat differential equation : 1 d dλ Y(λ) = −L Y(λ) , (11) Y(0) =X. whose solution is = Kλ X .

(12)



= e−λL ,

(13)

t

Y(λ)

rin

∞ i Kλ is a matrix exponential whose definition is eM = i=0 Mi! . For every node, one has :  Yi (λ) = Kλ (i, j)Xj = Kλ (i, .)X , (14) j: j∼i

ep

which is the generalization of (6). The exponential of a symmetric matrix providing a semi-definite positive matrix, the matrix Kλ , which is called diffusion kernel, can be used as a covariance matrix for modeling the covariance between the random variables {Xi }. The more λ is large, the more the off-diagonal effects in Kλ increase. λ is interpreted as a scale parameter and Yi (λ) as a scale-space random field on V × R+ . By nature, the diffusion kernel has a multi-scale property that is well identified, and especially for dimensionality reduction applications [23]. However, the choice of its scale parameter λ remains a difficulty [11]. For small λ, Kλ (i, i) reflects local properties of G around the node i, while for large λ it captures some global structures. For instance, in the geometry processing field, the diagonal term Kλ (i, i) has been used as a shape descriptor [31] by considering that for every λ, the local spatial extrema of this function provide a feature-based scale-space representation of shapes, useful for shape matching.

pr

2.2 Graphical Modeling

The outstanding issue at the end of the previous modeling step is the choice of λ. In other words what is the scale λ the most representative of the observed profile x. In fact, several scales may explain this profile. Therefore, a natural approach consists of decomposing X into r independent random fields according to a discrete set of relevant scales Λ = {λ1 < ... < λr } : X=

r 

X(j) + X(0) ,

j=1 1 In

the classic case of diffusion in R2 , λ is a time parameter.

(15)

Scale-space module detection for random fields

8

where X(j) denotes the random field at scale λj and X(0) a residual [16, 34]. Following the idea of Fourier decomposition, for every profile x, the {x(j) }rj=1 can be seen as frequency components of x, from high to low frequencies. The decomposition (15) is related to the additive spline models whose theoretical foundation can be traced back to [37] Chap.10, (see also [14]) and later reintroduced under the name of multiple kernel in the machinelearning community [21]. Note that X(j) does not match Y(λj ) in (12), since the sum j Y(λj ) over a given set of scales does not reconstruct X. In our approach, the covariance matrix Cov(X(j) ) of every component is modeled from the diffusion kernel (13). So we use r kernels {Kλ1 , ..., Kλr } denoted {K1 , ..., Kr }, such that Cov(X(j) ) = Kj = σj2 κj where κj is given by (13) at scale λj . Due to the independence of the components, the covariance matrix Cov(X) is the following multiscale diffusion kernel : r  j=0

r 

σj2 κj

j=0

σj2 e−λj L

(16)

σ02 In

rin

r 

Kj =

t

¯ σ,Λ = K =

+

.

j=1

ep

Each kernel κj is weighted by a positive parameter σj2 that is all the more great than the scale λj significantly contributes to the random field X. The covariance matrix K0 is that of a white noise. As we said above, the more λj is large, the more the off-diagonal effects in Kj increase. In other words, when λj increases, the components {X(j) } are increasingly smooth. The passage from X(j) to X(j+1) implies that some details in X(j) are attenuated. If we assume that the dependency structure of the random variables {Xi } is uniquely described by its kernel, then it is legitimate to consider that X is distributed according to the Gaussian law ¯ σ,Λ ) . X ∼ N (0, K (17)

pr

The scales {λj }rj=0 and their associated weights σ ={σ ˙ j2 }rj=0 are unknown parameters that are estimated using the maximum likelihood principle 2 . Although the theoretical mean of X is zero, the empirical mean of each observed subprofile xMk is not necessarily zero, as for instance in Fig. 9. This is due to high scales that create long range correlations, or in other words low frequencies. Understanding the diffusion kernel is not a trivial task, this requires to call the graph spectral theory [25]. Note also that the choice of the diffusion kernel as covariance matrix arises as a necessity because we have only one observation of X. If we could have many observations then the covariance matrix could be estimated. In comparison with the heuristic decomposition (5), which uses a hard multi-scale separation of the weights, the multi-scale representation (16) appears as a soft decomposition based on the overall structure of the graph via L, and moreover allows statistical estimation of each component contribution. 2 For

notational convenience, we introduce λ0 = 0 that is associated to K0 .

Scale-space module detection for random fields

9

• Weight estimation. For a given Λ, let (σ|Λ) = log(pσ,Λ (x)) be the log-likelihood of σ, where pσ,Λ denotes the probability density of x. Given an observation x and the ¯ σ,Λ ), the log-likelihood is Gaussian N (0, K ¯ σ,Λ | − x K ¯ −1 x + Cte , (σ|Λ) = − log |K σ,Λ

(18)

where Cte denotes a constant term. The maximum likelihood estimate is computed under the constraint of positivity of the parameters σ : σ ˆ (λ) = argmax (σ|Λ) under the constraint σ > 0 .

(19)

σ

rin

t

For moderate sizes of n, the non-linear programming algorithms using gradient descent techniques are operational. For larger dimensions, the computation of the determinant ¯ σ,Λ | and the inverse K ¯ −1 becomes more difficult [18]. To reduce the amount of compu|K σ,Λ ¯ σ,Λ | tation, one might also wonder whether it would be possible to remove the term log |K  ¯ −1 in the likelihood, in order to work only with the generalized least-squares x Kσ,Λ x. Theoretically, we know that this estimate is not statistically consistent [13]. Our experiments have confirmed this defect, by showing severe aberrations in the multi-scale decompositions (cf. Section 3.1). • Scale estimation. A procedure for selecting the set Λ is now required. Given a uniform discretization Λ0 of the scale domain in R, the scale selection procedure estimates a subset Λ of scales irregularly distributed in Λ0 , which explains the profile of x according to a given criteria :

ep

Λ0 ={λ0j = jδ, j = 1, ..., r0 } ; Λ={λ0j1 , ..., λ0jr } ,

where δ is the discretization stepsize. First, the estimation σ ˆ (Λ0 ) is computed according to (19). To determine r, we perform a diagonalization of the covariance matrix 0

 = K

r 

0

σ ˆj2 (Λ0 )e−λj L ,

(20)

j=1

pr

of which we retain only the r largest eigenvalues ν1 ≥ ... ≥ νr according to the criteria r i=1 νi =1− , (21) r 0 i=1 νi

where is a positive parameter chosen close to 0, typically = 0.01 or 0.025. This criterion is related to that used in Principal Component Analysis [15]. It means the dispersion of X can be approximatively represented by r linearly independent components with an information loss determined by . Finally, from the estimated r, we can then achieve the selection of relevant scales Λ. These scales are associated with the r largest σ ˆj2 (Λ0 ), i.e. the scales whose components are the most involved in the dispersion of X. These scales are denoted {λ0j1 , ..., λ0jr } or more simply {λ1 , ..., λr }. As a consequence of (21), the

Scale-space module detection for random fields

10

estimates σ ˆj2 associated to the scales in Λ0 \Λ are much lower than those in Λ, and even close to 0. This selection achieves a pruning of non-significant scales. • Statistical Multi-scale Decomposition. This task concerns the estimation of the r components X(j) of the multi-scale decomposition of X. Component estimation is  = U Dν the spectral equation of closely linked to the scale selection problem. Denote KU the previous diagonalization, where Dν is the diagonal matrix of the r largest eigenvalues  Because of (21), assume that the eigenvalues λ0 \ λ are approximately equal to Λ of K. zero. This is especially true when is very small. Beyond r0 , the eigenvalues νr0 +1 ≥ ... ≥ νn are smaller and we can consider they are all close to 0. In this case, one can write  = U B where B are the coordinates of X  on the eigenvectors U . To estimate the scale X components, we take into account the importance of each eigenvalue, and this, using a Bayesian estimation with a prior distribution related on these eigenvalues.

rin

t

Proposition 1 Given an observation x and the prior distribution B ∼ N(0, Dν ) the Bayesian estimation provides the scale components : (j) ˆ , ∀j = 1, ..., r

x = Kj U Dν−1 b

= (σ 2 D−1 + Ir )−1 U  x . b 0 ν

(22) (23)

ep

The proof is given in Appendix 5.1. In this proof, if we replace the spectral equation  0 = U 0 D0 , then Proposition 1 is still relative to Λ by the equation relative to Λ0 : KU ν valid. This is useful when we do not assume that all eigenvalues Λ0 \ Λ are negligible. In this case, the Bayesian estimation is more justified because of the high difference between the values of Λ and Λ0 \ Λ.

2.3 Module Detection

pr

Given an undirected graph G and an observation x of the random field, we first compute the estimated scales {λj , j = 1, ..., r} and the associated decomposition (22) {x(j) , j = 1, ..., r} as presented in the previous section. Rather than considering directly the components, we consider their spatial variations with respect to the graph Laplacian L :  (j) (Lx(j) )v = (xv(j) − xi ), ∀v ∈ V i: i∼v

= dv xv(j) −



(j)

xi

(24) .

i: i∼v

This specifies the regularity of each component. Lx(j) is all the more great positively (j) (resp. negatively) than the expression xv of the node v is strongly increasing (resp. decreasing) with respect to its neighbors. Therefore we look for nodes that are most differentially expressed with respect to L, and this by examining the expression of the (j) components at different scales. Since the amplitude of variations of L x decreases when

Scale-space module detection for random fields

11

the scale increases, a specific normalization is required. As in the case of blob detection (j) (8) on a lattice L, an efficient normalization is λj L x . A scale λj for which λj (Lx(j) )v is a local extremum with respect to scale and space, is seen as reflecting a module at position v and scale λj . This implies the following procedure. For any node v ∈ V , we denote by Nvk ⊂ V the relative nodes of v of order k, (k = 1, ..., κ). k = 1 means the nearest neighbors (NN), k = 2 means the nearest neighbors of v to which their NN are added, etc... The module detection consists in searching local optima of the components with respect to the neighborhoods as follows3 : ∀ v ∈ V : (j(v), k(v), v  (v)) = argopt λj (Lx(j) )v ; j,k,v  ∈Nvk

(25)



if v (v) = v, then v is a module center at scale λj(v) . k(v)

}.

t

When a module center is detected at v, its area Mv is defined by the subgraph {v, Nv ◦

rin

In the next section, we denote V the nodes corresponding to the detected module centers, and therefore the set of detected areas is written as : ◦

{Mv , v ∈ V } .

3 Experiments

(26)

pr

ep

Recall that a module is an active subgraph denoted xM where M is a subgraph of G. A regular lattices L do not show particular structure like stars or clusters, unlike the case of irregular graphs G. Let us give an example of graph showing a particular structure. The graph structure is organized around known subgraphs {Rk = (Vk , Ek )} called regulons (or hubs). A regulon is a set of nodes Vk ⊂ V connected to one or several common nodes, called regulators. A regulon can be connected to several regulators and a regulator can be connected to several regulons. Fig.5 shows a graph with four regulons and four regulators. In practice, such regulons can be used a posteriori for interpreting the detected modules or inferring semantic modules. Depending on the profile of x, the area M of a module can be simply a regulon, a subregulon or the union of several regulons. Fig.6 shows a short time-series {x(1), x(2), x(3)} of a random field X observed on the graph of Fig.5. The colors depict the output of the scale-space module detection performed on every x(t). Successively, 3, 3 and 2 modules were detected, while the graph is composed of 4 regulons.

3.1 Evaluation on Simulated Data 3.1.1 Simulation Procedure

For phenomena of high complexity, simulated data are an important preliminary support for modeling when we do not have data with sufficient knowledge of the "ground truth". 3 v  (v)

also depends on k, what is omitted to simplify the writing.

Scale-space module detection for random fields

12

The simulation of the random field X requires to give the ground truth, consisting of a graph G = (V, E) and the parameters (Λ, σ). In our procedure, G is organized in regulons : G = m k=1 Rk . Here, we assume for simplicity that each regulon Rk is associated to only one regulator rk . The symbol + in indicates that the regulons are mutually connected. This high-level of connection is equivalent to a graph GB between the regulators : GB = ({rk }, Er ). The simulation is done in two steps. First, the simulation of a graph G consisting of m ¯ Λ,σ ) is regulons is done as described in Appendix 5.2. Second, a sample x of X ∼ N (0, K ¯ Λ,σ = drawn. To do that, we simply simulate α ∼ N (0, In ) since the diagonalization K ¯ Λ,σ ). V DV  implies V D1/2 α ∼ N (0, K 3.1.2 Simulated Data

ep

rin

t

Fig.8(a) shows the inter-regulon graph GB and Fig.8(b) the graph G. Each regulon has its own color. Fig.8(c) displays an observation x of the random field X on G, and Fig.9 shows its 1-D profile. In this experiment, each regulon is a potential module since the simulation procedure is based on a regulon structure. x was simulated using the multi-scale kernel (16) with 3 scales : λ1 = 8, λ2 = 14, λ3 = 24 and σ1 = σ2 = σ3 = 1. Although the theoretical mean of X is zero, the mean of each observed regulon in Fig.9 is not zero. However, due to the correlation between regulators, two regulons may have similar mean levels. This situation is favorable to the concept of module. This is consistent with Fig. 6 wherein there are 3 detected modules for 4 regulons. Note that in the absence of observation x, the spectral partitioning as recalled in Introduction, detects 4 modules corresponding to the 4 regulons. 3.1.3 Data Analysis

pr

The estimation and detection tasks are illustrated in the figures 8, 9, 10 and 11. The maximum likelihood estimation (19) was performed using the scale domain Λ0 = {2k, k = 0, 1, ..., 15}. Fig.10 displays the statistical multi-scale decomposition. The continuous 15 black line connecting the data points is the sum k=1 ˆx(2k) of all the components except the noise component ˆ x(0) . Since this line interpolates the data points, this means that the estimated noise component is very low. The selected Λ is computed from this decomposition using (21) with = 0.1. The three main components associated to {λ1 = 8, λ2 = 10, λ3 = 30} are shown in Fig.9. It is interesting to compare this statistical decomposition with the ordinary scale-space decomposition (12) shown in Fig.11. The statistical decomposition has the ability to focus more clearly on the spectral content of x. λ3 = 30 reflects low frequencies whereas λ2 = 10 contributes to high frequencies. But above all, it remedies the redundancy of ordinary scale-space representation, and therefore favors the identification of the right scales. The detected modules shown in Fig.8(d) correspond to the rule (25). In Fig.9, the locations of the detected extrema are indicated by red circles. There is exactly one detected module per regulon. This procedure is statistically assessed by Monte Carlo simulation. The random field X is simulated 200 times under the same conditions as above. From the obtained samples

Scale-space module detection for random fields

13

0.4

0.7

0.35

0.6

0.3

0.5

0.25 0.4 0.2 0.3 0.15 0.2

0.1

0.1

0.05 0

1

2

3

4

5 (a)

6

7

8

0

1

2

3 (b)

4

5

t

Figure 1: (a) Histogram of the empirical probabilities {P (|Λ| = k)}8k=1 of the number ◦ of selected scales , (b) Histogram of the empirical probabilities {P (|V | = k)}5k=1 of the number of detected modules.



rin

{x( ), = 1, ..., 200}, the probability of the number of selected scales {P (|Λ| = k)}8k=1 were estimated for = 0.01, as well as the probability of the number of detected module centers {P (|V | = k)}5k=1 , as shown in Fig. 1. The number of detected modules is ran◦

ep

dom, with a main mode at V = 5. In fact, as noted above, the mean levels of two regulons may be substantially close and therefore be recognized as belonging to the same module if they are connected. ¯ σ,Λ | in the likelihood. Above, we have mentioned the prominent role of the term log |K This is confirmed experimentally. Without this term, the estimation-detection procedure was repeated on the same data as previously. The results are shown in Fig.12. The multiscale decomposition is then quite inaccurate. All weights are very high and the scale components are close to zero.

3.2 Bacillus Subtilis Data

pr

Fig.4 illustrates the multi-scale decomposition of a field x that represents gene expressions of Bacillus Subtilis. The underlying graph G = (V, E) comes from the regulatory network of the bacterium. V denotes genes, E connections between genes and x gene expressions on V (Fig.4-a) 4 . The entire graph contains 1607 genes, 2345 edges and 132 regulons. Fig.5 displays four connected regulons extracted from this network. In steadystate, gene expressions are assumed to be governed by the model (17). In Fig.4 we see the structuring effects of the method in terms of gene grouping as this had already been shown for other regulatory networks [11]. In many applications, one is interested in studying the change of modules across different conditions [22]. In our example, the expression of the regulons depends on the nutritional environment of the bacteria over time, some of them are over-expressed and 4 The biological network has been simplified by removing the protein level network and therefore in G the regulatory protein-encoding genes and their proteins are confused. Furthermore, the edge directions in E have been deleted. Consequently, we cannot speak strictly of regulation in the sense of regulatory networks. .

Scale-space module detection for random fields

14

1.5

1.5

1

1

0.5

0.5

0

0

−0.5

−0.5

−1

−1

−1.5

−1.5

0

0

t

Figure 2: A toy example with three active regulons : two over-expressed and one under− 3 3 expressed. (a) μ

in blue. Confidence band : {μ+ Rk }k=1 in green, {μRk }k=1 in red. (b) A − + particular configuration x ( ) = {μ+ R1 , μR2 , μR3 }.

ep

rin

other ones are under-expressed. With the module detection, we search to identify regions of the graph that are particularly expressed through time. Fig.6 illustrates this detection on a short time series of the random field observed at 3 time points. The detection has been done at every time t, independently of the others. Every detected module is composed of one or several regulons. For instance, at time t = 1, there are 3 detected modules, which are respectively depicted in green, yellow and pink. The green module is composed of two regulons, what can favor the semantic interpretation of the module from properties of the regulons. The validation is primarily based on biological aspects. In the considered experience, one examines nutrient change effects : an experimental population of cells grows first in Glucose and then Malate is injected at time t0 such that 1 ≤ t0 ≤ 11. The detected modules should reflect this change. A further biological analysis is beyond the scope of our article, [4, 7]. However, a diagnostic tool is now proposed to help with this analysis. The idea is to generate configurations of X using a confidence band around the obtained decomposition 

(j) , μ

= x j∈Λ

pr

in order to quantify the stability of the detected modules. We start with the bootstrap confidence interval described in [38] that we recall. As a result of the decomposition, the

(0) = x − μ estimated residuals are : x

, whose empirical variance is σ ˆ02 = 

x(0) 2 /n. Derived from (15), a generative model based on μ

is written as :

(0)

X=μ

+ X(0) ,

where Xi are independent Gaussian random variables LG(0, σ ˆ02 ). Generate X(0) allows 2 to simulate X, which is now distributed according to N (

μ, σ ˆ0 In ). Pretending that μ

is the "true" μ, generate N bootstrap samples {x( )}N , and compute their respective smooth =1 profiles {

μ( )}N =1 . Using these samples, for every node i ∈ V , and for a given confidence

Scale-space module detection for random fields

15

+ γ close to one, a confidence interval of μi denoted [μ− i , μi ] is estimated, which provide a confidence band [μ− , μ+ ] as detailed in [38]. Our validation uses this confidence band to generate configurations. Denote {μ− Rk } + and {μRk } the two confidence profiles viewed from the regulons. For generating a new + configuration x , we draw randomly for each regulon Rk between μ− Rk and μRk , as illus trated in Fig. 2. Repeating this process N  times, we obtain new samples {x ( )}N =1 , on which module detection is performed. Finally, among the N  detected fields {{Mv ( ), v ∈ ◦

V ( )}, = 1 . . . N  }, we compute the proportion of fields that fit with the field {Mv , v ∈ ◦

V } obtained on the original x, cf. (26). This proportion is associated to the confidence γ, providing a quantitative diagnostic tool.

4 Conclusion

rin

t

The experiments show that module detection puts into light the activated modules and therefore provides a mean to study dynamic random fields. However, module detection on time series has been performed without taking into account time dependence. At every time t, the observation of the random field has been treated independently of the others. Nevertheless, it is well known that Markovian dependence can improve the sensitivity of the detection of isolated low signal. In the related paper [6], we present a Markovian spatio-temporal modeling that generalizes the present model. Doing so, in Fig.1 the ◦ probability Pˆ (|V | = 5)} should be higher.

ep

This paper proposes and implements a multi-scale graphical modeling for univariate random vectors observed on an undirected graph. The result is a multi-scale decomposition of the random field which provides an analysis tool to deal with specific treatments because it allows to select relevant scales. This tool is especially used for module detection. With hindsight, this detector seems relatively simple. However, emphasis has been put on a coherent modeling without heuristics and with very few tunable parameters.

pr

5 Appendix

5.1 Proof of Proposition 1

0  = X − X(0) = r X(j) , and recall the spectral equation KU  = U Dν where Let X j=1 0   = r Kj . First, since the columns of U are independent, we can write X  = UB K j=1 where B is a r-random vector. Then, the spectral equation allows to rewrite 0

 = U B = KU  D−1 B X ν

=

r 

0

Kj U Dν−1 B

j=1

where

(j)

X

=

Kj U Dν−1 B

=

r 

X(j) ,

j=1

,

(27)

Scale-space module detection for random fields

16

 = U B implies the covariance matrix which provides the components (22). Second, X   = Dν . Cov(B) = U KU

(28)

For a given observation x, the Bayesian estimation of the occurrence of B consists in maximizing the log-likelihood log p(b | x) = log p(x | b) + log p(b) + Cte. Given the Gaussian laws B ∼ N (0, Dν ) and X(0) ∼ N (0, σ02 In ), this amounts to compute

= argmax − 1 x − U b2 − b D−1 b , b ν σ02 b

(29)

5.2 Graph simulation

t

in (23). Note that (29) is similar to the criterion of the which provides the expression b Ridge regression [15].

rin

The simulation of a graph G consisting of m regulons : G = m k=1 Rk , is done in three steps. 1. For each set of nodes Vk making up a regulon, a regulon graph Rk = (Vk , Ek ) is simulated. 2. At a larger scale, the m regulons are considered as m nodes of a graph, and thus an inter-regulon graph GB is simulated.

pr

ep

3. The global graph G is obtained on the basis of these m + 1 graphs, as follows. For each regulon Rk , a regulator rk is drawn uniformly at random in this regulon. This regulator regulates the regulon(s) Rk such that k ∼ k  in GB , 5 . The weight of the connections between rk and nodes v in Rk are given by the probabilities of the Binomial law B(|Rk |, p) where 0 < p < 1. When a weight is below a threshold τ , for example 0.05, the weight is set to zero, then the probability distribution is renormalized. By ruling p and τ , one can modulate the number of edges between the regulator and the regulated unit. In this case, rk regulates a subset of nodes in Rk .

Acknowledgments

The referees are gratefully thanked. Their comments have improved the manuscript. The author is grateful to Alain Trouvé and Yong Yu for the experience we shared on the multiscale decomposition of images, which has been an inspiration. The author thanks warmly Benno Schwikowski for valuable discussions about the adaptation of Bacillus subtilis to nutritional environments, and Xiaoyi Chen for her help in carrying out experiments on Bacillus subtilis data. 5A

node v ∈ V is regulated by another nodes v if Xv is significantly correlated to Xv .

Scale-space module detection for random fields

17

References [1] Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466(5):761 764, 2010. [2] Mikhail Belkin and Partha Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, 56:209–239, 2004. [3] Andries E. Brouwer and Willem H. Haemers. Spectra of graphs. Springer, 2011. [4] Joerg Martin Buescher and al. Global network reorganization during dynamic adaptations of bacillus subtilis metabolism. Sciences, 335(6072):1099–1103, 2012. [5] Bernard Chalmond. Modeling and Inverse Problems in Image Analysis. SpringerVerlag, 2003.

t

[6] Bernard Chalmond. Spatio-temporal graphical modeling with innovations based on multi-scale diffusion kernel. Spatial Statistics, 7:40–61, 2014.

rin

[7] Bernard Chalmond and Xiaoyi Chen. A graphical modeling to scan network activity at modular level. Technical report, Institut Pasteur /Cergy-Pontoise University, 2012.

[8] Bernard Chalmond, Benjamin Francesconi, and Stephane Herbin. Using hidden scale for salient object detection. IEEE Trans. on Image Processing, 15(9):2644– 2656, 2006. [9] Li Chen, Jianhua Xuan, Rebecca B. Riggins, Yue Wang, and Robert Clarke. Identifying protein interaction subnetworks by a bagging markov random field-based method. Nucleic Acids Research, 41(2), 2012.

ep

[10] Timothé Cour, Florence Bénézite, and Jianbo Shi. Spectral segmentation with multiscale graph decomposition. In CVPR, 2005.

[11] Guro Dorum, Lars Snipen, Margrete Solheim, and Solve Saebo. Smoothing gene expression data with network information improves consistency of regulated genes. Statistical Applications in Genetics and Molecular Biology, 10(1), 2011.

[12] Marco A.R. Ferreira and Herbert K.H. Lee. Multiscale Modeling : A Bayesian Perspective. Springer, 2007.

pr

[13] Carlo Gaetan and Xavier Guyon. Spatial Statistics and Modeling. Springer-Verlag, 2009. [14] T.J. Hastie and R.J. Tibshirani. Hall/CRC, 1999.

Generalized Additive Models.

Chapman and

[15] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer, 2009. [16] Lasse Holmstrom, Leena Pasanen, Reinhard Furrer, and Stephan R. Sain. Scale space multiresolution analysis of random signals. Computational Statistics and Data Analysis, 55:2840–2855, 2011. [17] Trey Ideker, Owen Ozier, Benno Schwikowski, and Andrew F. Siegel. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics, 18(1):S233–S240, 2002.

Scale-space module detection for random fields

18

[18] Harri Kiiveri and Frank de Hoog. Fitting very large sparse gaussian graphical models. Computational Statistics and Data Analysis, 56:2626–2636, 2012. [19] Eric D. Kolaczyk. Statistical Analysis of Network Data : Methods and Models. Springer, 2009. [20] Risi Imre Kondor and John Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Morgan Kaufmann, editor, International Conference on Machine Learning, pages 315–322, 2002. [21] Gert R. G. Lanckriet, Tijl De Bie, Nello Cristianini, Michael I. Jordan, and William Stafford Noble. A statistical framework for genomic data fusion. Bioinformatics, 20(16):2626–2635, 2004. [22] Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath. Is my network module preserved and reproducible? PLoS Computational Biology, 7(1), 2011.

t

[23] Ann B. Lee and Larry Wasserman. Spectral connectivity analysis. Journal of the American Statistical Association, 105, 2010.

rin

[24] Tony Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116, 1998. [25] Bojan Mohar. Some applications of laplace eigenvalues of graphs. In G. Hahn and G. Sabidussi, editors, Graph Symmetry: Algebraic Methods and Applications,, volume Ser. C 497, pages 225, 275. Kluwer, 1997. [26] M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review, E(74), 2006.

ep

[27] Noa Novershtern, Aviv Regev, and Nir Friedman. Physical module networks: an integrative approach for reconstructing transcription regulation. Bioinformatics, 2011. [28] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.

[29] Ihor Smal, Marco Loog, Wiro Niessen, and Erik Meijering. Quantitative comparison of spot detection methods in fluorescence microscopy. IEEE Trans. on Medical Imaging, 29(2):282–301, 2010.

pr

[30] Daniel A. Spielman and Shang-Hua Teng. Spectral partitioning works: planar graphs and finite element meshes. In IEEE 1996, editor, 37th Symposium on Foundations of Computer Science, pages 96–105, 1996. [31] Jian Sun, Maks Ovsjanikov, and Leonidas Guibas. A concise and provably informative multi-scale signature based on heat diffusion. In Eurographics Symposium on Geometry Processing, volume 28. Blackwell Publishing, 2009. [32] Liang Sun, Shuiwang Ji, , and Jieping Ye. Adaptive diffusion kernel learning from biological networks for protein function prediction. BMC Bioinformatics 9:162, 9(162), 2008.

[33] Richard Szeliski. Computer Vision: Algorithms and Applications. Springer, 2010. [34] Kevin Thon, Havard Rue, Stein Olav Skrovseth, and Fred Godtliebsen. Bayesian multiscale analysis of images modeled as gaussian markov random fields. Computational Statistics and Data Analysis, 56:49–61, 2012.

19

rin

t

Scale-space module detection for random fields

Figure 3: Scale-space blob detection on an image observed on a regular grid.

[35] Igor Ulitsky and Ron Shamir. Identification of functional modules using network topology and high-throughput data. BMC Systems Biology, 1(8), 2007. [36] Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4), 2007.

ep

[37] Grace Wahba. Spline models for observational data. SIAM, 1990.

[38] Yuedong Wang and Grace Wahba. Bootstrap confidence intervals for smoothing splines and their comparison to bayesian ‘confidence intervals. J. Statist. Comput. Simulation, 51:263–279, 1994.

pr

[39] Fan Zhang and Edwin R. Hancock. Graph spectral image smoothing using the heat kernel. Pattern Recognition, 41:3328 – 3342, 2008.

Scale-space module detection for random fields

20

gabT

gabR

data X cdd

ywtG

bmrU

yodQ

yheD

yusR usd

yjdH ycgG

spmA yngG

yhjR

yqfZ yknT

yheC ydhDyqfD

yitC ydcC ypjB

ycgF

ytrI

yunB

ydjP

rodA

yaaH

yfhM

mreC

recU

ytrH sodF

yfnD

ponA mreB ugtP

phoA

cotJB

cotJC yebC

ywjA

blt

yusV

bltR

narJ narH

phoR

hemX rapD

yxjI tagE

tagA

tagF

bdbC

yxjF

ctaA narK

psd

bdbD

pssA rsiX

mcsB spo0M

ytxJ

pbpI

yqhP

walR

yrrS ywnJ spoIIP

yknX yknW ybfP

yfkN

nsrR

fadR yesM

tuaH

dhbF

nasE

qcrB

sigH

nasA

cypX rapH

kipR kipI ywlG comFA

yvrN

ycsI nasC

gltA

ureC

ycsF

ureB

yycC ywdI

yvzA yvcB

ureA

sigF

dltE

pbpH

epsL

arsC

ytfJ yhfM

ylbB lonB

yddJ

galK

rbsA

ctaB

gerD

tlp

ycbCycbD

albC

yozM

acoR

ykuU

gerAC sspN spoIVB

yckD

yydJ

dnaG

yxaJ

spo0FkinA

ykuV

epr

yjcM

ppsD

rapA

flgL gabP

ruvA yqzHyhaO

yvqJ

skfG

yqcG

ylbA

cotM

ppsA rocG skfH

cgeA

ynzD

yclJ

ybdN

cgeD

yybL

ybaK

ylqB

tlpA

yfjR

sspL yraG

gerKA

ydfS

bcd

appF yfkQ

appC

salA

gdh

ykoV

yfkR

glcU

hag senS

ilvD

fabL

yufP

sspE

yufQ metP

fliK

rocD rocE

motB

cheV

yjcQ

argJ

degR

fliT

argH

(a)

cdd

ydaP

yfhK

era

yvrE

ykzI

ywsB

purDpurK

ctc

clpE

ydaT

yceD yceG yceF yacL

yfnD

ispDyceH

sigB

ydaH

ponA

yqjL yceC

sigI

mreB

ugtP

disA

purH

purR

phoB

clpXlonA

ydhF

purF

ctsR

bcrC

tuaE

fabF

plsC

divIC

phoA

tuaD

tatAD

ywbO

pstA

yfiZ

yfmE

csoR

ykuO

yxjI tagE

tagF

tagA

bdbC

pssA rsiX

mcsB

clpC

spo0M

ytxJ

fur

pbpI feuC

yrrS ywnJ spoIIP

yknX yknW ybfP

yxjJ

fapR

yqhP

ycnJ

walR

csoR

fadF

hemB

yflA

ykoL plsX

ctpB

nasF

dhbE yqxD tuaG besA dhbB dhbA

nasE

tagF

yjdB stoA

pbpI

fadA

icd

citZ

mdh

cydD

yolA

ywdJ

ppsE

ywlF

comK

sigH

cypX

rapH

kipR

kipI

ywlG comFA

yvrP

glnR

yvrN

kipA

ycsI nasC

gltA

ureC

ureB

yvzA yvcB

ureA

sigF

pbpH

yokL

galK

epsJ

yvmC

ywbD

gpr

gerD

tlp

ycbCycbD

gltA

rok

sspH

yolC

ptkA

yxaL

hutU

yydJ

yphF

yflB

ydjH

dnaG

yxaJ

yjcM

ppsD

bacC bacA bacE bacD

dppD

sigA

levR

spo0A

cotX

cotU

ydfS

bcd

appF

yfkQ

appC

salA

yfjR

ycgM

hag

senS

yuiB

ilvD

fabL

yufP

sspE

guaB

yurJ

cheC

rocD rocE

yclH

ywcH

frlM

dnaN

clpQ

frlO

ypmP frlB frlN yufN frlD yuiA

csrA

ylmE

yneF

codV

ylmA

ahrC

ftsX tkt

yusE

med

ylmH

ylmD

exoA sspM

yrkN

yrkO

yvyC

yjcP hemAT mcpA

argJ

carA

degR

yfjR

yuiB ilvD yufP

sspE

guaB

yqzC divIVA

fliK

yurJ

pr

rocD rocE

sigD cheC

yclH

yrrL

ylmE

ycgA

ywcH

frlM

dnaN

clpQ yneF codV ylmA

yfmJ yqzD accD

yerB

nfrA

ilvA

ypmP frlB frlN yufN frlD yuiA

yrkP

pdxS

ylmG accAftsE yvhJ pdxT yrkQ

ykfC yufO

ykaA

metS

yvyE

yppE yuxH

comZ metQ

frlO

csrA

yngB

ykcC

sepF

bkdR

rocB lytF

ykfD

metN

yycA

ykcB

bkdAAbuk

rocR

rocC

fliZ

flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ

ywfH yufQ

metP ykfA

flhB fliG

senS

fabL

spoIIE

yngA

bkdB bkdAB

ylxH fliH ycgM

hag yisY

ahrC

ftsX tkt

yusE med ylmH

ylmD

yqcF

yrkN

yrkO

yqzC divIVA

yppD spoIIE

kinC

clpY ykfB

yusD

yfhP

yfmS

dnaA

yoaH

frlR

yjcQ yjfB flhP mcpC fliD

yvyC

ywlC yydA argD

yfmT

fliS

motA

pgdS

(c)

cotH cotZ

cotG yurS

yclI

ptb

lpdV

appB

glcU yraE

pbpF

flhO tlpB mcpBlytD

fliT

argH

cotX

cotU

lytC

appC

salA yoaR

gdh

yraD

tlpC

argB

cotB

cgeE cgeC cotV ftsY cgeByxeE

lytA

bcd

appF yfkQ

ykoV

yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU

yfmT

motB

cheV

argF

cwlH

cotW cgeD

lytB

rocA

ydfS nfo

carB argC

argG

spo0A

yscB

gerKA sspK

yfkR

yppD

yusD

yfmS

fliS

yjcQ

argD

cotY

sspG

yclJ yjcN yybL yybN

yybK

appD

sspA

yraF

gerKB

gerBA

yoaH

motA

yjfB flhP mcpC fliD

yydA

cgeA

sigL ybdN

tlpA

sspL yraG

splB

yhcV

kinC

ykfB

dnaA

frlR

ywlC

sipV

ydiPyneB

cotM

ppsA rocG ynzD

skfE

ybaK

yybM

clpY

yfhP

ydgH

yorB

ylbA

skfH

ylqB

ybdO

codY

spoVAF

spoVAD sspP

yrkQ

parE yqjX tagC yhaZ ligA ruvA yqzHyhaO

yvqJ

skfG

yqcG yqxJ

yxbC yxbD

spoIIGA

gerBB ypzA

sspO yndE yndD

ylaJ

yrkP

pdxS

yqcF

gerE

lytR

yjdG

yocD pnbA

skfC

skfF

asnH

sinI

yxnB

csgA sspI

yvdQ

ykaA

metS

yvyE

ylmG accAftsE yvhJ

pdxT

ycgA

yfmJ yqzD accD

yerB

nfrA

ilvA

levR

ykuL

xynD

yrpD

yobB

slrR skfB

yxbB

dppE dppCkapB yxaM

dppA sigA

ybxH

yrrL

yppE

yuxH

ykfC

yufO

ccpB

sepF

bkdR

rocB lytF

comZ

metQ

gerBC

yngB

ykcC

rocR

rocC

fliZ

ykfD

metN

yycA

ykcB

bkdAAbuk

pdaA

fliK

sigD

flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ

ywfH

yufQ

metP

ykfA

flhB

fliG

yisY

pbpF

dppD

bacF

dnaE yhaM yozL ruvB aprX

sdpA yoqM yfmI

bacC bacA bacE bacD

bacB

yxkC

yhjE

acsA

ywbF ydeJ

ppsD

racA

kinB ymaE

scoC flgM

ydgG

ydiO ynzC

xkdA

yitF

yngA

bkdB bkdAB

ylxH

fliH

glcU

yraE

yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU

ptb

lpdV

appB

yoaR

gdh

ykoV

yraD

nfo

yxaJ

yomJ yjcM sdpB

rapA phrA dppB

yvyF yvyG yuiC

sda

uvrC pcrA yneA

levG

lysA

sleB

gerKA

sspK

ybxH

yfkR

dnaG spo0FkinA

ykuV

flgL gabP

sigG

yhjD

parC ykvR yorC

dinB

levE sacC levD

yydI

ydjG

epr ansZ

flgK

hutM ackA hutH hutI hutG

yndF

yclI

appD

sspA

yraF

gerKB

spoVAD sspP

yflB

ydjH

rapC nprE rapE

gerAC sspN spoIVB

yusN

yteA

lytC

sspL yraG

splB

gerBA

hutU yphF

yitG

spoVAA

spoVAB

lytA

yitF

ccpB

yolC ptkA

yydJ yydH

yhjC

oppD

yhcN

gerAB gerAA

yckD

yhcQ ypeB sspC

uvrB

uvrX uvrA yerH

rok yxaL

cotH

cotZ

cotG

yurS

lytB

rocA

gerBB ypzA

yhcV

spoVAF

pdaA

gerD

tlp

ycbCycbD

spoVT pbpG

cotB

cgeE cgeC cotV ftsY cgeByxeE

sspF yozQ sspB yqfU

yscB

spoVAB

sspO yndE yndD

ylaJ

gpr

ycbG

sipV

ydiPyneB

cotM

cwlH

cotW

cgeD

deoR

licR

levF

hutP

cotY

sspG

yclJ

yjcN

yybL

yybN

yybM

yybK

nupC

lexA

acoBacoL acoA

epsK

oppA oppB oppC

cgeA

sigL

ybdN

tlpA

csgA sspI

exoA sspM

ydgH

ywbD

sspH

epsI epsC epsB

aprE spoIIAA spoIIAB

dacF

oppF

licH

licA

hbs

yydG yqxI yolB

epsJ yvmC

arsB yqhG yqcK

yorB

ppsA rocG

ynzD

ybaK

ylqB

ybdO

codY

yvdQ

sleB

gerBC

parE yqjX tagC

yhaZ ligA

ruvA yqzHyhaO

ylbA

skfH

skfE

spoIIGA

yndF

licB

yozM

acoR

ykuU

acoC

yabT yqhH

seaA

sacX

licC ctaB nrdE

bdbA

bdbB

epsH

yvqJ

skfG

yqcG

yqxJ

yxbC

yxbD

sinI

yxnB

lysA

yteA

gerE

ykuL

xynD

yocD

pnbA

skfC

skfF

asnH

dppA

bacF

yhjE

yopL malP

glpF

albC

sunA

tasA

epsO

yobB

slrR

skfB

yxbB

dppE dppCkapB yxaM

bacB

yxkC

ydiO

dnaE

galK

glpT

glpP

glpK

gmuA rbsA

albD

albE albA albB sboX sboA

yolJ

yfmG

sinR

rapG

ynzC

yhaM yozL ruvB aprX

ymaB

glpD

gmuGgmuB gntK gmuD rbsC

albF

xylR

pdp

yorD

leuB sipW yddJ

xynB

treA

yvmB

albG

sunT

ilvH leuD leuC

epsA

ydjI

malA

yclK

csn

dltC phy

recA ilvB

abrB ilvC

leuA epsG

yheJ

ybaJ

xylB

kdgA

licT

bglH

xynPyxiE

msmX

dltD

phrE

spoVG epsE

bdhA

epsF

epsN

ycbJ

iolB treR

cimH

dltAdltB

phrC

yocH epsL

epsM

yokK

yokL

glpQ

gmuR

spoIIQ spoIIR

arsR

xylA

kdgK

treP sucD nrdI

gntR

skfA

epsD phrK

yrzI

yphE

ytfJ yhfM

lonB

yjbA

sdpA

yoqM

yfmI

kinB

ymaE

scoC

flgM

arsC

yhfW

gmuF

gmuC yvnA rbsR amyE gmuE

dltE

pbpH

yqzG

ksgA

ypfB

ylbB

yphA

iolT

odhA

cydC nrdF

gntZ rbsD

yqaP

ctaO ywcE

yvcA

yuaB

rsfA

ytfI

yodF

nasB

sacY

ydgG

lytR

yjdG

racA

phrA dppB

yvyF yvyG yuiC

yneA

xkdA

yrpD

sdpB

rapA

flgL gabP

sigG

levE sacC levD levG

acsA

ywbF ydeJ

yomJ

spo0FkinA

ykuV

epr

ansZ

flgK

hutM ackA hutH hutI hutG

yydI

ydjG

rapC nprE rapE

gerAC sspN spoIVB

yusN

yvzA yvcB

ylbC

gltB

yveA

yitG

spoVAA

ureB ureA

glnM

epsK

yydH

yhjC

oppD

yhcN

gerAB gerAA

yckD

sspC

sspF yozQ sspB yqfU

ureC

sigF

wapA

bpr levB

yhjD

sda

uvrC pcrA

dinB

rbsK

ctaC

ctaF ctaD ctaG

srfAAcomS srfAB

spo0E

yweA

yvrO

tnrA

ycsF

yycC ywdI

sacB

gltC

uvrB

uvrX uvrA yerH

parC

ykvR yorC

levF

hutP

spoVT pbpG

yhcQ ypeB

yvrN

ycsI nasC

lipC

sipT glnP ywdK

ppsB

yvrP

glnR

kipA

deoR

licR

lexA

acoBacoL acoA

ctaE

srfAC srfAD

yokI xynC

yokJ

yyaC

epsI epsC epsB

aprE spoIIAA spoIIAB

dacF

oppF

rapH

degU

ykzB

nupC

licA

hbs

yydG yqxI

yolB

oppA oppB oppC

ycbG

licB

yozM

acoR

ykuU

yorD

tasA

acoC

arsB yqhG yqcK

ywlG comFA

licH

ctaB

nrdE

bdbA

bdbB

sigH

kipR

kipI

licC

glpF

albC

sunA

abh

rapK

comA

cypX

cydB

rbsB gntP

cwlS

yttP

ppsC

degQ

comK

glnA

cccA qcrA

cwlD

ylaE

ppsE

ywlF

ywrD

dctP

odhB

bglS bglP

resD qcrC

qcrB

nasA

acuC

iolD

yobO

yerI

pel

ywoF

alsT

ycsG yqzE glnH

ssbA

yopL malP

glpT

glpP

glpK

albD

albE albA albB sboX sboA

yolJ

ywdJ

noc dprA sbcD

yisB

comEC

yfmG

sinR

epsH

yabT yqhH

seaA

sacX

xylR

pdp

gmuA

rbsA

albF

xynB

treA

ymaB

glpD

gmuGgmuB gntK gmuD

rbsC

yddJ

xylB

kdgA

yvmB

kdgR

ccpA ycdA

ydeH yolA

iolR

iolG

iolI

antE

phrF

dacC

iolJ acuA yvcI ywdA

iolC iolF

sdpI

yngC

iolS

araL

iolH iolE

yxkF

ywhH

bofC

sacA acuB

pta nagA

kduD cotD

kduI

citM

araB

araE

cydA

phrG

yttA

glnQ

citT

sacP

araA

cotC

yhaR kdgT

rapF

yncM

yydF

yxeD

yhcM csbX

yycB

maf comGD

nin comEB comGF

albG

sunT

ilvH

leuD leuC

sipW

ydjI

rapG

comGG comEA

msmX

dltD

dltC

phy

recA

ilvB

abrB

ilvC

leuA

epsG

epsA

leuB

ybaJ

epsO

ycbJ

yxxG

licT

bglH

xynPyxiE

comN

malA

yclK

spoIIQ spoIIR

yjbA

arsR

iolB

treR

cimH

dltAdltB

phrE

spoVG epsE

bdhA

epsF

epsN

yheJ

yokK

ytfJ yhfM

lonB

yhfW

glpQ

gmuR

csn

epsL

epsM

yqzG

yyaC

arsC

ylbB

yphA

xylA

kdgK

treP sucD

nrdI

gntR

gmuC yvnA rbsR amyE gmuE

phrC

yocH

yrzI

yphE

rsfA

ksgA

ypfB

nasB

gmuF

rbsD

dltE

epsD phrK

ctaO

ywcE

yvcA

yuaB

ylbC

ytfI

yodF

yveA

odhA

cydC nrdF

gntZ

skfA

spo0E

yweA

yvrO

tnrA

ycsF

yycC ywdI

wapA

glnM

gltB

rbsK

ctaC

ctaF ctaD ctaG

srfAAcomS srfAB

yqaP

ppsB

degU

ykzB

lipC

sipT glnP ywdK

bpr levB

sacY

ctaE

srfAC srfAD

yokI

xynC

yokJ

sacB

gltC

abh

rapK

comA

glnA

ywrD

nasA

cydB

rbsB gntP

cwlS

yttP

ppsC

degQ

ycsG yqzE glnH

ssbA

cccA qcrA

cwlD

qcrB

alsT

noc dprA sbcD

dctP

odhB

bglP

resD

qcrC

ylaE

cydD

nasD

ftsA spoVS

sacT

ypiF

araN

araD lcfB

etfA

sdpR

vpr

lytE

ftsZ

guaC

pucK pucL pucM

yflN

abfA

fadB lcfA

fadE

fadN

citB

phrI

pucJ

comGB addAglcR rsmG comFB comGC comC

sbcC

bglS

citZ

araQ

ycnK

ssbB nucA

comGA

acuC

fadA

icd

mdh

araR

sucC

yjdB stoA

dhbC ycxA yuzA

iolR

iolG

iolD

yobO

yerI

pel

ywoF

yisB

comEC

ccpA

ycdA

ydeH

phrF

dacC

yttA

glnQ

nasE

cstA

araM

araP

ccpC

katX

rpsR rpsF comFC comGE yhjB ybdK

iolJ acuA yvcI ywdA

iolC

iolF

iolI

antE

yhcM csbX

yycB

maf comGD

nin comEB comGF

yesM

ctpB

nasF

tuaF

addB

iolT

araL

iolH

iolE

yxkF

kdgR

yflA

dhbF

dhbE yqxD tuaG besA dhbB dhbA

abnA

fadR

ykoL

tuaH

acdA

galT

etfB

resA resE resC

plsX

fabD

rpoE

fadM

uxaC

yqhB

fabG

sacA acuB

pta

nagA

kduD cotD

kduI

sdpI

yngC

yfkN

nsrR

ylbP

pucR

fadH

spoIVCB

perR

minD

hmp

iolS

citM

fadG

spoIIIC

yoaW

yknY

minC

sacP

fadF

hemB

sigX

araB

araE

cydA

phrG

ywhH

bofC

yxxG

comGG comEA

comN

yknZ

yvgN

pucG

sacT

ypiF

araA

cotC

yhaR kdgT

rapF

yncM

yxeD

fnr

exuR

yxzE

pspA

ymzB yvgO yxaB

radC

ahpF

ypuD

rsiW ybfO

pucI

yflN

abfA

araN

araD lcfB

etfA

nasD

yydF

walR

yqhQ

feuA

arfM

katA

hemC

resB

fadB lcfA

sdpR

vpr

lytE

ftsZ

ftsA

spoVS

yknX

yknW ybfP

ybbA

fadE

fadN

citB

phrI

ycnK

guaC

pucK pucL pucM

yrrS ywnJ spoIIP

yxjJ

fapR

narI

narG

hemA

ahpC

scoA

phoP

ytxG yvyD ytxH

feuC

feuB

dhbC

ycxA yuzA

pucJ

comGB addAglcR rsmG comFB comGC comC

spo0M

ytxJ

mrgA

hemL

yjmC exuT yxjC mmgA mmgC

yjmD uxuA

mmgB

ybfM

mcsA

fur

ycnJ

pucD pucC pucB

yxjF

ctaA narK

psd

bdbD

mcsB

clpC

acpA

pucA pucFpucH pucE

citT

araR

sucC

araQ

sbcC

bdbC

zosA

uxaA

scoB

rsbV rsbW rsbX

tagA

pssA rsiX

ep katX

addB

rpsR rpsF comFC comGE yhjB ybdK

ssbB nucA

comGA

tagE

ydjM tagD

yjeA

acdA

cstA

araM

araP

ccpC

dhbF

tuaF

pbpX

tagB

ykvT

cwlO

rpoE

fadR

yesM

uxaB

iseA

fadM

etfB

resA resE resC

fabG

tuaH

yxjI

fhuD

yfhC

fabHA fabHB fabI

galT

ylbP

resB

fabD

rapD

ywjB

abnA

spoIIIC

uxaC

yqhB

hmp

bltR

narJ

narH

hemD

hemX

fhuG

fadH

yfkN

nsrR

blt

sigM

phoR

spoIVCB

minD

sigX

pucR

ykuO

yrhH

guaD

yfiY ykuP yhfQ ycgT yclQ yclN

katA

perR

yknY

yvgN

bltD

spoIIID

yjbC

yfhA

yqhP

yknZ

minC

feuA

ymfH

ymfF

ydfK ymfD

ylxX

phoD

fhuB yfmD

fhuC

ahpF

yoaW

yxzE

pspA

ymzB yvgO yxaB

radC

yqhQ feuB

pucG

pucI

yndA

mta

spoIID

murBsbp

metA

ywbO

pstS

tuaC

tuaB

tatCD

yclO yxeB

ykuN

hemC

fnr

yqfC

yokU

coxA

spoIIIAD

spoIIIAG

ylxW

tatAD

csbB

pstC

pstBB

yfmF

yfiZ

yfmE

arfM

scoA

exuR

phoA

divIB

divIC

bcrC

tuaE

ykvI

spoIIIAH bofA spoVE spoIIIAE

cotJC

yebC

murF

ytpA

pstA

narI

fadG

ybbA pucA pucFpucH pucE

ydhF

tuaD

yodT

ysnD

yfnD

phoB

clpXlonA

cotJB

yesK

ycgR

ycgQ

pstBA

ypuD

rsiW ybfO

ytrH

sodF

spsK

mreD

mreB

ctsR

yfmC yclP

phoP

ytxG yvyD ytxH

fabHA fabHB fabI

ahpC

spsJcwlJ

ytvI

kamA

ponA

ytpB

ugtP

fabF

spoVK yyaD yobW

yhaX spoIIIAB

ydaH

yqjL yceC

sigI

ykuT

plsC

hemA

yjmC exuT yxjC mmgA mmgC

yjmD uxuA

mmgB

ybfM

mcsA

acpA

ywjB

pucD pucC pucB

yxjF

ctaA narK

psd

bdbD

hemL

uxaA

scoB

rsbV rsbW rsbX

ydjM tagD yjeA

fhuG yfmC yclP

pbpX

tagB ykvT cwlO

yitC

ydcC ypjB

spoVB

yodP

spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ

sigE

ypbG

ispDyceH

sigB

rnr

yoaA

yusV

copA

iseA

usd

yqfZ yknT

yunB

yhaL

coaX

yhdK

ywtF

ywaC

yfkM

narG

mrgA

yusR

yheC ydhDyqfD

cotO

yhdL

ypuA

ftsH

sigW yceE

ysxC

zosA

yheD

spmA yngG

yhjR

ycgF

yjcA

yceD yceG yceF yacL

narJ

hemX

uxaB

yodQ

yjdH

ycgG

yjfA

secDF bmr

ysnF

purF

hemD

rapD

fhuD

purR

glgA

ysxE

prkA

yodR

yngI

disA

pbuO

cotE

dacB

yngE

mbl

glgD

yabQ

ytrI

radA

blt

spoIIID

ypqA

ispG

glgP

spmB

bmrR

clpP

purH

bltD

ylxX

narH

phoR

yfhC

yfkS

gerM

spoVID

yabR

spoIIM

spoIVFA

ddl

yfkT

ywjA

guaD

yfiY ykuP yhfQ ycgT yclQ yclN

ydaT

spoVR

yqxA

yngF glgB

yjaV

asnO

safA

ydjP

mreC

recU

hprT

ylxP

trxA

yngJ

yteV

nucB

yodS

yocL spoVMspoIVA

yeaA

yaaH

ywmF

yotD

yfhA

fhuC

clpE

purS

purN

yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC

yuzC

yqeZ

yqfB

ysdB

yfhD

cypC

yitD

ydcA

ytxC

ydbT

yteJ

yozO pbpE

tilS

yocB

purM

metA

ctc

yhdF

yxiS

purL

yoaG

yfhM

gsiB

yflH

copZ

sigM

yjbC

phoD

fhuB yfmD

yclO yxeB

ykuN

yrhH

pstS tuaC

tuaB

tatCD

yfmF

copA

csbB

pstC

pstBB

ywjA yusV

purDpurK

ymfF

ydfK ymfD

yckC

yvlB

mreBH rsgI ybfQ

yqfA

racX

yfhL

rsbRD

ydeC

yhxD

mgsR

bltR

pstBA copZ

yraA

yhcO

yobJ yjoB

yvlA

ythQ yvlD

ydbS

sppA xpaC

rodA

yxnA

csbA

ykgA

ygxB

purE

pbuX glyA

ymfH

mta

spoIID

spoIIIAG

ylxW

murBsbp

ytpA

yfkM

spoIIIAD

spoIIIAH bofA spoVE spoIIIAE

cotJC

divIB

murF

ysxC

yotD

cotJB

yebC

folD

xpt

yabJ

purC

nusB

yesK

ycgR

ycgQ

purA pbuG

yndA

coxA

aag

purB

yqfC

yokU

mreD

ytpB

ykuT

purQ

ykvI

spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ

sigE

ypbG

yuaF yaaNydjO ywrE yoaF fosB

yvlC

ythP

opuE

yoxB

yxbG

ypuB

ydaS

katE

ykgB

ispF

ytkL

nadE

yhdN

nhaX

spsK

yhaL

coaX

yhdK

ywtF

ywaC

yhaX spoIIIAB

yhdL

ypuA

ftsH

sigW yceE

radA

pbuO

1000

yaaI

ydaD

csbC

yxzF

ywiE

yfhE

yodT

ysnD

yvrE

ywlB

yfkH

aldY

ytrH

sodF

yuaI

ywsB

ohrB

spsJcwlJ

ytvI

yycD

ydaG

yflT

bmrU

spoVK yyaD yobW

kamA

secDF bmr

ysnF

rnr

yoaA

spoVB

yodP

yfhF

yvaK

ywjC

yfkD

ywzA

ykzI

yjgC

yoxC ydhK

yitC

ydcC ypjB

yunB

era

yfkI

sodA

gspA

yqfZ yknT

yheC ydhDyqfD

cotO

yfkT

yfhD

yfkS

yhjR

ycgF

ydaE

yjgB

ycbP

ybyB ywmE

ywtG

yjcA

ylxP

trxA

ycgG

yabQ

yngI

ywmF purN

usd

ydaP

yfhK

yocK

yusR

yjzE

yugU

dps

yjgD

yheD

spmA yngG

yjfA

bmrR hprT

cdd

yxkO gtaB

ytaB

yitT

yodQ

yjdH

ytrI

ddl

ysdB

clpP

purS pbuX glyA

800

gabD

ycdF

yfkJ

ydaF

glgA

ysxE

prkA

yodR

ycdG

ydbD

cotE

dacB

yngE

mbl

glgD

spoIIM

mreC

recU tilS

yocB

purM

1600

csbD

ypqA

ispG

glgP

spoIVFA

yaaH

yfhM

gsiB yflH

yhdF

cypC

1400

yerD

gerM

spoVID

yabR

safA

yeaA

ydjP

rodA

yfhL

rsbRD

ydeC

yxiS

purL

racX

opuE

yxbG yxnA

csbA

asnO

spmB

ykgA

yhxD

1200

spoVR

yqxA

yngF glgB

yjaV

yocL spoVMspoIVA

yoxB

ispF

ypuB

mgsR

600

yngJ

yteV

nucB

yodS

yuzC

yqeZ

yqfB

ytkL

nadE

yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC

ytxC

ydbT

yteJ

yozO pbpE

yaaI

yfkH ydaD csbC

yxzF

katE

ykgB

ywlB

ydaG

yflT bmrU

yraA

yoaG yqfA sppA xpaC

yitD

ydcA

mreBH rsgI ybfQ

ydbS

yycD

yuaI

yjgC

yoxC ydhK

yckC

yvlB ythQ yvlD

yvaK

ywjC

yfkD

ywzA

nhaX

ygxB

400

pgdS

yhcO

yvlA

yfhF yjgB

ycbP yfkI

sodA

ohrB

aag

200

spoIIE

(b)

yobJ yjoB

yvlC ythP

ydaE

yugU

dps

yjgD

folD

yrkO

yppD

gabR

yuaF yaaNydjO ywrE yoaF fosB

yjzE

yfkJ yxkO gtaB

ytaB

yitT yocK ybyB ywmE

ywtG

purE

ylmD

yqzC divIVA

med

ylmH

gabT

gabR

gabD

ycdF csbD yerD ydaF

ycdG

ydbD

xpt

1600

flhO tlpB mcpBlytD

tlpC

mcpA

argB

carA

gabT

nusB

1400

yusD

yjcP hemAT

yvyC

yjfB flhP mcpC fliD

purQ

1200

kinC

carB argC argG

purC

−1

yrkN

yrkQ

600 800 1000 component at scale5

yfmT

fliS

argF

argD

ydaS

yrkP

pdxS

yqcF

400

yfmS yoaH motA

yydA

gspA

ykaA

metS

comZ

ahrC

ywlC

ywiE

ylmE

yvyE

ylmG accAftsE yvhJ pdxT

yfmJ yqzD accD

yusE

dnaN

ftsX tkt

yneF

codV clpY ykfB

yhdN

yrrL yppE yuxH

yerB

ywcH

clpQ

dnaA frlR

aldY

sepF

bkdR

rocB lytF

nfrA ycgA

ylmA

200

0

rocR

rocC

fliZ

ilvA yufO

ypmP frlB frlN yufN frlD yuiA

yfhP

yfhE

yngB

ykcC

ykfC

metQ metN

frlM frlO

csrA

ykcB

bkdAAbuk

ykfD

yurJ

ykfA

guaB

flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ

ywfH

yuiB

pbpF

cheC

fliG

yisY

yraE

yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU

sigD

flhB

ycgM

yraD nfo

spoVAD sspP

bkdB bkdAB

ylxH fliH

yycA

yngA

ptb

lpdV

appB

yoaR

sspK

ybxH gerBA

appD

sspA yraF

gerKB

1600

rin

splB

yhcV

ccpB

yclH

yclI

yitF sspO yndE yndD

1400

1

lytA lytC

pdaA

yabJ

cotH cotZ

cotG yurS

cotU

lytB

rocA

spoVAF

purB

cotX

yybK

spo0A

yscB

gerBB ypzA

ylaJ

purA pbuG

cotB

cgeE cgeC cotV ftsY cgeByxeE

yybN yybM

ybdO

codY

cwlH

cotW

yjcN

skfE spoIIGA

cotY

sspG

sigL

yqxJ

yxbC yxbD

sinI

yxnB

−2

yorB

yocD pnbA

skfC

skfF

asnH dppA

bacF sigA

gerBC

1200

0

ydiPyneB

yobB

slrR skfB

yxbB

dppE dppCkapB yxaM

dppD

bacB

yxkC

yhaM yozL ruvB aprX

sipV

ydgH

sdpA yoqM yfmI

kinB bacC bacA bacE bacD

ymaE

scoC flgM

parE yqjX tagC yhaZ ligA

lytR

yjdG

racA

phrA dppB

yvyF yvyG yuiC

xkdA

gerE

ykuL

xynD

yrpD

sdpB

ansZ

flgK

sigG

spoVAB

exoA sspM

600 800 1000 component at scale4

dnaE

yndF

csgA sspI

yvdQ sleB

400

2

ynzC

levR

acsA

ywbF ydeJ

yomJ

ydgG

ydiO

yneA

levG

yydI

ydjG

yusN

yteA

uvrC pcrA dinB

yhjE

ackA hutH yphF hutI hutG

lysA

spoVAA sspF yozQ sspB yqfU

200

yhjD

sda

ykvR yorC

levF

hutM

hutU

yitG

sspC

uvrB

uvrX uvrA yerH

parC

lexA

acoBacoL acoA

levE sacC levD

yolC ptkA

hutP

spoVT pbpG

yhcQ ypeB

1600

hbs

yydG yqxI yolB

rok

yflB

ydjH

rapC nprE rapE

licR

bdbA

bdbB

yxaL epsK

yydH

yhjC

oppD

yhcN

gerAB gerAA

−2

deoR

licA

albD

albE albA albB sboX sboA

yolJ

sspH

epsI epsC epsB

oppA oppB oppC gpr

ycbG

licB

nrdE

albF albG

sunT

sunA

epsJ yvmC aprE spoIIAA spoIIAB

ywbD

1400

nupC

yorD

acoC

rapG

dacF

oppF

1200

licH

licC

tasA

epsO

ycbJ

arsB yqhG yqcK

600 800 1000 component at scale3

0

yopL malP

glpT

glpP

glpK

yfmG

sinR

epsH arsR

yabT yqhH

400

xylR

pdp

ymaB

glpD

glpF

dltD

dltC

ilvH

sipW

ydjI

yvmB

yclK

leuD leuC

epsA

leuB

ybaJ

yokL

yjbA

yhfW

xynB

treA

gmuA

spoIIQ spoIIR

yphA

seaA

ilvC

leuA epsG

yheJ yokK

yyaC nasB

xylB msmX

cimH

dltAdltB

phy

recA

abrB

bdhA

epsF

epsN yqzG

phrE

spoVG epsE

ilvB

epsM

rsfA

ksgA

ypfB

phrC

yocH

yrzI

yphE ylbC ytfI

yodF

yveA

200

2

licT

bglH

xynPyxiE

treR

kdgA

malA

gmuGgmuB gntK gmuD

epsD phrK

ctaO ywcE

yvcA

glnM

gltB

iolB

treP sucD glpQ

gmuR

gmuC yvnA rbsR amyE gmuE

srfAAcomS srfAB

xylA

kdgK

nrdI

gntR

rbsC

yuaB

wapA

bpr levB

gmuF

rbsD

csn

sacB

sacX

−0.1

iolT

bglS

odhA

cydC nrdF

gntZ

skfA

spo0E

yweA

yvrO

tnrA

lipC sipT glnP ywdK

yokI xynC

yokJ

yvrP

glnR kipA

rbsK

ctaC

ctaF ctaD ctaG

srfAC srfAD

yqaP

ppsB

degU

ykzB

sacY

ctaE

cydB

rbsB gntP

cwlS

yttP

abh

rapK

comA

acuC

iolD

dctP

odhB

bglP

cccA qcrA

cwlD

ylaE

ppsE

ywrD

ssbA

comEC

gltC

ccpA

iolR

iolG

iolI kdgR

yngC

resD

ppsC

degQ

comK glnA

iolJ acuA yvcI ywdA

iolC iolF

yxkF

ywlF alsT

ycsG yqzE glnH

araL

iolH

kduD cotD

kduI

sdpI

yobO qcrC

iolS

citM araB

sacA acuB

nagA

iolE cydA

phrG

ycdA

ydeH yerI

pel

yttA

glnQ ywdJ

noc dprA sbcD yisB

sacP araA araQ pta

araE

rapF

yncM

yydF

yxeD antE

yolA

sacT

ypiF

araD lcfB

etfA

cotC

yhaR kdgT

nasD

ftsZ

ftsA

phrF

ywoF

nin comEB comGF

comN

yflN

abfA araN

fadN

cydD

sdpR

vpr

lytE phrI

spoVS ywhH dacC

1600

araR

sucC

araP fadB lcfA

fadE

yjdB stoA

citB

dhbC

ycnK

guaC

bofC yhcM csbX

yycB

citZ

ctpB

nasF

dhbE yqxD besA dhbB dhbA

tuaF tuaG

ycxA yuzA

pucK pucL pucM

fadA

icd mdh

ccpC fabD

katX

pucJ

sbcC

citT

etfB

resA resE resC

yflA

ykoL plsX fabG

pucR addB

cstA

araM

resB pucI

yxxG

1400

0

galT

ylbP

maf comGD

1200

fadM uxaC

yqhB hmp

rpsR rpsF comFC comGE yhjB ybdK

600 800 1000 component at scale2

acdA

sigX

comGB addAglcR rsmG comFB comGC comC

abnA rpoE

yvgN

ssbB nucA

400

0.1

fadH

minD

pucG

comGA

200

fadF

fadG

minC

feuA

arfM

ahpF

hemB

spoIIIC spoIVCB

perR

yknY

ybbA

comGG comEA

−5

narG mrgA

katA

hemC

fnr

exuR

yoaW

yxzE

yknZ

feuB pucA pucFpucH pucE

zosA

ahpC

scoA

ypuD

rsiW ybfO

pspA

ymzB yvgO yxaB

radC

yqhQ

hemA

yjmC exuT yxjC mmgA mmgC

yjmD uxuA

phoP

ytxG yvyD ytxH

yxjJ

fapR ycnJ

mmgB

ybfM

mcsA

feuC

fabI

hemL

uxaA scoB

rsbV rsbW rsbX

ydjM tagD yjeA

fur

fabHA fabHB

pucD pucC pucB

uxaB

pbpX

tagB ykvT cwlO iseA

acpA

ywjB

narI

hemD

guaD fhuD

clpC

yfmC yclP

yjbC

yfhA

fhuC

yfhC

fhuG

1600

sigM

phoD

fhuB yfmD

yclO yxeB

ykuN

yfiY ykuP yhfQ ycgT yclQ yclN ykuO

yrhH

tuaC

tuaB

tatCD

yfmF yfiZ

yfmE

csoR

1400

0

spoIIID

divIC

ywbO

csbB pstS

t

yotD

copA

1200

bltD

ylxX

tatAD

pstA pstC

pstBB

600 800 1000 component at scale1

5

ymfH ymfF ydfK ymfD

ylxW

murBsbp metA

pstBA copZ

400

yndA

mta

spoIID spoIIIAG

divIB

200

yqfC

coxA spoIIIAD

spoIIIAH bofA spoVE spoIIIAE

yesK

ycgR ycgQ

murF

bcrC

tuaE tuaD

yokU

mreD

ytpB

ytpA

fabF

spsK spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ

sigE ypbG

ydaH

phoB ydhF

yfkM

ykvI

yhaX spoIIIAB yhaL coaX

yhdK

ywtF

ywaC

yqjL yceC

disA

ctsR

yodT

ysnD

kamA cotO yhdL

ypuA

ftsH

sigW yceE

ispDyceH

sigI ykuT

radA

ysxC

plsC

spsJcwlJ ytvI

yjcA

bmrR hprT

secDF bmr

yceD yceG yceF yacL

sigB

rnr

ywmF trxA

spoVK yyaD yobW

yodP

yjfA

ddl

tilS

ysdB

gsiB

spoVB

yngI

yfhL

rsbRD

yflH

ysnF

ylxP

yoaA

clpP

clpXlonA

purF

pbuO

glgA

ysxE prkA

yodR

yfkT

yfhD

cypC

yfkS

cotE

dacB

yngE

mbl glgD

yabQ

yxbG yxnA

csbA

ypuB ydeC ctc yhdF yocB

ydaT

ypqA ispG

glgP

spmB

ykgA

yxiS mgsR clpE

gerM

spoVID

yabR

spoIIM safA spoIVFA

ispF

nadE

ydaS

yhxD

ygxB

purE

nusB

purR

spoVR

yqxA

yngF glgB

yjaV asnO

5 0 −5

yngJ

yteV nucB

yodS

yocL spoVMspoIVA

yeaA

ytkL

csbC

yxzF ywiE

yhdN

yraA

aag folD

xpt

yabJ

purN

purH

yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC

yuzC

yqeZ

yqfB

yoxB

nhaX purB

purC purL

purS pbuX glyA purM

yitD

ytxC

ydbT

yteJ

yozO pbpE

racX

opuE yaaI

yfkH

aldY

katE

ykgB

ywlB

ydaD

ohrB

purQ

purA pbuG

purDpurK

yoaG yqfA sppA xpaC yuaI

yvrE

ykzI

yjgC ywsB

ydcA

mreBH rsgI ybfQ

ydbS

yycD

ydaG

yflT

yckC

yvlB ythQ yvlD

yvaK

ywjC

yfkD

ywzA

yfkI

sodA

yoxC ydhK

yfhE

ydaP

yfhK

era

yhcO

yvlA

yfhF yjgB

ycbP

yobJ yjoB

yvlC ythP

ydaE

yugU

dps

yjgD

gspA

yjzE

yfkJ yxkO gtaB

ytaB

yitT yocK ybyB ywmE

yuaF yaaNydjO ywrE yoaF fosB

gabD

ycdF csbD yerD ydaF

ycdG

ydbD

yjcP hemAT mcpA

motB

cheV degR

pgdS

flhO tlpB mcpBlytD

tlpC fliT

argF carB argC

argG

argJ

carA

argB argH

(d)

Figure 4: Multi-scale decomposition of gene expression network of Bacillus subtilis (from [7]). (a) Original data. b) Multi-scale decomposition profiles (1-D display). (c-d) Scale components for λ = 2 and λ = 16, respectively. The decomposition has a structuring effect in terms of gene grouping. Despite the use of false colors, it is difficult to distinguish modules, unlike the case of an ordinary image as in Fig. 3.

Scale-space module detection for random fields

21

182

189

177

187

180

181

184

186

209

192

194

213

174

185

188

175

208

215 198

206

190

178

183

196 205

214

193

202

203

179

204 212

197

201 200

149

211

148

191

199(rok)

151

166

161

163

139

136

131

169

173

133

171

144

153 165

160

157

138

142

170

155 128

36(codY)

135

158

168

162

156

146

143

164

37(comK)

147

130

145

150 137

132 172 154

195

210

207

176(degU)

134

140 141

129 159

152

108

167

110 111

44

94

24

18

115

76

47

34 77

105 14

32 25

2

26

112

7 59

72

69

71

40

126

84

12

51

75

33

4

46

88

60

66

109

124

79

64

52

9

rin

85

19

58

89

38

87

49

98

13

74

92

68

56

116

17

31

122

1

78 95

106 48

54

91

118

93

22

42 103

28

67

83

100 125

70

39 81 8

29

90

27

113

t

99

114

123 61

16

102

96

53

121

43

3

41

119

73

57

30

11

97

20

50

80

63

21

45

104

10

23

62

5

101

65

107

120

82

117

86

15

6

55

127

35

ep

Figure 5: Graph partioning of G : this graph shows four regulons and four regulators.

182

189

182

189

177

187

180

183

149

149

163

150

165

208

132

166

161

206

190

205

137

198

150

165

205

137

132

202

172

204

199

146

144

153

200

157

138

153

156

157

138

195

210

197

200

162

135

156

164

164

134

214

142

211

170

155

207

167

152

115

61

106

40

25

2

102

96

25

48

31

71

72

74

22

59

70

12

56

1

78

60

19

85

43

3

46

4

88

49

95

51

75

33

87

66

109

73

57

11

97

63

21

45

23

62

5

107

120

117

86

15

19

73

57

66

109

30

63

21

45

23

51 75

33

46

4

88 60

19

85

66

109

73

57

11

97 63

21

45

23

62 5

107

120 117 15

104

10

101

65

86

9

20

50

80

82

124 79

64

52

119 41

6 55

127 35

12 87

30

62 5

117 15

58 89

1 49

3

126

84

38 56

95

92

122 42 103

28

98

74

36

100

13

17 71

22

43

107

120 86

69 116

31

72

70

9

106 48 54

91

118

68

83

112

104

10

101

65 82

26

20

50

80

81 8

29

67 99

114

124 79

64

52

11

97

39

123 61

90

27

113

96

125

4

76

47 53

121 40 2

78

119 41

6

55

127

35

60 85

43 3

46

115

93 32

58

88

49

95

111 24

34 77

25

59

51 75

33

87

191

110

44

94 18

105

102

126

84

108

16 37

14

89

12

56

207

167

152

159

7

38

1

78

104

10

101

65

82

9

92

36

211

170

141

98

13

74

122 42 103

28

214

142

17 71

20

50

80

30

79

64

52

119

41

70

124

116

195

210

197

140

128 129

54

31

100 125

134

168 155

69

22

58

156

164

48

91

118

68

83

72 59

89

38

26

162

135

106

112

7

126

84

122

42

103

28

92

36

100

125

99

114

98

13

17

112

7

116

157

138 143

8

67

54

91

118

68

83

201

193 200

81

29

90

27

113

96

69

pr 26

2

102

67

99

114

61

40

8

29

90

27

113

153 158

39

123

53

121

32

203 204

146

144

76

47

93

14

202

199

130

145

110

115

34

77

105

81

32

208 206

190

205 212

133

111

24

18

39

123

53

121

44

94

76

47

93

108

16

37

111

24

34

77

191

167

152

110

44

94

18

105

207

165 137

147 154

211

170

141

159

108

16

37

14

214

142

129

150

132 172

140

128

141

159

134

168

155

191

140

128

129

194

215 198

196

169

195

210

197

143

209

192

160 136

131

201

193

158

130

145

162

135

143

204

199

163

139

173

203

146

144

201

193

158

130

212

133

154

213

171 166

161

206

190

196

169

147

212

133

168

149 148

151

208

215

160

136

131

203

147

172

145

163

139

173

202

194

213

171

198

196

169

209

192

194

215

160

136

131

154

179

148

151

209

192

213

171

166

161

175

188

178

179

148

151

181 174

176

178

179

139

177

184

185

183

188

176

178

173

180

175

188

176

186

174

185

175

183

187

181

184

186

174

185

182

189

177

187

181

184

186

180

6 55

127 35

Figure 6: Module detection at three time points. This treatment was done without taking into account time dependence. Module detection yields respectively 3, 3 and 2 modules depending on x(t), t = 1, 2, 3.

Scale-space module detection for random fields

22

12

t

11 15 14

10 13

58

53 50 9

60 56

52

7

48

54 55

51

44 47

46

43

rin

49

57

52

8

36

45

27

6

59

33

42

41

37

34

55

39

57

54

60

58

59

3

56

45

51

53

32

47

5

35

40

6

14

4

13

15

9

10

12

26

25

11

21

19

20

5

24

4

3

1 2

23

22

30

31

38

29

8

50

49

28

48

46

18

7

44

2

1

17

39

23

43

16

41

38

40 22

34

35

24

29

32

33

31

17

27 16

36

18 25 19

28

30

37

20

ep

42

21

26

(a)

(b)

pr

Figure 7: Same configuration {x, G} as in Fig. 8(c) but with two different displays : (a) "Edge weighted spring embedded" layout, (b) Hierarchical layout.

Scale-space module detection for random fields

23

26

21

M3

12

20

25 11 14

38

18

29

15

19 17

13

M4

10

33

22

39 16 28

31

40

9

41

35

32

M1

24

34

36 30

M2

27

23 2 7

37 1 42 8

3

4 51 5

58 6

56

47

t

55

46

57

53

44

M5

43

54

48

50

49

rin

45

52

59

(a)

60

(b)

26

26

21

21

12

12

20

20

25

11

25

11 14

14

38

38

18

29

15

19

18

29

15

17

22

16

28

16

31

40

9

40

9

41

31 41

35

32

24

27

35

32

24

27

ep 34

36

10

33

39

22

28

17

13

19

13

10

33 39

34

36

23

30

23

30

2 7

2

37

7

1

37

42

1

8

42

8

3

3

4 51

4

51

5

5

58

58

6

56

47

6

56

47

55

55

46

57

46

57

53 44

53

44

43

54

54

43

48 50

48

50

49

49

45

45

52

59

52 59

60

60

pr

(c)

(d)

Figure 8: Experiment on simulated data. (a) The regulator graph GB (between regulons). (b) The entire regulon graph G. (c) The observed random field x (displayed with "Force directed" layout). (d) Module detection outcome: there is one detected module per regulon. Given the knowledge of the regulons, we can associate a regulon to each detected module.

Scale-space module detection for random fields

24

lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 1.5 w(30)= 1.45

1

t

w(10)= 4.39 w(8)= 1.38

rin

0.5

0

−0.5

−1

ep

−1.5

−2

0

10

20

30

40

50

60

pr

Figure 9: Module detection based on the multi-scale decomposition in Fig.10 restricted to Λ. The pink crosses show the 1-D profile of x. The color curves display the three main components for scales {8, 10, 30}. The red circles are the locations of the detected module centers. The color segments at the bottom of the figure locate regulons, their colors are identical to those in Fig.8(a-b-d).

Scale-space module detection for random fields

25

lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 1.5

w(30)= 1.45 w(28)= 0.39 w(26)= 0.01 w(24)= 1.31 w(22)= 0.01 w(20)= 0.01 w(18)= 0.27 w(16)= 0.01 w(14)= 0.37 w(12)= 0.72 w(10)= 4.39 w(8)= 1.38 w(6)= 0.63 w(4)= 0.01 w(2)= 0.01

t

1

rin

0.5

0

−0.5

ep

−1

−1.5

−2

0

10

20

30

40

50

60

pr

Figure 10: Statistical multi-scale decomposition for Λ0 = {2, 4, ..., 28, 30}. Scale selection with = 0.1 selects Λ = {8, 10, 24, 30}. The black curve is the sum of all Λ0 -components.

Scale-space module detection for random fields

26

lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

t

1

rin

0.5

0

−0.5

ep

−1

−1.5

−2

0

10

20

30

40

50

pr

Figure 11: Ordinary scale-space decomposition.

60

Scale-space module detection for random fields

27

lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30

t

1.5 w(30)= 507 w(28)= 515 w(26)= 523 w(24)= 534 w(22)= 545 w(20)= 559 w(18)= 576 w(16)= 596 w(14)= 621 w(12)= 653 w(10)= 695 w(8)= 750 w(6)= 825 w(4)= 930 w(2)= 1080

rin

1

0.5

0

−0.5

−1

ep

−1.5

−2

0

10

20

30

40

50

60

pr

¯ σ,Λ |. All scales are nearly Figure 12: Adverse effect of an estimation made without log |K equal contributions and without informative value.