Scale-space module detection for random fields observed on a graph non embedded in a metric space Bernard Chalmond ∗†
rin
Abstract
t
October 2014
ep
In the spirit of Lindeberg’s approach for image analysis on regular lattice, we adapt from a statistical viewpoint, the blob detection procedure for graphs non embedded in a metric space. We treat data observed on such a graph in the goal of detecting salient modules. This task consists in seeking subgraphs whose activity is strong or weak compared to those of their neighbors. This is performed by analyzing nodes activity at multi-scale levels. To do that, data are seen as the occurrence of a univariate random field, for which we propose a multi-scale graphical modeling. In the framework of diffusion processes, the covariance matrix of the random field is decomposed into a weighted sum of graph Laplacians at different scales. Under the assumption of Gaussian law, the maximum likelihood estimation of the weights is performed that provides a set of relevant scales. As a result, we obtain a multi-scale decomposition of the random field on which the module detection is based. This method is experimentally analyzed on simulated data and biological networks.
pr
Keywords. Blob Detection, Module Detection, Network Activity, Graphical Modeling, Scale-space Random Field, Graph Laplacian, Diffusion Kernel, Multiscale Decomposition, Scale Selection
∗ CMLA,
UMR CNRS 8536, ENS Cachan, France University, France
† Cergy-Pontoise
1
Scale-space module detection for random fields
2
1 Introduction This paper is related to the following general issue : given an undirected graph G = (V, E) with only one component non embedded in a metric space, and an observation x of a univariate real random field X indexed by the nodes V of this graph, one seeks subgraphs {Mk } in V for which the respective observations {xMk } appear as salient profiles, in comparison to their surrounding. Such a subgraph with its profile (Mk , xMk ) is called module. In summary, we have the following schema : [{xi }i∈V , G = (V, E)] ; {(Mk , xMk )} ,
(1)
t
where xi ∈ R. This concept depends on the context and what we seek. We use the term module in a broad sense. However, we focus in the following on a particular module called blob or spot, depending of the context.
ep
rin
Concretely, the problem is as follows. Consider Fig. 3 that shows an image {xi , i ∈ V } where V denotes the nodes of a sampling grid L included in R2 . This image shows a multitude of dark spots of various sizes that a scale-space algorithm has detected. This detection uses the graph L = (V, E) where E are the nearest neighbor connections. Here, our visual perception clearly distinguishes the spots. While keeping the values {xi }, suppose now that we replace L by a graph G whose nodes are not in a metric space. Displaying x requires to represent the graph in the plane, which implies to choose a particular layout. This is illustrated in Fig. 7 and Fig. 8(c) that display an "image" on such a graph with three different layouts. This image contains five distinct spots that it is difficult to recognize, although these figures display the same image x. Similarly, in the case of Fig. 3, our perception of spots would be greatly disturbed, and the detection algorithm would not work because it requires a metric space. Dealing with this problem is the subject of our article. To our knowledge it has not been still treated.
pr
Although different, this problem suggests another problem that needs to be presented in order to avoid some confusions. Let a set of points {ξi , i ∈ V } in space Rp with p > 1, allowing to define a similarity matrix W between these points, e.g. from their correlations or distances. This matrix is then used to infer a connectivity structure E, typically by connecting highly correlated or spatially close nodes. The detection of modules is then performed on the graph G = (V, E), for example by using the concept of betweenness [19]. Here, the objective is the determination of {Mk }. In summary, we have the following two-step schema : (i)
{ξi }i∈V ; W ; E ,
(ii)
G = (V, E) ; {Mk } .
(2)
This second schema is further discussed in Section 1.1. In a nutshell, we can say that (2) seeks sub-networks in G whereas (1) seeks active sub-networks with respect to x. The tackled issue is resulting from molecular biology for which a vast literature exists. With respect to our concerns, a few references are [17, 9, 35, 27, 11, 22]. This
Scale-space module detection for random fields
3
issue arises also in other fields, like social networks where modules refer to communities [1, 26]. One crucial step when studying the structure and dynamics of these networks is to identify modules/communities. However, these studies are mainly devoted to the schema (2) for biological networks and (2-ii) for social networks, whereas we are interested by the schema (1) due to the nature of our data, which come from a univariate random field.
rin
t
The connectivity of G is summarized in the graph Laplacian matrix L, which plays a central role in our context : ⎧ i∼j ⎨ −1 if Li,j = di if i=j (3) ⎩ 0 otherwise , where i ∼ j denotes the edge (i, j) ∈ E, and di = j 1j∼i is the degree of node i, i.e. the number of edges connected to i. L is a symmetric positive semi-definite matrix, which can be written as L=D−A
where D = diag{di } and A is the binary adjacency matrix ai,j = 1 if i ∼ j. Note that the graph Laplacian appears when one considers the local variation energy, called bending energy : U (x) = (xi − xj )2 = x Lx, (4) i∼j
ep
where the sum is over the edges (i, j) ∈ E. In (3) each edge i ∼ j carries the value ai,j = 1. This definition can be extended to the weighted case, where the nonnegative weights are not necessarily all equal to 1. In both cases, we have di = j ai,j . We continue this introduction by positioning our contribution with respect to previous works. Then, the model and the methodology are presented in Section 2. The multi-scale decomposition and the module detection are tested on simulated random fields and on real data in Section 3, where diagnostic tools are introduced. We invite the reader to have a look at Fig. 8 that illustrates and summarizes the method.
pr
1.1 Graph partitioning
Although module detection is not a partitioning task, some aspects of the related problem of spectral partitioning could lead to confusions. There is a large literature on spectral clustering for graph partitioning [36, 30, 28, 10, 3] among many others. In spectral clustering [30, 36], given a graph as in (2-ii) we compute eigenvectors u1 , ..., um associated to the m smallest eigenvalues μ1 , ..., μm of L, and assign to every node i the vector m {uk (i)}m k=1 in R . Then, graph partitioning is the outcome of a vector space clustering algorithm such as k-means applied to the resulting vectors. In background of this procedure, there is an important property. If the graph is composed with c connected components, then the first c eigenvalues of L are zero, and the corresponding eigenvectors are the indicator vectors of the connected components. Fig.5 illustrates such a graph with 4 components.
Scale-space module detection for random fields
4
rin
t
This approach also works when one replaces the adjacent/ nonadjacent coefficients ai,j by a similarity or closeness measure : ai,j = w(i, j). The multiplicity of the eigenvalue 0 is the number of connected components of the underlying graph where nodes i, j are adjacent when w(i, j) > 0. Two examples illustrate the closeness measure for the schema (2). When the graph Laplacian represents a 3D discrete surface (mesh), every node i ∈ V is associated with a 3D coordinate point ξi in R3 , also denoted vi [31]. The weight of an edge i ∼ j is defined by the Gaussian function w(i, j) = exp −(vi − vj 2 /σv2 ). Hence, the geometric structure of the mesh is encoded in the weights. The second example concerns graph based image segmentation [28]. The image is {xi , i ∈ V } where V are the nodes of a 2D regular grid embedded in R2 . Every node i ∈ V is associated with a 3D vector ξi = (vi , xi ) where vi is a 2D coordinate point. The Gaussian weight function is rewritten w(i, j) = exp −(vi − vj 2 /σv2 + |xi − xj |2 /σx2 ). Other weighting functions were proposed in the literature. Two pixels are connected if they are within distance δ : w(i, j) = 0 if vi − vj < δ. But how to chose the graph connection radius δ ? In [10], from heuristic considerations, the graph weights are segmented into different scales : W = W1 + W2 + ... + Wr ,
(5)
ep
where Ws corresponds to a specific spatial separation range : ws (i, j) = 0 if δs−1 < vi − vj < δs . In our case, the graph G is not embedded in an Euclidean space as the mesh in the examples above. Although non uniform weights ai,j can be chosen, these weights are not necessarily associated to a distance. To perform blob extraction, we use the diffusion property based on the graph Laplacian that does not require to have an explicit closeness measure. Since diffusion is a multi-scale process, we take advantage of this property to define a decomposition of the affinity matrix. This decomposition is related to generalized additive models that bring a theoretical base [37, 14].
1.2 Blob detection On a regular mesh
pr
In the image analysis domain, when G is simply a regular grid embedded in R2 , the problem of salient area detection has received much attention and in particular for blob detection [24], as illustrated in Fig.3. In this figure, the detected blobs are localized by squares whose size (scale) is adapted to the width of the blobs. A blob is regarded as a spot and a simple model is given by the Gaussian profile [29]. In this case, we have the following result on which the scale-space blob detection is based. Consider the simple image x = {xi , i ∈ V } representing a Gaussian spot characterized by a width parameter λ0 and centered at a point v0 on the grid : xi ∝ (λ) exp −vi − v0 2 /2λ0 , for every i ∈ V . Consider a smooth version x of the image
Scale-space module detection for random fields
5
obtained by convolution with Gaussian kernel Gλ : (λ) xi = xi Gλ (i, i ) = Gλ (i, .) x , i
(6)
Gλ (i, i ) ∝ exp −(vi − vi 2 /2λ) . [24] gives the following property that the spot center satisfies : d (λ) λ[Δ x ]i0 = 0, dλ λ=λ0
(7)
rin
t
where Δ denotes the discretized Laplacian operator on the grid. In the image processing literature, ΔGλ called Laplacian of Gaussian, is used for multi-resolution representations [33]. Laplacians of Gaussian have mathematical properties, which have been widely studied in the scale-space community. (7) tell us that the derivative of λΔGλ is able to select the width λ0 of the Gaussian spot. Essentially, λΔ quantifies in some sense a curvature of the smoothed spot, and this curvature is optimal when λ = λ0 . This property is used to detect blobs in the images : the detected blob centers are the local extrema of the discretized scale-space volume (λ)
{λ[Δ x
]i , i ∈ V, λ ∈ Λ} ,
(8)
ep
where Λ is a finite set of scales corresponding to an increasingly coarse sub-sampling of the regular grid. For every detected blob, the optimization of (8) returns a scale, which characterizes the width of the blob. From mesh to graph non embedded in a metric space
pr
In this paper, module refers to the extension of the blob concept to graphs. In a first step, this extension is straightforward since (6) is the solution of the heat equation on Z2 whose extension to graphs is well known [20]. However, extending the blob detection to non geometric graphs requires some modifications with respect to scale and space. While for a mesh, it is natural to choose Λ from a sub-sampling of the grid ([12], Chap. 10), for non geometric graph this choice is much less trivial since the relevant scales are irregularly spread in R+ , and the scale has no explicit dimension. In this goal, the multi-scale representation { x(λ) , λ ∈ Λ} must be (λ) revisited in order to get a sparse representation denoted {x , λ ∈ Λ} yielding a nonredundant decomposition of x in term of reconstruction : λ x(λ) = x, a property that (λ) does not satisfy. This property of non-redundancy is necessary for the identification x of the right scales.
1.3 Module, semantic module and related works In image analysis, the module detection is used first for extracting areas of interest without using any strong prior information. These areas are then interpreted with greater precision or extended using high-level information in order to obtain semantic modules, as for
Scale-space module detection for random fields
6
ep
rin
t
object recognition [8]. This remark holds also in systemic biology where semantic modules correspond to biological modules (see [17, 9] among many others). The definition of biological modules does not rely solely on areas and profiles, but also uses complex biological knowledge. In several papers, the detection of biological modules operates in two stages: firstly, detection of module seeds, or more simply modules , and secondly refinement of the detected modules to finally obtain meaningful biological modules [35]. We comment some main approaches for module detection as introduced in the bioinformatic literature. Given a scoring function that allows to compute the importance of every sub-network, finding the maximal-scoring connected subgraph is an NP-problem. In the seminal work [17], the main limitation is that node scores are treated independently since the sub-network score is calculated as a sum of the node scores. To overcome this limitation, [9] proposes an inverse problem approach in which the node scores are modeled by a hidden Markov random field model under a constraint of regularity that is expressed by a bending energy as (4). Two major well-known drawbacks are inherent to this approach [5] : the data-driven determination of the regularity scale (the trade-off parameter), and the energy minimization that requires stochastic optimization, a difficult computation task, already encountered in [17]. But conceptually, the main limitation of the Markovian model is that it is mono-scale, which is not suitable when the size of the modules is varying. Instead of using the bending energy at a single scale, we propose to use it with a multi-scale formulation in order to adapt the scale to the module sizes. Technically, the advantage of this approach is twofold. First, the set of relevant scales can be estimated efficiently from the data. Second, we avoid the huge computation burden of the stochastic optimization. The computation is limited to scan a multi-scale representation of type (6) by searching the differential local extrema as it is done for blob detection on a grid L.
2 Models and Method
2.1 Random Field and Diffusion Process
pr
This section summarizes a set of fundamental results on graph Laplacian and diffusion kernels. Consider a random field X = (X1 , ..., Xn ) observed on an undirected graph G = (V, E). V denotes the node set and E the edges connecting them. The dependency structure between the random variables {Xi } depends on the topological structure given by E. This dependency structure is here limited to a covariance structure modeled by a diffusion kernel [25], a choice explored in many domains and especially in pattern recognition, biological networks analysis and image processing [2, 32, 39]. We seek to represent X by a random field model on G, denoted Y(λ), whose covariance stucture depends on a scale parameter λ > 0. Essentially, this model is obtained by equalizing the variations due to a change of scale, with the spatial variations as follows : Yi (λ + dλ) − Yi (λ) = ˙ dλ (Yj (λ) − Yi (λ)) , (9) j∈V : j∼i
Scale-space module detection for random fields
7
and in vector form : Y(λ + dλ) − Y(λ) = −dλ L Y(λ) , L=D−A,
(10)
where the graph Laplacian L is defined in (3). The equation (10) is the discretized version on G of the classical heat differential equation : 1 d dλ Y(λ) = −L Y(λ) , (11) Y(0) =X. whose solution is = Kλ X .
(12)
Kλ
= e−λL ,
(13)
t
Y(λ)
rin
∞ i Kλ is a matrix exponential whose definition is eM = i=0 Mi! . For every node, one has : Yi (λ) = Kλ (i, j)Xj = Kλ (i, .)X , (14) j: j∼i
ep
which is the generalization of (6). The exponential of a symmetric matrix providing a semi-definite positive matrix, the matrix Kλ , which is called diffusion kernel, can be used as a covariance matrix for modeling the covariance between the random variables {Xi }. The more λ is large, the more the off-diagonal effects in Kλ increase. λ is interpreted as a scale parameter and Yi (λ) as a scale-space random field on V × R+ . By nature, the diffusion kernel has a multi-scale property that is well identified, and especially for dimensionality reduction applications [23]. However, the choice of its scale parameter λ remains a difficulty [11]. For small λ, Kλ (i, i) reflects local properties of G around the node i, while for large λ it captures some global structures. For instance, in the geometry processing field, the diagonal term Kλ (i, i) has been used as a shape descriptor [31] by considering that for every λ, the local spatial extrema of this function provide a feature-based scale-space representation of shapes, useful for shape matching.
pr
2.2 Graphical Modeling
The outstanding issue at the end of the previous modeling step is the choice of λ. In other words what is the scale λ the most representative of the observed profile x. In fact, several scales may explain this profile. Therefore, a natural approach consists of decomposing X into r independent random fields according to a discrete set of relevant scales Λ = {λ1 < ... < λr } : X=
r
X(j) + X(0) ,
j=1 1 In
the classic case of diffusion in R2 , λ is a time parameter.
(15)
Scale-space module detection for random fields
8
where X(j) denotes the random field at scale λj and X(0) a residual [16, 34]. Following the idea of Fourier decomposition, for every profile x, the {x(j) }rj=1 can be seen as frequency components of x, from high to low frequencies. The decomposition (15) is related to the additive spline models whose theoretical foundation can be traced back to [37] Chap.10, (see also [14]) and later reintroduced under the name of multiple kernel in the machinelearning community [21]. Note that X(j) does not match Y(λj ) in (12), since the sum j Y(λj ) over a given set of scales does not reconstruct X. In our approach, the covariance matrix Cov(X(j) ) of every component is modeled from the diffusion kernel (13). So we use r kernels {Kλ1 , ..., Kλr } denoted {K1 , ..., Kr }, such that Cov(X(j) ) = Kj = σj2 κj where κj is given by (13) at scale λj . Due to the independence of the components, the covariance matrix Cov(X) is the following multiscale diffusion kernel : r j=0
r
σj2 κj
j=0
σj2 e−λj L
(16)
σ02 In
rin
r
Kj =
t
¯ σ,Λ = K =
+
.
j=1
ep
Each kernel κj is weighted by a positive parameter σj2 that is all the more great than the scale λj significantly contributes to the random field X. The covariance matrix K0 is that of a white noise. As we said above, the more λj is large, the more the off-diagonal effects in Kj increase. In other words, when λj increases, the components {X(j) } are increasingly smooth. The passage from X(j) to X(j+1) implies that some details in X(j) are attenuated. If we assume that the dependency structure of the random variables {Xi } is uniquely described by its kernel, then it is legitimate to consider that X is distributed according to the Gaussian law ¯ σ,Λ ) . X ∼ N (0, K (17)
pr
The scales {λj }rj=0 and their associated weights σ ={σ ˙ j2 }rj=0 are unknown parameters that are estimated using the maximum likelihood principle 2 . Although the theoretical mean of X is zero, the empirical mean of each observed subprofile xMk is not necessarily zero, as for instance in Fig. 9. This is due to high scales that create long range correlations, or in other words low frequencies. Understanding the diffusion kernel is not a trivial task, this requires to call the graph spectral theory [25]. Note also that the choice of the diffusion kernel as covariance matrix arises as a necessity because we have only one observation of X. If we could have many observations then the covariance matrix could be estimated. In comparison with the heuristic decomposition (5), which uses a hard multi-scale separation of the weights, the multi-scale representation (16) appears as a soft decomposition based on the overall structure of the graph via L, and moreover allows statistical estimation of each component contribution. 2 For
notational convenience, we introduce λ0 = 0 that is associated to K0 .
Scale-space module detection for random fields
9
• Weight estimation. For a given Λ, let (σ|Λ) = log(pσ,Λ (x)) be the log-likelihood of σ, where pσ,Λ denotes the probability density of x. Given an observation x and the ¯ σ,Λ ), the log-likelihood is Gaussian N (0, K ¯ σ,Λ | − x K ¯ −1 x + Cte , (σ|Λ) = − log |K σ,Λ
(18)
where Cte denotes a constant term. The maximum likelihood estimate is computed under the constraint of positivity of the parameters σ : σ ˆ (λ) = argmax (σ|Λ) under the constraint σ > 0 .
(19)
σ
rin
t
For moderate sizes of n, the non-linear programming algorithms using gradient descent techniques are operational. For larger dimensions, the computation of the determinant ¯ σ,Λ | and the inverse K ¯ −1 becomes more difficult [18]. To reduce the amount of compu|K σ,Λ ¯ σ,Λ | tation, one might also wonder whether it would be possible to remove the term log |K ¯ −1 in the likelihood, in order to work only with the generalized least-squares x Kσ,Λ x. Theoretically, we know that this estimate is not statistically consistent [13]. Our experiments have confirmed this defect, by showing severe aberrations in the multi-scale decompositions (cf. Section 3.1). • Scale estimation. A procedure for selecting the set Λ is now required. Given a uniform discretization Λ0 of the scale domain in R, the scale selection procedure estimates a subset Λ of scales irregularly distributed in Λ0 , which explains the profile of x according to a given criteria :
ep
Λ0 ={λ0j = jδ, j = 1, ..., r0 } ; Λ={λ0j1 , ..., λ0jr } ,
where δ is the discretization stepsize. First, the estimation σ ˆ (Λ0 ) is computed according to (19). To determine r, we perform a diagonalization of the covariance matrix 0
= K
r
0
σ ˆj2 (Λ0 )e−λj L ,
(20)
j=1
pr
of which we retain only the r largest eigenvalues ν1 ≥ ... ≥ νr according to the criteria r i=1 νi =1−, (21) r 0 i=1 νi
where is a positive parameter chosen close to 0, typically = 0.01 or 0.025. This criterion is related to that used in Principal Component Analysis [15]. It means the dispersion of X can be approximatively represented by r linearly independent components with an information loss determined by . Finally, from the estimated r, we can then achieve the selection of relevant scales Λ. These scales are associated with the r largest σ ˆj2 (Λ0 ), i.e. the scales whose components are the most involved in the dispersion of X. These scales are denoted {λ0j1 , ..., λ0jr } or more simply {λ1 , ..., λr }. As a consequence of (21), the
Scale-space module detection for random fields
10
estimates σ ˆj2 associated to the scales in Λ0 \Λ are much lower than those in Λ, and even close to 0. This selection achieves a pruning of non-significant scales. • Statistical Multi-scale Decomposition. This task concerns the estimation of the r components X(j) of the multi-scale decomposition of X. Component estimation is = U Dν the spectral equation of closely linked to the scale selection problem. Denote KU the previous diagonalization, where Dν is the diagonal matrix of the r largest eigenvalues Because of (21), assume that the eigenvalues λ0 \ λ are approximately equal to Λ of K. zero. This is especially true when is very small. Beyond r0 , the eigenvalues νr0 +1 ≥ ... ≥ νn are smaller and we can consider they are all close to 0. In this case, one can write = U B where B are the coordinates of X on the eigenvectors U . To estimate the scale X components, we take into account the importance of each eigenvalue, and this, using a Bayesian estimation with a prior distribution related on these eigenvalues.
rin
t
Proposition 1 Given an observation x and the prior distribution B ∼ N(0, Dν ) the Bayesian estimation provides the scale components : (j) ˆ , ∀j = 1, ..., r
x = Kj U Dν−1 b
= (σ 2 D−1 + Ir )−1 U x . b 0 ν
(22) (23)
ep
The proof is given in Appendix 5.1. In this proof, if we replace the spectral equation 0 = U 0 D0 , then Proposition 1 is still relative to Λ by the equation relative to Λ0 : KU ν valid. This is useful when we do not assume that all eigenvalues Λ0 \ Λ are negligible. In this case, the Bayesian estimation is more justified because of the high difference between the values of Λ and Λ0 \ Λ.
2.3 Module Detection
pr
Given an undirected graph G and an observation x of the random field, we first compute the estimated scales {λj , j = 1, ..., r} and the associated decomposition (22) {x(j) , j = 1, ..., r} as presented in the previous section. Rather than considering directly the components, we consider their spatial variations with respect to the graph Laplacian L : (j) (Lx(j) )v = (xv(j) − xi ), ∀v ∈ V i: i∼v
= dv xv(j) −
(j)
xi
(24) .
i: i∼v
This specifies the regularity of each component. Lx(j) is all the more great positively (j) (resp. negatively) than the expression xv of the node v is strongly increasing (resp. decreasing) with respect to its neighbors. Therefore we look for nodes that are most differentially expressed with respect to L, and this by examining the expression of the (j) components at different scales. Since the amplitude of variations of L x decreases when
Scale-space module detection for random fields
11
the scale increases, a specific normalization is required. As in the case of blob detection (j) (8) on a lattice L, an efficient normalization is λj L x . A scale λj for which λj (Lx(j) )v is a local extremum with respect to scale and space, is seen as reflecting a module at position v and scale λj . This implies the following procedure. For any node v ∈ V , we denote by Nvk ⊂ V the relative nodes of v of order k, (k = 1, ..., κ). k = 1 means the nearest neighbors (NN), k = 2 means the nearest neighbors of v to which their NN are added, etc... The module detection consists in searching local optima of the components with respect to the neighborhoods as follows3 : ∀ v ∈ V : (j(v), k(v), v (v)) = argopt λj (Lx(j) )v ; j,k,v ∈Nvk
(25)
if v (v) = v, then v is a module center at scale λj(v) . k(v)
}.
t
When a module center is detected at v, its area Mv is defined by the subgraph {v, Nv ◦
rin
In the next section, we denote V the nodes corresponding to the detected module centers, and therefore the set of detected areas is written as : ◦
{Mv , v ∈ V } .
3 Experiments
(26)
pr
ep
Recall that a module is an active subgraph denoted xM where M is a subgraph of G. A regular lattices L do not show particular structure like stars or clusters, unlike the case of irregular graphs G. Let us give an example of graph showing a particular structure. The graph structure is organized around known subgraphs {Rk = (Vk , Ek )} called regulons (or hubs). A regulon is a set of nodes Vk ⊂ V connected to one or several common nodes, called regulators. A regulon can be connected to several regulators and a regulator can be connected to several regulons. Fig.5 shows a graph with four regulons and four regulators. In practice, such regulons can be used a posteriori for interpreting the detected modules or inferring semantic modules. Depending on the profile of x, the area M of a module can be simply a regulon, a subregulon or the union of several regulons. Fig.6 shows a short time-series {x(1), x(2), x(3)} of a random field X observed on the graph of Fig.5. The colors depict the output of the scale-space module detection performed on every x(t). Successively, 3, 3 and 2 modules were detected, while the graph is composed of 4 regulons.
3.1 Evaluation on Simulated Data 3.1.1 Simulation Procedure
For phenomena of high complexity, simulated data are an important preliminary support for modeling when we do not have data with sufficient knowledge of the "ground truth". 3 v (v)
also depends on k, what is omitted to simplify the writing.
Scale-space module detection for random fields
12
The simulation of the random field X requires to give the ground truth, consisting of a graph G = (V, E) and the parameters (Λ, σ). In our procedure, G is organized in regulons : G = m k=1 Rk . Here, we assume for simplicity that each regulon Rk is associated to only one regulator rk . The symbol + in indicates that the regulons are mutually connected. This high-level of connection is equivalent to a graph GB between the regulators : GB = ({rk }, Er ). The simulation is done in two steps. First, the simulation of a graph G consisting of m ¯ Λ,σ ) is regulons is done as described in Appendix 5.2. Second, a sample x of X ∼ N (0, K ¯ Λ,σ = drawn. To do that, we simply simulate α ∼ N (0, In ) since the diagonalization K ¯ Λ,σ ). V DV implies V D1/2 α ∼ N (0, K 3.1.2 Simulated Data
ep
rin
t
Fig.8(a) shows the inter-regulon graph GB and Fig.8(b) the graph G. Each regulon has its own color. Fig.8(c) displays an observation x of the random field X on G, and Fig.9 shows its 1-D profile. In this experiment, each regulon is a potential module since the simulation procedure is based on a regulon structure. x was simulated using the multi-scale kernel (16) with 3 scales : λ1 = 8, λ2 = 14, λ3 = 24 and σ1 = σ2 = σ3 = 1. Although the theoretical mean of X is zero, the mean of each observed regulon in Fig.9 is not zero. However, due to the correlation between regulators, two regulons may have similar mean levels. This situation is favorable to the concept of module. This is consistent with Fig. 6 wherein there are 3 detected modules for 4 regulons. Note that in the absence of observation x, the spectral partitioning as recalled in Introduction, detects 4 modules corresponding to the 4 regulons. 3.1.3 Data Analysis
pr
The estimation and detection tasks are illustrated in the figures 8, 9, 10 and 11. The maximum likelihood estimation (19) was performed using the scale domain Λ0 = {2k, k = 0, 1, ..., 15}. Fig.10 displays the statistical multi-scale decomposition. The continuous 15 black line connecting the data points is the sum k=1 ˆx(2k) of all the components except the noise component ˆ x(0) . Since this line interpolates the data points, this means that the estimated noise component is very low. The selected Λ is computed from this decomposition using (21) with = 0.1. The three main components associated to {λ1 = 8, λ2 = 10, λ3 = 30} are shown in Fig.9. It is interesting to compare this statistical decomposition with the ordinary scale-space decomposition (12) shown in Fig.11. The statistical decomposition has the ability to focus more clearly on the spectral content of x. λ3 = 30 reflects low frequencies whereas λ2 = 10 contributes to high frequencies. But above all, it remedies the redundancy of ordinary scale-space representation, and therefore favors the identification of the right scales. The detected modules shown in Fig.8(d) correspond to the rule (25). In Fig.9, the locations of the detected extrema are indicated by red circles. There is exactly one detected module per regulon. This procedure is statistically assessed by Monte Carlo simulation. The random field X is simulated 200 times under the same conditions as above. From the obtained samples
Scale-space module detection for random fields
13
0.4
0.7
0.35
0.6
0.3
0.5
0.25 0.4 0.2 0.3 0.15 0.2
0.1
0.1
0.05 0
1
2
3
4
5 (a)
6
7
8
0
1
2
3 (b)
4
5
t
Figure 1: (a) Histogram of the empirical probabilities {P (|Λ| = k)}8k=1 of the number ◦ of selected scales , (b) Histogram of the empirical probabilities {P (|V | = k)}5k=1 of the number of detected modules.
◦
rin
{x( ), = 1, ..., 200}, the probability of the number of selected scales {P (|Λ| = k)}8k=1 were estimated for = 0.01, as well as the probability of the number of detected module centers {P (|V | = k)}5k=1 , as shown in Fig. 1. The number of detected modules is ran◦
ep
dom, with a main mode at V = 5. In fact, as noted above, the mean levels of two regulons may be substantially close and therefore be recognized as belonging to the same module if they are connected. ¯ σ,Λ | in the likelihood. Above, we have mentioned the prominent role of the term log |K This is confirmed experimentally. Without this term, the estimation-detection procedure was repeated on the same data as previously. The results are shown in Fig.12. The multiscale decomposition is then quite inaccurate. All weights are very high and the scale components are close to zero.
3.2 Bacillus Subtilis Data
pr
Fig.4 illustrates the multi-scale decomposition of a field x that represents gene expressions of Bacillus Subtilis. The underlying graph G = (V, E) comes from the regulatory network of the bacterium. V denotes genes, E connections between genes and x gene expressions on V (Fig.4-a) 4 . The entire graph contains 1607 genes, 2345 edges and 132 regulons. Fig.5 displays four connected regulons extracted from this network. In steadystate, gene expressions are assumed to be governed by the model (17). In Fig.4 we see the structuring effects of the method in terms of gene grouping as this had already been shown for other regulatory networks [11]. In many applications, one is interested in studying the change of modules across different conditions [22]. In our example, the expression of the regulons depends on the nutritional environment of the bacteria over time, some of them are over-expressed and 4 The biological network has been simplified by removing the protein level network and therefore in G the regulatory protein-encoding genes and their proteins are confused. Furthermore, the edge directions in E have been deleted. Consequently, we cannot speak strictly of regulation in the sense of regulatory networks. .
Scale-space module detection for random fields
14
1.5
1.5
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−1.5
0
0
t
Figure 2: A toy example with three active regulons : two over-expressed and one under− 3 3 expressed. (a) μ
in blue. Confidence band : {μ+ Rk }k=1 in green, {μRk }k=1 in red. (b) A − + particular configuration x ( ) = {μ+ R1 , μR2 , μR3 }.
ep
rin
other ones are under-expressed. With the module detection, we search to identify regions of the graph that are particularly expressed through time. Fig.6 illustrates this detection on a short time series of the random field observed at 3 time points. The detection has been done at every time t, independently of the others. Every detected module is composed of one or several regulons. For instance, at time t = 1, there are 3 detected modules, which are respectively depicted in green, yellow and pink. The green module is composed of two regulons, what can favor the semantic interpretation of the module from properties of the regulons. The validation is primarily based on biological aspects. In the considered experience, one examines nutrient change effects : an experimental population of cells grows first in Glucose and then Malate is injected at time t0 such that 1 ≤ t0 ≤ 11. The detected modules should reflect this change. A further biological analysis is beyond the scope of our article, [4, 7]. However, a diagnostic tool is now proposed to help with this analysis. The idea is to generate configurations of X using a confidence band around the obtained decomposition
(j) , μ
= x j∈Λ
pr
in order to quantify the stability of the detected modules. We start with the bootstrap confidence interval described in [38] that we recall. As a result of the decomposition, the
(0) = x − μ estimated residuals are : x
, whose empirical variance is σ ˆ02 =
x(0) 2 /n. Derived from (15), a generative model based on μ
is written as :
(0)
X=μ
+ X(0) ,
where Xi are independent Gaussian random variables LG(0, σ ˆ02 ). Generate X(0) allows 2 to simulate X, which is now distributed according to N (
μ, σ ˆ0 In ). Pretending that μ
is the "true" μ, generate N bootstrap samples {x( )}N , and compute their respective smooth =1 profiles {
μ( )}N =1 . Using these samples, for every node i ∈ V , and for a given confidence
Scale-space module detection for random fields
15
+ γ close to one, a confidence interval of μi denoted [μ− i , μi ] is estimated, which provide a confidence band [μ− , μ+ ] as detailed in [38]. Our validation uses this confidence band to generate configurations. Denote {μ− Rk } + and {μRk } the two confidence profiles viewed from the regulons. For generating a new + configuration x , we draw randomly for each regulon Rk between μ− Rk and μRk , as illus trated in Fig. 2. Repeating this process N times, we obtain new samples {x ( )}N =1 , on which module detection is performed. Finally, among the N detected fields {{Mv ( ), v ∈ ◦
V ( )}, = 1 . . . N }, we compute the proportion of fields that fit with the field {Mv , v ∈ ◦
V } obtained on the original x, cf. (26). This proportion is associated to the confidence γ, providing a quantitative diagnostic tool.
4 Conclusion
rin
t
The experiments show that module detection puts into light the activated modules and therefore provides a mean to study dynamic random fields. However, module detection on time series has been performed without taking into account time dependence. At every time t, the observation of the random field has been treated independently of the others. Nevertheless, it is well known that Markovian dependence can improve the sensitivity of the detection of isolated low signal. In the related paper [6], we present a Markovian spatio-temporal modeling that generalizes the present model. Doing so, in Fig.1 the ◦ probability Pˆ (|V | = 5)} should be higher.
ep
This paper proposes and implements a multi-scale graphical modeling for univariate random vectors observed on an undirected graph. The result is a multi-scale decomposition of the random field which provides an analysis tool to deal with specific treatments because it allows to select relevant scales. This tool is especially used for module detection. With hindsight, this detector seems relatively simple. However, emphasis has been put on a coherent modeling without heuristics and with very few tunable parameters.
pr
5 Appendix
5.1 Proof of Proposition 1
0 = X − X(0) = r X(j) , and recall the spectral equation KU = U Dν where Let X j=1 0 = r Kj . First, since the columns of U are independent, we can write X = UB K j=1 where B is a r-random vector. Then, the spectral equation allows to rewrite 0
= U B = KU D−1 B X ν
=
r
0
Kj U Dν−1 B
j=1
where
(j)
X
=
Kj U Dν−1 B
=
r
X(j) ,
j=1
,
(27)
Scale-space module detection for random fields
16
= U B implies the covariance matrix which provides the components (22). Second, X = Dν . Cov(B) = U KU
(28)
For a given observation x, the Bayesian estimation of the occurrence of B consists in maximizing the log-likelihood log p(b | x) = log p(x | b) + log p(b) + Cte. Given the Gaussian laws B ∼ N (0, Dν ) and X(0) ∼ N (0, σ02 In ), this amounts to compute
= argmax − 1 x − U b2 − b D−1 b , b ν σ02 b
(29)
5.2 Graph simulation
t
in (23). Note that (29) is similar to the criterion of the which provides the expression b Ridge regression [15].
rin
The simulation of a graph G consisting of m regulons : G = m k=1 Rk , is done in three steps. 1. For each set of nodes Vk making up a regulon, a regulon graph Rk = (Vk , Ek ) is simulated. 2. At a larger scale, the m regulons are considered as m nodes of a graph, and thus an inter-regulon graph GB is simulated.
pr
ep
3. The global graph G is obtained on the basis of these m + 1 graphs, as follows. For each regulon Rk , a regulator rk is drawn uniformly at random in this regulon. This regulator regulates the regulon(s) Rk such that k ∼ k in GB , 5 . The weight of the connections between rk and nodes v in Rk are given by the probabilities of the Binomial law B(|Rk |, p) where 0 < p < 1. When a weight is below a threshold τ , for example 0.05, the weight is set to zero, then the probability distribution is renormalized. By ruling p and τ , one can modulate the number of edges between the regulator and the regulated unit. In this case, rk regulates a subset of nodes in Rk .
Acknowledgments
The referees are gratefully thanked. Their comments have improved the manuscript. The author is grateful to Alain Trouvé and Yong Yu for the experience we shared on the multiscale decomposition of images, which has been an inspiration. The author thanks warmly Benno Schwikowski for valuable discussions about the adaptation of Bacillus subtilis to nutritional environments, and Xiaoyi Chen for her help in carrying out experiments on Bacillus subtilis data. 5A
node v ∈ V is regulated by another nodes v if Xv is significantly correlated to Xv .
Scale-space module detection for random fields
17
References [1] Yong-Yeol Ahn, James P. Bagrow, and Sune Lehmann. Link communities reveal multiscale complexity in networks. Nature, 466(5):761 764, 2010. [2] Mikhail Belkin and Partha Niyogi. Semi-supervised learning on riemannian manifolds. Machine Learning, 56:209–239, 2004. [3] Andries E. Brouwer and Willem H. Haemers. Spectra of graphs. Springer, 2011. [4] Joerg Martin Buescher and al. Global network reorganization during dynamic adaptations of bacillus subtilis metabolism. Sciences, 335(6072):1099–1103, 2012. [5] Bernard Chalmond. Modeling and Inverse Problems in Image Analysis. SpringerVerlag, 2003.
t
[6] Bernard Chalmond. Spatio-temporal graphical modeling with innovations based on multi-scale diffusion kernel. Spatial Statistics, 7:40–61, 2014.
rin
[7] Bernard Chalmond and Xiaoyi Chen. A graphical modeling to scan network activity at modular level. Technical report, Institut Pasteur /Cergy-Pontoise University, 2012.
[8] Bernard Chalmond, Benjamin Francesconi, and Stephane Herbin. Using hidden scale for salient object detection. IEEE Trans. on Image Processing, 15(9):2644– 2656, 2006. [9] Li Chen, Jianhua Xuan, Rebecca B. Riggins, Yue Wang, and Robert Clarke. Identifying protein interaction subnetworks by a bagging markov random field-based method. Nucleic Acids Research, 41(2), 2012.
ep
[10] Timothé Cour, Florence Bénézite, and Jianbo Shi. Spectral segmentation with multiscale graph decomposition. In CVPR, 2005.
[11] Guro Dorum, Lars Snipen, Margrete Solheim, and Solve Saebo. Smoothing gene expression data with network information improves consistency of regulated genes. Statistical Applications in Genetics and Molecular Biology, 10(1), 2011.
[12] Marco A.R. Ferreira and Herbert K.H. Lee. Multiscale Modeling : A Bayesian Perspective. Springer, 2007.
pr
[13] Carlo Gaetan and Xavier Guyon. Spatial Statistics and Modeling. Springer-Verlag, 2009. [14] T.J. Hastie and R.J. Tibshirani. Hall/CRC, 1999.
Generalized Additive Models.
Chapman and
[15] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. The elements of statistical learning. Springer, 2009. [16] Lasse Holmstrom, Leena Pasanen, Reinhard Furrer, and Stephan R. Sain. Scale space multiresolution analysis of random signals. Computational Statistics and Data Analysis, 55:2840–2855, 2011. [17] Trey Ideker, Owen Ozier, Benno Schwikowski, and Andrew F. Siegel. Discovering regulatory and signalling circuits in molecular interaction networks. Bioinformatics, 18(1):S233–S240, 2002.
Scale-space module detection for random fields
18
[18] Harri Kiiveri and Frank de Hoog. Fitting very large sparse gaussian graphical models. Computational Statistics and Data Analysis, 56:2626–2636, 2012. [19] Eric D. Kolaczyk. Statistical Analysis of Network Data : Methods and Models. Springer, 2009. [20] Risi Imre Kondor and John Lafferty. Diffusion kernels on graphs and other discrete input spaces. In Morgan Kaufmann, editor, International Conference on Machine Learning, pages 315–322, 2002. [21] Gert R. G. Lanckriet, Tijl De Bie, Nello Cristianini, Michael I. Jordan, and William Stafford Noble. A statistical framework for genomic data fusion. Bioinformatics, 20(16):2626–2635, 2004. [22] Peter Langfelder, Rui Luo, Michael C. Oldham, and Steve Horvath. Is my network module preserved and reproducible? PLoS Computational Biology, 7(1), 2011.
t
[23] Ann B. Lee and Larry Wasserman. Spectral connectivity analysis. Journal of the American Statistical Association, 105, 2010.
rin
[24] Tony Lindeberg. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):77–116, 1998. [25] Bojan Mohar. Some applications of laplace eigenvalues of graphs. In G. Hahn and G. Sabidussi, editors, Graph Symmetry: Algebraic Methods and Applications,, volume Ser. C 497, pages 225, 275. Kluwer, 1997. [26] M. E. J. Newman. Finding community structure in networks using the eigenvectors of matrices. Physical Review, E(74), 2006.
ep
[27] Noa Novershtern, Aviv Regev, and Nir Friedman. Physical module networks: an integrative approach for reconstructing transcription regulation. Bioinformatics, 2011. [28] Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8):888–905, 2000.
[29] Ihor Smal, Marco Loog, Wiro Niessen, and Erik Meijering. Quantitative comparison of spot detection methods in fluorescence microscopy. IEEE Trans. on Medical Imaging, 29(2):282–301, 2010.
pr
[30] Daniel A. Spielman and Shang-Hua Teng. Spectral partitioning works: planar graphs and finite element meshes. In IEEE 1996, editor, 37th Symposium on Foundations of Computer Science, pages 96–105, 1996. [31] Jian Sun, Maks Ovsjanikov, and Leonidas Guibas. A concise and provably informative multi-scale signature based on heat diffusion. In Eurographics Symposium on Geometry Processing, volume 28. Blackwell Publishing, 2009. [32] Liang Sun, Shuiwang Ji, , and Jieping Ye. Adaptive diffusion kernel learning from biological networks for protein function prediction. BMC Bioinformatics 9:162, 9(162), 2008.
[33] Richard Szeliski. Computer Vision: Algorithms and Applications. Springer, 2010. [34] Kevin Thon, Havard Rue, Stein Olav Skrovseth, and Fred Godtliebsen. Bayesian multiscale analysis of images modeled as gaussian markov random fields. Computational Statistics and Data Analysis, 56:49–61, 2012.
19
rin
t
Scale-space module detection for random fields
Figure 3: Scale-space blob detection on an image observed on a regular grid.
[35] Igor Ulitsky and Ron Shamir. Identification of functional modules using network topology and high-throughput data. BMC Systems Biology, 1(8), 2007. [36] Ulrike von Luxburg. A tutorial on spectral clustering. Statistics and Computing, 17(4), 2007.
ep
[37] Grace Wahba. Spline models for observational data. SIAM, 1990.
[38] Yuedong Wang and Grace Wahba. Bootstrap confidence intervals for smoothing splines and their comparison to bayesian ‘confidence intervals. J. Statist. Comput. Simulation, 51:263–279, 1994.
pr
[39] Fan Zhang and Edwin R. Hancock. Graph spectral image smoothing using the heat kernel. Pattern Recognition, 41:3328 – 3342, 2008.
Scale-space module detection for random fields
20
gabT
gabR
data X cdd
ywtG
bmrU
yodQ
yheD
yusR usd
yjdH ycgG
spmA yngG
yhjR
yqfZ yknT
yheC ydhDyqfD
yitC ydcC ypjB
ycgF
ytrI
yunB
ydjP
rodA
yaaH
yfhM
mreC
recU
ytrH sodF
yfnD
ponA mreB ugtP
phoA
cotJB
cotJC yebC
ywjA
blt
yusV
bltR
narJ narH
phoR
hemX rapD
yxjI tagE
tagA
tagF
bdbC
yxjF
ctaA narK
psd
bdbD
pssA rsiX
mcsB spo0M
ytxJ
pbpI
yqhP
walR
yrrS ywnJ spoIIP
yknX yknW ybfP
yfkN
nsrR
fadR yesM
tuaH
dhbF
nasE
qcrB
sigH
nasA
cypX rapH
kipR kipI ywlG comFA
yvrN
ycsI nasC
gltA
ureC
ycsF
ureB
yycC ywdI
yvzA yvcB
ureA
sigF
dltE
pbpH
epsL
arsC
ytfJ yhfM
ylbB lonB
yddJ
galK
rbsA
ctaB
gerD
tlp
ycbCycbD
albC
yozM
acoR
ykuU
gerAC sspN spoIVB
yckD
yydJ
dnaG
yxaJ
spo0FkinA
ykuV
epr
yjcM
ppsD
rapA
flgL gabP
ruvA yqzHyhaO
yvqJ
skfG
yqcG
ylbA
cotM
ppsA rocG skfH
cgeA
ynzD
yclJ
ybdN
cgeD
yybL
ybaK
ylqB
tlpA
yfjR
sspL yraG
gerKA
ydfS
bcd
appF yfkQ
appC
salA
gdh
ykoV
yfkR
glcU
hag senS
ilvD
fabL
yufP
sspE
yufQ metP
fliK
rocD rocE
motB
cheV
yjcQ
argJ
degR
fliT
argH
(a)
cdd
ydaP
yfhK
era
yvrE
ykzI
ywsB
purDpurK
ctc
clpE
ydaT
yceD yceG yceF yacL
yfnD
ispDyceH
sigB
ydaH
ponA
yqjL yceC
sigI
mreB
ugtP
disA
purH
purR
phoB
clpXlonA
ydhF
purF
ctsR
bcrC
tuaE
fabF
plsC
divIC
phoA
tuaD
tatAD
ywbO
pstA
yfiZ
yfmE
csoR
ykuO
yxjI tagE
tagF
tagA
bdbC
pssA rsiX
mcsB
clpC
spo0M
ytxJ
fur
pbpI feuC
yrrS ywnJ spoIIP
yknX yknW ybfP
yxjJ
fapR
yqhP
ycnJ
walR
csoR
fadF
hemB
yflA
ykoL plsX
ctpB
nasF
dhbE yqxD tuaG besA dhbB dhbA
nasE
tagF
yjdB stoA
pbpI
fadA
icd
citZ
mdh
cydD
yolA
ywdJ
ppsE
ywlF
comK
sigH
cypX
rapH
kipR
kipI
ywlG comFA
yvrP
glnR
yvrN
kipA
ycsI nasC
gltA
ureC
ureB
yvzA yvcB
ureA
sigF
pbpH
yokL
galK
epsJ
yvmC
ywbD
gpr
gerD
tlp
ycbCycbD
gltA
rok
sspH
yolC
ptkA
yxaL
hutU
yydJ
yphF
yflB
ydjH
dnaG
yxaJ
yjcM
ppsD
bacC bacA bacE bacD
dppD
sigA
levR
spo0A
cotX
cotU
ydfS
bcd
appF
yfkQ
appC
salA
yfjR
ycgM
hag
senS
yuiB
ilvD
fabL
yufP
sspE
guaB
yurJ
cheC
rocD rocE
yclH
ywcH
frlM
dnaN
clpQ
frlO
ypmP frlB frlN yufN frlD yuiA
csrA
ylmE
yneF
codV
ylmA
ahrC
ftsX tkt
yusE
med
ylmH
ylmD
exoA sspM
yrkN
yrkO
yvyC
yjcP hemAT mcpA
argJ
carA
degR
yfjR
yuiB ilvD yufP
sspE
guaB
yqzC divIVA
fliK
yurJ
pr
rocD rocE
sigD cheC
yclH
yrrL
ylmE
ycgA
ywcH
frlM
dnaN
clpQ yneF codV ylmA
yfmJ yqzD accD
yerB
nfrA
ilvA
ypmP frlB frlN yufN frlD yuiA
yrkP
pdxS
ylmG accAftsE yvhJ pdxT yrkQ
ykfC yufO
ykaA
metS
yvyE
yppE yuxH
comZ metQ
frlO
csrA
yngB
ykcC
sepF
bkdR
rocB lytF
ykfD
metN
yycA
ykcB
bkdAAbuk
rocR
rocC
fliZ
flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ
ywfH yufQ
metP ykfA
flhB fliG
senS
fabL
spoIIE
yngA
bkdB bkdAB
ylxH fliH ycgM
hag yisY
ahrC
ftsX tkt
yusE med ylmH
ylmD
yqcF
yrkN
yrkO
yqzC divIVA
yppD spoIIE
kinC
clpY ykfB
yusD
yfhP
yfmS
dnaA
yoaH
frlR
yjcQ yjfB flhP mcpC fliD
yvyC
ywlC yydA argD
yfmT
fliS
motA
pgdS
(c)
cotH cotZ
cotG yurS
yclI
ptb
lpdV
appB
glcU yraE
pbpF
flhO tlpB mcpBlytD
fliT
argH
cotX
cotU
lytC
appC
salA yoaR
gdh
yraD
tlpC
argB
cotB
cgeE cgeC cotV ftsY cgeByxeE
lytA
bcd
appF yfkQ
ykoV
yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU
yfmT
motB
cheV
argF
cwlH
cotW cgeD
lytB
rocA
ydfS nfo
carB argC
argG
spo0A
yscB
gerKA sspK
yfkR
yppD
yusD
yfmS
fliS
yjcQ
argD
cotY
sspG
yclJ yjcN yybL yybN
yybK
appD
sspA
yraF
gerKB
gerBA
yoaH
motA
yjfB flhP mcpC fliD
yydA
cgeA
sigL ybdN
tlpA
sspL yraG
splB
yhcV
kinC
ykfB
dnaA
frlR
ywlC
sipV
ydiPyneB
cotM
ppsA rocG ynzD
skfE
ybaK
yybM
clpY
yfhP
ydgH
yorB
ylbA
skfH
ylqB
ybdO
codY
spoVAF
spoVAD sspP
yrkQ
parE yqjX tagC yhaZ ligA ruvA yqzHyhaO
yvqJ
skfG
yqcG yqxJ
yxbC yxbD
spoIIGA
gerBB ypzA
sspO yndE yndD
ylaJ
yrkP
pdxS
yqcF
gerE
lytR
yjdG
yocD pnbA
skfC
skfF
asnH
sinI
yxnB
csgA sspI
yvdQ
ykaA
metS
yvyE
ylmG accAftsE yvhJ
pdxT
ycgA
yfmJ yqzD accD
yerB
nfrA
ilvA
levR
ykuL
xynD
yrpD
yobB
slrR skfB
yxbB
dppE dppCkapB yxaM
dppA sigA
ybxH
yrrL
yppE
yuxH
ykfC
yufO
ccpB
sepF
bkdR
rocB lytF
comZ
metQ
gerBC
yngB
ykcC
rocR
rocC
fliZ
ykfD
metN
yycA
ykcB
bkdAAbuk
pdaA
fliK
sigD
flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ
ywfH
yufQ
metP
ykfA
flhB
fliG
yisY
pbpF
dppD
bacF
dnaE yhaM yozL ruvB aprX
sdpA yoqM yfmI
bacC bacA bacE bacD
bacB
yxkC
yhjE
acsA
ywbF ydeJ
ppsD
racA
kinB ymaE
scoC flgM
ydgG
ydiO ynzC
xkdA
yitF
yngA
bkdB bkdAB
ylxH
fliH
glcU
yraE
yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU
ptb
lpdV
appB
yoaR
gdh
ykoV
yraD
nfo
yxaJ
yomJ yjcM sdpB
rapA phrA dppB
yvyF yvyG yuiC
sda
uvrC pcrA yneA
levG
lysA
sleB
gerKA
sspK
ybxH
yfkR
dnaG spo0FkinA
ykuV
flgL gabP
sigG
yhjD
parC ykvR yorC
dinB
levE sacC levD
yydI
ydjG
epr ansZ
flgK
hutM ackA hutH hutI hutG
yndF
yclI
appD
sspA
yraF
gerKB
spoVAD sspP
yflB
ydjH
rapC nprE rapE
gerAC sspN spoIVB
yusN
yteA
lytC
sspL yraG
splB
gerBA
hutU yphF
yitG
spoVAA
spoVAB
lytA
yitF
ccpB
yolC ptkA
yydJ yydH
yhjC
oppD
yhcN
gerAB gerAA
yckD
yhcQ ypeB sspC
uvrB
uvrX uvrA yerH
rok yxaL
cotH
cotZ
cotG
yurS
lytB
rocA
gerBB ypzA
yhcV
spoVAF
pdaA
gerD
tlp
ycbCycbD
spoVT pbpG
cotB
cgeE cgeC cotV ftsY cgeByxeE
sspF yozQ sspB yqfU
yscB
spoVAB
sspO yndE yndD
ylaJ
gpr
ycbG
sipV
ydiPyneB
cotM
cwlH
cotW
cgeD
deoR
licR
levF
hutP
cotY
sspG
yclJ
yjcN
yybL
yybN
yybM
yybK
nupC
lexA
acoBacoL acoA
epsK
oppA oppB oppC
cgeA
sigL
ybdN
tlpA
csgA sspI
exoA sspM
ydgH
ywbD
sspH
epsI epsC epsB
aprE spoIIAA spoIIAB
dacF
oppF
licH
licA
hbs
yydG yqxI yolB
epsJ yvmC
arsB yqhG yqcK
yorB
ppsA rocG
ynzD
ybaK
ylqB
ybdO
codY
yvdQ
sleB
gerBC
parE yqjX tagC
yhaZ ligA
ruvA yqzHyhaO
ylbA
skfH
skfE
spoIIGA
yndF
licB
yozM
acoR
ykuU
acoC
yabT yqhH
seaA
sacX
licC ctaB nrdE
bdbA
bdbB
epsH
yvqJ
skfG
yqcG
yqxJ
yxbC
yxbD
sinI
yxnB
lysA
yteA
gerE
ykuL
xynD
yocD
pnbA
skfC
skfF
asnH
dppA
bacF
yhjE
yopL malP
glpF
albC
sunA
tasA
epsO
yobB
slrR
skfB
yxbB
dppE dppCkapB yxaM
bacB
yxkC
ydiO
dnaE
galK
glpT
glpP
glpK
gmuA rbsA
albD
albE albA albB sboX sboA
yolJ
yfmG
sinR
rapG
ynzC
yhaM yozL ruvB aprX
ymaB
glpD
gmuGgmuB gntK gmuD rbsC
albF
xylR
pdp
yorD
leuB sipW yddJ
xynB
treA
yvmB
albG
sunT
ilvH leuD leuC
epsA
ydjI
malA
yclK
csn
dltC phy
recA ilvB
abrB ilvC
leuA epsG
yheJ
ybaJ
xylB
kdgA
licT
bglH
xynPyxiE
msmX
dltD
phrE
spoVG epsE
bdhA
epsF
epsN
ycbJ
iolB treR
cimH
dltAdltB
phrC
yocH epsL
epsM
yokK
yokL
glpQ
gmuR
spoIIQ spoIIR
arsR
xylA
kdgK
treP sucD nrdI
gntR
skfA
epsD phrK
yrzI
yphE
ytfJ yhfM
lonB
yjbA
sdpA
yoqM
yfmI
kinB
ymaE
scoC
flgM
arsC
yhfW
gmuF
gmuC yvnA rbsR amyE gmuE
dltE
pbpH
yqzG
ksgA
ypfB
ylbB
yphA
iolT
odhA
cydC nrdF
gntZ rbsD
yqaP
ctaO ywcE
yvcA
yuaB
rsfA
ytfI
yodF
nasB
sacY
ydgG
lytR
yjdG
racA
phrA dppB
yvyF yvyG yuiC
yneA
xkdA
yrpD
sdpB
rapA
flgL gabP
sigG
levE sacC levD levG
acsA
ywbF ydeJ
yomJ
spo0FkinA
ykuV
epr
ansZ
flgK
hutM ackA hutH hutI hutG
yydI
ydjG
rapC nprE rapE
gerAC sspN spoIVB
yusN
yvzA yvcB
ylbC
gltB
yveA
yitG
spoVAA
ureB ureA
glnM
epsK
yydH
yhjC
oppD
yhcN
gerAB gerAA
yckD
sspC
sspF yozQ sspB yqfU
ureC
sigF
wapA
bpr levB
yhjD
sda
uvrC pcrA
dinB
rbsK
ctaC
ctaF ctaD ctaG
srfAAcomS srfAB
spo0E
yweA
yvrO
tnrA
ycsF
yycC ywdI
sacB
gltC
uvrB
uvrX uvrA yerH
parC
ykvR yorC
levF
hutP
spoVT pbpG
yhcQ ypeB
yvrN
ycsI nasC
lipC
sipT glnP ywdK
ppsB
yvrP
glnR
kipA
deoR
licR
lexA
acoBacoL acoA
ctaE
srfAC srfAD
yokI xynC
yokJ
yyaC
epsI epsC epsB
aprE spoIIAA spoIIAB
dacF
oppF
rapH
degU
ykzB
nupC
licA
hbs
yydG yqxI
yolB
oppA oppB oppC
ycbG
licB
yozM
acoR
ykuU
yorD
tasA
acoC
arsB yqhG yqcK
ywlG comFA
licH
ctaB
nrdE
bdbA
bdbB
sigH
kipR
kipI
licC
glpF
albC
sunA
abh
rapK
comA
cypX
cydB
rbsB gntP
cwlS
yttP
ppsC
degQ
comK
glnA
cccA qcrA
cwlD
ylaE
ppsE
ywlF
ywrD
dctP
odhB
bglS bglP
resD qcrC
qcrB
nasA
acuC
iolD
yobO
yerI
pel
ywoF
alsT
ycsG yqzE glnH
ssbA
yopL malP
glpT
glpP
glpK
albD
albE albA albB sboX sboA
yolJ
ywdJ
noc dprA sbcD
yisB
comEC
yfmG
sinR
epsH
yabT yqhH
seaA
sacX
xylR
pdp
gmuA
rbsA
albF
xynB
treA
ymaB
glpD
gmuGgmuB gntK gmuD
rbsC
yddJ
xylB
kdgA
yvmB
kdgR
ccpA ycdA
ydeH yolA
iolR
iolG
iolI
antE
phrF
dacC
iolJ acuA yvcI ywdA
iolC iolF
sdpI
yngC
iolS
araL
iolH iolE
yxkF
ywhH
bofC
sacA acuB
pta nagA
kduD cotD
kduI
citM
araB
araE
cydA
phrG
yttA
glnQ
citT
sacP
araA
cotC
yhaR kdgT
rapF
yncM
yydF
yxeD
yhcM csbX
yycB
maf comGD
nin comEB comGF
albG
sunT
ilvH
leuD leuC
sipW
ydjI
rapG
comGG comEA
msmX
dltD
dltC
phy
recA
ilvB
abrB
ilvC
leuA
epsG
epsA
leuB
ybaJ
epsO
ycbJ
yxxG
licT
bglH
xynPyxiE
comN
malA
yclK
spoIIQ spoIIR
yjbA
arsR
iolB
treR
cimH
dltAdltB
phrE
spoVG epsE
bdhA
epsF
epsN
yheJ
yokK
ytfJ yhfM
lonB
yhfW
glpQ
gmuR
csn
epsL
epsM
yqzG
yyaC
arsC
ylbB
yphA
xylA
kdgK
treP sucD
nrdI
gntR
gmuC yvnA rbsR amyE gmuE
phrC
yocH
yrzI
yphE
rsfA
ksgA
ypfB
nasB
gmuF
rbsD
dltE
epsD phrK
ctaO
ywcE
yvcA
yuaB
ylbC
ytfI
yodF
yveA
odhA
cydC nrdF
gntZ
skfA
spo0E
yweA
yvrO
tnrA
ycsF
yycC ywdI
wapA
glnM
gltB
rbsK
ctaC
ctaF ctaD ctaG
srfAAcomS srfAB
yqaP
ppsB
degU
ykzB
lipC
sipT glnP ywdK
bpr levB
sacY
ctaE
srfAC srfAD
yokI
xynC
yokJ
sacB
gltC
abh
rapK
comA
glnA
ywrD
nasA
cydB
rbsB gntP
cwlS
yttP
ppsC
degQ
ycsG yqzE glnH
ssbA
cccA qcrA
cwlD
qcrB
alsT
noc dprA sbcD
dctP
odhB
bglP
resD
qcrC
ylaE
cydD
nasD
ftsA spoVS
sacT
ypiF
araN
araD lcfB
etfA
sdpR
vpr
lytE
ftsZ
guaC
pucK pucL pucM
yflN
abfA
fadB lcfA
fadE
fadN
citB
phrI
pucJ
comGB addAglcR rsmG comFB comGC comC
sbcC
bglS
citZ
araQ
ycnK
ssbB nucA
comGA
acuC
fadA
icd
mdh
araR
sucC
yjdB stoA
dhbC ycxA yuzA
iolR
iolG
iolD
yobO
yerI
pel
ywoF
yisB
comEC
ccpA
ycdA
ydeH
phrF
dacC
yttA
glnQ
nasE
cstA
araM
araP
ccpC
katX
rpsR rpsF comFC comGE yhjB ybdK
iolJ acuA yvcI ywdA
iolC
iolF
iolI
antE
yhcM csbX
yycB
maf comGD
nin comEB comGF
yesM
ctpB
nasF
tuaF
addB
iolT
araL
iolH
iolE
yxkF
kdgR
yflA
dhbF
dhbE yqxD tuaG besA dhbB dhbA
abnA
fadR
ykoL
tuaH
acdA
galT
etfB
resA resE resC
plsX
fabD
rpoE
fadM
uxaC
yqhB
fabG
sacA acuB
pta
nagA
kduD cotD
kduI
sdpI
yngC
yfkN
nsrR
ylbP
pucR
fadH
spoIVCB
perR
minD
hmp
iolS
citM
fadG
spoIIIC
yoaW
yknY
minC
sacP
fadF
hemB
sigX
araB
araE
cydA
phrG
ywhH
bofC
yxxG
comGG comEA
comN
yknZ
yvgN
pucG
sacT
ypiF
araA
cotC
yhaR kdgT
rapF
yncM
yxeD
fnr
exuR
yxzE
pspA
ymzB yvgO yxaB
radC
ahpF
ypuD
rsiW ybfO
pucI
yflN
abfA
araN
araD lcfB
etfA
nasD
yydF
walR
yqhQ
feuA
arfM
katA
hemC
resB
fadB lcfA
sdpR
vpr
lytE
ftsZ
ftsA
spoVS
yknX
yknW ybfP
ybbA
fadE
fadN
citB
phrI
ycnK
guaC
pucK pucL pucM
yrrS ywnJ spoIIP
yxjJ
fapR
narI
narG
hemA
ahpC
scoA
phoP
ytxG yvyD ytxH
feuC
feuB
dhbC
ycxA yuzA
pucJ
comGB addAglcR rsmG comFB comGC comC
spo0M
ytxJ
mrgA
hemL
yjmC exuT yxjC mmgA mmgC
yjmD uxuA
mmgB
ybfM
mcsA
fur
ycnJ
pucD pucC pucB
yxjF
ctaA narK
psd
bdbD
mcsB
clpC
acpA
pucA pucFpucH pucE
citT
araR
sucC
araQ
sbcC
bdbC
zosA
uxaA
scoB
rsbV rsbW rsbX
tagA
pssA rsiX
ep katX
addB
rpsR rpsF comFC comGE yhjB ybdK
ssbB nucA
comGA
tagE
ydjM tagD
yjeA
acdA
cstA
araM
araP
ccpC
dhbF
tuaF
pbpX
tagB
ykvT
cwlO
rpoE
fadR
yesM
uxaB
iseA
fadM
etfB
resA resE resC
fabG
tuaH
yxjI
fhuD
yfhC
fabHA fabHB fabI
galT
ylbP
resB
fabD
rapD
ywjB
abnA
spoIIIC
uxaC
yqhB
hmp
bltR
narJ
narH
hemD
hemX
fhuG
fadH
yfkN
nsrR
blt
sigM
phoR
spoIVCB
minD
sigX
pucR
ykuO
yrhH
guaD
yfiY ykuP yhfQ ycgT yclQ yclN
katA
perR
yknY
yvgN
bltD
spoIIID
yjbC
yfhA
yqhP
yknZ
minC
feuA
ymfH
ymfF
ydfK ymfD
ylxX
phoD
fhuB yfmD
fhuC
ahpF
yoaW
yxzE
pspA
ymzB yvgO yxaB
radC
yqhQ feuB
pucG
pucI
yndA
mta
spoIID
murBsbp
metA
ywbO
pstS
tuaC
tuaB
tatCD
yclO yxeB
ykuN
hemC
fnr
yqfC
yokU
coxA
spoIIIAD
spoIIIAG
ylxW
tatAD
csbB
pstC
pstBB
yfmF
yfiZ
yfmE
arfM
scoA
exuR
phoA
divIB
divIC
bcrC
tuaE
ykvI
spoIIIAH bofA spoVE spoIIIAE
cotJC
yebC
murF
ytpA
pstA
narI
fadG
ybbA pucA pucFpucH pucE
ydhF
tuaD
yodT
ysnD
yfnD
phoB
clpXlonA
cotJB
yesK
ycgR
ycgQ
pstBA
ypuD
rsiW ybfO
ytrH
sodF
spsK
mreD
mreB
ctsR
yfmC yclP
phoP
ytxG yvyD ytxH
fabHA fabHB fabI
ahpC
spsJcwlJ
ytvI
kamA
ponA
ytpB
ugtP
fabF
spoVK yyaD yobW
yhaX spoIIIAB
ydaH
yqjL yceC
sigI
ykuT
plsC
hemA
yjmC exuT yxjC mmgA mmgC
yjmD uxuA
mmgB
ybfM
mcsA
acpA
ywjB
pucD pucC pucB
yxjF
ctaA narK
psd
bdbD
hemL
uxaA
scoB
rsbV rsbW rsbX
ydjM tagD yjeA
fhuG yfmC yclP
pbpX
tagB ykvT cwlO
yitC
ydcC ypjB
spoVB
yodP
spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ
sigE
ypbG
ispDyceH
sigB
rnr
yoaA
yusV
copA
iseA
usd
yqfZ yknT
yunB
yhaL
coaX
yhdK
ywtF
ywaC
yfkM
narG
mrgA
yusR
yheC ydhDyqfD
cotO
yhdL
ypuA
ftsH
sigW yceE
ysxC
zosA
yheD
spmA yngG
yhjR
ycgF
yjcA
yceD yceG yceF yacL
narJ
hemX
uxaB
yodQ
yjdH
ycgG
yjfA
secDF bmr
ysnF
purF
hemD
rapD
fhuD
purR
glgA
ysxE
prkA
yodR
yngI
disA
pbuO
cotE
dacB
yngE
mbl
glgD
yabQ
ytrI
radA
blt
spoIIID
ypqA
ispG
glgP
spmB
bmrR
clpP
purH
bltD
ylxX
narH
phoR
yfhC
yfkS
gerM
spoVID
yabR
spoIIM
spoIVFA
ddl
yfkT
ywjA
guaD
yfiY ykuP yhfQ ycgT yclQ yclN
ydaT
spoVR
yqxA
yngF glgB
yjaV
asnO
safA
ydjP
mreC
recU
hprT
ylxP
trxA
yngJ
yteV
nucB
yodS
yocL spoVMspoIVA
yeaA
yaaH
ywmF
yotD
yfhA
fhuC
clpE
purS
purN
yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC
yuzC
yqeZ
yqfB
ysdB
yfhD
cypC
yitD
ydcA
ytxC
ydbT
yteJ
yozO pbpE
tilS
yocB
purM
metA
ctc
yhdF
yxiS
purL
yoaG
yfhM
gsiB
yflH
copZ
sigM
yjbC
phoD
fhuB yfmD
yclO yxeB
ykuN
yrhH
pstS tuaC
tuaB
tatCD
yfmF
copA
csbB
pstC
pstBB
ywjA yusV
purDpurK
ymfF
ydfK ymfD
yckC
yvlB
mreBH rsgI ybfQ
yqfA
racX
yfhL
rsbRD
ydeC
yhxD
mgsR
bltR
pstBA copZ
yraA
yhcO
yobJ yjoB
yvlA
ythQ yvlD
ydbS
sppA xpaC
rodA
yxnA
csbA
ykgA
ygxB
purE
pbuX glyA
ymfH
mta
spoIID
spoIIIAG
ylxW
murBsbp
ytpA
yfkM
spoIIIAD
spoIIIAH bofA spoVE spoIIIAE
cotJC
divIB
murF
ysxC
yotD
cotJB
yebC
folD
xpt
yabJ
purC
nusB
yesK
ycgR
ycgQ
purA pbuG
yndA
coxA
aag
purB
yqfC
yokU
mreD
ytpB
ykuT
purQ
ykvI
spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ
sigE
ypbG
yuaF yaaNydjO ywrE yoaF fosB
yvlC
ythP
opuE
yoxB
yxbG
ypuB
ydaS
katE
ykgB
ispF
ytkL
nadE
yhdN
nhaX
spsK
yhaL
coaX
yhdK
ywtF
ywaC
yhaX spoIIIAB
yhdL
ypuA
ftsH
sigW yceE
radA
pbuO
1000
yaaI
ydaD
csbC
yxzF
ywiE
yfhE
yodT
ysnD
yvrE
ywlB
yfkH
aldY
ytrH
sodF
yuaI
ywsB
ohrB
spsJcwlJ
ytvI
yycD
ydaG
yflT
bmrU
spoVK yyaD yobW
kamA
secDF bmr
ysnF
rnr
yoaA
spoVB
yodP
yfhF
yvaK
ywjC
yfkD
ywzA
ykzI
yjgC
yoxC ydhK
yitC
ydcC ypjB
yunB
era
yfkI
sodA
gspA
yqfZ yknT
yheC ydhDyqfD
cotO
yfkT
yfhD
yfkS
yhjR
ycgF
ydaE
yjgB
ycbP
ybyB ywmE
ywtG
yjcA
ylxP
trxA
ycgG
yabQ
yngI
ywmF purN
usd
ydaP
yfhK
yocK
yusR
yjzE
yugU
dps
yjgD
yheD
spmA yngG
yjfA
bmrR hprT
cdd
yxkO gtaB
ytaB
yitT
yodQ
yjdH
ytrI
ddl
ysdB
clpP
purS pbuX glyA
800
gabD
ycdF
yfkJ
ydaF
glgA
ysxE
prkA
yodR
ycdG
ydbD
cotE
dacB
yngE
mbl
glgD
spoIIM
mreC
recU tilS
yocB
purM
1600
csbD
ypqA
ispG
glgP
spoIVFA
yaaH
yfhM
gsiB yflH
yhdF
cypC
1400
yerD
gerM
spoVID
yabR
safA
yeaA
ydjP
rodA
yfhL
rsbRD
ydeC
yxiS
purL
racX
opuE
yxbG yxnA
csbA
asnO
spmB
ykgA
yhxD
1200
spoVR
yqxA
yngF glgB
yjaV
yocL spoVMspoIVA
yoxB
ispF
ypuB
mgsR
600
yngJ
yteV
nucB
yodS
yuzC
yqeZ
yqfB
ytkL
nadE
yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC
ytxC
ydbT
yteJ
yozO pbpE
yaaI
yfkH ydaD csbC
yxzF
katE
ykgB
ywlB
ydaG
yflT bmrU
yraA
yoaG yqfA sppA xpaC
yitD
ydcA
mreBH rsgI ybfQ
ydbS
yycD
yuaI
yjgC
yoxC ydhK
yckC
yvlB ythQ yvlD
yvaK
ywjC
yfkD
ywzA
nhaX
ygxB
400
pgdS
yhcO
yvlA
yfhF yjgB
ycbP yfkI
sodA
ohrB
aag
200
spoIIE
(b)
yobJ yjoB
yvlC ythP
ydaE
yugU
dps
yjgD
folD
yrkO
yppD
gabR
yuaF yaaNydjO ywrE yoaF fosB
yjzE
yfkJ yxkO gtaB
ytaB
yitT yocK ybyB ywmE
ywtG
purE
ylmD
yqzC divIVA
med
ylmH
gabT
gabR
gabD
ycdF csbD yerD ydaF
ycdG
ydbD
xpt
1600
flhO tlpB mcpBlytD
tlpC
mcpA
argB
carA
gabT
nusB
1400
yusD
yjcP hemAT
yvyC
yjfB flhP mcpC fliD
purQ
1200
kinC
carB argC argG
purC
−1
yrkN
yrkQ
600 800 1000 component at scale5
yfmT
fliS
argF
argD
ydaS
yrkP
pdxS
yqcF
400
yfmS yoaH motA
yydA
gspA
ykaA
metS
comZ
ahrC
ywlC
ywiE
ylmE
yvyE
ylmG accAftsE yvhJ pdxT
yfmJ yqzD accD
yusE
dnaN
ftsX tkt
yneF
codV clpY ykfB
yhdN
yrrL yppE yuxH
yerB
ywcH
clpQ
dnaA frlR
aldY
sepF
bkdR
rocB lytF
nfrA ycgA
ylmA
200
0
rocR
rocC
fliZ
ilvA yufO
ypmP frlB frlN yufN frlD yuiA
yfhP
yfhE
yngB
ykcC
ykfC
metQ metN
frlM frlO
csrA
ykcB
bkdAAbuk
ykfD
yurJ
ykfA
guaB
flgC cheB fliJ fliY fliL fliP cheA flgB fliM cheY flhA ylxF ycgN fliI flgE flhF cheW fliE fliR fliF cheD fliQ
ywfH
yuiB
pbpF
cheC
fliG
yisY
yraE
yqfX sspD yrrD spoVAC ydfR gerKC adhB sspJ splA ykoU
sigD
flhB
ycgM
yraD nfo
spoVAD sspP
bkdB bkdAB
ylxH fliH
yycA
yngA
ptb
lpdV
appB
yoaR
sspK
ybxH gerBA
appD
sspA yraF
gerKB
1600
rin
splB
yhcV
ccpB
yclH
yclI
yitF sspO yndE yndD
1400
1
lytA lytC
pdaA
yabJ
cotH cotZ
cotG yurS
cotU
lytB
rocA
spoVAF
purB
cotX
yybK
spo0A
yscB
gerBB ypzA
ylaJ
purA pbuG
cotB
cgeE cgeC cotV ftsY cgeByxeE
yybN yybM
ybdO
codY
cwlH
cotW
yjcN
skfE spoIIGA
cotY
sspG
sigL
yqxJ
yxbC yxbD
sinI
yxnB
−2
yorB
yocD pnbA
skfC
skfF
asnH dppA
bacF sigA
gerBC
1200
0
ydiPyneB
yobB
slrR skfB
yxbB
dppE dppCkapB yxaM
dppD
bacB
yxkC
yhaM yozL ruvB aprX
sipV
ydgH
sdpA yoqM yfmI
kinB bacC bacA bacE bacD
ymaE
scoC flgM
parE yqjX tagC yhaZ ligA
lytR
yjdG
racA
phrA dppB
yvyF yvyG yuiC
xkdA
gerE
ykuL
xynD
yrpD
sdpB
ansZ
flgK
sigG
spoVAB
exoA sspM
600 800 1000 component at scale4
dnaE
yndF
csgA sspI
yvdQ sleB
400
2
ynzC
levR
acsA
ywbF ydeJ
yomJ
ydgG
ydiO
yneA
levG
yydI
ydjG
yusN
yteA
uvrC pcrA dinB
yhjE
ackA hutH yphF hutI hutG
lysA
spoVAA sspF yozQ sspB yqfU
200
yhjD
sda
ykvR yorC
levF
hutM
hutU
yitG
sspC
uvrB
uvrX uvrA yerH
parC
lexA
acoBacoL acoA
levE sacC levD
yolC ptkA
hutP
spoVT pbpG
yhcQ ypeB
1600
hbs
yydG yqxI yolB
rok
yflB
ydjH
rapC nprE rapE
licR
bdbA
bdbB
yxaL epsK
yydH
yhjC
oppD
yhcN
gerAB gerAA
−2
deoR
licA
albD
albE albA albB sboX sboA
yolJ
sspH
epsI epsC epsB
oppA oppB oppC gpr
ycbG
licB
nrdE
albF albG
sunT
sunA
epsJ yvmC aprE spoIIAA spoIIAB
ywbD
1400
nupC
yorD
acoC
rapG
dacF
oppF
1200
licH
licC
tasA
epsO
ycbJ
arsB yqhG yqcK
600 800 1000 component at scale3
0
yopL malP
glpT
glpP
glpK
yfmG
sinR
epsH arsR
yabT yqhH
400
xylR
pdp
ymaB
glpD
glpF
dltD
dltC
ilvH
sipW
ydjI
yvmB
yclK
leuD leuC
epsA
leuB
ybaJ
yokL
yjbA
yhfW
xynB
treA
gmuA
spoIIQ spoIIR
yphA
seaA
ilvC
leuA epsG
yheJ yokK
yyaC nasB
xylB msmX
cimH
dltAdltB
phy
recA
abrB
bdhA
epsF
epsN yqzG
phrE
spoVG epsE
ilvB
epsM
rsfA
ksgA
ypfB
phrC
yocH
yrzI
yphE ylbC ytfI
yodF
yveA
200
2
licT
bglH
xynPyxiE
treR
kdgA
malA
gmuGgmuB gntK gmuD
epsD phrK
ctaO ywcE
yvcA
glnM
gltB
iolB
treP sucD glpQ
gmuR
gmuC yvnA rbsR amyE gmuE
srfAAcomS srfAB
xylA
kdgK
nrdI
gntR
rbsC
yuaB
wapA
bpr levB
gmuF
rbsD
csn
sacB
sacX
−0.1
iolT
bglS
odhA
cydC nrdF
gntZ
skfA
spo0E
yweA
yvrO
tnrA
lipC sipT glnP ywdK
yokI xynC
yokJ
yvrP
glnR kipA
rbsK
ctaC
ctaF ctaD ctaG
srfAC srfAD
yqaP
ppsB
degU
ykzB
sacY
ctaE
cydB
rbsB gntP
cwlS
yttP
abh
rapK
comA
acuC
iolD
dctP
odhB
bglP
cccA qcrA
cwlD
ylaE
ppsE
ywrD
ssbA
comEC
gltC
ccpA
iolR
iolG
iolI kdgR
yngC
resD
ppsC
degQ
comK glnA
iolJ acuA yvcI ywdA
iolC iolF
yxkF
ywlF alsT
ycsG yqzE glnH
araL
iolH
kduD cotD
kduI
sdpI
yobO qcrC
iolS
citM araB
sacA acuB
nagA
iolE cydA
phrG
ycdA
ydeH yerI
pel
yttA
glnQ ywdJ
noc dprA sbcD yisB
sacP araA araQ pta
araE
rapF
yncM
yydF
yxeD antE
yolA
sacT
ypiF
araD lcfB
etfA
cotC
yhaR kdgT
nasD
ftsZ
ftsA
phrF
ywoF
nin comEB comGF
comN
yflN
abfA araN
fadN
cydD
sdpR
vpr
lytE phrI
spoVS ywhH dacC
1600
araR
sucC
araP fadB lcfA
fadE
yjdB stoA
citB
dhbC
ycnK
guaC
bofC yhcM csbX
yycB
citZ
ctpB
nasF
dhbE yqxD besA dhbB dhbA
tuaF tuaG
ycxA yuzA
pucK pucL pucM
fadA
icd mdh
ccpC fabD
katX
pucJ
sbcC
citT
etfB
resA resE resC
yflA
ykoL plsX fabG
pucR addB
cstA
araM
resB pucI
yxxG
1400
0
galT
ylbP
maf comGD
1200
fadM uxaC
yqhB hmp
rpsR rpsF comFC comGE yhjB ybdK
600 800 1000 component at scale2
acdA
sigX
comGB addAglcR rsmG comFB comGC comC
abnA rpoE
yvgN
ssbB nucA
400
0.1
fadH
minD
pucG
comGA
200
fadF
fadG
minC
feuA
arfM
ahpF
hemB
spoIIIC spoIVCB
perR
yknY
ybbA
comGG comEA
−5
narG mrgA
katA
hemC
fnr
exuR
yoaW
yxzE
yknZ
feuB pucA pucFpucH pucE
zosA
ahpC
scoA
ypuD
rsiW ybfO
pspA
ymzB yvgO yxaB
radC
yqhQ
hemA
yjmC exuT yxjC mmgA mmgC
yjmD uxuA
phoP
ytxG yvyD ytxH
yxjJ
fapR ycnJ
mmgB
ybfM
mcsA
feuC
fabI
hemL
uxaA scoB
rsbV rsbW rsbX
ydjM tagD yjeA
fur
fabHA fabHB
pucD pucC pucB
uxaB
pbpX
tagB ykvT cwlO iseA
acpA
ywjB
narI
hemD
guaD fhuD
clpC
yfmC yclP
yjbC
yfhA
fhuC
yfhC
fhuG
1600
sigM
phoD
fhuB yfmD
yclO yxeB
ykuN
yfiY ykuP yhfQ ycgT yclQ yclN ykuO
yrhH
tuaC
tuaB
tatCD
yfmF yfiZ
yfmE
csoR
1400
0
spoIIID
divIC
ywbO
csbB pstS
t
yotD
copA
1200
bltD
ylxX
tatAD
pstA pstC
pstBB
600 800 1000 component at scale1
5
ymfH ymfF ydfK ymfD
ylxW
murBsbp metA
pstBA copZ
400
yndA
mta
spoIID spoIIIAG
divIB
200
yqfC
coxA spoIIIAD
spoIIIAH bofA spoVE spoIIIAE
yesK
ycgR ycgQ
murF
bcrC
tuaE tuaD
yokU
mreD
ytpB
ytpA
fabF
spsK spoIVCA murGspoVD cotJA spoIIIAA spoIIIAC spoIIIAF yesJ
sigE ypbG
ydaH
phoB ydhF
yfkM
ykvI
yhaX spoIIIAB yhaL coaX
yhdK
ywtF
ywaC
yqjL yceC
disA
ctsR
yodT
ysnD
kamA cotO yhdL
ypuA
ftsH
sigW yceE
ispDyceH
sigI ykuT
radA
ysxC
plsC
spsJcwlJ ytvI
yjcA
bmrR hprT
secDF bmr
yceD yceG yceF yacL
sigB
rnr
ywmF trxA
spoVK yyaD yobW
yodP
yjfA
ddl
tilS
ysdB
gsiB
spoVB
yngI
yfhL
rsbRD
yflH
ysnF
ylxP
yoaA
clpP
clpXlonA
purF
pbuO
glgA
ysxE prkA
yodR
yfkT
yfhD
cypC
yfkS
cotE
dacB
yngE
mbl glgD
yabQ
yxbG yxnA
csbA
ypuB ydeC ctc yhdF yocB
ydaT
ypqA ispG
glgP
spmB
ykgA
yxiS mgsR clpE
gerM
spoVID
yabR
spoIIM safA spoIVFA
ispF
nadE
ydaS
yhxD
ygxB
purE
nusB
purR
spoVR
yqxA
yngF glgB
yjaV asnO
5 0 −5
yngJ
yteV nucB
yodS
yocL spoVMspoIVA
yeaA
ytkL
csbC
yxzF ywiE
yhdN
yraA
aag folD
xpt
yabJ
purN
purH
yybI yhbH sqhC ylbJ spoIVFBykvUglgC yhxC
yuzC
yqeZ
yqfB
yoxB
nhaX purB
purC purL
purS pbuX glyA purM
yitD
ytxC
ydbT
yteJ
yozO pbpE
racX
opuE yaaI
yfkH
aldY
katE
ykgB
ywlB
ydaD
ohrB
purQ
purA pbuG
purDpurK
yoaG yqfA sppA xpaC yuaI
yvrE
ykzI
yjgC ywsB
ydcA
mreBH rsgI ybfQ
ydbS
yycD
ydaG
yflT
yckC
yvlB ythQ yvlD
yvaK
ywjC
yfkD
ywzA
yfkI
sodA
yoxC ydhK
yfhE
ydaP
yfhK
era
yhcO
yvlA
yfhF yjgB
ycbP
yobJ yjoB
yvlC ythP
ydaE
yugU
dps
yjgD
gspA
yjzE
yfkJ yxkO gtaB
ytaB
yitT yocK ybyB ywmE
yuaF yaaNydjO ywrE yoaF fosB
gabD
ycdF csbD yerD ydaF
ycdG
ydbD
yjcP hemAT mcpA
motB
cheV degR
pgdS
flhO tlpB mcpBlytD
tlpC fliT
argF carB argC
argG
argJ
carA
argB argH
(d)
Figure 4: Multi-scale decomposition of gene expression network of Bacillus subtilis (from [7]). (a) Original data. b) Multi-scale decomposition profiles (1-D display). (c-d) Scale components for λ = 2 and λ = 16, respectively. The decomposition has a structuring effect in terms of gene grouping. Despite the use of false colors, it is difficult to distinguish modules, unlike the case of an ordinary image as in Fig. 3.
Scale-space module detection for random fields
21
182
189
177
187
180
181
184
186
209
192
194
213
174
185
188
175
208
215 198
206
190
178
183
196 205
214
193
202
203
179
204 212
197
201 200
149
211
148
191
199(rok)
151
166
161
163
139
136
131
169
173
133
171
144
153 165
160
157
138
142
170
155 128
36(codY)
135
158
168
162
156
146
143
164
37(comK)
147
130
145
150 137
132 172 154
195
210
207
176(degU)
134
140 141
129 159
152
108
167
110 111
44
94
24
18
115
76
47
34 77
105 14
32 25
2
26
112
7 59
72
69
71
40
126
84
12
51
75
33
4
46
88
60
66
109
124
79
64
52
9
rin
85
19
58
89
38
87
49
98
13
74
92
68
56
116
17
31
122
1
78 95
106 48
54
91
118
93
22
42 103
28
67
83
100 125
70
39 81 8
29
90
27
113
t
99
114
123 61
16
102
96
53
121
43
3
41
119
73
57
30
11
97
20
50
80
63
21
45
104
10
23
62
5
101
65
107
120
82
117
86
15
6
55
127
35
ep
Figure 5: Graph partioning of G : this graph shows four regulons and four regulators.
182
189
182
189
177
187
180
183
149
149
163
150
165
208
132
166
161
206
190
205
137
198
150
165
205
137
132
202
172
204
199
146
144
153
200
157
138
153
156
157
138
195
210
197
200
162
135
156
164
164
134
214
142
211
170
155
207
167
152
115
61
106
40
25
2
102
96
25
48
31
71
72
74
22
59
70
12
56
1
78
60
19
85
43
3
46
4
88
49
95
51
75
33
87
66
109
73
57
11
97
63
21
45
23
62
5
107
120
117
86
15
19
73
57
66
109
30
63
21
45
23
51 75
33
46
4
88 60
19
85
66
109
73
57
11
97 63
21
45
23
62 5
107
120 117 15
104
10
101
65
86
9
20
50
80
82
124 79
64
52
119 41
6 55
127 35
12 87
30
62 5
117 15
58 89
1 49
3
126
84
38 56
95
92
122 42 103
28
98
74
36
100
13
17 71
22
43
107
120 86
69 116
31
72
70
9
106 48 54
91
118
68
83
112
104
10
101
65 82
26
20
50
80
81 8
29
67 99
114
124 79
64
52
11
97
39
123 61
90
27
113
96
125
4
76
47 53
121 40 2
78
119 41
6
55
127
35
60 85
43 3
46
115
93 32
58
88
49
95
111 24
34 77
25
59
51 75
33
87
191
110
44
94 18
105
102
126
84
108
16 37
14
89
12
56
207
167
152
159
7
38
1
78
104
10
101
65
82
9
92
36
211
170
141
98
13
74
122 42 103
28
214
142
17 71
20
50
80
30
79
64
52
119
41
70
124
116
195
210
197
140
128 129
54
31
100 125
134
168 155
69
22
58
156
164
48
91
118
68
83
72 59
89
38
26
162
135
106
112
7
126
84
122
42
103
28
92
36
100
125
99
114
98
13
17
112
7
116
157
138 143
8
67
54
91
118
68
83
201
193 200
81
29
90
27
113
96
69
pr 26
2
102
67
99
114
61
40
8
29
90
27
113
153 158
39
123
53
121
32
203 204
146
144
76
47
93
14
202
199
130
145
110
115
34
77
105
81
32
208 206
190
205 212
133
111
24
18
39
123
53
121
44
94
76
47
93
108
16
37
111
24
34
77
191
167
152
110
44
94
18
105
207
165 137
147 154
211
170
141
159
108
16
37
14
214
142
129
150
132 172
140
128
141
159
134
168
155
191
140
128
129
194
215 198
196
169
195
210
197
143
209
192
160 136
131
201
193
158
130
145
162
135
143
204
199
163
139
173
203
146
144
201
193
158
130
212
133
154
213
171 166
161
206
190
196
169
147
212
133
168
149 148
151
208
215
160
136
131
203
147
172
145
163
139
173
202
194
213
171
198
196
169
209
192
194
215
160
136
131
154
179
148
151
209
192
213
171
166
161
175
188
178
179
148
151
181 174
176
178
179
139
177
184
185
183
188
176
178
173
180
175
188
176
186
174
185
175
183
187
181
184
186
174
185
182
189
177
187
181
184
186
180
6 55
127 35
Figure 6: Module detection at three time points. This treatment was done without taking into account time dependence. Module detection yields respectively 3, 3 and 2 modules depending on x(t), t = 1, 2, 3.
Scale-space module detection for random fields
22
12
t
11 15 14
10 13
58
53 50 9
60 56
52
7
48
54 55
51
44 47
46
43
rin
49
57
52
8
36
45
27
6
59
33
42
41
37
34
55
39
57
54
60
58
59
3
56
45
51
53
32
47
5
35
40
6
14
4
13
15
9
10
12
26
25
11
21
19
20
5
24
4
3
1 2
23
22
30
31
38
29
8
50
49
28
48
46
18
7
44
2
1
17
39
23
43
16
41
38
40 22
34
35
24
29
32
33
31
17
27 16
36
18 25 19
28
30
37
20
ep
42
21
26
(a)
(b)
pr
Figure 7: Same configuration {x, G} as in Fig. 8(c) but with two different displays : (a) "Edge weighted spring embedded" layout, (b) Hierarchical layout.
Scale-space module detection for random fields
23
26
21
M3
12
20
25 11 14
38
18
29
15
19 17
13
M4
10
33
22
39 16 28
31
40
9
41
35
32
M1
24
34
36 30
M2
27
23 2 7
37 1 42 8
3
4 51 5
58 6
56
47
t
55
46
57
53
44
M5
43
54
48
50
49
rin
45
52
59
(a)
60
(b)
26
26
21
21
12
12
20
20
25
11
25
11 14
14
38
38
18
29
15
19
18
29
15
17
22
16
28
16
31
40
9
40
9
41
31 41
35
32
24
27
35
32
24
27
ep 34
36
10
33
39
22
28
17
13
19
13
10
33 39
34
36
23
30
23
30
2 7
2
37
7
1
37
42
1
8
42
8
3
3
4 51
4
51
5
5
58
58
6
56
47
6
56
47
55
55
46
57
46
57
53 44
53
44
43
54
54
43
48 50
48
50
49
49
45
45
52
59
52 59
60
60
pr
(c)
(d)
Figure 8: Experiment on simulated data. (a) The regulator graph GB (between regulons). (b) The entire regulon graph G. (c) The observed random field x (displayed with "Force directed" layout). (d) Module detection outcome: there is one detected module per regulon. Given the knowledge of the regulons, we can associate a regulon to each detected module.
Scale-space module detection for random fields
24
lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 1.5 w(30)= 1.45
1
t
w(10)= 4.39 w(8)= 1.38
rin
0.5
0
−0.5
−1
ep
−1.5
−2
0
10
20
30
40
50
60
pr
Figure 9: Module detection based on the multi-scale decomposition in Fig.10 restricted to Λ. The pink crosses show the 1-D profile of x. The color curves display the three main components for scales {8, 10, 30}. The red circles are the locations of the detected module centers. The color segments at the bottom of the figure locate regulons, their colors are identical to those in Fig.8(a-b-d).
Scale-space module detection for random fields
25
lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 1.5
w(30)= 1.45 w(28)= 0.39 w(26)= 0.01 w(24)= 1.31 w(22)= 0.01 w(20)= 0.01 w(18)= 0.27 w(16)= 0.01 w(14)= 0.37 w(12)= 0.72 w(10)= 4.39 w(8)= 1.38 w(6)= 0.63 w(4)= 0.01 w(2)= 0.01
t
1
rin
0.5
0
−0.5
ep
−1
−1.5
−2
0
10
20
30
40
50
60
pr
Figure 10: Statistical multi-scale decomposition for Λ0 = {2, 4, ..., 28, 30}. Scale selection with = 0.1 selects Λ = {8, 10, 24, 30}. The black curve is the sum of all Λ0 -components.
Scale-space module detection for random fields
26
lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
t
1
rin
0.5
0
−0.5
ep
−1
−1.5
−2
0
10
20
30
40
50
pr
Figure 11: Ordinary scale-space decomposition.
60
Scale-space module detection for random fields
27
lambda: 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30
t
1.5 w(30)= 507 w(28)= 515 w(26)= 523 w(24)= 534 w(22)= 545 w(20)= 559 w(18)= 576 w(16)= 596 w(14)= 621 w(12)= 653 w(10)= 695 w(8)= 750 w(6)= 825 w(4)= 930 w(2)= 1080
rin
1
0.5
0
−0.5
−1
ep
−1.5
−2
0
10
20
30
40
50
60
pr
¯ σ,Λ |. All scales are nearly Figure 12: Adverse effect of an estimation made without log |K equal contributions and without informative value.