Spatial Information Retrieval From Remote-sensing Images-Part I

process that, for a given data set and a hypothesis space, assigns .... is the probability distribution function (pdf) of ... two functional equations whose solutions uniquely determine ... A possible solution is using the principle of maximum ..... The third model used is the ME [32] ..... [13] K. Huang, Statistical Mechanics, 2nd ed.
2MB taille 4 téléchargements 391 vues
IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

1431

Spatial Information Retrieval from Remote-Sensing Images—Part I: Information Theoretical Perspective Mihai Datcu, Klaus Seidel, and Marc Walessa

Abstract— Automatic interpretation of remote-sensing (RS) images and the growing interest for query by image content from large remote-sensing image archives rely on the ability and robustness of information extraction from observed data. In Parts I and II of this article, we turn the attention to the modern Bayesian way of thinking and introduce a pragmatic approach to extract structural information from RS images by selecting from a library of a priori models those which best explain the structures within an image. Part I introduces the Bayesian approach and defines the information extraction as a two-level procedure: 1) model fitting, which is the incertitude alleviation over the model parameters, and 2) model selection, which is the incertitude alleviation over the class of models. The superiority of the Bayesian results is commented from an information theoretical perspective. The theoretical assay concludes with the proposal of a new systematic method for scene understanding from RS images: search for the scene that best explains the observed data. The method is demonstrated for high accuracy restoration of synthetic aperture radar (SAR) images with emphasis on new optimization algorithms for simultaneous model selection and parameter estimation. Examples are given for three families of Gibbs random fields (GRF) used as prior model libraries. Part II expands in detail on the information extraction using GRF’s at one and at multiple scales. Based on the Bayesian approach, a new method for optimal joint scale and model selection is demonstrated. Examples are given using a nested family of GRF’s utilized as prior models for information extraction with applications both to SAR and optical images. Index Terms—Bayesian inference, Gibbs random fields, hierarchical modeling, model selection, scene understanding, synthetic aperture radar (SAR).

I. INTRODUCTION

T

HE restoration of degraded images, image segmentation, feature extraction, and clustering are extensively used for the interpretation of remote-sensing (RS) data [27]. Most of these methods have information theoretical foundations in, e.g., signal or parameter estimation theory or decision theory. In the information theoretical perspective, we organize the observations (image pixel intensities, spectral components, or other image features) in a measurement space [31]. A measurement is understood as a realization of a random Manuscript received December 4, 1997; revised May 9, 1998. This work was supported by the ETH project “Advanced Query and Retrieval Techniques for Remote Sensing Image Databases.” M. Datcu and M. Walessa are with the German Remote Sensing Data Center DFD, German Aerospace Center (DLR), D-82234 Oberpfaffenhofen, Germany (e-mail: [email protected]). K. Seidel is with the Swiss Federal Institute of Technology, CH 8092 ETH Z¨urich, Switzerland (e-mail: [email protected]). Publisher Item Identifier S 0196-2892(98)06834-X.

variable and carries information in the form of incertitude. Image information extraction requires us to recognize which scene structures can be identified as being responsible for the generation of the observed measurements. Thus, a hypothesis testing rule is introduced as a partitioning of the measurement space into disjoint sets. Each set explains a certain scene structure. To derive the decision rule, we have to learn probability distributions from the observed data, i.e., estimate the likelihood over the space partition in the assumed hypotheses by measuring frequencies of measurements under known assumptions. Consequently, the interpretation of highcomplexity RS images is often hindered using traditional image analysis methods, i.e., the classification is based on an insufficient amount of training data; therefore, significant uncertainties remain and any signature analysis, structural interpretation, and classification are affected by a variety of deficiencies. The Bayesian approach consists of interpreting probabilities based on a system of axioms describing the incomplete information [5] rather than randomness and allowing an inference process that, for a given data set and a hypothesis space, assigns probabilities to the hypothesis. Thus, the Bayesian approach enables a self-consistent inference process, permitting the specifications of information retrieval problems directly using models in the form of probability distributions [14]. Based on these premises, we approach the information extraction of RS images from the Bayesian perspective, selecting from a predefined library of models [22] those that best explain the image. The Bayesian information retrieval from RS images is essentially an inverse problem: search for the scene that best explains the observed data. We give an information theoretical assay of the Bayesian paradigm, and we show how the traditional decision and estimation theories are recognized in the Bayesian model fitting and model selection concepts [9], [23]. For applications, we put emphasis on and demonstrate the Bayesian concepts for structural information extraction as texture features using families of Gibbs random fields (GRF’s) [10], [11], [18], [22], [24]. The article has the following organization. Section II reviews the results of classical estimation theory in two circumstances: the ability to include prior models, e.g., the Maximum A Posteriori (MAP) estimator, and the quality of an estimator. In Section III, the Bayesian paradigm is presented with emphasis on the axiomatic definition of the probability calculus and its implications for stochastic inference. These results allow us to encapsulate proposition and prior information in the inference process, resulting in an elegant formalism for information

0196–2892/98$10.00  1998 IEEE

1432

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

retrieval: model fitting and model selection, i.e., the inference in the parameter and in the model space. Section IV presents the basic methods for stochastic modeling and contains an example of model selection commented from the point of view of information theory. The information theoretical assay of the Bayesian paradigm is concluded in Section V with systematic concepts: a two-level method for information retrieval from RS images and the scene understanding approach for data inversion. In Section VI, the theoretical concepts introduced in Sections III–V are exemplified. Three families of GRF models are defined. The scene understanding as Bayesian paradigm is demonstrated for synthetic aperture radar (SAR) image reconstruction. A model-fitting algorithm illustrates the extraction of different information according to the assumed prior model. Further, a new iterative optimization algorithm demonstrates the automatic selection, from a family of models, of the one that best explains the observation. The algorithm enables precise reconstruction of textures and details in SAR images. Section VII comments on the accuracy of the information extraction depending on the relative size of the analyzing window versus the extent of the GRF neighborhood. In addition, the application of the Bayesian paradigm for information extraction using a nested family of GRF models at multiple scales is recommended. This is exemplified in Part II [30]. Section VIII exposes the conclusions, and the Appendix presents the stochastic relaxation algorithm used for the demonstrations in Section VI. II. CLASSICAL APPROACH OF ESTIMATION THEORY To introduce systematic methods for the interpretation of RS images, and to give the information theoretical background for the understanding of the modern Bayesian approach, we present a short overview of the classical information retrieval approaches: parameter estimation and decision theory.

(a)

(b)

Fig. 1. (a) Quadratic cost function that leads to the MMSE estimator and (b) uniform cost function leading to the MAP estimator.

Minimizing the risk using a quadratic cost function, we obtain the minimum mean square estimator (MMSE) (3) which is the conditional mean of the desired parameter. Minimizing the risk and using a uniform cost function, at , we obtain the Maximum A Posteriori (MAP) the limit estimator (4) A special case is the estimation of a deterministic parameter , its probability distribution function is a distribution (5) In this case, MMSE degenerates into the originally assumed from the value and cannot extract the information about observations (6) The estimation problem is reformulated defining the likelihood and searching for its maximum. The maximum function likelihood (ML) estimator is defined as (7)

A. Parameter Estimation We consider the following model for our observations : (1) where represents a deterministic model of the image formation process (the forward model), is the physical parameter is vector we want to extract from the observations, and the measurement noise. The physical parameter vector is a realization of a stochastic process. The classical parameter estimation is formulated as the minimization of the Bayes risk defined over the signal space. The risk is the expectation value of a cost function defined over the joint space of observations ( ) and model parameters ( ) (2) The cost function is a measure of goodness of the estimated parameter , being defined as a distance between the actual and the desired but unknown value of . The most frequently used cost functions are the quadratic and uniform (Fig. 1).

We observe that, in classical estimation theory, using a cost function is nothing else but describing a type of prior information. This fact is observed in the role played by the posterior distribution of the basic estimators mentioned below (8) The likelihood factor at its turn can be rewritten for the observation model in (1) (9) is the probability distribution function (pdf) of where the noise. Now the expression for the posterior encapsulates the deterministic prior knowledge represented by the forward model. In addition, the knowledge about the observation noise and the a priori information about the desired parameter are also included. We conclude that MAP is a complete frame for model-based approaches in information extraction. Now we turn our attention from the cost-based formulation of the estimation problem to the analysis of the posterior

DATCU et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART I

TABLE I SUMMARY OF CLASSICAL PARAMETER ESTIMATION METHODS AS PARTICULAR CASES OF THE MAP APPROACH

1433

conditional least squares (CLS). The difference lies in the physical meaning of the regularization operator , which in the deterministic approach is understood as the degree of smoothness of a reconstructed signal. B. Quality of the Estimated Parameters The estimated parameters themselves are random variables because they are functions of the observations that are random variables. The quality of the estimated parameters [2] is evaluated by their conditional mean relative to (10) If the conditional mean equals the true value of the parameter, the estimator is unbiased. A stronger quality criterion for the estimated parameters is the conditional variance relative to (11) A good estimator must have a variance as small as possible. The limit of the quality of an estimator is given by the Cramer–Rao bound (12)

The Cramer–Rao bound can be expressed using the Kullbach–Leibler (K–L) divergence (13) where distribution. In Table I, we summarize the basic results of classical estimation theory as particular cases resulting from the MAP approach using different model qualities. The table summarizes the basic parameter estimation methods from the point of view of the capability to model a problem. MAP is a complete approach, encapsulating both stochastic models and deterministic prior knowledge. The least-squares (LS) estimation is a weak approach, handling no prior knowledge about the information retrieval problem. We conclude with the observation that MAP itself is sufficient to define the other estimators as special cases by using different noise and prior models. It will be the Bayesian paradigm (Section III) that demonstrates the consistency of this observation. Remarks: 1) MAP uses a cost function, which makes the estimation very sensible to how accurately the prior models the data. Alternative solutions are the generalization of the cost function such that it can capture information in the mass of the prior distribution. 2) ML is a least-committed estimator, in the sense that it chooses the parameters from the observation space where the noise is zero with maximum probability. Due to this property, ML is in many situations preferred to other methods. 3) The standard regularization of the solution of ill-posed problems leads to objective functionals similar to the

is the K–L divergence (14)

If the parameter is a vector, the quality of the estimator is measured by its covariance matrix (15)

Cov The inferior bound of the quality is given by Cov where

(16)

is the Fisher information matrix with its elements

(17) The elements of the Fisher information matrix can also be expressed using the K–L discrimination (18) The K–L divergence shows that the bounds of the estimator quality are given by the fit of the likelihood to a desired true likelihood. This will be later understood as a comparison of the likelihood distributions allowing a decision about the identity of the data, and it will be the base for the explanation of the information content of a stochastic model.

1434

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

III. BAYESIAN PARADIGM The methods and results we are discussing within this section arise from the fundamental approach to consider probability theory as a set of normative rules for conducting inference. This was first clearly outlined by Cox in 1946 [5]. Supposing that degrees of plausibility are to be represented by real numbers on a scale, he found the conditions that such a calculus is consistent in the sense that, if two different methods for calculation are permitted by the rules, they should yield the same result. These consistency requirements took the form of two functional equations whose solutions uniquely determine the fundamental probability calculus rules (19), up to a change of variables that can alter their form, but not their content

(19) We can recognize (19) as the basic product and summation rules of probability theory, from which all its results can be and represent the same derived. For example, if proposition, we can change and on the right-hand side of , we obtain the Bayes theorem (19) and if (20) So, the fundamental equations of probability theory are uniquely determined as rules of inference. Jaynes [14] noticed that Cox’ results are based only on some very elementary requirements for consistency that make no reference to random experiments, so it is legitimate to assign probabilities to any clearly stated proposition. Thus, the conventional interpretation of probability as frequency in a random experiment is included as a special case of probabilistic inference for certain kinds of propositions or prior information, which make explicit statements about the outcomes of an experiment, whatever it should be. Furthermore, the frequentist approach does not lead to the inferential approach to probability simply because the rationals behind the first approach are merely deductive. If the probability theory is a set of normative rules for conducting inference, the Bayes theorem is to be seen as a method of translating information into a probability assignment. Thus, the Bayesian approach consists in interpreting probabilities based on a system of axioms describing the incomplete information rather than randomness and enabling the inference process, which for a given data set and a hypothesis space assigns probabilities to the hypothesis. The Bayesian and frequentist approaches are similar to a certain extent, but the self-consistent formulation of the Bayesian interpretation gives an alternative understanding of the information hidden in the measured data. The classical decision and estimation theory relies on the availability of some knowledge expressed as pdf’s and the definition of a cost function. As already observed from Table I, the specification of the cost seems to be discretionary. It is the Bayesian approach that enables the self-consistent inference process, thus making possible the specification of information retrieval problems directly using models in the form of probability distributions.

From the perspective of information extraction, the set of hypotheses is defined over the class of prior models. We adopt the following Bayesian two-level information extraction procedure: 1) incertitude alleviation of the model parameter values; 2) incertitude alleviation of model classes. This means that two levels of inference can be introduced: model fitting and model comparison [23]. A. Level I: Model Fitting The first level of inference assumes that the models to be inverted are true. The task consists in fitting the model to the data. The results are often expressed as the most probable parameter values and an error measure. The inference and the data requires knowledge of the prior model . The inference follows Bayes’ rule and prediction is applied separately for each of the models attached to (21) This equation is similar to the MAP estimator Posterior

Likelihood Prior Evidence

(22)

The differences between these are the explicit form of the and the class of models assumed under the hypotheses fact that the equation is obtained as a result of a consistent probabilistic inference with no allusion to the minimization of a risk function. The evidence term can generally be neglected in the model fitting, a fact that was already observed in the case of MAP estimation. The evidence term is important in the second level of inference, and here lies the novelty of the Bayesian approach in data inversion. B. Level II: Model Comparison The task of the second level of the Bayesian inference is to find the most plausible model explaining the data (23) , The inference relies on the evidence of , carried by and the subjective prior over the assumed hypothesis space . The probability shows how plausible we thought the alternative models were before the data arrived. One of the difficulties arising here is the modeling of the hypothesis space. A possible solution is using the principle of maximum . The reasons for a certain model entropy (ME) to assign (hypothesis) will be expressed as constraints. The second level of the Bayesian inference encapsulates Occam’s razor (Fig. 2), stating that unnecessarily complex models are not preferred to simpler ones [23], [29]. Complicated overparametrized models, trying to generalize too much ( ), are penalized under Bayes rule. Simpler ones ( ) are preferred; they are more evident. A better data prediction is accomplished within the data validity interval .

DATCU et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART I

1435

The model comparison and, as a consequence, the model selection can be done only using relative measures among the existing library of models. There is no algorithmic approach to build such a library with the guarantee to get all correct models for a diverse real data set. We have to be aware that the most plausible model can be only an approximation and conscientiously design classes of models. Such methods are presented below.

H

Fig. 2. Illustration of Occam’s razor concept. In the hypothesis 1, the data are explained using a simple model, i.e., able to predict only a limited interval . A complex model, in the hypothesis 2, explains a larger diversity of data structures but does not predict as strongly as 1 in the interval . The simpler hypothesis is the one with fewer equally likely alternatives.

y

D

H

H

D

For a quantitative understanding of Occam’s razor, we have to invert its definition: the more plausible hypothesis tends to be simpler. However, due to the weak explanation of the plausibility using the concept of simplicity, further analysis is necessary. Simplicity can be better related to the model complexity, i.e., the number of parameters. Intuitively, a simpler model will occupy a smaller volume in the parameter space. Thus, for model selection, a correction should be applied, taking into account the prior information. C. Bayes Factor The Bayes factor [25] is defined as the likelihood ratio in the model space

IV. BAYESIAN MODEL INFERENCE In this section, we first introduce definitions of Bayesian models and then expand on information theoretical aspects of model selection. A. Stochastic Modeling According to the Bayesian paradigm, a statistical model is defined as a parametric stochastic model and a prior distri, [29]. bution defined on the parameter space The Bayes theorem (20) shows how the information on is updated by the information contained in the data . The complexity of the observed scenes is usually very high, and often the causality of the observed phenomena allow the decomposition of the models into simpler hierarchical layers. A hierarchical Bayes model is a Bayesian stochastic , , in which the prior is decomposed model , , into conditional distributions: is the marginal distribution. This leads to where

(24) (28) The Bayes factor emphasizes the role of the observations since it partially eliminates the influence of the prior. Consequently, the Bayes factor is considered as an objective measure for model comparison. The consistency of the Bayesian approach allows us to compare probabilities, i.e., to make decisions. For example, comparing the probabilities of the presence of a model conditioned by all available evidence given the observed data

(25) Explicitly, the Bayes factor can be written as

where the integral is evaluated over the multidimensional pa. A hierarchical model rameter space and with too many layers could contradict the Occam principle of simplicity. In such a situation, part of the parameters will be given as hyperparameters defining a family of models (29) is the hyperparameter space. If the model has where intrinsically high dimensionality, but it is not hierarchic in its nature, the marginalization can be used to define a family of simpler models. The Bayesian paradigm for information extraction allows the elimination of parameters, also nuisance parameters, by reducing the posterior dimensionality

(26)

is the assumed model in the hypothesis and where is the validity interval of the model . An alternative method for model comparison is the ratio of likelihoods evaluated for the ML estimates (27)

(30) . with The heteroclite nature of a model faced with the actual data model can be captured defining neighborhood classes of of models. Such an approach is the -contamination class , defined by (31)

can be interpreted as a special case of the Bayes factor for prior distributions. The same applies to the reasoning for the estimation of deterministic parameters.

is a class of distributions modeling the prior’s where aberrations.

1436

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

Note that another possible solution to define neighborhood classes is based on the K–L divergence used as a functional distance. In the case of hierarchical Bayesian models, or models describing the structure of the hypothesis space, we can be faced with the impossibility to model the last level of parameters since a prior is no more available. However, it is still possible to use the Bayesian paradigm and to define a noninformative prior. The simplest choice is the uniform distribution, as already shown in Table I. In this situation, the MAP estimator degenerates into an ML estimator. This is the Bayesian interpretation of ML, contrary to the frequentist approach, which states the ML estimation problem for deterministic parameters. Jeffreys [16] introduced a noninformative prior, the Jeffreys prior ( ), based on the Fisher information. In the case of scalar parameters, it is defined as (32) This definition is based on the property of Fisher information to describe the information content of the model. Here we can recall (A-21) and interpret it in the light of Bayesian model comparison (27). Another noninformative prior is ME [19]. If no a priori distribution is specified, but some prior knowledge exists in the form of linear combinations, e.g., moments, the principle of ME can be used to define a noninformative prior distribution (Table I).

we can apply an iterative algorithm for optimization of the functional [7] (35) The estimation algorithm has two steps I: for a given evaluate which minimizes II: find

(36) (37)

The algorithm iterates replacing with the new guess of The algorithm is based on the observation that

.

(38) and (39) showing that the algorithm for hyperparameter estimation is a descent optimization of the log-evidence. We remark that de facto the model selection can be interpreted in the information theoretical frame as the minimization of the K–L divergence at the limit of convergence of the parameter . Similarly, in the frame of the minimum description length (MDL) analysis, Riessanen [28] demonstrates the importance of the Fisher information determining an optimal quantization space. Qian and Kuensch [26] analyzed the model selection computing the stochastic complexity and demonstrated again the role of the Fisher information. In the Appendix, this algorithm will be demonstrated for reconstruction and segmentation of SAR images.

B. Information Theoretical Aspects in Model Selection In the previous section, we already noticed the similarity of the Bayesian model compared to the classical definition of the K–L divergence. This will be justified in this section. The previous deductions are made in the frequentist way of thinking. From the Bayesian point of view, we can write the K–L divergence with the meaning of the difference between the expected loss and its minimum value. Such an approach can be used in the inference to describe the information content in certain models in simpler terms. A simple example is the approximation of models with some standard distribution families. An example of the information theoretical approach is given for the model selection from a family of parametric models (29). In this case, the model selection results in choosing the parameter . The evidence of the family of models is (33) which can be interpreted as, and in the information theoretical approach it is, the likelihood function of for the observed data (34) Consequently, the model selection is the ML estimation , which is evaluated for the observed data. To find from with respect to the hyperparameter, the maximum of

V. BAYESIAN INFORMATION RETRIEVAL In this section, we summarize our assay on the information theoretical perspective of the Bayesian paradigm for a pragmatical approach for information extraction. One of the main difficulties in the interpretation of RS images is their intrinsic high complexity, both radiometric and structural. In real applications, we very seldom have sufficient training data sets and ground truth available to be able to derive models for scattering and structures from the observed scenes. The curse of dimensionality strongly limits the possibility of learning from the available, too small, training data sets. Therefore, the interpretation of RS images using computer vision methods is often a weak approach. Alternatively, in these papers, the scene understanding approach is proposed [6]. Scene understanding is the attempt to extract information about the physical characterization, structure, and geometry of three-dimensional (3-D) scenes from two-dimensional (2-D) RS images. The task of the scene understanding process is to find the scene that best explains the observed data. Scene understanding is an ill-posed inverse problem. The solutions we propose are model based and formulated in the general frame of the Bayesian inference. Inspired by (9), we present an algorithm for the solution of the scene understanding problem (Fig. 3). Trying to interpret an RS image, we know, or assume knowing, some elements in advance. The assumed, known information (prior models) can be used to simulate the scene, e.g., a landscape, and observe

DATCU et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART I

1437

The Bayesian inference method consists of interpreting probabilities as a measure of incomplete information rather than randomness and gives an alternative understanding of the information hidden in the measured data. The solution of ill-posed inverse problems is generally very sensitive to the assumed model. This is the criticism of the inverse problem methods: they can introduce unrealistic behavior in the results. The importance of the second level of the Bayesian inference consists of the possibility to select the best result that corresponds to the most plausible model for the available data. Fig. 3. Parameter estimation paradigm. The MAP formalism or the model fitting in the Bayesian approach inspires the idea of an iterative algorithm for information extraction from observations (9). We have to compare the observations y with the simulated data based on a guess of the parameters. The parameter x is restricted to take values only according to a certain prior model.

VI. SAR SCENE UNDERSTANDING: A BAYESIAN APPROACH In Sections III–V, we described the Bayesian paradigm and considered some information theoretical and algorithmic issues. In this section, we demonstrate the potential of the Bayesian approach for information extraction from SAR images. A. GRF’s as Prior Models

Fig. 4. Model selection paradigm. The diagram refers to the shaded area in Fig. 3. Structures in RS images are explained based on a priori knowledge stored in a library of models. We extract information about the identity of the scene by selecting the model that best explains it. Using (23), we choose the model having the highest evidence in the class.

it virtually with a simulated sensor (forward model). The resulting simulated image can now be compared to the actual RS image. The error will depend on the degree of goodness of the guess of the parameters describing the scene. A set of fixed and variable parameters controlling the image synthesis characterizes the scene model. It is unrealistic to believe that one universal model could describe all aspects of a real scene. A pragmatic approach is to consider a library of rather simple models, each one describing a part of the reality. The next task is to select, based on the analysis of the observed data, the models that can best explain different parts of the scene. The way this selection is done will be discussed in the following sections. For this discussion, the derivation of a goodness measure is important, which depends on the information we have about the image formation process and on some a priori information we have about the data and the parameters that describe it. This goodness measure (evidence) allows the selection of appropriate models. The diagram in Fig. 4 illustrates the model selection process. Once a model has been selected as being evident for explaining certain aspects of the scene, parameter estimation follows as the next step. We have to search for the values of the parameters in such a way that the error between the actually observed and the simulated image is minimized. These values represent a guess of the scene parameters.

In the examples presented in our article, we limit ourselves to modeling the image structures with a limited family of GRF’s. Extensive literature presents in detail the theory and applications of GRF’s [5], [6], [8], [22]. In this section, we introduce only the basic concepts. In Part II, we give a detailed tutorial on information extraction using GRF’s at one and multiple scales [30]. We consider and define the GRF as modeling the structural behavior of images understood as 2-D stochastic processes defined on rectangular lattices. The definition of a GRF has two aspects: topological and stochastic. Topologically, a GRF is defined on a rectangular lattice , and is described by a neighborhood system of points and a system of cliques. We give graphical definitions in Table II. The neighborhood size defines the maximum limits of the structural complexity of the GRF model. In Table II, the first and second-order neighborhoods are exemplified. The clique system specifies the possible topology of the random structures in different realizations of the stochastic process. The GRF stochastic model is given in the form of an exponential distribution with

(40)

is the partition function, a normalization factor of where is an energy the distribution, is the temperature, and function, defined as the sum of the potential functions attached to each clique

(41) The potential functions give a parametric description of the type of interaction between the samples of the random

1438

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

TABLE II GRAPHICAL DEFINITION

OF

where is the center pixel of the neighborhood and are the components of the model parameter vector being of size equal to the number of elements in the neighborhood . The GMRF is restricted to be stationary ( ) and has a positive definite covariance matrix. In fact, the GMRF is a stationary noncausal autoregressive (AR) process. The linearity of the model makes it analytically tractable, but it also limits the complexity of image structures (textures) that it is able to explain. The second model used is a GRF defined as [4]

GRF’S

(43)

are the components of the model parameter vector where and is a hyperparameter describing the capacity of the model to explain sharper or smoother structures. The third model used is the ME [32] field inside the clique domains. represents the model parameter vector. Using the analogy with statistical mechanics, can be interpreted as a scale hyperpathe temperature rameter. At high temperatures, the realizations of a GRF show short-range correlations and are dominated by smallscale (micro)structures. At low temperatures, a GRF generates realizations with long-range correlations characterized by large-scale (macro)structures [13], [33]. This behavior becomes important for the regularization of inverse problems, i.e., image restoration, where GRF’s are used as priors. But the main reason for the extensive application of GRF’s as prior models for structures appearing in images is the demonstration of their equivalence with Markov random fields (Table II), the (MRF’s). Given a neighborhood system is an MRF with respect to if, and only if, random field is a Gibbs distribution on . Actually, in spite of the fact that MRF models are characterized by local statistics, they are also able to capture long-range interactions. The multiple scale behavior of the GRF–MRF will be addressed in the Appendix. Now we consider three examples of GRF’s that will be used in the next sections to demonstrate qualitatively and quantitatively the importance of the second level of Bayesian inference for the interpretation of RS images. Because we need to understand an information source to be able to interpret it, we emphasize mainly the qualitative aspects of the models. The first model is the Gauss–Markov random field (GMRF) [17], [20]. The model is defined by

(42)

(44) The principle of ME states that from all distributions that satisfy certain constraints, the one with the largest entropy should be chosen [15]. Thus, inferring the data that satisfy a set of constraints can be seen as solving an ill-posed inverse problem, with the entropy as regularizing functional [3], [12]. The condition of ME is now understood as a regularity condition and is expressed in the form of a GRF. must be We have to note that in this case the random field normalized to unit sum over the domain of definition . The ME regularization enhances the reconstruction of localized strong image structures (targets). B. Parameter Estimation The GRF models introduced above are used to demonstrate the application of the Bayesian scene-understanding paradigm for SAR image enhancement and reconstruction. Image enhancement aims at transforming images in such a way that their quality is better suited for visual or automatic interpretation. One of the basic problems in image enhancement is cleaning the noise, and its difficulty consists in the separation of noise from image detail. The quality of noise rejection will depend on the ability to define what is noise and what is image: the accuracy of modeling [10], [11], [17], [18]. The radar cross section is the desired noiseless image that we want to estimate from the observed intensity SAR image . The microwave propagation and the SAR signal processing (focusing) are modeled as an invariant linear system (the forward model). In the approximation we use, is convolved , where with the incoherent system point spread function is the point spread function of the coherent system model. The speckle effect is modeled as a multiplicative noise that is Gamma distributed and correlated by the SAR end-to-end system (the likelihood) [1]. For the cross section , we adopted

DATCU et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART I

1439

Fig. 5. Qualitative example for model selection. We demonstrate that assuming different prior models different information from the same data can be extracted. Image (a) shows an original three-look ERS-1 SAR image. Images (b)–(d) are the results of cross-section estimation assuming a GMRF, GRF, and ME prior model. The result using (b) GMRF has equally good quality for all structures in the image, (c) GRF speckle is drastically reduced losing some image detail, and (d) in the ME model the image is improved with very good preservation of details.

different models for demonstration. Hence, the degraded image model may be written as (45) The problem to be solved is the reconstruction of the out of , which is known original radar cross section as an inverse problem. Unlike conventional techniques that only remove speckle noise, thus estimating , the inverse approach takes into account the SAR image formation for image reconstruction. The problem is formulated as a Bayesian model fitting (21) and solved by using the scene-understanding paradigm (Fig. 3). The used MAP estimation algorithm is presented in detail in the Appendix. A solution is computed for three hypotheses corresponding to the GRF’s defined in (42)–(44), used as prior models for the SAR cross section . The examples demonstrating the importance of appropriate

modeling are shown in Fig. 5. The inverted image based on the GMRF has good quality for the structures in the scene, the GRF model is smoothing preferentially the uniform regions preserving sharp edges, and the ME model strongly prefers (explains) the targets in the image. The presented examples show the different quality of the restored images for different assumed priors. Each model explains preferentially certain structures, thus, we conclude that model selection is important for applications and suggest to use a library of models. In the following paragraph, we give an example of automated model selection. C. Optimal Iterative Model Selection and Parameter Estimation Following the demonstration of the importance of the model for the quality of SAR image reconstruction, we illustrate the second level of Bayesian inference (23): model selection

1440

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

In Fig. 9, the result of SAR cross-section MAP estimation, according to the model that locally best explains the data, is presented. The airborne image was chosen to demonstrate the model selection process on textured and untextured regions in the image. In contrast to the examples presented before, the reconstructed image is not dominated by the properties of one particular model anymore. Instead, using several models, a larger variety of image features can be explained and reconstructed satisfactorily. Fig. 6. Block diagram of the simultaneous parameter estimation and despeckling algorithm for one class of models.

based on evidence calculation. In this example, we consider . the approximation Based on the definition in (29), we induce over the class of GMRF’s a partition in a family of models centered on fixed values of the parameter vector . A variant of the algorithm defined by (36) and (37) is used for the estimation of parameters from textured images in the presence of noise. Herein, the models are chosen locally according to their evidence, i.e., their capability to explain the information contained in the image. In the general case, an arbitrary number of models characterized by their potential functions and probably having a different number of parameters is considered. However, for . simplicity of notation, we denote the used models by To select the best model and taking into account the noise, we have to maximize the evidence (46) as a function of . This is done by using an iterative algorithm that compares the evidences of different values of the parameter vector . The algorithm is depicted in Fig. 6 and is presented in more detail in the following paragraph, together with illustrations shown in Fig. 7. 1) Choose an initial guess for , i.e., cut out a submodel [Fig. 7(a)]. from the parametric family of models 2) Calculate the MAP estimate of using the likelihood function [Fig. 7(b)] describing the noise and the sub[Fig. 7(c)]. model (prior) defined by 3) Calculate the evidence of this submodel described by according to (46) and Fig. 7(c). 4) Change dependent on the evidence obtaining a refined [Fig. 7(d)] and repeat, starting from Step estimate 2) until convergence to the highest evidence is reached. In the given example, the GMRF model is used together ) for areas with a -model (degenerated GMRF for of uniform backscatter. Based on the definition in (29), we partition both classes of models into several submodels being with its dimension characterized by a parameter vector depending on the model class (GMRF or ). Applying the presented algorithm, parameter estimates for each model are calculated and the model with the highest evidence is chosen (Fig. 8) to further serve as a prior to obtain a local MAP estimate of the cross section.

VII. DISCUSSION OF MODELING ACCURACY From the point of view of the first level of Bayesian inference (21), the quality of the estimated parameters depends on the noise description (the likelihood) and its upper bound is given by the Fisher information matrix (17) [or Cramer–Rao bound (12) for scalar parameters]. In the perspective of the second level of Bayesian inference (23), the quality of the estimation is dependent on the accuracy of the used model and can only be evaluated relative to the evidence of the other available models. In practice, it is very seldom to clearly separate the two aspects: noise and model goodness. In this section, we concentrate on the accuracy of the estimated parameters, relating it to the size of the analyzing window and the model complexity at different scales. Further, for the aim of this section, we only use GMRF’s (42) for exemplification. GMRF parameter estimation requires the analysis of the image statistics in a limited window. Intuitively, knowing the extent of the GMRF neighborhood, we can consider its size the lower limit of the analyzing window. Again intuitively, we can guess that a larger extent of the analyzing window will lead to a higher accuracy of the estimator, that being a consequence of using more data for the computation of our sample expectations. This reasoning is valid in the assumption of homogeneous images. To give a quantitative measure for the quality of the estimated parameters, we recall the Fisher information matrix (17), which for the case of GMRF’s, results in the following inferior bound for the covariance matrix of the estimated parameters: (47) is the size of the analyzing window and is the where sample covariance matrix. As expected, the variance of the estimated parameters is smaller for larger analyzing windows. In real RS images, the probability of large uniform areas, uniform in the sense that they are explained by the same model, is rather small. Thus, large estimation windows result in contamination of the model in the analyzing window (31). The effect is a large variance or inconsistency of the estimated parameters. This also affects the capability to capture high structural complexity in the images, i.e., high-complexity models are defined on large neighborhoods, which implies the use of large analyzing windows. To overcome this drawback, the multiresolutionmultiscale analysis has been proposed [20], [24]. In the previous section, we addressed the model complexity using the topological part of the GRF definition. The struc-

DATCU et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART I

1441

(a)

(b)

(c)

(d)

( )

Fig. 7. Illustrative example for the model selection method. The diagram depicts (a) the parametric family of models p xjq showing two of them, pk x and pk+1 x ; (b) the likelihood distribution p y jx showing one of its realizations corresponding to the estimated value in the assumption of model k ; (c) the integration interval xk for evidence computation, m y jk , in hypothesis being consistent with model k ; and (d) the same as (c) using the MAP reestimation of x for the newly inferred likelihood in assumption of the model k . Dotted lines represent the posterior distributions. Note: all actual distributions are multidimensional.

()

()

1

( )

(

)

+1

stationary AR processes allow a spectral representation, as follows: (48)

is the power spectrum, are the where are the components of the model spatial frequencies, and parameter vector. The power spectrum is defined over the , . The lattice complexity of the model is reflected by the complexity of the power spectrum, i.e., the degree of the trigonometric polynomial at the denominator. For exemplification, two images and their spectra are presented in Fig. 10(a), (a ) and (b), (b ). Image (a) has higher structural complexity. The unsatisfactory use of small analyzing windows due to data nonhomogeneity led to the study of multiresolution analysis of data and model characteristics. is changing its power Scaling a GMRF with a factor as follows: spectrum into Fig. 8. Image showing the locally selected model depending on the model evidence (white: GMRF model used in textured regions, black:  model for homogeneous areas).

(49)

tural complexity of the realizations of a stochastic process is determined by the number of interactions (cliques) and their type (potential function). The GMRF’s being equivalent to

In [21], a detailed multiresolution analysis of GMRF’s is studied and it is concluded that, in general, GMRF’s lose Markovianity after downscaling. However, there are

1442

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

Fig. 9. Example of SAR image reconstruction using the full Bayesian approach: model fitting and model selection. The original airborne SAR image (a) was acquired over a high-complexity scene, including different textures (forest, parking lot), point targets (cars, houses), and uniform sharply delimited backscatter areas (fields, runway). The estimated cross-section image (b) was obtained by selecting the locally, most evident model between a parametric family of GMRF’s and a family of  models and taking into account the exact likelihood: the statistics of the speckle process. The GMRF models are skilled in explaining and reconstructing the textures, while the  models better explain and reconstruct uniform regions.

Fig. 10. Qualitative example of model behavior at different scales. A highly structured texture (a) with the spectrum presented in (b) was scaled with a factor of one-quarter. Image (c) shows the zoom of the scaled image to make evidence of the artifacts introduced by the aliasing visible in (d). Scaling does not preserve the model. The images (a0 )–(d0 ) show a similar example, but of a smooth texture, which after scaling is not affected by aliasing. The model is preserved, galaxy-like spectrum has the support limited to a small domain of frequencies, and change of scale does not introduce important aliasing. Images (a0 ) and (c0 ) are very similar. We could also note that a model which is able to explain the left image should have higher complexity than the model which explains the right image.

classes of GMRF’s that preserve their properties over several scales, or at least they can be well approximated by a GMRF [20]. In Fig. 10, two examples for the downscaling of a discrete process and its spectral interpretation are given. Fig. 10(a)–(d) show a high-complexity texture and its extremely structured spectrum spread over a large bandwidth. Subsampling the image introduces aliasing, drastically

changing the structure of the data. Fig. 10(a )–(d ) presents a texture with a narrow and smooth spectrum. The resampled data preserve in a good approximation the structure of the spectrum, and aliasing is negligible. Based on the previously presented assumptions and in the frame of the Bayesian paradigm, we introduce a systematic method for information extraction at multiple scales.

DATCU et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART I

1443

Fig. 11. Nested family of GRF’s. Complexity increases from left to right. The most complex models include the simplest ones, but based on the Occam razor, we will select and use the simplest one that is able to explain the observations. The model selection is interpreted as metainformation extraction describing the structural complexity of the scene.

The graphical definition in Table II shows that a larger neighborhood model (the rightmost one) includes the model with a smaller neighborhood (the leftmost one). All cliques and associated parametric potential functions of the simple model are recognized in the more complex model. Based on (30) and knowing that the Bayesian paradigm includes the Occam razor (Fig. 3), we define a nested family of GRF’s based on the size of the neighborhood (Fig. 11). Heuristically, we limit the complexity of the models to a neighborhood of order five. To capture higher structural complexity in the images, we use an analysis at multiple scales and apply the model selection paradigm (Fig. 4) using such a nested family of GRF’s. Part II demonstrates the method in detail [30]. VIII. CONCLUSIONS This paper presented an overview and assay of the classical and the Bayesian approach for information extraction. We assessed a pragmatical approach for the information retrieval task defined as a two-step procedure in a systematic and general manner. This approach is suitable for the interpretation of RS data, which, due to their complexity, require accurate modeling. The scene understanding paradigm was introduced based on Bayesian assumptions. Examples demonstrated the success of the proposed approach for the reconstruction of SAR images using three families of GRF’s. An automatic algorithm for model selection was presented and demonstrated for accurate texture and detail reconstruction from SAR images. The very good quality of the restoration is obtained due to the Bayesian paradigm, which allows a complete model-based approach. All prior information at hand was used, i.e., the families of GRF’s as prior models, speckle model, and end-to-end forward model of the SAR system as likelihood function. From an algorithmic point of view, the two levels of the Bayesian inference permitted the design of a computationally tractable algorithm for the estimation of the parameters for a problem with three hierarchical levels. The proposed technique differs from traditional methods for image segmentation and clustering in that it uses a library of predefined models and does not require training data sets. As it is known, due to the curse of dimensionality, such training data sets require large and expensive volumes of ground truth verification. The utilization of predefined libraries of models in the Bayesian paradigm enables interpretation of unknown and unreachable scenes. For the objectives of information systems, e.g., query by image content from large image archives, the Bayesian para-

Fig. 12. Flow chart of the parameter estimation exemplified for SAR image inversion.

digm offers a two-level hierarchy of the retrieved information, i.e., 1) model identity as a metafeature allowing a differentiate interpretation of 2) the estimated model parameter vector. We can summarize the achievements and trends of this research as follows 1) utilization of more sophisticated models being able to take into consideration all available a priori information and knowledge; 2) utilization of libraries of models and the selection of the model best explaining the observations; 3) information extraction at two hierarchical levels. APPENDIX ALGORITHM FOR MODEL-BASED DESPECKLING OF SAR IMAGES We describe the algorithm for scene understanding (Fig. 3) applied for the MAP SAR image reconstruction. The steps of the algorithms are shown in Fig. 12. Step I) Modeling: Starting with an initial guess of the cross section, a small change to this estimate is made. This change is not made arbitrarily but according to the model properties, i.e., the new modified estimate is consistent with the statistics of the chosen model. The family of potential functions used are defined in (42)–(44). Step II) Forward Model: The new guess is passed through the forward model, i.e., in this example (50) is obtained. Statistical uncertainties caused, e.g., by noise are not yet considered but are embedded by the likelihood in Step III. Step III) Data Comparison: To evaluate the new estimate and stay close to the data, the noiseless simulated datum is compared with the observation . The likelihood function reflecting the noise characteristics is used for comparison to obtain a measure of how likely the simulated data are.

1444

In the case of

IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, VOL. 36, NO. 5, SEPTEMBER 1998

-look speckle images, we find (51)

Step IV) Estimate Update: According to the likelihood of the simulated data compared to the previous estimate, the is accepted or rejected. The acceptance new guess of rule can be either deterministic or a stochastic, resulting in gradient approaches only accepting simulated data with increased likelihood or relaxation algorithms that may also accept estimates with lower likelihood following a statistical decision rule, e.g., accept a new state with probability (for arguments 1) (52) where controls the acceptance of less-probable data. Step V) Model Parameter Estimation: Model parameters, e.g., a hyperparameter or texture parameters, are estimated from the actual estimate of to update the model. This step corresponds to the expectation–maximization algorithm of the joint probability of scene and model parameters given the observation

(53) Step VI) Iterative Estimation: Reiteration from Step I) with updated model parameters and the current scene estimate is performed until convergence is reached. Herein, the determination of convergence is not a trivial problem. Applying this optimization scheme, we obtain the MAP estimate of the cross section (see examples in Fig. 12). Note that we have separated the posterior distribution into its likelihood and its prior model parts, i.e., 1) making minor changes to the estimate determined by the model characteristics limiting the optimization space, e.g., by sampling new values for from the prior distribution and 2) evaluating the resulting change in the likelihood term using a Metropolis-like algorithm, thus minimizing the noise-dependent norm. Thus, we do not need in a closed an expression of the posterior distribution form. ACKNOWLEDGMENT The authors would like to acknowledge H. Rehrauer, M. Schr¨oder, G. Palubinskas, and G. Schwarz for fruitful discussions and comments. Furthermore, they want to thank ESA and Aerosensing GmbH for providing the SAR images used in their examples. The presented texture images have been taken from the well-known Brodatz album. REFERENCES [1] R. Bamler and B. Sch¨attler, “SAR data acquisition and image formation,” in SAR Geocoding: Data and Systems, G. Schreier, Ed. Karlsruhe, Germany: Wichmann-Verlag, 1993, pp. 53–102. [2] R. E. Blahut, Principles and Practice of Information Theory. New York Addison-Wesley, 1987.

[3] S. F. Burch, S. F. Gull, and L. Skilling, “Image restoration by a powerful maximum entropy method,” Comput. Vision, Graph., Image Processing, vol. 23, pp. 113–128, 1983. [4] P. Charbaonnier, L. Blanc-Feraud, G. Aubert, and M. Barlaud, “Deterministic edge-preserving regularization in computed imaging,” IEEE Trans. Image Processing, vol. 6, pp. 298–311, Feb. 1997. [5] R. T. Cox, “Probability, frequency and reasonable expectation,” Amer. J. Phys., vol. 17, pp. 1–13, 1946. [6] M. Datcu, “Scene understanding from SAR images,” in Proc. IEEE IGARSS’96, T. I. Stein, Ed. vol. 1, pp. 310–314. [7] A. P. Dempster, N. M. Larid, and D. B. Rubin, “Maximum likelihood from incomplete data via EM algorithm,” J. R. Stat. Soc. (Ser. B), vol. 39, pp. 1–39, 1977. [8] H. Derin and H. Elliott, “Modeling and segmentation of noisy and textured images using Gibbs random fields,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-9, pp. 39–55, Jan. 1987. [9] R. T. Frankot and R. Chellappa, “Lognormal random-field models and their applications to radar image synthesis,” IEEE Trans. Geosci. Remote Sensing, vol. GE-25, pp. 195–207, Jan. 1987. [10] D. Geman, “Random fields and inverse problems in imaging,” in Lecture Notes in Mathematics. Berlin, Germany: Springer-Verlag, 1988, pp. 117–193. [11] S. Geman and D. Geman, “Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images,” IEEE Trans. Pattern Anal. Machine Intell., vol. PAMI-6, pp. 721–741, Nov. 1984. [12] S. F. Gull and J. Skilling, “Maximum entropy method in image processing,” Proc. Inst. Elect. Eng., vol. 131F, pp. 646–659, 1984. [13] K. Huang, Statistical Mechanics, 2nd ed. New York: Wiley, 1987. [14] E. T. Jaynes, “Information theory and statistical mechanics,” J. Phys. Rev., vol. 106, pp. 620–630, 1957. , “On the rationale of maximum entropy methods,” Proc. IEEE, [15] vol. 70, pp. 939–952, Sept. 1982. [16] H. Jeffreys, Theory of Probability. Oxford, U.K.: Oxford Univ. Press, 1967. [17] F. C. Jeng and J. W. Woods, “Compound Gauss–Markov random fields for image estimation,” IEEE Trans. Signal Processing, vol. 39, pp. 683–697, Mar. 1991. [18] P. A. Kelly, H. Derin, and K. D. Hartt, “Adaptive segmentation of speckled images using a hierarchical random field model,” IEEE Trans. Acoust., Speech, Signal Processing, vol. 36, pp. 1628–1641, Oct. 1988. [19] L. Knockaert, “Maximum a posteriori maximum entropy order determination,” IEEE Trans. Signal Processing, vol. 45, pp. 1553–1559, June 1997. [20] S. Krishnamachari and R. Chellappa, “Multiresolution Gauss–Markov random field models for texture segmentation,” IEEE Trans. Image Processing, vol. 6, pp. 251–267, Feb. 1997. [21] S. Lakshmanan and H. Derin, “Gaussian Markov random fields at multiple resolutions,” in Markov Random Fields, R. Chellappa and A. Jain, Eds. New York: Academic, 1993, pp. 131–158, paper on the behavior of MRF after subsampling. [22] S. Z. Li, Markov Random Field Modeling in Computer Vision, Computer Science Workbench. Berlin, Germany: Springer-Verlag, 1995. [23] D. J. C. MacKay, “Bayesian interpolation,” in Maximum Entropy and Bayesian Methods in Inverse Problems. Norwell, MA: Kluwer, 1991, pp. 39–66. [24] J. Mao and A. K. Jain, “Texture classification and segmentation using multiresolution simultaneous autoregressive models,” Pattern Recognit., vol. 25, no. 2, pp. 173–188, 1992. [25] A. O’Hagan, Bayesian Inference, Kendall’s Library of Statistics, 1994. [26] G. Qian and H. R. K¨unsch, “On model selection in robust linear regression,” Seminar f¨ur Statistik, ETHZ, Zurich, Switzerland, Tech. Rep. 80, 1996. [27] J. A. Richards, Remote Sensing Digital Image Analysis. Berlin, Germany: Springer-Verlag, 1986. [28] J. Rissanen, “Modeling by shortest data description,” Automatica, vol. 14, pp. 465–471, 1978. [29] C. P. Robert, “Springer texts in statistics,” in The Bayesian Choice. Berlin, Germany: Springer-Verlag, 1996. [30] M. Schr¨oder, H. Rehrauer, K. Seidel, and M. Datcu, “Spatial information retrieval from remote sensing images—Part II: Gibbs–Markov random fields,” this issue, pp. 1446–1455. [31] A. Spataru, “Fondements de la th´eorie de la transmission de l’information,” Lausanne: Presses Polytechniques Romandes, 1987. [32] H. J. Trussell, “The relationship between image restoration by the maximum a posteriori method and a maximum entropy method,” IEEE Trans. Acoust. Speech, Signal Processing, vol. ASSP-28, pp. 114–117, Jan. 1980. [33] K. G. Wilson, “Problems in physics with many scales of length,” Sci. Amer., vol. 24, pp. 158–177, 1979.

DATCU et al.: SPATIAL INFORMATION RETRIEVAL FROM REMOTE-SENSING IMAGES—PART I

Mihai Datcu received the M.S. and Ph.D. degrees in electronics and telecommunications from the Polytechnical University of Bucharest (UPB), Romania, in 1978 and 1986. His doctoral dissertation was on high-complexity signal processing. He held Visiting Professor appointments from 1991 to 1992 at the Department of Mathematics, University of Oviedo, Spain, and from 1992 to 1993 and 1996 to 1997, at the Swiss Federal Institute of Technology, ETH Z¨urich. He has held an image processing professorship with UPB since 1981. Currently, he is a Scientist with the German Remote Sensing Data Center (DFD), German Aerospace Center (DLR), Oberpfaffenhofen, Germany. His current interest is in multidimensional signal processing, model-based scene understanding, multiresolution signal analysis, with applications in SAR remote sensing.

Klaus Seidel received the B.S. degree in experimental physics in 1965 and the Ph.D. degree in 1971, both from the Swiss Federal Institute of Technology (ETHZ), Z¨urich. He is with the Image Science Group at ETHZ and is Head of the Image Processing Group for remotesensing applications. He is working in different organizations and committees: since 1985 as the National Point of Contact (NPOC) in Switzerland in cooperation with the Swiss Federal Office of Topography. From 1987 to 1997, he was a Swiss delegate at the European Association of Remote Sensing Laboratories (EARSeL). He is chairing EARSeL’s Special Interest Group on Land, Ice, and Snow. Since 1987, he has been a Swiss Delegate and Expert in ESA Working Groups and since 1991, he has functioned as ESA’s Nominated Center for Switzerland. He also teaches courses in digital processing of satellite images and has published several papers concerning snow cover monitoring, geometric correction, and multispectral analysis of satellite images.

1445

Marc Walessa received an electrical engineering and telecommunications degree in 1996 from the RWTH, Aachen, Germany, He joined the Image Processing Department, ENST, Paris, France, in 1995 for a six-month project on stereovision. Since winter 1996, he has been with the German Remote Sensing Data Center (DFD), German Aerospace Center (DLR), Oberpfaffenhofen, Germany, preparing his Ph.D. thesis on Bayesian SAR-data interpretation. His interest lies in image and speech processing and information theory.