Automated time-window selection based on machine learning for full

Synthetic data are generated by solving the seis- mic wave equation using an effective and efficient numerical algorithm. In order to ensure inversion accuracy ...
1MB taille 3 téléchargements 237 vues
Automated time-window selection based on machine learning for full-waveform inversion

Yangkang Chen1 , Judith C. Hill1 , Wenjie Lei2 , Matthieu Lefebvre2 , Ebru Bozdağ3 , Dimitri Komatitsch4 , and Jeroen Tromp2 1 Oak Ridge National Laboratory, 2 Princeton University, 3 Colorado School of Mines, 4 LMA, CNRS UPR 7051 SUMMARY Due to increased computational capabilities afforded by modern and future computer architectures, the seismology community is demanding a more comprehensive understanding of full waveform information from recorded seismic data. Full waveform inversion seeks to match observed seismic data with synthesized seismograms by iteratively updating subsurface model parameters. Synthetic data are generated by solving the seismic wave equation using an effective and efficient numerical algorithm. In order to ensure inversion accuracy and stability, both synthesized and observed seismograms must be carefully pre-processed. More specifically, when synthetic and observed data have a large waveform mismatch during the initial iterations, waveforms should be carefully selected for calculating the misfit gradient in order to avoid instability. We introduce a fully automated algorithm based on machine learning (ML) to intelligently select time windows for calculating the misfit between observed and synthetic seismograms. The training dataset can be prepared using time windows obtained based on the FLEXWIN method, in which selection parameters are finely tuned. Results show that automatically selected time windows are of sufficiently high quality compared with the benchmark FLEXWIN method.

and Chen, 2013) and even attenuation measurements (Laurence and Shearer, 2006). Maggi et al. (2009) introduced an automated time-window selection algorithm called FELXWIN to arbitrarily select time windows in entire seismic traces where observed seismograms and synthetic waveforms are sufficiently close. Maggi et al. (2009) designed a sophisticated five-stage workflow to construct a robust filter that passes high-quality data components. In this abstract, we present an intelligent algorithm to select time windows from observed and synthetic seismograms based on machine learning. The window selection problem can be formulated as a classification problem, i.e., for each candidate window the decision is to either select or reject. A neural network can be trained using available time windows that are selected by the FLEXWIN method (Maggi et al., 2009), and is subsequently applied to a large independent dataset. THEORY Adjoint state method The target of full waveform inversion is to minimize an objective function that measures the misfit between observed and synthetic seismic data (Tromp et al., 2005; Chen et al., 2016), e.g., the least-squares misfit

INTRODUCTION Emerging 3D-3D tomographic methods, i.e., seismic tomography based on a 3D reference model and 3D numerical simulations of seismic wavefields, take advantage of full wavefield simulations and finite-frequency kernels, thereby reducing data restrictions required when using approximate forward modeling and simplified descriptions of sensitivity (Hung et al., 2000; Dahlen et al., 2000; Maggi et al., 2009). In exploration seismology, tomographic methods using full waveform forward modeling and adjoint state based inversion techniques are commonly referred to as “full-waveform inversion” (FWI). FWI is an iterative process to update the model by minimizing the least-squares misfit between recorded and synthesized data predicted from the current model in the data domain (Tarantola, 1984; Pratt et al., 1998; Pratt, 1999; Symes, 2008; Virieux and Operto, 2009; Morgan et al., 2013; Warner et al., 2013; Xue et al., 2016). Because of rapid developments in waveformbased inversion methods and strategies and larger and larger data volumes, more efforts have been directed into investigations of automated picking of seismic phases for misfit calculation. vanDecar and Crosson (1990) proposed a partially automated multi-channel cross-correlation method to determine teleseismic relative phase arrival times. This approach was extended to efficient methods for obtaining highly accurate traveltime (Sigloch and Nolet, 2006; Houser et al., 2008; Lee

© 2017 SEG SEG International Exposition and 87th Annual Meeting

χ(m) =

N Z 1X T k d(xr , t) − s(xr , t, m) k22 dt , 2 r =1 0

(1)

where d(xr , t) denotes three-component seismic data recorded at station xr , N the number of stations, and m a given earth model. It is worth noting that any misfit function can be used in adjoint inversions. In fact, the selected windows — by FLEXWIN or any other algorithm— are also related to the chosen misfit function. In isotropic elastic media, the gradient of the misfit function (1) can be formulated as Z δχ = (Kκ δ ln κ + K µ δ ln µ + K ρ δ ln ρ) d3 x , (2) V

where κ, µ, and ρ denote the bulk modulus, shear modulus, and density, respectively. The sensitivity kernels Kκ , K µ , and K ρ are the Fréchet derivatives with respect to the bulk modulus, shear modulus, and density, respectively. Specifically Z T Kκ (x) = − κ(x) [∇ · s(x, t)][∇ · s† (x, T − t)] dt , (3) 0

K µ (x) = − K ρ (x) =

Z T

Z T 0

0

2µ(x) D(x, t) : D† (x, T − t) dt ,

ρ(x) ∂t s(x, t) · ∂t s† (x, T − t) dt ,

(4) (5)

Page 1604

  where s† denotes the adjoint wavefield, and D = 21 ∇s + (∇s)T − 1 (∇ · s)I and D† denote the traceless strain deviator and its ad3 joint (Tromp et al., 2005). Time-window selection as a classification problem The aim of pattern recognition is the classification of objects into a finite number of categories. In a pattern recognition system an object and a set of categories are given as input and the system decides to which category the object belongs. In general, it works in two stages. In the first stage, feature extraction (also known as the pre-processing or parameterization stage), a set of measures are extracted from the input object. In the second stage, classification, the object is associated with one of the categories based on these features. The main steps of machine learning based window selection are learning and predicting. In the learning process, the three stages are listed as follows: 1. Gathering as many windows as possible, not only containing usable windows, but also unusable windows. If only usable windows are considered, the machine will not infer the criteria of defining an unusable window. 2. Collecting the values of the five measurements (features) of all input windows: cross-correlation value between synthetic and observed seismograms, crosscorrelation time lag between synthetic and observed seismograms, amplitude ratio between synthetic and observed seismograms, window length, and minimum short-term-average/long-term-average (STA/LTA) of envelopes of synthetic seismogram, as the input variables (which will be introduced in detail later). 3. Using a typical pattern recognition neural network model for setting up the model (neural network). The introduction of the neural network model will be discussed below. When the neural network is trained, the next step is to use it for predicting the selection mode (usable or unusable) of each input window from the measurements (features) of each individual window. Classification via neural network training In this section, we introduce the mathematical basics of the NN related classification framework. Given input data x, the neural network makes predictions using forward propagation. Take the 3-layer neural network as an example. The algorithm is as follows: z1 = x W1 + b1 ,

(6)

a1 = tanh(z1 ) ,

(7)

z2 = a1 W2 + b2 , a2 = yˆ = softmax(z2 ) ,

(8) (9)

where zi is the input of layer i and ai the output of layer i after applying the hyperbolic tangent activation function. W1 , W2 , b1 , and b2 are the unknown parameters we need to solve during the NN training process. W1 and W2 are called weight matrices and b1 and b2 are called bias vectors. Note that

© 2017 SEG SEG International Exposition and 87th Annual Meeting

all vectors in equations (6)–(9) are row vectors. The softmax function is defined as ez j softmax(z j ) = P K . ez k

(10)

k=1

The neural network learning process is equivalent to the following minimization problem: min

W1,W2,b1,b2

L(y, yˆ ) = −

K X

yk log yˆk ,

(11)

k=1

where L(y, yˆ ) is called the loss function, or more specifically the categorical cross-entropy loss (also known as the negative log likelihood), and K is the number of classes. Equation (11) sums over our training examples and adds to the loss if we predicted the incorrect class. The further away the two probability distributions y (the correct labels) and yˆ (our predictions) are, the greater our loss will be. By finding parameters that minimize the loss we maximize the correctness likelihood of our trained network. We use the gradient descent method to find the minimum. The gradient descent method needs the gradient of the loss ∂L , ∂L , ∂L , function with respect to the parameters: ∂W ∂b1 ∂W2 1 ∂L ∂b2 .

To calculate these gradients, we use the backpropagation algorithm, which conveniently calculates the gradients starting from the output.

Initial windows selection There are typically two ways to create initial windows. One is by collecting all the windows along each trace with a constant window length, and then using all these initial windows in the learned NN. The advantage of this strategy is that we will not miss any possible window along the entire trace. The disadvantage of this strategy is that the performance highly depends on the search window length, which still requires human input and experience, and it is also sensitive to the signal-to-noise ratio of an input trace or input window. Thus, this strategy makes the resulting window selection algorithm not fully automated. Another strategy is to use initial windows picked from the STA/LTA as in the traditional FLEXWIN algorithm. A STA/LTA measure from the envelope of synthetic data is calculated in order to detect triggers in the traces. Since the STA/LTA already represents an automated way of detecting arrivals, we can simply use it to create initial windows for subsequent mode prediction. The STA/LTA can also help reject a large part of noisy traces that are not good for misfit calculation. A problem arising in STA/LTA based initial window selection is that it will possibly create some overlap between selected windows. However, this will not be an issue at the final stage, after window mode prediction, since we can merge windows with temporal overlap. In our algorithm, we use the second strategy. Feature selection Feature extraction is basically a transformation stage from data space into a feature space to extract robust information from the waveform in a compressed form. This step is critical for

Page 1605

the success of the classification task. Each datum —here threecomponent seismograms associated with one event— is represented by a feature vector which is used to train and test machine-learning models. It is important to select features that are informative and predictive of the individual datum properties. Furthermore, the size of the feature set and types of features (e.g., nominal, numeric, etc.) define the size of the learning problem. In other words, the larger the feature space, the more possible combinations of features need to be examined and learned by the machine-learning algorithm (Mousavi et al., 2016). We have selected five features for training the neural network, namely • Normalized cross-correlation (CC) value between observed and synthetic seismograms, CC = max[Γ(t)] , R ˜ 0 − t) dt 0 s˜ (t 0 ) d(t R Γ(t) = R . 0 0 [ s˜2 (t ) dt d˜2 (t 0 − t) dt 0 ]1/2

(12)

• Cross-correlation time lag,

∆τ = arg max[Γ(t)] . t

(13)

• Amplitude ratio between observed and synthetic data, R ˜  d 2 (t) dt ∆ ln A = ln( Aobs /Asyn ) = 0.5 ln R . (14) s˜2 (t) dt • Window length,

w = t end − t start .

(15)

• Minimum STA/LTA value, mstalta = min(STA/LTA) .

(16)

Performance evaluation In binary classification problems, like the subject of automated window selection in this abstract, the goal is to categorize the outcome of an event into one of two categories, either accepted window (1) or rejected window (0). This process can result in one of four possible outcomes that are defined as follows: 1. True Positive (TP): Evaluated and actual results are 1 (Valid Detection). 2. False Positive (FP): Evaluated result is 1, but actual result is 0 (False Alarm). 3. False Negative (FN): Evaluated result is 0, but actual result is 1 (Missed Detection). 4. True Negative (TN): Evaluated and actual results are 0 (Valid Non-detection). Let TP, TN, FN, and TN denote the number of instances that fall into the four mentioned categories of classification results. Then the classification accuracy (or success rate) can be represented as TP + TN Accuracy = . TP + TN + FP + FN

© 2017 SEG SEG International Exposition and 87th Annual Meeting

(17)

In this abstract, the actual classification value of each window is given by the FLEXWIN algorithm based on finely tuned parameters. We select a benchmark dataset for training the neural network, and then apply the trained network to an independent waveform dataset and evaluate the classification accuracy in order to validate the algorithm. It is worth mentioning that here “accuracy” simply means the level at which the NN classified results are similar to results obtained based on the FLEXWIN method. We can treat this “accuracy” as a quantitative reference when evaluating performance, and we can confirm the validity of window selection performance by visual observation and human experience. However, the biggest challenge for FLEXWIN is to find a common set of parameters for data from different types of earthquakes. Thus we are generally conservative, eliminating bad selections but also eliminating good waveforms when using large datasets. We are currently using FLEXWIN selections to validate the proposed approach, but the ultimate aim is to go beyond FLEXWIN selections and maximize the selection of usable parts of waveforms in seismograms. EXAMPLES We first use a synthetic example generated from the Marmousi velocity model (Bourgeois et al., 1991) to demonstrate the performance of the proposed automated window selection method. For this example, we simulated a shot record from the true velocity model shown in Figure 1a as the observed data and then simulate a shot record from the smoothed velocity model shown in Figure 1b as the synthetic data. The shot location is located at the surface at position 4,596 m. 767 receivers are evenly distributed along the survey at a depth of 60 m. The windows used for training are extracted from FLEXWIN selection results from 6 seismograms between positions 1,920 m and 1,980 m. We show a comparison between the semi-automated FLEXWIN method and the fully-automated ML method for two stations. The comparison for position 6,840 m is shown in Figure 2, where the blue rectangles denote the selected time windows for misfit calculation. It is obvious that the result from the two methods are exactly the same. Using the ML based method, 2 among 21 initial windows are selected for this data and the accuracy defined in (17) is 100%. Figure 3 show a comparison for location 7,836 m. 12 among 30 initial windows are selected for this result using the proposed method and the accuracy is 90%. It is also salient that although window selection results are slightly different, the merged waveform selection result are exactly the same.

Stations used for validation

Stations used for training

(a)

(b)

Figure 1: (a) Marmousi velocity model. (b) Smoothed velocity model.

Page 1606

0.2 Seismograms 0.2 21 initial windows

2e-06 Seismograms

Observed Synthetic

Amplitude

Amplitude

0.1 0.05 0

Observed Synthetic

83272 initial windows 1e-06 6 selected windows

0.2 2 selected windows

-0.05 -0.1

success rate=0.99979585

0 -1e-06 -2e-06

-0.2 -0.2 0.0

0.5

1.0

1.5

2.0

Time [s]

2.5

3.0

3.5

4.0

-3e-06

4.5

0

200

400

600

(a)

1000

1200

1400

(a)

0.2 Seismograms 0.2 21 initial windows

6e-06 Seismograms

Observed Synthetic

0.1 success rate=1.00000000 Amplitude

0.05 0 -0.05

Observed Synthetic

4e-06 41144 initial windows 1 selected windows 2e-06 success rate=0.99992709

0.2 2 selected windows Amplitude

800 Time [s]

0 -2e-06 -4e-06

-0.1

-6e-06

-0.2 -0.2 0.0

0.5

1.0

1.5

2.0

Time [s]

2.5

3.0

3.5

4.0

4.5

-8e-06

0

200

400

600

800 Time [s]

1000

1200

1400

(b)

(b)

Figure 2: Time window selection results for a station located at 6,840 m using (a) FLEXWIN method and (b) the ML method. Note that the results from the two methods are exactly the same.

Figure 5: Time window selection results for stations (a) AU.NFK and (b) AK.GAMB for the 2014 Mw 6.6 Panama earthquake.

0.4 Seismograms 0.3 30 initial windows

Observed Synthetic

0.2 9 selected windows Amplitude

0.1 0 -0.1 -0.2 -0.3 -0.4 -0.5 0.0

0.5

1.0

1.5

2.0

Time [s]

2.5

3.0

3.5

4.0

4.5

(a) 0.4 Seismograms 0.3 30 initial windows

Observed Synthetic

the source mechanism and location and the green inverted triangles denote the earthquake stations used for demonstration in this abstract. Figures 5a and 5b show the time window selection results for the stations AU.NFK (with a 99.98% accuracy) and AK.GAMB (with a 99.99% accuracy). This example shows that the proposed algorithm can be effective for very complicated seismograms.

0.2 12 selected windows Amplitude

0.1 success rate=0.90000000 0

CONCLUSIONS

-0.1 -0.2 -0.3 -0.4 -0.5 0.0

0.5

1.0

1.5

2.0

Time [s]

2.5

3.0

3.5

4.0

4.5

(b)

Figure 3: Time window selection results for station located at 7,836 m using (a) FLEXWIN method and (b) the ML method. Note that although window selection results are slightly different, the merged waveform selection result are exactly the same.

Event: C201412080854A

90°N

90°N

60°N

60°N

30°N

30°N





30°S

30°S

60°S 90°S 180°

60°S 120°W

60°W



60°E

120°E

90°S 180°

Selecting windows that contains synthetic and observed seismograms which are sufficiently close to each other plays an indispensable role in practical implementation of full waveform inversion since it guarantees convergence of the inversion. While the traditional FLEXWIN algorithm can be “automated” to some extent, it still involves a huge amount of labor that requires human input and prior experience, and thus is not deemed to be fully automated. We have presented a fully automated way of selecting optimal misfit calculation windows to avoid numerical instability during large-scale seismic inversion. A neural network can be trained from a small dataset and then applied to a large number of data automatically. Synthetic experiments for the Marmousi model and a real earthquake data example demonstrate the performance of the proposed machine learning based algorithm. The next step of this project is to verify the reliability and robustness of the proposed method in improving inversion results of full waveform inversion. ACKNOWLEDGEMENTS

Figure 4: Source and station locations. A real data example is from the Mw 6.6 Panama earthquake in December 8, 2014. The epicenter was 20 kilometers (12 miles) south of the Punta de Burica peninsula, on Panama’s Pacific Ocean side, near the Costa Rican border. The source and station locations are shown in Figure 4, where the beachball indicates

© 2017 SEG SEG International Exposition and 87th Annual Meeting

This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under contract DE-AC05-00OR22725. The spectral-element software package SPECFEM3D_GLOBE used for simulating the seismograms and the benchmark window selection software package FLEXWIN used in this article are freely available via the Computational Infrastructure for Geodynamics (CIG; geodynamics.org).

Page 1607

EDITED REFERENCES Note: This reference list is a copyedited version of the reference list submitted by the author. Reference lists for the 2017 SEG Technical Program Expanded Abstracts have been copyedited so that references provided with the online metadata for each paper will achieve a high degree of linking to cited sources that appear on the Web. REFERENCES

Bourgeois, A., M. Bourget, P. Lailly, M. Poulet, P. Ricarte, and R. Versteeg, 1990, The Marmousi experience in The Marmousi Experience: Proceedings of the 1990 EAEG workshop: Marmousi, model and data, 5–16, https://doi.org/10.3997/2214-4609.201411190. Chen, Y., H. Chen, K. Xiang, and X. Chen, 2016, Geological structure guided well log interpolation for high-fidelity full waveform inversion: Geophysical Journal International, 207, 1313–1331, https://doi.org/10.1093/gji/ggw343. Dahlen, F. A., G. Nolet, and S. H. Hung, 2000, Fréchet kernels for finite frequency traveltime - I. theory: Geophysical Journal International, 141, 157–174, https://doi.org/10.1046/j.1365246x.2000.00070.x. Houser, C., G. Masters, and G. Laske, 2008, Shear and compressional velocity models of the mantle from cluster analysis of long-period waveforms: Geophysical Journal International, 174, 195–212, http://dx.doi.org/10.1111/j.1365-246X.2008.03763.x. Hung, S. H., F. A. Dahlen, and G. Nolet, 2000, Fréchet kernels for finite frequency traveltime — II. examples: Geophysical Journal International, 141, 175–203, https://doi.org/10.1046/j.1365246x.2000.00072.x. Laurence, J. F., and P. M. Shearer, 2006, Imaging mantle transition zone thickness with SdS-SS finitefrequency sensitivity kernels: Geophysical Journal International, 174, 143–158, https://doi.org/10.1111/j.1365-246x.2007.03673.x. Lee, E.-J., and P. Chen, 2013, Automating seismic waveform analysis for full 3-D waveform inversions: Geophysical Journal International, 194, 572–589, https://doi.org/10.1093/gji/ggt124. Maggi, A., C. Tape, M. Chen, D. Chao, and J. Tromp, 2009, An automated time-window selection algorithm for seismic tomography: Geophysical Journal International, 178, 257–281, https://doi.org/10.1111/j.1365-246X.2009.04099.x. Morgan, J., M. Warner, R. Bell, J. Ashley, D. Barnes, R. Little, K. Roele, and C. Jones, 2013, Nextgeneration seismic experiments: Wide-angle, multi-azimuth, three-dimensional, full-waveform inversion: Geophysical Journal International, 195, 1657–1678, https://doi.org/10.1093/gji/ggv513. Mousavi, S. M., S. P. Horton, C. A. Langston, and B. Samei, 2016, Seismic features and automatic discrimination of deep and shallow induced-microearthquakes using neural network and logistic regression: Geophysical Journal International, 207, 29–46, https://doi.org/10.1093/gji/ggw258. Pratt, G., 1999, Seismic waveform inversion in the frequency domain, Part 1: Theory and verification in a physical scale model: Geophysics, 64, 888–901, https://doi.org/10.1190/1.1444597. Pratt, G., C. Shin, and G. Hick, 1998, Gauss-Newton and full Newton methods in frequency-space seismic waveform inversion: Geophysical Journal International, 133, 341–362, https://doi.org/10.1046/j.1365-246X.1998.00498.x. Sigloch, K., and G. Nolet, 2006, Measuring finite-frequency body-wave amplitudes and traveltimes: Geophysical Journal International, 167, 271–287, https://doi.org/10.1111/j.1365246X.2006.03116.x. Symes, W. W., 2008, Migration velocity analysis and waveform inversion: Geophysical Prospecting, 56, 765–790, https://doi.org/10.1111/j.1365-2478.2008.00698.x. Tarantola, A., 1984, Inversion of seismic reflection data in the acoustic approximation: Geophysics, 49, 1259–1266, https://doi.org/10.1190/1.1441754.

© 2017 SEG SEG International Exposition and 87th Annual Meeting

Page 1608

Tromp, J., C. Tape, and Q. Liu, 2005, Seismic tomography, adjoint methods, time reversal and bananadoughnut kernels: Geophysical Journal International, 160, 195–216, https://doi.org/10.1190/1.1441754. van Decar, J. C., and R. S. Crosson, 1990, Determination of teleseismic relative phase arrival times using multi-chanel cross-correlation and least squares: Bulletin of Seismological Society of America, 80, 150–169. Virieux, J., and S. Operto, 2009, An overview of full-waveform inversion in exploration geophysics: Geophysics, 74, WCC1–WCC26, https://doi.org/10.1190/1.3238367. Warner, M., A. Ratcliffe, T. Nangoo, J. Morgan, A. Umpleby, N. Shah, V. Vinje, I. Stekl, L. Guasch, C. Win, G. Conroy, and A. Bertrand, 2013, Anisotropic 3D full-waveform inversion: Geophysics, 79, R59–R80, https://doi.org/10.1190/geo2012-0338.1. Xue, Z., N. Alger, and S. Fomel, 2016, Full-waveform inversion using smoothing kernels: 86th Annual International Meeting, SEG, Expanded Abstracts 1358–1363, https://doi.org/10.1190/segam201613948739.1.

© 2017 SEG SEG International Exposition and 87th Annual Meeting

Page 1609