We explore the feasibility of surface-related multiple elimination by two-step separation where primaries and multiples are separated in the latent space of a convolutional autoencoder. First, we train a convolutional autoencoder to produce orthogonal embeddings of primaries and multiples. Second, we train another network to classify the latent space embedding of target data into respective wave types and decode predictions back to the data domain. Moreover, we propose an end-to-end workflow for the generation of realistic synthetic seismic data sufficient for knowledge transfer from training on synthetic to inference on field data. We evaluate the two-step separation approach in synthetic setup and highlight the strengths and weaknesses of using masks in encoder latent space for surface-related multiple elimination.
OECD bibliometric indicators: Selected highlights, April 2024
Surface-related multiple elimination through orthogonal encoding in the latent space of convolutional autoencoder
1. Surface-related multiple elimination
through orthogonal encoding in the latent
space of convolutional autoencoder
Oleg Ovcharenko¹, Anatoly Baumstein, and Erik Neumann²,
ExxonMobil Upstream Research Company
Acknowledgements: Huseyin Denli, Joe Reilly and many other colleagues
¹presently at KAUST
²presently at ExxonMobil Production Deutschland GmbH
2. Outline
2
Introduction
• Problem
• Value
• Intuition behind the problem
New ML multiple attenuation approach
• Training data generation
• Architectures
Examples
• Jaktopia synthetic data
• NW Australia field data
9. What we aim to do
9
Primaries (P)
Multiples (M)
Data (D)
Neural network (-s)
10. Why do we need deep learning (DL)?
10
Why? • Advanced methods are costly (human efforts, time)
• 2D out-of-plane effects
• No totally accurate solution yet
What? • Data-driven split of primaries and multiples
based on previous experience
Value? • Approximately correct at lower cost
• Possible workaround of out-of-plane effects
11. Why do we need deep learning (DL)?
11
Why? • Advanced methods are costly (human efforts, time)
• 2D out-of-plane effects
• No totally accurate solution yet
What? • Data-driven split of primaries and multiples
based on previous experience
Value? • Approximately correct at lower cost
• Possible workaround of out-of-plane effects
The goal is to explore other ways
12. Why do we need deep learning (DL)?
12
Why? • Advanced methods are costly (human efforts, time)
• 2D out-of-plane effects
• No totally accurate solution yet
What? • Data-driven split of primaries and multiples
based on previous experience
Value? • Approximately correct at lower cost
• Possible workaround of out-of-plane effects
13. Intuition (physics) behind DL
13
Emulate Radon in latent space
Encode waveform statistics
ß Multiple shots
ß Individual shots
Train to … Given …
Encode physics in the subsurface
ß Individual CMPs after NMO
14. What is training data?
14
Very synthetic synthetic (SS)
Data-based synthetic (DS)
Processed field data from nearby (DN)
Processed field data from far away (DF)
Realism
Efforts
SS
DS
DN
DF
15. 15
Unlimited amount
Exact wave field separation
Not limited by conventional
Proximity to field required
Data-based synthetic
D P M
Field
Syn
Closer look in examples
16. 16
Field data
RMS
velocities
PSTM or
NMO stack
Reflectivity
Pseudo-
density
(in depth)
Simulation with FS boundary
condition and field data
geometry
Simulation with MIRROR
boundary condition and field
data geometry
Data
Primaries
Multiples
Interval velocities
In depth
Data-based synthetic
Matched filter
17. Two approaches
17
Separation in data domain Separation in encoded domain
zD
zP
zM
D
pP
pM
D
pP
pM
1
2
Cats + Dogs
Cats
Dogs
Cats + Dogs
Cats
Dogs
Cats + Dogs
in encoded
domain
18. Separation in data domain. U-Net
18
Ronneberger, Olaf, Philipp Fischer, and Thomas Brox. "U-net: Convolutional networks for biomedical image
segmentation." International Conference on Medical image computing and computer-assisted intervention. Springer, Cham, 2015.
D
pP
pM
U
N
E
T
P
M
L1
L1
1
19. Tzinis, Efthymios, et al. "Two-Step Sound Source Separation: Training On Learned Latent Targets." ICASSP 2020-2020 IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020.
Separation in encoded domain
1. Orthogonal encoding
2. Separation in encoded space
Two-stages:
2
21. Stage 1. Orthogonal encoding
21
E
N
C
O
D
E
R
D
E
C
O
D
E
R
D D’
Encoded data
22. Stage 1. Orthogonal encoding
22
mP + mM + mNA = 1
E
N
C
O
D
E
R
D
E
C
O
D
E
R
D D’
P
M
N/A
Latent space masks
23. mP mM
Cross Entropy Loss
P M N/A
3-classes:
Conv-TasNet: Surpassing Ideal Time-Frequency Magnitude Masking for Speech Separation. https://arxiv.org/abs/1809.07454
zD
Stage 2. Separation in encoded space
24. Two-stage separation (TSS) inference bird view
24
D zD Classifier
E
D
mP’ = zP’
mM’ zD =
pP pM
zD
zM’
Stage 1
Stage 2
1. Select field data
2. Make DS for training
3. Train orthogonal encoder
4. Train latent space classifier
5. Run inference on field data
6. *Asub
7. *Stack
Complete workflow:
25. • The model was generated by combining stratigraphic information with a rock physics model (𝑉!"#$%
and porosity) and includes several different classes of AVO.
• A gas cloud was introduced, creating imaging challenges underneath.
Synthetic model
𝑛!"# = 1000
𝑛"$# = 480
𝑑!"# = 25.0 𝑚
𝑑"$# = 12.5 𝑚
𝑑% = 6.25 𝑚
source = band-limited spike
Observed data
51. Summary
We developed a new data-driven ML-aided multiple attenuation method:
• Produces estimates of primaries and multiples
• Does not rely on conventional demultiple methods
• Is not limited by incomplete acquisition
• Delivers fast turnover
> This is a proof-of-concept study
53. Summary
We developed a new data-driven ML-aided multiple attenuation method:
• Produces estimates of primaries and multiples
• Does not rely on conventional demultiple methods
• Is not limited by incomplete acquisition
• Delivers fast turnover
> This is a proof-of-concept study