SlideShare ist ein Scribd-Unternehmen logo
1 von 100
Downloaden Sie, um offline zu lesen
Optic Flow Estimation by Deep Learning
YU HUANG
SUNNYVALE, CALIFORNIA
YU.HUANG07@GMAIL.COM
Outline
• Optic Flow
• Brightness Constancy Constraints
• Aperture Problem
• Regularization and Smoothness Constraints
• Lucas-Kanade algorithm
• Focus of Expansion (FOE)
• Discrete Optimization for Optical Flow
• Large Displacement Optical Flow: Descriptor Matching
• EpicFlow: Edge-Preserving Interpolation of
Correspondences for Optical Flow
• Optical Flow with Piecewise Parametric Model
• SPM-BP: Sped-up PatchMatch Belief Propagation
• Coarse-to-Fine PatchMatch for Large Optical Flow
• Flow Fields: Correspondence Fields for Optical Flow
• Full Flow: Optic Flow Estimate By Global Optimization
over Regular Grids
• DeepFlow: Large displ. optical flow with deep matching
• FlowNet: Learning Optical Flow with ConvNets
• Deep Discrete Flow
• Optical Flow Estimation using a Spatial Pyramid Network
• A Large Dataset to Train ConvNets for Disparity, Optical
Flow, and Scene Flow Estimation
• Optical Flow via Direct Cost Volume Processing by CNN
• Appendix A: A Database and Evaluation for Optical Flow
• Appendix B: Secret of Optic Flow Estimation
• Appendix C: Deep Learning and optimization theory
o Sometimes, motion is the only cue
Motion and perceptual organization
Optic Flow
•Definition: optical flow is the
apparent motion of brightness
patterns in the image
•Ideally, optical flow would be the
same as the motion field
•Have to be careful: apparent
motion can be caused by lighting
changes without any actual motion
• Think of a uniform rotating sphere
under fixed lighting v.s. a stationary
sphere under moving illumination
…
Estimating optical flow
• Given two subsequent frames, estimate the apparent motion field between them.
• Key assumptions
• Brightness constancy: projection of the same point looks the same in every frame
• Small motion: points do not move very far
• Spatial coherence: points move like their neighbors
I(x,y,t–1) I(x,y,t)
Brightness Constancy Equation:
),()1,,( ),,(),( tyxyx vyuxItyxI 
),(),(),,()1,,( yxvIyxuItyxItyxI yx 
Can be written as:
Brightness Constancy Constraint
I(x,y,t–1) I(x,y,t)
0 tyx IvIuISo,
0 tyx IvIuI
1 equation in 2 unknowns


dt
dx
u
dt
dy
v 




y
I
Ix 



y
I
I y
t
I
It



The Aperture Problem
0

The Aperture Problem
Actual motion

The Aperture Problem
Perceived motion
Regularization & Smoothness Constraints
Additional smoothness constraint :
,))()(( 2222
dxdyvvuue yxyxs  
Besides of constraint equation term
,)( 2
dxdyIvIuIe tyxc  
minimize es+ec
Temporal aliasing causes ambiguities in optical flow because
images can have many pixels with the same intensity.
i.e., how do we know which ‘correspondence’ is correct?
nearest match is correct (no aliasing)
nearest match is incorrect (aliasing)
actual shift
estimated shift
To overcome aliasing: coarse-to-fine strategy.
Horn & Schunck algorithm
Lucas-Kanade algorithm
Prob: we have more equations than unknowns
• The summations are over all pixels in the K x K window
• This technique was first proposed by Lukas & Kanade (1981)
Solution: solve least squares problem
• Minimum least squares solution given by solution (in d) of:
Lucas-Kanade algorithm
◦ Optimal (u, v) satisfies Lucas-Kanade equation
When is This Solvable?
• ATA should be invertible
• ATA should not be too small due to noise
– eigenvalues 1 and 2 of ATA should not be too small
• ATA should be well-conditioned
– 1/ 2 should not be too large (1 = larger eigenvalue)
ATA is solvable when there is no aperture problem
What are the potential causes of errors in this procedure?
◦ Suppose ATA is easily invertible
◦ Suppose there is not much noise in the image
When the assumptions are violated?
• Brightness constancy is not satisfied
• The motion is not small
• A point does not move like its neighbors
– window size is too large
– what is the ideal window size?
Lucas-Kanade algorithm
 Iterative Refinement in Lukas-Kanade
 Estimate velocity at each pixel by solving Lucas-Kanade equations
 Warp H towards I using the estimated flow field
 use image warping techniques
 Repeat until convergence
 Some Implementation Issues:
 Warping is not easy (ensure that errors in warping are smaller than the estimate
refinement)
 Warp one image, take derivatives of the other so you don’t need to re-compute
the gradient after each iteration.
 Often useful to low-pass filter the images before motion estimation (for better
derivative estimation, and linear approximations to image intensity)
Focus of Expansion (FOE)
• Motion of object = - (Motion of Sensor)
• For a given translatory motion and gaze direction, the world seems to
flow out of one point (FOE).
),,( zyx
),,( 000 zyx1f
)','( yx
After time t, the scene point moves to:
),,(),,( 000 wtzvtyutxzyx 
},{)','(
0
0
0
0
wtz
vty
wtz
utx
yx





• As t varies the image point moves along a
straight line in the image
• Focus of Expansion: backtrack time or )( t
},{)','(
w
v
w
u
yx 
Challenges
Occlusion
Large displacement
Varying illumination
Insufficient texture
Discrete Optimization for Optical Flow
Large-displacement optical flow from a discrete point of view: sub-pixel from pixel-accurate flow;
Formulate optical flow estimation as discrete inference in a CRF, followed by sub-pixel refinement.
3 different strategies, to reduce computation and memory demands by several orders of magnitude.
Combination of three strategies allow to estimate large-displacement optical flow.
Diverse Flow Proposals:
Efficient search structure, 300 nearest neighbors, 200 proposals from neighboring pixels
Block Coordinate Descent
Alternating optimization of image rows and columns, sub-problems solved optimally via DP
Truncated Pairwise Potential
Efficient Dynamic Programming
Discrete Optimization for Optical Flow
Strategies for Efficient Discrete Optical Flow. Left: a large set of diverse flow proposals per pixel by combining
NN in feature space from a set of grid cells with winner-takes-all solutions from neighboring pixels. Middle:
apply block coordinate descent, iteratively optimizing all image rows and columns conditioned on neighboring
blocks via dynamic programming. Right: Taking advantage of robust penalties, reduce pairwise computation
costs by pre-computing the set of non-truncated neighboring flow proposals for each flow vector.
Discrete Optimization for Optical Flow
Robust data term based on DAISY descriptors d
Similar flow vectors f are encouraged by
weighted by the edge strength
Naive Dynamic Programming
Efficient Dynamic Programming
Large Displacement Optical Flow: Descriptor
Matching in Variational Motion Estimation
Integrating rich descriptors into variational optical flow setting vs coarse-to-fine warping schemes;
Estimate a dense optical flow field with the same high accuracy as from variational optical flow;
VARIATIONAL MODEL
SPM-BP: Sped-up PatchMatch Belief
Propagation for Continuous MRFs
Integrating key ideas from PatchMatch of
effective particle propagation and
resampling, PatchMatch belief propagation
(PMBP) has been demonstrated to have
good performance in addressing continuous
labeling problems and runs orders of
magnitude faster than Particle BP (PBP).
Sped-up PMBP (SPM-BP): unifying efficient
filter-based cost aggregation and message
passing with PatchMatch-based particle
generation in a highly effective way.
Two-layer graph structure used in SPM-BP: (b)(c) A superpixel-level
graph generates new particle proposals to be tested on the pixel-
level graph. (d) For the reference superpixel, the EAF is applied to
obtain the data cost. (e) The message passing algo. proceeds in the
inner loop, while outgoing messages on the boundary are fixed.
Efficient Coarse-to-Fine PatchMatch for
Large Displacement Optical Flow
CPM (Coarse-to-fine PatchMatch), blends an efficient random search strategy with the coarse-to-fine
scheme for optical flow problem, wrt the nearest neighbor field (NNF).
Propagation with constrained random search radius btw adjacent levels on the hierarchical architecture.
Construct the pyramids. On each level, the initial matching correspondences is propagated with random search
after a fixed number of times, and the results of each level is used as a initialization of the next lower level.
Efficient Coarse-to-Fine PatchMatch for
Large Displacement Optical Flow
Sec. 3.4
A forward-backward consistency check is performed
to detect the occlusions and remove the outliers, on
multi levels of the pyramid.
Only validation of matching correspondences on the
two finest levels is checked.
With backward flow interpolated from matching
correspondences linearly, let the error threshold of
the consistency check equal to the grid spacing, and
the coarser matches are all upscaled to the finest
resolution before the consistency check.
The matches > 400 pixels are also removed.
EpicFlow: Edge-Preserving Interpolation of
Correspondences for Optical Flow
Optical flow estimation at large displacements with significant occlusions.
Edge-Preserving Interpolation of Correspondences (EpicFlow) is fast and robust.
2 steps: i) dense matching by edge-preserving interpolation from a sparse set of
matches; ii) variational energy minimization initialized with the dense matches.
 The sparse-to-dense interpolation relies on an appropriate choice of the distance,
namely an edge-aware geodesic distance.
Handle occlusions and motion boundaries – two issues for optical flow computation.
Approximation for geodesic distance to allow fast computation w/o performance loss.
Variational energy minimization on dense matches to obtain the final flow estimation.
EpicFlow: Edge-Preserving Interpolation of
Correspondences for Optical Flow
Overview of EpicFlow. Given two images, compute matches using DeepFlow and the edges of the
first image using SED (Structured Edge Detector). Combine them to interpolate matches and obtain a
dense correspondence field, as initialization of a one-level energy minimization framework.
EpicFlow: Edge-Preserving Interpolation of
Correspondences for Optical Flow
(a-b) two consecutive frames; (c) contour response C from SED; (d) match positions from DeepMatching; (e-f)
geodesic distance from a pixel to all others. (g-h) 100 nearest matches using geodesic distance from the pixel.
Dense, Accurate Optical Flow Estimation with
Piecewise Parametric Model
Fit a flow field piecewise to a variety of parametric models, where the domain of each
piece (i.e., each piece’s shape, position and size) is determined adaptively, while at the
same time maintaining a global inter-piece flow continuity constraint.
A multi-model fitting scheme via energy minimization, taking into account both the
piecewise constant model assumption and the flow field continuity constraint, enabling
to effectively handle both homogeneous motions and complex motions.
Potts model term MDL term
Data term
Flow continuity(inter-piece compatibility) term
Dense, Accurate Optical Flow Estimation with
Piecewise Parametric Model
Flow Fields: Dense Correspondence Fields for
Accurate Large Displacement Optical Flow Estimation
 A dense correspondence field approach much better suited for optical flow estimation than
approximate nearest neighbor fields.
Do not require explicit regularization, smoothing or a new data term, but a data based search strategy
that finds most inliers and enhancements for outlier filtering.
The pipeline of the Flow Field approach. For the basic approach, only consider the full resolution.
Flow Fields: Dense Correspondence Fields for
Accurate Large Displacement Optical Flow Estimation
Illustration of the hierarchical Flow Field approach. Flow offsets
saved in pixels are propagated in all arrow directions.
Full Flow: Optical Flow Estimation By Global
Optimization over Regular Grids
A global optimization approach to optical flow
estimation which optimizes a classical optical flow
objective over the full space of mappings between
discrete grids.
The regular structure of the space of mappings
enables optimizations that reduce the
computational complexity of the algorithm’s inner
loop and support efficient matching.
The approach treats the objective (data term and
regularization term) as a Markov random field and
uses discrete optimization techniques.
Optical flow over regular grids. Each pixel p in
I1 is spatially connected to its four neighbors
in I1 and temporally connected to (2ς + 1)2
pixels in I2. Ω → [−ς, ς]2 be a flow field.
Full Flow: Optical Flow Estimation By Global
Optimization over Regular Grids
This objective is a discrete Markov random field with
a two-dimensional label space;
To optimize the model, use TRW-S, which optimizes
the dual of a natural linear programming relaxation of
the problem;
To reduce wall-clock time, implemente a parallelized
TRW-S solver;
Occlusion handling by FW-BW consistency checking;
Use EpicFlow interpolation scheme as postprocessing.
DeepFlow: Large displacement optical flow
with deep matching
 DeepFlow, blends a matching algorithm with a variational approach for optical flow.
A descriptor matching algorithm, tailored to the optical flow problem, that allows to boost
performance on fast motions.
 The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving
convolutions and max-pooling, a construction akin to deep convolutional nets.
Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a
built-in smoothing effect on descriptors matches, a valuable asset for integration into an energy
minimization framework for optical flow estimation.
 DeepFlow efficiently handles large displacements occurring in realistic videos, and shows
competitive performance on optical flow benchmarks.
DeepFlow: Large displacement optical flow
with deep matching
FlowNet: Learning Optical Flow with
Convolutional Networks
A generic architecture and another one including a layer that correlates feature vectors at
different image locations: FlowNetSimple and FlowNetCorr, being trained end-to-end.
A simple choice is to stack both input images together and feed them through a rather generic
network, allowing the network to decide itself how to process the image pair to extract the motion
information, called consisting only of convolutional layers ‘FlowNetSimple’.
A straightforward step is to create two separate, yet identical processing streams for the two
images and to combine them at a later stage. Design a ‘correlation layer’ that performs
multiplicative patch comparisons between two feature maps consisting of the layer “FlowNetCorr”.
Given two multi-channel feature maps f1, f2: R2 → Rc, with w, h, and c being their width, height and
number of channels, the correlation layer lets the network compare each patch from f1 with each
path from f2.
Refinement: The main ingredient are ‘upconvolutional’ layers, consisting of unpooling (extending
the feature maps, as opposed to pooling) and a convolution.
FlowNet: Learning Optical Flow with
Convolutional Networks
FlowNet: Learning Optical Flow with
Convolutional Networks
Refinement of the coarse feature maps to the high resolution prediction
FlowNet 2.0
End-to-end learning of optical flow: a stacked architecture with warping of the 2nd image with
intermediate optical flow; small displacements by a sub-network specializing on small motions.
Evaluation of options when stacking two FlowNetS networks (Net1 and Net2)
FlowNet 2.0
Deep Discrete Flow
Investigate two types of networks: a local network with a small receptive field consisting of 3x3
convolutions followed by non-linearities a subsequent context network that aggregates information
over larger image regions using dilated convolutions;
Learning context-aware features for solving optical flow using discrete optimization;
Training a context network with a large receptive field size on top of a local network using dilated
convolutions on patches.
Feature matching by comparing each pixel in the ref image to every pixel in the target image;
The matching cost volume from the network's output forms the data term for discrete MAP inference
in a pairwise MRF.
Local Network: leverages 3x3 convolution kernels. The hyper-parameters of the network are the
number of layers and the number of feature maps in each layer as specified in evaluation.
Context Network: increases the size of the receptive field with only modest increase in complexity by
exploiting dilated convolutions, i.e. reading the input feature maps at locations with a spatial stride
larger than one, taking more contextual information into account.
Deep Discrete Flow
The input images are processed in forward order and backward order using local and context Siamese CNN, yielding per-
pixel descriptors. Then match points on a regular grid in the ref image to every pixel in the other image, yielding a large
tensor of forward matching costs (F1/F2) and backward matching costs (B1/B2). Matching costs are smoothed using
discrete MAP inference in a pairwise MRF. Finally, a forward-backward consistency check removes outliers and sub-pixel
accuracy is attained using the EpicFlow interpolator . Train the model in a piece-wise fashion via the loss functions.
Deep Discrete Flow
(a) Naive
(b) Fast
Dilated Convolution Implementations
The center of the patch is marked with a red * and each color corresponds to a convolution center for a specific
dilation factor, red for 4 dilations (shown in green), green for 2 dilations (shown in blue) and yellow for both.
Deep Discrete Flow
Fast Patch-based Training of Dilated Convolutional Networks. Left: A naive
implementation requires dilated convolution operations which are computationally
less efficient than highly optimized cudnn convolutions without dilations. Right:
The behavior of dilated convolutions can be replicated with regular convolutions by
first sub-sampling the feature map and then applying 1-dilated convolutions with
stride. Here dilations is denoting an array that species the dilation factor of the
dilated convolution in each convolutional layer.
Optical Flow Estimation using a Spatial
Pyramid Network
Compute optical flow by combining a classical spatial-pyramid formulation with deep learning.
This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by
the current flow estimate and computing an update to the flow.
Train one deep network per level to compute the flow update.
Do not need to deal with large motions, instead these are dealt with by the pyramid.
 Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters.
 Since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped
images is appropriate.
The learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method
and how to improve it.
 Trained using Adam optimization with 1 = 0:9 and 2 = 0:999. A batch size of 32 across all networks with 4000
iterations per epoch. A learning rate of 1e-4 for the first 60 epochs and decrease it to 1e-5 until converge.
Optical Flow Estimation using a Spatial
Pyramid Network
Training network Gk requires trained models {G0,…,Gk} to
obtain the initial flow u(Vk-1). Obtain ground truth residual
flows ῠk by subtracting downsampled ground truth flow Ṽk
and u(Vk-1) to train the network Gk using the End Point Error
(EPE) loss.
Each level in the pyramid has a simplified task relative to the full optical flow
estimation problem; it only has to estimate a small-motion update to an existing
flow field. Consequently each network can be simple.
Optical Flow Estimation using a Spatial
Pyramid Network
Inference in a 3-Level Pyramid Network: The network G0 computes the residual flow v0 at the highest level of the
pyramid(smallest image) using the low resolution images {I1
0 , I2
0 }. At each pyramid level, the network Gk
computes a residual flow vk which propagates to each of the next lower levels of the pyramid in turn, to finally
obtain the flow V2 at the highest resolution.
A Large Dataset to Train ConvNets for Disparity,
Optical Flow, and Scene Flow Estimation
What is Scene Flow?
Scene flow describes the 3D motion of scene points, just like optical flow describes the 2D motion.
Disparity estimation: First only train early low res. losses, second enable higher res. and phase out low res.
losses, then repurpose the deeper layers when no longer constrained by directly attached losses;
Scene flow estimation: 1. Interleaving 3 pretrained networks (1x FlowNet and 2x DispNets); 2. Joint retraining
on optical flow, 2x disparity, and disparity change.
A Large Dataset to Train ConNets for Disparity,
Optical Flow, and Scene Flow Estimation
Interleaving the weights of a FlowNet (green) and two DispNets (red and blue) to a SceneFlowNet. For
every layer, the filter masks are created by taking the weights of one network (left) and setting the
weights of the other networks to zero, respectively (middle). The outputs from each network are then
concatenated to yield one big network with three times the number of inputs and outputs (right).
A Large Dataset to Train ConNets for Disparity,
Optical Flow, and Scene Flow Estimation
Accurate Optical Flow via Direct Cost Volume
Processing by CNN
Optical flow estimation operating on the full 4-d cost volume.
Share the structural benefits of leading stereo matching pipelines to yield high accuracy.
The full 4-d cost volume can be constructed in a fraction of a second due to its regularity.
Adapt semi-global matching to the 4-d setting, to a pipeline that achieves higher accuracy.
Learn a nonlinear feature embedding using a convolutional network.
Embed image patches into a compact and discriminative feature space that is robust to geometric and
radiometric distortions encountered in optical flow estimation.
Feature space embeddings as well as distances in this space can be computed extremely efficiently.
A small fully-convolutional network that embeds raw image patches into a compact Euclidean space.
SGM is a common stand-in for more costly MRF optimization in stereo processing pipelines, robust/in parallel.
EpicFlow uses locally-weighted affine models to synthesize a dense flow field from semidense matches.
Accurate Optical Flow via Direct Cost Volume
Processing by CNN
Qualitative results on three images from the KITTI 2015 training set.
Appendix A:
A Database and Evaluation Methodology
for Optical Flow
Limitations of Yosemite
Only sequence used for quantitative evaluation
Limitations:
•Very simple and synthetic
•Small, rigid motion
•Minimal motion discontinuities/occlusions
Image 7 Image 8
Yosemite
Ground-Truth Flow
Flow Color
Coding
Limitations of Yosemite
Only sequence used for quantitative evaluation
Current challenges:
•Non-rigid motion
•Real sensor noise
•Complex natural scenes
•Motion discontinuities
Need more challenging and more realistic benchmarks
Image 7 Image 8
Yosemite
Ground-Truth Flow
Flow Color
Coding
Realistic synthetic imagery
•Randomly generate scenes with “trees” and “rocks”
•Significant occlusions, motion, texture, and blur
•Rendered using Mental Ray and “lens shader” plugin
RockGrove
Modified stereo imagery
•Recrop and resample ground-truth stereo datasets to have appropriate motion for OF
VenusMoebius
•Paint scene with textured fluorescent paint
•Take 2 images: One in visible light, one in UV light
•Move scene in very small steps using robot
•Generate ground-truth by tracking the UV images
Dense flow with hidden texture
Setup
Visible
UV
Lights Image Cropped
Conclusions
•Difficulty: Data substantially more challenging than Yosemite
•Diversity: Substantial variation in difficulty across the various datasets
•Motion GT vs Interpolation: Best algorithms for one are not the best for the other
•Comparison with Stereo: Performance of existing flow algorithms appears weak
Szeliski
Appendix B:
Secret of Optic Flow Estimation and Their Principles
Classical Optical Flow Objective Function
u and v are the horizontal and vertical components of the optical flow field
to be estimated from images I1 and I2, λ is a regularization parameter, and
ρD and ρS are the data and spatial penalty functions.
The penalty functions: (1) the quadratic HS penalty, (2) the Charbonnier
penalty, and (3) the Lorentzian.
Pre-processing
Optimize the regularization parameter λ for the training sequences;
Apply non-linear prefiltering of the images to reduce the influence of
illumination changes;
Use a standard brightness constancy model;
Gradient only imposes constancy of the gradient vector at each pixel (i.e. it
robustly penalizes Euclidean distance between image gradients);
Simple derivative constancy is as good as the more sophisticated texture
decomposition method.
Coarse-to-fine estimation and GNC
(Graduated Non-Convexity)
The GNC (graduated non-convexity) scheme: linearly combine a quadratic
objective with a robust objective in varying proportions, from fully quadratic
to fully robust;
The downsampling factor does not matter when using a convex penalty; a
standard factor of 0.5 is fine;
The GNC method is helpful even for the convex Charbonnier penalty
function due to the nonlinearity of the data term.
Interpolation method and derivatives
Bicubic interpolation is more accurate than bilinear;
Removing temporal averaging of the gradients, using Central difference
filters, or using a 7-point derivative filter all reduce accuracy compared to
the baseline, but not significantly.
The spline-based interpolation scheme is consistently better;
Temporal averaging of the derivatives is probably worthwhile for a small
computational expense.
Penalty functions
The convex Charbonnier penalty performs better than the more robust, non-
convex Lorentzian on both the training and test sets.
One reason that non-convex functions are more difficult to optimize, causing
the optimization scheme to find a poor local optimum.
The less-robust Charbonnier is preferable to the Lorentzian and a slightly non-
convex penalty function is better still.
Median filtering
The baseline 5 × 5 median filter is better than both MF 3×3 and MF 7×7 but the
difference is not significant.
When we perform 5× 5 median filtering twice (2× MF) or five times (5× MF) per
warping step, the results are worse.
Finally, removing the median filtering step (w/o MF) makes the computed flow
significantly less accurate with larger outliers.
Appendix C:
Machine (Deep) Learning and Optimization
Graphical Models
• Graphical Models: Powerful framework for representing dependency
structure between random variables.
• The joint probability distribution over a set of random variables.
• The graph contains a set of nodes (vertices) that represent random variables, and a set
of links (edges) that represent dependencies between those random variables.
• The joint distribution over all random variables decomposes into a product of
factors, where each factor depends on a subset of the variables.
• Two type of graphical models:
• Directed (Bayesian networks)
• Undirected (Markov random fields, Boltzmann machines)
• Hybrid graphical models that combine directed and undirected models, such as Deep
Belief Networks, Hierarchical-Deep Models.
Generative Model: MRF
Random Field: F={F1,F2,…FM} a family of random variables on set S in which each Fi takes
value fi in a label set L.
Markov Random Field: F is said to be a MRF on S w.r.t. a neighborhood N if and only if it
satisfies Markov property.
◦ Generative model for joint probability p(x)
◦ allows no direct probabilistic interpretation
◦ define potential functions Ψ on maximal cliques A
◦ map joint assignment to non-negative real number
◦ requires normalization
MRF is undirected graphical models
A flow network G(V, E) defined as a fully connected directed graph
where each edge (u,v) in E has a positive capacity c(u,v) >= 0;
The max-flow problem is to find the flow of maximum value on a
flow network G;
A s-t cut or simply cut of a flow network G is a partition of V into S
and T = V-S, such that s in S and t in T;
A minimum cut of a flow network is a cut whose capacity is the
least over all the s-t cuts of the network;
Methods of max flow or mini-cut:
◦ Ford Fulkerson method;
◦ "Push-Relabel" method.
Mostly labeling is solved as an energy minimization problem;
Two common energy models:
◦ Potts Interaction Energy Model;
◦ Linear Interaction Energy Model.
Graph G contain two kinds of vertices: p-vertices and i-vertices;
◦ all the edges in the neighborhood N, called n-links;
◦ edges between the p-vertices and the i-vertices called t-links.
In the multiple labeling case, the multi-way cut should leave each p-vertex connected to one i-vertex;
The minimum cost multi-way cut will minimize the energy function where the severed n-links would
correspond to the boundaries of the labeled vertices;
The approximation algorithms to find this multi-way cut:
◦ "alpha-expansion" algorithm;
◦ "alpha-beta swap" algorithm.
Deep Learning
Representation learning attempts to automatically learn good features or representations;
Deep learning algorithms attempt to learn multiple levels of representation of increasing
complexity/abstraction (intermediate and high level features);
Become effective via unsupervised pre-training + supervised fine tuning;
◦ Deep networks trained with back propagation (without unsupervised pre-training) perform worse than
shallow networks.
Deal with the curse of dimensionality (smoothing & sparsity) and over-fitting (unsupervised, regularizer);
Semi-supervised: structure of manifold assumption;
◦ labeled data is scarce and unlabeled data is abundant.
Why Deep Learning?
Supervised training of deep models (e.g. many-layered Nets) is too hard (optimization
problem);
◦ Learn prior from unlabeled data;
Shallow models are not for learning high-level abstractions;
◦ Ensembles or forests do not learn features first;
◦ Graphical models could be deep net, but mostly not.
Unsupervised learning could be “local-learning”;
◦ Resemble boosting with each layer being like a weak learner
Learning is weak in directed graphical models with many hidden variables;
◦ Sparsity and regularizer.
Traditional unsupervised learning methods aren’t easy to learn multiple levels of
representation.
◦ Layer-wised unsupervised learning is the solution.
Multi-task learning (transfer learning and self taught learning);
Other issues: scalability & parallelism with the burden from big data.
Multi Layer Neural Network
A neural network = running several logistic regressions at the same time;
◦ Neuron=logistic regression or…
Calculate error derivatives (gradients) to refine: back propagate the error derivative through model
(the chain rule)
◦ Online learning: stochastic/incremental gradient descent
◦ Batch learning: conjugate gradient descent
Problems in MLPs
Multi Layer Perceptrons (MLPs), one feed-forward neural network, were popularly used for decades.
Gradient is progressively getting more scattered
◦ Below the top few layers, the correction signal is minimal
Gets stuck in local minima
◦ Especially start out far from ‘good’ regions (i.e., random initialization)
In usual settings, use only labeled data
◦ Almost all data is unlabeled!
◦ Instead the human brain can learn from unlabeled data.
Convolutional Neural Networks
CNN is a special kind of multi-layer NNs applied to 2-d arrays (usually images), based on spatially localized
neural input;
◦ local receptive fields(shifted window), shared weights (weight averaging) across the hidden units, and often, spatial
or temporal sub-sampling;
◦ Related to generative MRF/discriminative CRF:
◦ CNN=Field of Experts MRF=ML inference in CRF;
◦ Generate ‘patterns of patterns’ for pattern recognition.
Each layer combines (merge, smooth) patches from previous layers
◦ Pooling /Sampling (e.g., max or average) filter: compress and smooth the data.
◦ Convolution filters: (translation invariance) unsupervised;
◦ Local contrast normalization: increase sparsity, improve optimization/invariance.
C layers convolutions,
S layers pool/sample
Convolutional Neural Networks
Convolutional Networks are trainable multistage architectures composed of multiple stages;
Input and output of each stage are sets of arrays called feature maps;
At output, each feature map represents a particular feature extracted at all locations on input;
Each stage is composed of: a filter bank layer, a non-linearity layer, and a feature pooling layer;
A ConvNet is composed of 1, 2 or 3 such 3-layer stages, followed by a classification module;
◦ A fully connected layer: softmax transfer function for posterior distribution.
Filter: A trainable filter (kernel) in filter bank connects input feature map to output feature map;
Nonlinearity: a pointwise sigmoid tanh() or a rectified sigmoid abs(gi•tanh()) function;
◦ In rectified function, gi is a trainable gain parameter, might be followed a contrast normalization N;
Feature pooling: treats each feature map separately -> a reduced-resolution output feature map;
Supervised training is performed using a form of SGD to minimize the prediction error;
◦ Gradients are computed with the back-propagation method.
Unsupervised pre-training: predictive sparse decomposition (PSD), then supervised fine-tuning.
* is discrete convolution operator
Belief Nets
Belief net is directed acyclic graph composed of stochastic var.
Can observe some of the variables and solve two problems:
◦ inference: Infer the states of the unobserved variables.
◦ learning: Adjust the interactions between variables to more likely generate the observed data.
stochastic
hidden
cause
visible
effect
Use nets composed of layers
of stochastic variables with
weighted connections.
Boltzmann Machines
Energy-based model associate a energy to each configuration of stochastic variables of interests (for
example, MRF, Nearest Neighbor);
◦ Learning means adjustment of the low energy function’s shape properties;
Boltzmann machine is a stochastic recurrent model with hidden variables;
◦ Monte Carlo Markov Chain, i.e. MCMC sampling (appendix);
Restricted Boltzmann machine is a special case:
◦ Only one layer of hidden units;
◦ factorization of each layer’s neurons/units (no connections in the same layer);
Contrastive divergence: approximation of gradient (appendix).
probability
Energy Function
Learning rule
Deep Belief Networks
A hybrid model: can be trained as generative or
discriminative model;
Deep architecture: multiple layers (learn features
layer by layer);
◦ Multi layer learning is difficult in sigmoid belief
networks.
◦ Top two layers are undirected connections, RBM;
◦ Lower layers get top down directed connections
from layers above;
Unsupervised or self-taught pre-learning provides
a good initialization;
◦ Greedy layer-wise unsupervised training for
RBM
Supervised fine-tuning
◦ Generative: wake-sleep algorithm (Up-down)
◦ Discriminative: back propagation (bottom-up)
Deep Boltzmann Machine
Learning internal representations that become increasingly complex;
High-level representations built from a large supply of unlabeled inputs;
Pre-training consists of learning a stack of modified RBMs, which are composed to create a deep Boltzmann
machine (undirected graph);
Generative fine-tuning: different from DBN
◦ Positive and negative phase (appendix)
Discriminative fine-tuning: the same to DBN
◦ Back propagation.
Denoising Auto-Encoder
Multilayer NNs with target output=input;
Reconstruction=decoder(encoder(input));
◦ Perturbs the input x to a corrupted version;
◦ Randomly sets some of the coordinates of input to zeros.
◦ Recover x from encoded perturbed data.
Learns a vector field towards higher probability regions;
Pre-trained with DBN or regularizer with perturbed training data;
Minimizes variational lower bound on a generative model;
◦ corresponds to regularized score matching on an RBM;
PCA=linear manifold=linear Auto Encoder;
Auto-encoder learns the salient variation like a nonlinear PCA.
Stacked Denoising Auto-Encoder
Stack many (may be sparse) auto-encoders in succession and train them using greedy layer-wise
unsupervised learning
◦ Drop the decode layer each time
◦ Performs better than stacking RBMs;
Supervised training on the last layer using final features;
(option) Supervised training on the entire network to fine- tune all weights of the neural net;
Empirically not quite as accurate as DBNs.
 A simplified Bayes Net: it propagates info. throughout a graphical model via a series
of messages between neighboring nodes iteratively; likely to converge to a consensus that
determines the marginal prob. of all the variables;
 messages estimate the cost (or energy) of a configuration of a clique given all other cliques;
then the messages are combined to compute a belief (marginal or maximum probability);
Two types of BP methods:
◦ max-product;
◦ sum-product.
BP provides exact solution when there are no loops in graph!
Equivalent to dynamic programming/Viterbi in these cases;
Loopy Belief Propagation: still provides approximate (but often good) solution;
Generalized BP for pairwise MRFs
◦ Hidden variables xi and xj are connected through a compatibility function;
◦ Hidden variables xi are connected to observable variables yi by the local “evidence” function;
The joint probability of {x} is given by
To improve inference by taking into account higher-order interactions among the
variables;
◦ An intuitive way is to define messages that propagate between groups of nodes rather than just single nodes;
◦ This is the intuition in Generalized Belief Propagation (GBP).
Stochastic Gradient Descent (SGD)
• The general class of estimators that arise as minimizers of sums are called M-
estimators;
• Where are stationary points of the likelihood function (or zeroes of its derivative, the score
function)?
• Online gradient descent samples a subset of summand functions at every step;
• The true gradient is approximated by a gradient at a single example;
• Shuffling of training set at each pass.
• There is a compromise between two forms, often called "mini-batches", where the
true gradient is approximated by a sum over a small number of training examples.
• STD converges almost surely to a global minimum when the objective function
is convex or pseudo-convex, and otherwise converges almost surely to a local
minimum.
Back Propagation
E (f(x0,w),y0) = -log (f(x0,w)- y0).
Variable Learning Rate
Too large learning rate
◦ cause oscillation in searching for the minimal point
Too slow learning rate
◦ too slow convergence to the minimal point
Adaptive learning rate
◦ At the beginning, the learning rate can be large when the current point is far from the
optimal point;
◦ Gradually, the learning rate will decay as time goes by.
Should not be too large or too small:
◦ annealing rate 𝛼(𝑡)=𝛼(0)/(1+𝑡/𝑇)
◦ 𝛼(𝑡) will eventually go to zero, but at the beginning it is almost a constant.
Variable Momentum
AdaGrad/AdaDelta
Dropout and Maxout for Overfitting
Dropout: set the output of each hidden neuron to zero w.p. 0.5.
◦ Motivation: Combining many different models that share parameters succeeds in reducing test
errors by approximately averaging together the predictions, which resembles the bagging.
◦ The units which are “dropped out” in this way do not contribute to the forward pass and do not
participate in back propagation.
◦ So every time an input is presented, the NN samples a different architecture, but all these
architectures share weights.
◦ This technique reduces complex co-adaptations of units, since a neuron cannot rely on the presence
of particular other units.
◦ It is, therefore, forced to learn more robust features that are useful in conjunction with many
different random subsets of the other units.
◦ Without dropout, the network exhibits substantial overfitting.
◦ Dropout roughly doubles the number of iterations required to converge.
Maxout takes the maximum across multiple feature maps;
Weight Decay for Overfitting
Weight decay or L2 regularization adds a penalty term to the error function, a term called the
regularization term: the negative log prior in Bayesian justification,
◦ Weight decay works as rescaling weights in the learning rule, but bias learning still the same;
◦ Prefer to learn small weights, and large weights allowed if improving the original cost function;
◦ A way of compromising btw finding small weights and minimizing the original cost function;
In a linear model, weight decay is equivalent to ridge (Tikhonov) regression;
L1 regularization: the weights not really useful shrink by a constant amount toward zero;
◦ Act like a form of feature selection;
◦ Make the input filters cleaner and easier to interpret;
L2 regularization penalizes large values strongly while L1 regularization ;
Markov Chain Monte Carlo (MCMC): simulating a Markov chain whose equilibrium distr. is the
posterior distribution for weights & hyper-parameters;
Hybrid Monte Carlo: gradient and sampling.
Early Stopping for Overfitting
Steps in early stopping:
◦ Divide the available data into training and validation sets.
◦ Use a large number of hidden units.
◦ Use very small random initial values.
◦ Use a slow learning rate.
◦ Compute the validation error rate periodically during training.
◦ Stop training when the validation error rate "starts to go up".
Early stopping has several advantages:
◦ It is fast.
◦ It can be applied successfully to networks in which the number of weights far exceeds the sample size.
◦ It requires only one major decision by the user: what proportion of validation cases to use.
Practical issues in early stopping:
◦ How many cases do you assign to the training and validation sets?
◦ Do you split the data into training and validation sets randomly or by some systematic algorithm?
◦ How do you tell when the validation error rate "starts to go up"?
MCMC Sampling for Optimization
Markov Chain: a stochastic process in which future states are independent of past states but the
present state.
◦ Markov chain will typically converge to a stable distribution.
Monte Carlo Markov Chain: sampling using ‘local’ information
◦ Devise a Markov chain whose stationary distribution is the target.
◦ Ergodic MC must be aperiodic, irreducible, and positive recurrent.
◦ Monte Carlo Integration to get quantities of interest.
Metropolis-Hastings method: sampling from a target distribution
◦ Create a Markov chain whose transition matrix does not depend on the normalization term.
◦ Make sure the chain has a stationary distribution and it is equal to the target distribution (accept ratio).
◦ After sufficient number of iterations, the chain will converge the stationary distribution.
Gibbs sampling is a special case of M-H Sampling.
◦ The Hammersley-Clifford Theorem: get the joint distribution from the complete conditional distribution.
Hybrid Monte Carlo: gradient sub step for each Markov chain.
Mean Field for Optimization
Variational approximation modifies the optimization problem to be tractable, at the price of
approximate solution;
Mean Field replaces M with a (simple) subset M(F), on which A* (μ) is a closed form (Note: F is
disconnected graph);
◦ Density becomes factorized product distribution in this sub-family.
◦ Objective: K-L divergence.
Mean field is a structured variation approximation approach:
◦ Coordinate ascent (deterministic);
Compared with stochastic approximation (sampling):
◦ Faster, but maybe not exact.
Contrastive Divergence for RBMs
Contrastive divergence (CD) is proposed for training PoE first, also being a quicker way to learn
RBMs;
◦ Contrastive divergence as the new objective;
◦ Taking gradients and ignoring a term which is usually very small.
Steps:
◦ Start with a training vector on the visible units.
◦ Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel.
Can be applied using any MCMC algorithm to simulate the model (not limited to just Gibbs
sampling);
CD learning is biased: not work as gradient descent
Improved: Persistent CD explores more modes in the distribution
◦ Rather than from data samples, begin sampling from the mode samples, obtained from the last gradient
update.
◦ Still suffer from divergence of likelihood due to missing the modes.
Score matching: the score function does not depend on its normal. factor. So, match it b.t.w. the
model with the empirical density.
“Wake-Sleep” Algorithm for DBN
Pre-trained DBN is a generative model;
Do a stochastic bottom-up pass (wake phase)
◦ Get samples from factorial distribution (visible first, then generate hidden);
◦ Adjust the top-down weights to be good at reconstructing the feature activities in the layer below.
Do a few iterations of sampling in the top level RBM
◦ Adjust the weights in the top-level RBM.
Do a stochastic top-down pass (sleep phase)
◦ Get visible and hidden samples generated by generative model using data coming from nowhere!
◦ Adjust the bottom-up weights to be good at reconstructing the feature activities in the layer above.
◦ Any guarantee for improvement? No!
The “Wake-Sleep” algorithm is trying to describe the representation economical (Shannon’s coding
theory).
Greedy Layer-Wise Training
Deep networks tend to have more local minima problems than shallow networks during
supervised training
Train first layer using unlabeled data
◦ Supervised or semi-supervised: use more unlabeled data.
Freeze the first layer parameters and train the second layer
Repeat this for as many layers as desire
◦ Build more robust features
Use the outputs of the final layer to train the last supervised layer (leave early weights frozen)
Fine tune the full network with a supervised approach;
Avoid problems to train a deep net in a supervised fashion.
◦ Each layer gets full learning
◦ Help with ineffective early layer learning
◦ Help with deep network local minima
Why Greedy Layer-Wise Training Works?
Take advantage of the unlabeled data;
Regularization Hypothesis
◦ Pre-training is “constraining” parameters in a region relevant to unsupervised
dataset;
◦ Better generalization (representations that better describe unlabeled data are
more discriminative for labeled data) ;
Optimization Hypothesis
◦ Unsupervised training initializes lower level parameters near localities of better
minima than random initialization can.
Only need fine tuning in the supervised learning stage.
Two-Stage Pre-training in DBMs
Pre-training in one stage
◦ Positive phase: clamp observed, sample hidden, using variational approximation (mean-field)
◦ Negative phase: sample both observed and hidden, using persistent sampling (stochastic approximation:
MCMC)
Pre-training in two stages
◦ Approximating a posterior distribution over the states of hidden units (a simpler directed deep model as DBNs
or stacked DAE);
◦ Train an RBM by updating parameters to maximize the lower-bound of log-likelihood and correspond.
posterior of hidden units.
◦ Options (CAST, contrastive divergence, stochastic approximation…).
Optic flow estimation with deep learning

Weitere ähnliche Inhalte

Was ist angesagt?

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation岳華 杜
 
Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)John Williams
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learningAntonio Rueda-Toicen
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer VisionSungjoon Choi
 
Comparative study on image segmentation techniques
Comparative study on image segmentation techniquesComparative study on image segmentation techniques
Comparative study on image segmentation techniquesgmidhubala
 
Video Segmentation
Video SegmentationVideo Segmentation
Video SegmentationSmriti Jain
 
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...Ulaş Bağcı
 
Image segmentation
Image segmentationImage segmentation
Image segmentationkhyati gupta
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basicsBrodmann17
 
Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)VARUN KUMAR
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillationNAVER Engineering
 
Mpeg video compression
Mpeg video compressionMpeg video compression
Mpeg video compressionGem WeBlog
 
Image segmentation
Image segmentationImage segmentation
Image segmentationDeepak Kumar
 
Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DLLeapMind Inc
 

Was ist angesagt? (20)

Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic SegmentationSemantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
Semantic Segmentation - Fully Convolutional Networks for Semantic Segmentation
 
Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)Image processing9 segmentation(pointslinesedges)
Image processing9 segmentation(pointslinesedges)
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Image segmentation with deep learning
Image segmentation with deep learningImage segmentation with deep learning
Image segmentation with deep learning
 
Deep Learning in Computer Vision
Deep Learning in Computer VisionDeep Learning in Computer Vision
Deep Learning in Computer Vision
 
Comparative study on image segmentation techniques
Comparative study on image segmentation techniquesComparative study on image segmentation techniques
Comparative study on image segmentation techniques
 
Video Segmentation
Video SegmentationVideo Segmentation
Video Segmentation
 
Image Processing
Image ProcessingImage Processing
Image Processing
 
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
Lec7: Medical Image Segmentation (I) (Radiology Applications of Segmentation,...
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Deep learning based object detection basics
Deep learning based object detection basicsDeep learning based object detection basics
Deep learning based object detection basics
 
Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)Image Segmentation (Digital Image Processing)
Image Segmentation (Digital Image Processing)
 
Relational knowledge distillation
Relational knowledge distillationRelational knowledge distillation
Relational knowledge distillation
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Mpeg video compression
Mpeg video compressionMpeg video compression
Mpeg video compression
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Image segmentation
Image segmentationImage segmentation
Image segmentation
 
Image Segmentation
 Image Segmentation Image Segmentation
Image Segmentation
 
Final presentation optical flow estimation with DL
Final presentation  optical flow estimation with DLFinal presentation  optical flow estimation with DL
Final presentation optical flow estimation with DL
 
Histogram processing
Histogram processingHistogram processing
Histogram processing
 

Ähnlich wie Optic flow estimation with deep learning

Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingYu Huang
 
Multi-hypothesis projection-based shift estimation for sweeping panorama reco...
Multi-hypothesis projection-based shift estimation for sweeping panorama reco...Multi-hypothesis projection-based shift estimation for sweeping panorama reco...
Multi-hypothesis projection-based shift estimation for sweeping panorama reco...Tuan Q. Pham
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionIOSR Journals
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionIOSR Journals
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalTaeKang Woo
 
A PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKING
A PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKINGA PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKING
A PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKINGIRJET Journal
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Yu Huang
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentationasodariyabhavesh
 
Lossless image compression via by lifting scheme
Lossless image compression via by lifting schemeLossless image compression via by lifting scheme
Lossless image compression via by lifting schemeSubhashini Subramanian
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Yan Xu
 
Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276Editor IJARCET
 
Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276Editor IJARCET
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networksananth
 
Single image super resolution with improved wavelet interpolation and iterati...
Single image super resolution with improved wavelet interpolation and iterati...Single image super resolution with improved wavelet interpolation and iterati...
Single image super resolution with improved wavelet interpolation and iterati...iosrjce
 

Ähnlich wie Optic flow estimation with deep learning (20)

DICTA 2017 poster
DICTA 2017 posterDICTA 2017 poster
DICTA 2017 poster
 
Fisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous DrivingFisheye Omnidirectional View in Autonomous Driving
Fisheye Omnidirectional View in Autonomous Driving
 
MTP paper
MTP paperMTP paper
MTP paper
 
Multi-hypothesis projection-based shift estimation for sweeping panorama reco...
Multi-hypothesis projection-based shift estimation for sweeping panorama reco...Multi-hypothesis projection-based shift estimation for sweeping panorama reco...
Multi-hypothesis projection-based shift estimation for sweeping panorama reco...
 
presentation.ppt
presentation.pptpresentation.ppt
presentation.ppt
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super Resolution
 
Effective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super ResolutionEffective Pixel Interpolation for Image Super Resolution
Effective Pixel Interpolation for Image Super Resolution
 
Stixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normalStixel based real time object detection for ADAS using surface normal
Stixel based real time object detection for ADAS using surface normal
 
A PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKING
A PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKINGA PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKING
A PROJECT REPORT ON REMOVAL OF UNNECESSARY OBJECTS FROM PHOTOS USING MASKING
 
Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling Deep learning for 3-D Scene Reconstruction and Modeling
Deep learning for 3-D Scene Reconstruction and Modeling
 
Chapter10 image segmentation
Chapter10 image segmentationChapter10 image segmentation
Chapter10 image segmentation
 
Kintinuous review
Kintinuous reviewKintinuous review
Kintinuous review
 
Lossless image compression via by lifting scheme
Lossless image compression via by lifting schemeLossless image compression via by lifting scheme
Lossless image compression via by lifting scheme
 
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
Deep Learning Approach in Characterizing Salt Body on Seismic Images - by Zhe...
 
Ijetr011837
Ijetr011837Ijetr011837
Ijetr011837
 
Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276
 
Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276Ijarcet vol-2-issue-7-2273-2276
Ijarcet vol-2-issue-7-2273-2276
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
Single image super resolution with improved wavelet interpolation and iterati...
Single image super resolution with improved wavelet interpolation and iterati...Single image super resolution with improved wavelet interpolation and iterati...
Single image super resolution with improved wavelet interpolation and iterati...
 
reportVPLProject
reportVPLProjectreportVPLProject
reportVPLProject
 

Mehr von Yu Huang

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingYu Huang
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...Yu Huang
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingYu Huang
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingYu Huang
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationYu Huang
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and PredictionYu Huang
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VYu Huang
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVYu Huang
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduYu Huang
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the HoodYu Huang
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)Yu Huang
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingYu Huang
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?Yu Huang
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingYu Huang
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgYu Huang
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learningYu Huang
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymoYu Huang
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningYu Huang
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingYu Huang
 

Mehr von Yu Huang (20)

Application of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous DrivingApplication of Foundation Model for Autonomous Driving
Application of Foundation Model for Autonomous Driving
 
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...The New Perception Framework  in Autonomous Driving: An Introduction of BEV N...
The New Perception Framework in Autonomous Driving: An Introduction of BEV N...
 
Data Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous DrivingData Closed Loop in Simulation Test of Autonomous Driving
Data Closed Loop in Simulation Test of Autonomous Driving
 
Techniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous DrivingTechniques and Challenges in Autonomous Driving
Techniques and Challenges in Autonomous Driving
 
BEV Joint Detection and Segmentation
BEV Joint Detection and SegmentationBEV Joint Detection and Segmentation
BEV Joint Detection and Segmentation
 
BEV Object Detection and Prediction
BEV Object Detection and PredictionBEV Object Detection and Prediction
BEV Object Detection and Prediction
 
Fisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VIFisheye based Perception for Autonomous Driving VI
Fisheye based Perception for Autonomous Driving VI
 
Fisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving VFisheye/Omnidirectional View in Autonomous Driving V
Fisheye/Omnidirectional View in Autonomous Driving V
 
Fisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IVFisheye/Omnidirectional View in Autonomous Driving IV
Fisheye/Omnidirectional View in Autonomous Driving IV
 
Prediction,Planninng & Control at Baidu
Prediction,Planninng & Control at BaiduPrediction,Planninng & Control at Baidu
Prediction,Planninng & Control at Baidu
 
Cruise AI under the Hood
Cruise AI under the HoodCruise AI under the Hood
Cruise AI under the Hood
 
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
LiDAR in the Adverse Weather: Dust, Snow, Rain and Fog (2)
 
Scenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous DrivingScenario-Based Development & Testing for Autonomous Driving
Scenario-Based Development & Testing for Autonomous Driving
 
How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?How to Build a Data Closed-loop Platform for Autonomous Driving?
How to Build a Data Closed-loop Platform for Autonomous Driving?
 
Annotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous DrivingAnnotation tools for ADAS & Autonomous Driving
Annotation tools for ADAS & Autonomous Driving
 
Simulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atgSimulation for autonomous driving at uber atg
Simulation for autonomous driving at uber atg
 
Multi sensor calibration by deep learning
Multi sensor calibration by deep learningMulti sensor calibration by deep learning
Multi sensor calibration by deep learning
 
Prediction and planning for self driving at waymo
Prediction and planning for self driving at waymoPrediction and planning for self driving at waymo
Prediction and planning for self driving at waymo
 
Jointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planningJointly mapping, localization, perception, prediction and planning
Jointly mapping, localization, perception, prediction and planning
 
Data pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous drivingData pipeline and data lake for autonomous driving
Data pipeline and data lake for autonomous driving
 

Kürzlich hochgeladen

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catcherssdickerson1
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...Chandu841456
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerAnamika Sarkar
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...srsj9000
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleAlluxio, Inc.
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxPurva Nikam
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxKartikeyaDwivedi3
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx959SahilShah
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)Dr SOUNDIRARAJ N
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...asadnawaz62
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfAsst.prof M.Gokilavani
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitterShivangiSharma879191
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile servicerehmti665
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncssuser2ae721
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxJoão Esperancinha
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024hassan khalil
 

Kürzlich hochgeladen (20)

TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor CatchersTechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
TechTAC® CFD Report Summary: A Comparison of Two Types of Tubing Anchor Catchers
 
An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...An experimental study in using natural admixture as an alternative for chemic...
An experimental study in using natural admixture as an alternative for chemic...
 
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube ExchangerStudy on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
Study on Air-Water & Water-Water Heat Exchange in a Finned Tube Exchanger
 
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
Gfe Mayur Vihar Call Girls Service WhatsApp -> 9999965857 Available 24x7 ^ De...
 
Correctly Loading Incremental Data at Scale
Correctly Loading Incremental Data at ScaleCorrectly Loading Incremental Data at Scale
Correctly Loading Incremental Data at Scale
 
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
young call girls in Rajiv Chowk🔝 9953056974 🔝 Delhi escort Service
 
An introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptxAn introduction to Semiconductor and its types.pptx
An introduction to Semiconductor and its types.pptx
 
Concrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptxConcrete Mix Design - IS 10262-2019 - .pptx
Concrete Mix Design - IS 10262-2019 - .pptx
 
Application of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptxApplication of Residue Theorem to evaluate real integrations.pptx
Application of Residue Theorem to evaluate real integrations.pptx
 
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
UNIT III ANALOG ELECTRONICS (BASIC ELECTRONICS)
 
complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...complete construction, environmental and economics information of biomass com...
complete construction, environmental and economics information of biomass com...
 
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdfCCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
CCS355 Neural Network & Deep Learning UNIT III notes and Question bank .pdf
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter8251 universal synchronous asynchronous receiver transmitter
8251 universal synchronous asynchronous receiver transmitter
 
Call Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile serviceCall Girls Delhi {Jodhpur} 9711199012 high profile service
Call Girls Delhi {Jodhpur} 9711199012 high profile service
 
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsyncWhy does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
Why does (not) Kafka need fsync: Eliminating tail latency spikes caused by fsync
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptxDecoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
Decoding Kotlin - Your guide to solving the mysterious in Kotlin.pptx
 
Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024Architect Hassan Khalil Portfolio for 2024
Architect Hassan Khalil Portfolio for 2024
 

Optic flow estimation with deep learning

  • 1. Optic Flow Estimation by Deep Learning YU HUANG SUNNYVALE, CALIFORNIA YU.HUANG07@GMAIL.COM
  • 2. Outline • Optic Flow • Brightness Constancy Constraints • Aperture Problem • Regularization and Smoothness Constraints • Lucas-Kanade algorithm • Focus of Expansion (FOE) • Discrete Optimization for Optical Flow • Large Displacement Optical Flow: Descriptor Matching • EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow • Optical Flow with Piecewise Parametric Model • SPM-BP: Sped-up PatchMatch Belief Propagation • Coarse-to-Fine PatchMatch for Large Optical Flow • Flow Fields: Correspondence Fields for Optical Flow • Full Flow: Optic Flow Estimate By Global Optimization over Regular Grids • DeepFlow: Large displ. optical flow with deep matching • FlowNet: Learning Optical Flow with ConvNets • Deep Discrete Flow • Optical Flow Estimation using a Spatial Pyramid Network • A Large Dataset to Train ConvNets for Disparity, Optical Flow, and Scene Flow Estimation • Optical Flow via Direct Cost Volume Processing by CNN • Appendix A: A Database and Evaluation for Optical Flow • Appendix B: Secret of Optic Flow Estimation • Appendix C: Deep Learning and optimization theory
  • 3. o Sometimes, motion is the only cue Motion and perceptual organization
  • 4. Optic Flow •Definition: optical flow is the apparent motion of brightness patterns in the image •Ideally, optical flow would be the same as the motion field •Have to be careful: apparent motion can be caused by lighting changes without any actual motion • Think of a uniform rotating sphere under fixed lighting v.s. a stationary sphere under moving illumination …
  • 5. Estimating optical flow • Given two subsequent frames, estimate the apparent motion field between them. • Key assumptions • Brightness constancy: projection of the same point looks the same in every frame • Small motion: points do not move very far • Spatial coherence: points move like their neighbors I(x,y,t–1) I(x,y,t)
  • 6. Brightness Constancy Equation: ),()1,,( ),,(),( tyxyx vyuxItyxI  ),(),(),,()1,,( yxvIyxuItyxItyxI yx  Can be written as: Brightness Constancy Constraint I(x,y,t–1) I(x,y,t) 0 tyx IvIuISo,
  • 7. 0 tyx IvIuI 1 equation in 2 unknowns   dt dx u dt dy v      y I Ix     y I I y t I It    The Aperture Problem 0
  • 10.
  • 11. Regularization & Smoothness Constraints Additional smoothness constraint : ,))()(( 2222 dxdyvvuue yxyxs   Besides of constraint equation term ,)( 2 dxdyIvIuIe tyxc   minimize es+ec Temporal aliasing causes ambiguities in optical flow because images can have many pixels with the same intensity. i.e., how do we know which ‘correspondence’ is correct? nearest match is correct (no aliasing) nearest match is incorrect (aliasing) actual shift estimated shift To overcome aliasing: coarse-to-fine strategy. Horn & Schunck algorithm
  • 12. Lucas-Kanade algorithm Prob: we have more equations than unknowns • The summations are over all pixels in the K x K window • This technique was first proposed by Lukas & Kanade (1981) Solution: solve least squares problem • Minimum least squares solution given by solution (in d) of:
  • 13. Lucas-Kanade algorithm ◦ Optimal (u, v) satisfies Lucas-Kanade equation When is This Solvable? • ATA should be invertible • ATA should not be too small due to noise – eigenvalues 1 and 2 of ATA should not be too small • ATA should be well-conditioned – 1/ 2 should not be too large (1 = larger eigenvalue) ATA is solvable when there is no aperture problem What are the potential causes of errors in this procedure? ◦ Suppose ATA is easily invertible ◦ Suppose there is not much noise in the image When the assumptions are violated? • Brightness constancy is not satisfied • The motion is not small • A point does not move like its neighbors – window size is too large – what is the ideal window size?
  • 14. Lucas-Kanade algorithm  Iterative Refinement in Lukas-Kanade  Estimate velocity at each pixel by solving Lucas-Kanade equations  Warp H towards I using the estimated flow field  use image warping techniques  Repeat until convergence  Some Implementation Issues:  Warping is not easy (ensure that errors in warping are smaller than the estimate refinement)  Warp one image, take derivatives of the other so you don’t need to re-compute the gradient after each iteration.  Often useful to low-pass filter the images before motion estimation (for better derivative estimation, and linear approximations to image intensity)
  • 15. Focus of Expansion (FOE) • Motion of object = - (Motion of Sensor) • For a given translatory motion and gaze direction, the world seems to flow out of one point (FOE). ),,( zyx ),,( 000 zyx1f )','( yx After time t, the scene point moves to: ),,(),,( 000 wtzvtyutxzyx  },{)','( 0 0 0 0 wtz vty wtz utx yx      • As t varies the image point moves along a straight line in the image • Focus of Expansion: backtrack time or )( t },{)','( w v w u yx 
  • 17. Discrete Optimization for Optical Flow Large-displacement optical flow from a discrete point of view: sub-pixel from pixel-accurate flow; Formulate optical flow estimation as discrete inference in a CRF, followed by sub-pixel refinement. 3 different strategies, to reduce computation and memory demands by several orders of magnitude. Combination of three strategies allow to estimate large-displacement optical flow. Diverse Flow Proposals: Efficient search structure, 300 nearest neighbors, 200 proposals from neighboring pixels Block Coordinate Descent Alternating optimization of image rows and columns, sub-problems solved optimally via DP Truncated Pairwise Potential Efficient Dynamic Programming
  • 18. Discrete Optimization for Optical Flow Strategies for Efficient Discrete Optical Flow. Left: a large set of diverse flow proposals per pixel by combining NN in feature space from a set of grid cells with winner-takes-all solutions from neighboring pixels. Middle: apply block coordinate descent, iteratively optimizing all image rows and columns conditioned on neighboring blocks via dynamic programming. Right: Taking advantage of robust penalties, reduce pairwise computation costs by pre-computing the set of non-truncated neighboring flow proposals for each flow vector.
  • 19. Discrete Optimization for Optical Flow Robust data term based on DAISY descriptors d Similar flow vectors f are encouraged by weighted by the edge strength Naive Dynamic Programming Efficient Dynamic Programming
  • 20. Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation Integrating rich descriptors into variational optical flow setting vs coarse-to-fine warping schemes; Estimate a dense optical flow field with the same high accuracy as from variational optical flow; VARIATIONAL MODEL
  • 21. SPM-BP: Sped-up PatchMatch Belief Propagation for Continuous MRFs Integrating key ideas from PatchMatch of effective particle propagation and resampling, PatchMatch belief propagation (PMBP) has been demonstrated to have good performance in addressing continuous labeling problems and runs orders of magnitude faster than Particle BP (PBP). Sped-up PMBP (SPM-BP): unifying efficient filter-based cost aggregation and message passing with PatchMatch-based particle generation in a highly effective way. Two-layer graph structure used in SPM-BP: (b)(c) A superpixel-level graph generates new particle proposals to be tested on the pixel- level graph. (d) For the reference superpixel, the EAF is applied to obtain the data cost. (e) The message passing algo. proceeds in the inner loop, while outgoing messages on the boundary are fixed.
  • 22. Efficient Coarse-to-Fine PatchMatch for Large Displacement Optical Flow CPM (Coarse-to-fine PatchMatch), blends an efficient random search strategy with the coarse-to-fine scheme for optical flow problem, wrt the nearest neighbor field (NNF). Propagation with constrained random search radius btw adjacent levels on the hierarchical architecture. Construct the pyramids. On each level, the initial matching correspondences is propagated with random search after a fixed number of times, and the results of each level is used as a initialization of the next lower level.
  • 23. Efficient Coarse-to-Fine PatchMatch for Large Displacement Optical Flow Sec. 3.4 A forward-backward consistency check is performed to detect the occlusions and remove the outliers, on multi levels of the pyramid. Only validation of matching correspondences on the two finest levels is checked. With backward flow interpolated from matching correspondences linearly, let the error threshold of the consistency check equal to the grid spacing, and the coarser matches are all upscaled to the finest resolution before the consistency check. The matches > 400 pixels are also removed.
  • 24. EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow Optical flow estimation at large displacements with significant occlusions. Edge-Preserving Interpolation of Correspondences (EpicFlow) is fast and robust. 2 steps: i) dense matching by edge-preserving interpolation from a sparse set of matches; ii) variational energy minimization initialized with the dense matches.  The sparse-to-dense interpolation relies on an appropriate choice of the distance, namely an edge-aware geodesic distance. Handle occlusions and motion boundaries – two issues for optical flow computation. Approximation for geodesic distance to allow fast computation w/o performance loss. Variational energy minimization on dense matches to obtain the final flow estimation.
  • 25. EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow Overview of EpicFlow. Given two images, compute matches using DeepFlow and the edges of the first image using SED (Structured Edge Detector). Combine them to interpolate matches and obtain a dense correspondence field, as initialization of a one-level energy minimization framework.
  • 26. EpicFlow: Edge-Preserving Interpolation of Correspondences for Optical Flow (a-b) two consecutive frames; (c) contour response C from SED; (d) match positions from DeepMatching; (e-f) geodesic distance from a pixel to all others. (g-h) 100 nearest matches using geodesic distance from the pixel.
  • 27. Dense, Accurate Optical Flow Estimation with Piecewise Parametric Model Fit a flow field piecewise to a variety of parametric models, where the domain of each piece (i.e., each piece’s shape, position and size) is determined adaptively, while at the same time maintaining a global inter-piece flow continuity constraint. A multi-model fitting scheme via energy minimization, taking into account both the piecewise constant model assumption and the flow field continuity constraint, enabling to effectively handle both homogeneous motions and complex motions. Potts model term MDL term Data term Flow continuity(inter-piece compatibility) term
  • 28. Dense, Accurate Optical Flow Estimation with Piecewise Parametric Model
  • 29. Flow Fields: Dense Correspondence Fields for Accurate Large Displacement Optical Flow Estimation  A dense correspondence field approach much better suited for optical flow estimation than approximate nearest neighbor fields. Do not require explicit regularization, smoothing or a new data term, but a data based search strategy that finds most inliers and enhancements for outlier filtering. The pipeline of the Flow Field approach. For the basic approach, only consider the full resolution.
  • 30. Flow Fields: Dense Correspondence Fields for Accurate Large Displacement Optical Flow Estimation Illustration of the hierarchical Flow Field approach. Flow offsets saved in pixels are propagated in all arrow directions.
  • 31. Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids A global optimization approach to optical flow estimation which optimizes a classical optical flow objective over the full space of mappings between discrete grids. The regular structure of the space of mappings enables optimizations that reduce the computational complexity of the algorithm’s inner loop and support efficient matching. The approach treats the objective (data term and regularization term) as a Markov random field and uses discrete optimization techniques. Optical flow over regular grids. Each pixel p in I1 is spatially connected to its four neighbors in I1 and temporally connected to (2ς + 1)2 pixels in I2. Ω → [−ς, ς]2 be a flow field.
  • 32. Full Flow: Optical Flow Estimation By Global Optimization over Regular Grids This objective is a discrete Markov random field with a two-dimensional label space; To optimize the model, use TRW-S, which optimizes the dual of a natural linear programming relaxation of the problem; To reduce wall-clock time, implemente a parallelized TRW-S solver; Occlusion handling by FW-BW consistency checking; Use EpicFlow interpolation scheme as postprocessing.
  • 33. DeepFlow: Large displacement optical flow with deep matching  DeepFlow, blends a matching algorithm with a variational approach for optical flow. A descriptor matching algorithm, tailored to the optical flow problem, that allows to boost performance on fast motions.  The matching algorithm builds upon a multi-stage architecture with 6 layers, interleaving convolutions and max-pooling, a construction akin to deep convolutional nets. Using dense sampling, it allows to efficiently retrieve quasi-dense correspondences, and enjoys a built-in smoothing effect on descriptors matches, a valuable asset for integration into an energy minimization framework for optical flow estimation.  DeepFlow efficiently handles large displacements occurring in realistic videos, and shows competitive performance on optical flow benchmarks.
  • 34. DeepFlow: Large displacement optical flow with deep matching
  • 35. FlowNet: Learning Optical Flow with Convolutional Networks A generic architecture and another one including a layer that correlates feature vectors at different image locations: FlowNetSimple and FlowNetCorr, being trained end-to-end. A simple choice is to stack both input images together and feed them through a rather generic network, allowing the network to decide itself how to process the image pair to extract the motion information, called consisting only of convolutional layers ‘FlowNetSimple’. A straightforward step is to create two separate, yet identical processing streams for the two images and to combine them at a later stage. Design a ‘correlation layer’ that performs multiplicative patch comparisons between two feature maps consisting of the layer “FlowNetCorr”. Given two multi-channel feature maps f1, f2: R2 → Rc, with w, h, and c being their width, height and number of channels, the correlation layer lets the network compare each patch from f1 with each path from f2. Refinement: The main ingredient are ‘upconvolutional’ layers, consisting of unpooling (extending the feature maps, as opposed to pooling) and a convolution.
  • 36. FlowNet: Learning Optical Flow with Convolutional Networks
  • 37. FlowNet: Learning Optical Flow with Convolutional Networks Refinement of the coarse feature maps to the high resolution prediction
  • 38. FlowNet 2.0 End-to-end learning of optical flow: a stacked architecture with warping of the 2nd image with intermediate optical flow; small displacements by a sub-network specializing on small motions. Evaluation of options when stacking two FlowNetS networks (Net1 and Net2)
  • 40. Deep Discrete Flow Investigate two types of networks: a local network with a small receptive field consisting of 3x3 convolutions followed by non-linearities a subsequent context network that aggregates information over larger image regions using dilated convolutions; Learning context-aware features for solving optical flow using discrete optimization; Training a context network with a large receptive field size on top of a local network using dilated convolutions on patches. Feature matching by comparing each pixel in the ref image to every pixel in the target image; The matching cost volume from the network's output forms the data term for discrete MAP inference in a pairwise MRF. Local Network: leverages 3x3 convolution kernels. The hyper-parameters of the network are the number of layers and the number of feature maps in each layer as specified in evaluation. Context Network: increases the size of the receptive field with only modest increase in complexity by exploiting dilated convolutions, i.e. reading the input feature maps at locations with a spatial stride larger than one, taking more contextual information into account.
  • 41. Deep Discrete Flow The input images are processed in forward order and backward order using local and context Siamese CNN, yielding per- pixel descriptors. Then match points on a regular grid in the ref image to every pixel in the other image, yielding a large tensor of forward matching costs (F1/F2) and backward matching costs (B1/B2). Matching costs are smoothed using discrete MAP inference in a pairwise MRF. Finally, a forward-backward consistency check removes outliers and sub-pixel accuracy is attained using the EpicFlow interpolator . Train the model in a piece-wise fashion via the loss functions.
  • 42. Deep Discrete Flow (a) Naive (b) Fast Dilated Convolution Implementations The center of the patch is marked with a red * and each color corresponds to a convolution center for a specific dilation factor, red for 4 dilations (shown in green), green for 2 dilations (shown in blue) and yellow for both.
  • 43. Deep Discrete Flow Fast Patch-based Training of Dilated Convolutional Networks. Left: A naive implementation requires dilated convolution operations which are computationally less efficient than highly optimized cudnn convolutions without dilations. Right: The behavior of dilated convolutions can be replicated with regular convolutions by first sub-sampling the feature map and then applying 1-dilated convolutions with stride. Here dilations is denoting an array that species the dilation factor of the dilated convolution in each convolutional layer.
  • 44. Optical Flow Estimation using a Spatial Pyramid Network Compute optical flow by combining a classical spatial-pyramid formulation with deep learning. This estimates large motions in a coarse-to-fine approach by warping one image of a pair at each pyramid level by the current flow estimate and computing an update to the flow. Train one deep network per level to compute the flow update. Do not need to deal with large motions, instead these are dealt with by the pyramid.  Spatial Pyramid Network (SPyNet) is much simpler and 96% smaller than FlowNet in terms of model parameters.  Since the flow at each pyramid level is small (< 1 pixel), a convolutional approach applied to pairs of warped images is appropriate. The learned convolution filters appear similar to classical spatio-temporal filters, giving insight into the method and how to improve it.  Trained using Adam optimization with 1 = 0:9 and 2 = 0:999. A batch size of 32 across all networks with 4000 iterations per epoch. A learning rate of 1e-4 for the first 60 epochs and decrease it to 1e-5 until converge.
  • 45. Optical Flow Estimation using a Spatial Pyramid Network Training network Gk requires trained models {G0,…,Gk} to obtain the initial flow u(Vk-1). Obtain ground truth residual flows ῠk by subtracting downsampled ground truth flow Ṽk and u(Vk-1) to train the network Gk using the End Point Error (EPE) loss. Each level in the pyramid has a simplified task relative to the full optical flow estimation problem; it only has to estimate a small-motion update to an existing flow field. Consequently each network can be simple.
  • 46. Optical Flow Estimation using a Spatial Pyramid Network Inference in a 3-Level Pyramid Network: The network G0 computes the residual flow v0 at the highest level of the pyramid(smallest image) using the low resolution images {I1 0 , I2 0 }. At each pyramid level, the network Gk computes a residual flow vk which propagates to each of the next lower levels of the pyramid in turn, to finally obtain the flow V2 at the highest resolution.
  • 47. A Large Dataset to Train ConvNets for Disparity, Optical Flow, and Scene Flow Estimation What is Scene Flow? Scene flow describes the 3D motion of scene points, just like optical flow describes the 2D motion. Disparity estimation: First only train early low res. losses, second enable higher res. and phase out low res. losses, then repurpose the deeper layers when no longer constrained by directly attached losses; Scene flow estimation: 1. Interleaving 3 pretrained networks (1x FlowNet and 2x DispNets); 2. Joint retraining on optical flow, 2x disparity, and disparity change.
  • 48. A Large Dataset to Train ConNets for Disparity, Optical Flow, and Scene Flow Estimation Interleaving the weights of a FlowNet (green) and two DispNets (red and blue) to a SceneFlowNet. For every layer, the filter masks are created by taking the weights of one network (left) and setting the weights of the other networks to zero, respectively (middle). The outputs from each network are then concatenated to yield one big network with three times the number of inputs and outputs (right).
  • 49. A Large Dataset to Train ConNets for Disparity, Optical Flow, and Scene Flow Estimation
  • 50. Accurate Optical Flow via Direct Cost Volume Processing by CNN Optical flow estimation operating on the full 4-d cost volume. Share the structural benefits of leading stereo matching pipelines to yield high accuracy. The full 4-d cost volume can be constructed in a fraction of a second due to its regularity. Adapt semi-global matching to the 4-d setting, to a pipeline that achieves higher accuracy. Learn a nonlinear feature embedding using a convolutional network. Embed image patches into a compact and discriminative feature space that is robust to geometric and radiometric distortions encountered in optical flow estimation. Feature space embeddings as well as distances in this space can be computed extremely efficiently. A small fully-convolutional network that embeds raw image patches into a compact Euclidean space. SGM is a common stand-in for more costly MRF optimization in stereo processing pipelines, robust/in parallel. EpicFlow uses locally-weighted affine models to synthesize a dense flow field from semidense matches.
  • 51. Accurate Optical Flow via Direct Cost Volume Processing by CNN Qualitative results on three images from the KITTI 2015 training set.
  • 52. Appendix A: A Database and Evaluation Methodology for Optical Flow
  • 53. Limitations of Yosemite Only sequence used for quantitative evaluation Limitations: •Very simple and synthetic •Small, rigid motion •Minimal motion discontinuities/occlusions Image 7 Image 8 Yosemite Ground-Truth Flow Flow Color Coding
  • 54. Limitations of Yosemite Only sequence used for quantitative evaluation Current challenges: •Non-rigid motion •Real sensor noise •Complex natural scenes •Motion discontinuities Need more challenging and more realistic benchmarks Image 7 Image 8 Yosemite Ground-Truth Flow Flow Color Coding
  • 55. Realistic synthetic imagery •Randomly generate scenes with “trees” and “rocks” •Significant occlusions, motion, texture, and blur •Rendered using Mental Ray and “lens shader” plugin RockGrove
  • 56. Modified stereo imagery •Recrop and resample ground-truth stereo datasets to have appropriate motion for OF VenusMoebius
  • 57. •Paint scene with textured fluorescent paint •Take 2 images: One in visible light, one in UV light •Move scene in very small steps using robot •Generate ground-truth by tracking the UV images Dense flow with hidden texture Setup Visible UV Lights Image Cropped
  • 58. Conclusions •Difficulty: Data substantially more challenging than Yosemite •Diversity: Substantial variation in difficulty across the various datasets •Motion GT vs Interpolation: Best algorithms for one are not the best for the other •Comparison with Stereo: Performance of existing flow algorithms appears weak Szeliski
  • 59. Appendix B: Secret of Optic Flow Estimation and Their Principles
  • 60. Classical Optical Flow Objective Function u and v are the horizontal and vertical components of the optical flow field to be estimated from images I1 and I2, λ is a regularization parameter, and ρD and ρS are the data and spatial penalty functions. The penalty functions: (1) the quadratic HS penalty, (2) the Charbonnier penalty, and (3) the Lorentzian.
  • 61. Pre-processing Optimize the regularization parameter λ for the training sequences; Apply non-linear prefiltering of the images to reduce the influence of illumination changes; Use a standard brightness constancy model; Gradient only imposes constancy of the gradient vector at each pixel (i.e. it robustly penalizes Euclidean distance between image gradients); Simple derivative constancy is as good as the more sophisticated texture decomposition method.
  • 62. Coarse-to-fine estimation and GNC (Graduated Non-Convexity) The GNC (graduated non-convexity) scheme: linearly combine a quadratic objective with a robust objective in varying proportions, from fully quadratic to fully robust; The downsampling factor does not matter when using a convex penalty; a standard factor of 0.5 is fine; The GNC method is helpful even for the convex Charbonnier penalty function due to the nonlinearity of the data term.
  • 63. Interpolation method and derivatives Bicubic interpolation is more accurate than bilinear; Removing temporal averaging of the gradients, using Central difference filters, or using a 7-point derivative filter all reduce accuracy compared to the baseline, but not significantly. The spline-based interpolation scheme is consistently better; Temporal averaging of the derivatives is probably worthwhile for a small computational expense.
  • 64. Penalty functions The convex Charbonnier penalty performs better than the more robust, non- convex Lorentzian on both the training and test sets. One reason that non-convex functions are more difficult to optimize, causing the optimization scheme to find a poor local optimum. The less-robust Charbonnier is preferable to the Lorentzian and a slightly non- convex penalty function is better still.
  • 65. Median filtering The baseline 5 × 5 median filter is better than both MF 3×3 and MF 7×7 but the difference is not significant. When we perform 5× 5 median filtering twice (2× MF) or five times (5× MF) per warping step, the results are worse. Finally, removing the median filtering step (w/o MF) makes the computed flow significantly less accurate with larger outliers.
  • 66. Appendix C: Machine (Deep) Learning and Optimization
  • 67. Graphical Models • Graphical Models: Powerful framework for representing dependency structure between random variables. • The joint probability distribution over a set of random variables. • The graph contains a set of nodes (vertices) that represent random variables, and a set of links (edges) that represent dependencies between those random variables. • The joint distribution over all random variables decomposes into a product of factors, where each factor depends on a subset of the variables. • Two type of graphical models: • Directed (Bayesian networks) • Undirected (Markov random fields, Boltzmann machines) • Hybrid graphical models that combine directed and undirected models, such as Deep Belief Networks, Hierarchical-Deep Models.
  • 68. Generative Model: MRF Random Field: F={F1,F2,…FM} a family of random variables on set S in which each Fi takes value fi in a label set L. Markov Random Field: F is said to be a MRF on S w.r.t. a neighborhood N if and only if it satisfies Markov property. ◦ Generative model for joint probability p(x) ◦ allows no direct probabilistic interpretation ◦ define potential functions Ψ on maximal cliques A ◦ map joint assignment to non-negative real number ◦ requires normalization MRF is undirected graphical models
  • 69. A flow network G(V, E) defined as a fully connected directed graph where each edge (u,v) in E has a positive capacity c(u,v) >= 0; The max-flow problem is to find the flow of maximum value on a flow network G; A s-t cut or simply cut of a flow network G is a partition of V into S and T = V-S, such that s in S and t in T; A minimum cut of a flow network is a cut whose capacity is the least over all the s-t cuts of the network; Methods of max flow or mini-cut: ◦ Ford Fulkerson method; ◦ "Push-Relabel" method.
  • 70. Mostly labeling is solved as an energy minimization problem; Two common energy models: ◦ Potts Interaction Energy Model; ◦ Linear Interaction Energy Model. Graph G contain two kinds of vertices: p-vertices and i-vertices; ◦ all the edges in the neighborhood N, called n-links; ◦ edges between the p-vertices and the i-vertices called t-links. In the multiple labeling case, the multi-way cut should leave each p-vertex connected to one i-vertex; The minimum cost multi-way cut will minimize the energy function where the severed n-links would correspond to the boundaries of the labeled vertices; The approximation algorithms to find this multi-way cut: ◦ "alpha-expansion" algorithm; ◦ "alpha-beta swap" algorithm.
  • 71. Deep Learning Representation learning attempts to automatically learn good features or representations; Deep learning algorithms attempt to learn multiple levels of representation of increasing complexity/abstraction (intermediate and high level features); Become effective via unsupervised pre-training + supervised fine tuning; ◦ Deep networks trained with back propagation (without unsupervised pre-training) perform worse than shallow networks. Deal with the curse of dimensionality (smoothing & sparsity) and over-fitting (unsupervised, regularizer); Semi-supervised: structure of manifold assumption; ◦ labeled data is scarce and unlabeled data is abundant.
  • 72. Why Deep Learning? Supervised training of deep models (e.g. many-layered Nets) is too hard (optimization problem); ◦ Learn prior from unlabeled data; Shallow models are not for learning high-level abstractions; ◦ Ensembles or forests do not learn features first; ◦ Graphical models could be deep net, but mostly not. Unsupervised learning could be “local-learning”; ◦ Resemble boosting with each layer being like a weak learner Learning is weak in directed graphical models with many hidden variables; ◦ Sparsity and regularizer. Traditional unsupervised learning methods aren’t easy to learn multiple levels of representation. ◦ Layer-wised unsupervised learning is the solution. Multi-task learning (transfer learning and self taught learning); Other issues: scalability & parallelism with the burden from big data.
  • 73. Multi Layer Neural Network A neural network = running several logistic regressions at the same time; ◦ Neuron=logistic regression or… Calculate error derivatives (gradients) to refine: back propagate the error derivative through model (the chain rule) ◦ Online learning: stochastic/incremental gradient descent ◦ Batch learning: conjugate gradient descent
  • 74. Problems in MLPs Multi Layer Perceptrons (MLPs), one feed-forward neural network, were popularly used for decades. Gradient is progressively getting more scattered ◦ Below the top few layers, the correction signal is minimal Gets stuck in local minima ◦ Especially start out far from ‘good’ regions (i.e., random initialization) In usual settings, use only labeled data ◦ Almost all data is unlabeled! ◦ Instead the human brain can learn from unlabeled data.
  • 75. Convolutional Neural Networks CNN is a special kind of multi-layer NNs applied to 2-d arrays (usually images), based on spatially localized neural input; ◦ local receptive fields(shifted window), shared weights (weight averaging) across the hidden units, and often, spatial or temporal sub-sampling; ◦ Related to generative MRF/discriminative CRF: ◦ CNN=Field of Experts MRF=ML inference in CRF; ◦ Generate ‘patterns of patterns’ for pattern recognition. Each layer combines (merge, smooth) patches from previous layers ◦ Pooling /Sampling (e.g., max or average) filter: compress and smooth the data. ◦ Convolution filters: (translation invariance) unsupervised; ◦ Local contrast normalization: increase sparsity, improve optimization/invariance. C layers convolutions, S layers pool/sample
  • 76. Convolutional Neural Networks Convolutional Networks are trainable multistage architectures composed of multiple stages; Input and output of each stage are sets of arrays called feature maps; At output, each feature map represents a particular feature extracted at all locations on input; Each stage is composed of: a filter bank layer, a non-linearity layer, and a feature pooling layer; A ConvNet is composed of 1, 2 or 3 such 3-layer stages, followed by a classification module; ◦ A fully connected layer: softmax transfer function for posterior distribution. Filter: A trainable filter (kernel) in filter bank connects input feature map to output feature map; Nonlinearity: a pointwise sigmoid tanh() or a rectified sigmoid abs(gi•tanh()) function; ◦ In rectified function, gi is a trainable gain parameter, might be followed a contrast normalization N; Feature pooling: treats each feature map separately -> a reduced-resolution output feature map; Supervised training is performed using a form of SGD to minimize the prediction error; ◦ Gradients are computed with the back-propagation method. Unsupervised pre-training: predictive sparse decomposition (PSD), then supervised fine-tuning. * is discrete convolution operator
  • 77. Belief Nets Belief net is directed acyclic graph composed of stochastic var. Can observe some of the variables and solve two problems: ◦ inference: Infer the states of the unobserved variables. ◦ learning: Adjust the interactions between variables to more likely generate the observed data. stochastic hidden cause visible effect Use nets composed of layers of stochastic variables with weighted connections.
  • 78. Boltzmann Machines Energy-based model associate a energy to each configuration of stochastic variables of interests (for example, MRF, Nearest Neighbor); ◦ Learning means adjustment of the low energy function’s shape properties; Boltzmann machine is a stochastic recurrent model with hidden variables; ◦ Monte Carlo Markov Chain, i.e. MCMC sampling (appendix); Restricted Boltzmann machine is a special case: ◦ Only one layer of hidden units; ◦ factorization of each layer’s neurons/units (no connections in the same layer); Contrastive divergence: approximation of gradient (appendix). probability Energy Function Learning rule
  • 79. Deep Belief Networks A hybrid model: can be trained as generative or discriminative model; Deep architecture: multiple layers (learn features layer by layer); ◦ Multi layer learning is difficult in sigmoid belief networks. ◦ Top two layers are undirected connections, RBM; ◦ Lower layers get top down directed connections from layers above; Unsupervised or self-taught pre-learning provides a good initialization; ◦ Greedy layer-wise unsupervised training for RBM Supervised fine-tuning ◦ Generative: wake-sleep algorithm (Up-down) ◦ Discriminative: back propagation (bottom-up)
  • 80. Deep Boltzmann Machine Learning internal representations that become increasingly complex; High-level representations built from a large supply of unlabeled inputs; Pre-training consists of learning a stack of modified RBMs, which are composed to create a deep Boltzmann machine (undirected graph); Generative fine-tuning: different from DBN ◦ Positive and negative phase (appendix) Discriminative fine-tuning: the same to DBN ◦ Back propagation.
  • 81. Denoising Auto-Encoder Multilayer NNs with target output=input; Reconstruction=decoder(encoder(input)); ◦ Perturbs the input x to a corrupted version; ◦ Randomly sets some of the coordinates of input to zeros. ◦ Recover x from encoded perturbed data. Learns a vector field towards higher probability regions; Pre-trained with DBN or regularizer with perturbed training data; Minimizes variational lower bound on a generative model; ◦ corresponds to regularized score matching on an RBM; PCA=linear manifold=linear Auto Encoder; Auto-encoder learns the salient variation like a nonlinear PCA.
  • 82. Stacked Denoising Auto-Encoder Stack many (may be sparse) auto-encoders in succession and train them using greedy layer-wise unsupervised learning ◦ Drop the decode layer each time ◦ Performs better than stacking RBMs; Supervised training on the last layer using final features; (option) Supervised training on the entire network to fine- tune all weights of the neural net; Empirically not quite as accurate as DBNs.
  • 83.  A simplified Bayes Net: it propagates info. throughout a graphical model via a series of messages between neighboring nodes iteratively; likely to converge to a consensus that determines the marginal prob. of all the variables;  messages estimate the cost (or energy) of a configuration of a clique given all other cliques; then the messages are combined to compute a belief (marginal or maximum probability); Two types of BP methods: ◦ max-product; ◦ sum-product. BP provides exact solution when there are no loops in graph! Equivalent to dynamic programming/Viterbi in these cases; Loopy Belief Propagation: still provides approximate (but often good) solution;
  • 84. Generalized BP for pairwise MRFs ◦ Hidden variables xi and xj are connected through a compatibility function; ◦ Hidden variables xi are connected to observable variables yi by the local “evidence” function; The joint probability of {x} is given by To improve inference by taking into account higher-order interactions among the variables; ◦ An intuitive way is to define messages that propagate between groups of nodes rather than just single nodes; ◦ This is the intuition in Generalized Belief Propagation (GBP).
  • 85. Stochastic Gradient Descent (SGD) • The general class of estimators that arise as minimizers of sums are called M- estimators; • Where are stationary points of the likelihood function (or zeroes of its derivative, the score function)? • Online gradient descent samples a subset of summand functions at every step; • The true gradient is approximated by a gradient at a single example; • Shuffling of training set at each pass. • There is a compromise between two forms, often called "mini-batches", where the true gradient is approximated by a sum over a small number of training examples. • STD converges almost surely to a global minimum when the objective function is convex or pseudo-convex, and otherwise converges almost surely to a local minimum.
  • 86. Back Propagation E (f(x0,w),y0) = -log (f(x0,w)- y0).
  • 87. Variable Learning Rate Too large learning rate ◦ cause oscillation in searching for the minimal point Too slow learning rate ◦ too slow convergence to the minimal point Adaptive learning rate ◦ At the beginning, the learning rate can be large when the current point is far from the optimal point; ◦ Gradually, the learning rate will decay as time goes by. Should not be too large or too small: ◦ annealing rate 𝛼(𝑡)=𝛼(0)/(1+𝑡/𝑇) ◦ 𝛼(𝑡) will eventually go to zero, but at the beginning it is almost a constant.
  • 90. Dropout and Maxout for Overfitting Dropout: set the output of each hidden neuron to zero w.p. 0.5. ◦ Motivation: Combining many different models that share parameters succeeds in reducing test errors by approximately averaging together the predictions, which resembles the bagging. ◦ The units which are “dropped out” in this way do not contribute to the forward pass and do not participate in back propagation. ◦ So every time an input is presented, the NN samples a different architecture, but all these architectures share weights. ◦ This technique reduces complex co-adaptations of units, since a neuron cannot rely on the presence of particular other units. ◦ It is, therefore, forced to learn more robust features that are useful in conjunction with many different random subsets of the other units. ◦ Without dropout, the network exhibits substantial overfitting. ◦ Dropout roughly doubles the number of iterations required to converge. Maxout takes the maximum across multiple feature maps;
  • 91. Weight Decay for Overfitting Weight decay or L2 regularization adds a penalty term to the error function, a term called the regularization term: the negative log prior in Bayesian justification, ◦ Weight decay works as rescaling weights in the learning rule, but bias learning still the same; ◦ Prefer to learn small weights, and large weights allowed if improving the original cost function; ◦ A way of compromising btw finding small weights and minimizing the original cost function; In a linear model, weight decay is equivalent to ridge (Tikhonov) regression; L1 regularization: the weights not really useful shrink by a constant amount toward zero; ◦ Act like a form of feature selection; ◦ Make the input filters cleaner and easier to interpret; L2 regularization penalizes large values strongly while L1 regularization ; Markov Chain Monte Carlo (MCMC): simulating a Markov chain whose equilibrium distr. is the posterior distribution for weights & hyper-parameters; Hybrid Monte Carlo: gradient and sampling.
  • 92. Early Stopping for Overfitting Steps in early stopping: ◦ Divide the available data into training and validation sets. ◦ Use a large number of hidden units. ◦ Use very small random initial values. ◦ Use a slow learning rate. ◦ Compute the validation error rate periodically during training. ◦ Stop training when the validation error rate "starts to go up". Early stopping has several advantages: ◦ It is fast. ◦ It can be applied successfully to networks in which the number of weights far exceeds the sample size. ◦ It requires only one major decision by the user: what proportion of validation cases to use. Practical issues in early stopping: ◦ How many cases do you assign to the training and validation sets? ◦ Do you split the data into training and validation sets randomly or by some systematic algorithm? ◦ How do you tell when the validation error rate "starts to go up"?
  • 93. MCMC Sampling for Optimization Markov Chain: a stochastic process in which future states are independent of past states but the present state. ◦ Markov chain will typically converge to a stable distribution. Monte Carlo Markov Chain: sampling using ‘local’ information ◦ Devise a Markov chain whose stationary distribution is the target. ◦ Ergodic MC must be aperiodic, irreducible, and positive recurrent. ◦ Monte Carlo Integration to get quantities of interest. Metropolis-Hastings method: sampling from a target distribution ◦ Create a Markov chain whose transition matrix does not depend on the normalization term. ◦ Make sure the chain has a stationary distribution and it is equal to the target distribution (accept ratio). ◦ After sufficient number of iterations, the chain will converge the stationary distribution. Gibbs sampling is a special case of M-H Sampling. ◦ The Hammersley-Clifford Theorem: get the joint distribution from the complete conditional distribution. Hybrid Monte Carlo: gradient sub step for each Markov chain.
  • 94. Mean Field for Optimization Variational approximation modifies the optimization problem to be tractable, at the price of approximate solution; Mean Field replaces M with a (simple) subset M(F), on which A* (μ) is a closed form (Note: F is disconnected graph); ◦ Density becomes factorized product distribution in this sub-family. ◦ Objective: K-L divergence. Mean field is a structured variation approximation approach: ◦ Coordinate ascent (deterministic); Compared with stochastic approximation (sampling): ◦ Faster, but maybe not exact.
  • 95. Contrastive Divergence for RBMs Contrastive divergence (CD) is proposed for training PoE first, also being a quicker way to learn RBMs; ◦ Contrastive divergence as the new objective; ◦ Taking gradients and ignoring a term which is usually very small. Steps: ◦ Start with a training vector on the visible units. ◦ Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. Can be applied using any MCMC algorithm to simulate the model (not limited to just Gibbs sampling); CD learning is biased: not work as gradient descent Improved: Persistent CD explores more modes in the distribution ◦ Rather than from data samples, begin sampling from the mode samples, obtained from the last gradient update. ◦ Still suffer from divergence of likelihood due to missing the modes. Score matching: the score function does not depend on its normal. factor. So, match it b.t.w. the model with the empirical density.
  • 96. “Wake-Sleep” Algorithm for DBN Pre-trained DBN is a generative model; Do a stochastic bottom-up pass (wake phase) ◦ Get samples from factorial distribution (visible first, then generate hidden); ◦ Adjust the top-down weights to be good at reconstructing the feature activities in the layer below. Do a few iterations of sampling in the top level RBM ◦ Adjust the weights in the top-level RBM. Do a stochastic top-down pass (sleep phase) ◦ Get visible and hidden samples generated by generative model using data coming from nowhere! ◦ Adjust the bottom-up weights to be good at reconstructing the feature activities in the layer above. ◦ Any guarantee for improvement? No! The “Wake-Sleep” algorithm is trying to describe the representation economical (Shannon’s coding theory).
  • 97. Greedy Layer-Wise Training Deep networks tend to have more local minima problems than shallow networks during supervised training Train first layer using unlabeled data ◦ Supervised or semi-supervised: use more unlabeled data. Freeze the first layer parameters and train the second layer Repeat this for as many layers as desire ◦ Build more robust features Use the outputs of the final layer to train the last supervised layer (leave early weights frozen) Fine tune the full network with a supervised approach; Avoid problems to train a deep net in a supervised fashion. ◦ Each layer gets full learning ◦ Help with ineffective early layer learning ◦ Help with deep network local minima
  • 98. Why Greedy Layer-Wise Training Works? Take advantage of the unlabeled data; Regularization Hypothesis ◦ Pre-training is “constraining” parameters in a region relevant to unsupervised dataset; ◦ Better generalization (representations that better describe unlabeled data are more discriminative for labeled data) ; Optimization Hypothesis ◦ Unsupervised training initializes lower level parameters near localities of better minima than random initialization can. Only need fine tuning in the supervised learning stage.
  • 99. Two-Stage Pre-training in DBMs Pre-training in one stage ◦ Positive phase: clamp observed, sample hidden, using variational approximation (mean-field) ◦ Negative phase: sample both observed and hidden, using persistent sampling (stochastic approximation: MCMC) Pre-training in two stages ◦ Approximating a posterior distribution over the states of hidden units (a simpler directed deep model as DBNs or stacked DAE); ◦ Train an RBM by updating parameters to maximize the lower-bound of log-likelihood and correspond. posterior of hidden units. ◦ Options (CAST, contrastive divergence, stochastic approximation…).