Emerging 3D Scanning Technologies for PropTech

Emerging 3D Scanning Technologies for PropTech
Falling costs with rising quality via hardware innovations and deep learning

Outlineofthepresentation
StructurefromMotion(SfM) Low-cost passive sensing
360°imaging Omnidirectional immersiveimagesandvideos
Rangesensing Structuredlight, Matterport,Kinectforexample
Laserscanning LiDARs fromVelodyne for example
Data-drivenprocessing DeepLearning
3DDatasets Withwhat totrain yourdeeplearningpipelines
FutureProspects Short overview of future applications
Thepresentationismeant asatechnical introductionfor typical hardware andsoftware
processingtechniquesusedinreal estateandconstruction site scanning.
Computerscientistsnew to proptechorganizations andreal estate fieldin generalmight
especiallyfindthispresentation useful.One assumesthat thereaderisfamiliarwiththe basics
ofdeeplearning.

Datastructuresfor realestatescans
RGB+D Pixel grid presenting colorand depth
Example
from Prof. Li
Mesh(Polygon) from voxel data(“3Dpixels”)
Voxel grid meshing using marching cubes (StackExchange)
PointCloud unordered datatypically (i.e. not on agrid but sparse

PropTechResources for domaininsights
https://www.inman.com/
Inman Hacker Connect is created by and for the real
estate technology community. Debate, discuss and
define the future of real estate’s most pressing tech
issues at Hacker Connect. Join more than 400
engineers, developers, designers, product managers,
database architects, webmasters, and technology
executives from across the real estate space. Build
partnerships, connect with peers, tackle thorny tech
issues, learn best practices discover innovative
breakthroughs and collaborate during special
hands-on keyboard sessions at this day-long, tech-
first event.
WHY YOU SHOULD ATTEND Hear from industry
leaders on APIs, bots, data security, ownership, user
experience, blockchain and more. Take part in
collaborative hands-on-keyboard sessions and
come out with a new tool to apply to your job. Learn
how to better integrate data, workflows and be
competitive in your recruitment efforts
https://www.inman.com/event/hacker-17-sf/ http://www.moderneventures.com/accelerator/
https://gust.com/accelerators/moderne-accelerator
(Pi Labs) is Europe’s first venture capital platform
investing exclusively in early stage ventures in the
property tech vertical. London, United Kingdom.
http://pilabs.co.uk/
http://www.jamesdearsley.co.uk/
“The only PropTech site for the latest Property
Technology news and views”
#PropTech community across Europe. Join us for our next event in #Berlin
http://futureproptech.de/

StructurefromMotion(SfM)
Low-costpassivesensing

StructurefromMotionBasics
Structure-from-Motion (SfM). Instead of a
single stereo pair, the SfM technique requires
multiple, overlapping photographs as input to
feature extraction and 3-D reconstruction
algorithms. - Westoby et al
praehistorische-archaeologie.de - Florian Tubbesing
Structure from Motion can achieve good
accuracy compared to laser scanners.
James and Robson (2012)
Cited by 281 Articles, and see Related articles
This volcanic bomb (~10 cm across) from Soufrière Hills
volcano was scanned by an Arius3d laser scanner (
Stuart Robson, University College London) and also
reconstructed using the SfM-MVS technique, with the
results scaled by sfm_georef. Differences between cross
sections through the two models have RMS values of
~0.3 mm. Point cloud: low res (6 Mb)
http://www.lancaster.ac.uk/staff/jamesm/software/sfm_georef.htm
SfM method basically computes the relative camera
positions between all related photos. After every
relative camera position is found, the scheme uses
these matrices to reconstruct all feature points using
triangulation. Thus there are two main problems:
1) Image registration (e.g. SIFT, SURF, ORB, etc)
2) Pose Estimation (e.g. Perspective-n-Point with RANSAC)
By Dr Calle Olsson
https://www.youtube.com/watch?v=i7ierVkXYa8

StructurefromMotionLiteratureReferences
https://doi.org/10.1016/j.geomorph.2012.08.021
Cited by 631 articles, and see Related articles
https://arxiv.org/abs/1701.08493
Structure-from-Motion’ (SfM) operates under
the same basic tenets as stereoscopic
photogrammetry, namely that 3-D structure
can be resolved from a series of overlapping,
offset images. However, it differs fundamentally
from conventional photogrammetry, in that the
geometry of the scene, camera positions and
orientation is solved automatically without the
need to specify a priori, a network of targets
which have known 3-D positions. Instead, these
are solved simultaneously using a highly
redundant, iterative bundle adjustment
procedure, based on a database of features
automatically extracted from a set of multiple
overlapping images (Snavely et al 2008).
Finally, even though there exist various theoretical works in the literature
that study fundamental problems in SfM and/or provide rigorous analysis of
stability and robustness of specific methods, we believe that the SfM
community would still highly benefit from rigorous results on fundamental
problems (e.g., what is the theoretically maximal amount of mismatched
features or level of noise in the images that can be tolerated for a stable
structure recovery, and can this be achieved efficiently?) and theoretical
analysis of stability, robustness and computational efficiency of existing
or new methods

SLAM Simultaneouslocalizationandmapping
SLAM, Visual Odometry, Structure from Motion, Multiple View Stereo
Yu Huang, Senior Architect, Autonomous Driving@Baidu USA
https://www.slideshare.net/yuhuang/visual-slam-structure-from-motion-multiple-view-stereo
Samsung R&D Institute
Necessary Skills / Attributes:
● 5+ years’ experience delivering computer vision based products using C++ or Python
(Masters or PhD study will be considered).
● Theoretical and practical understanding of multi-view geometry and 3D
reconstruction.
● Experience with machine learning techniques within a computer vision context.
● PhD/MS in Computer Vision, Artificial Intelligence or Machine Learning.
● Expertise with Deep Neural Networks using TensorFlow or Keras.
SLAM stands for Simultaneous Localization and Mapping and one way to understand
it is to imagine yourself entering an unfamiliar building for the first time. As you move about
the building, you don't completely forget where you have already been. Indeed, at any
moment you have a pretty good idea where you are within the current map that you have
so far constructed in your head, and unless you have a really bad sense of direction, you
could probably turn around and get back out of the building without too much trouble.
Finding your way around the building is a good example of simultaneously
constructing a map and localizing yourself within that map.
http://www.pirobot.org/blog/0015/

SLAM Traditionalalgorithm comparison
http://dx.doi.org/10.1186/s41074-017-0027-2
The framework is mainly composed of three modules as follows.
1) Initialization
2) Tracking
3) Mapping
Additional modules for stable and accurate vSLAM
+ Relocalization
+Global map optimization
“ From the technical point of views, there is no definitive difference between SLAM and real-time SfM.”
Even though visual SLAM algorithms have been developed since 2003, vSLAM is
still an active research field. Each algorithm has different characteristics. We need
to choose an appropriate algorithm by considering a purpose of an application.

VisualOdometry
Taketomi et al. (2017):
http://dx.doi.org/10.1186/s41074-017-0027-2
“Odometry is to estimate the sequential changes of
sensor positions over time using sensors such as
wheel encoder to acquire relative sensor movement.
Camera-based odometry called visual odometry
(VO) is also one of the active research fields in the
literature [16, 17].
From the technical point of views, vSLAM and VO
are highly relevant techniques because both
techniques basically estimate sensor positions.
According to the survey papers in robotics [18, 19],
the relationship between vSLAM and VO can be
represented as follows.
vSLAM = VO + global map optimization
The relationship between vSLAM and VO can also
be found from the papers [20, 21] and the papers [22,
23]. In the paper [20, 22], a technique on VO was first
proposed. Then, a technique on vSLAM was
proposed by adding the global optimization in VO [21,
23].”
Towards stable visual odometry & SLAM solutions
for autonomous vehicles
https://www.youtube.com/watch?v=T5Y6OPG-d08
NavStik Hackerspace | Projects at Hackerspace
Visual Odometry using Optic Flow

SoftwareOpen-sourceVisualSFM
VisualSFM:AVisualStructurefromMotion
System Changchang Wu
Cited by 326 articles, and see Related articles
VisualSFM is a GUI application for 3D reconstruction using structure
from motion (SFM). The reconstruction system integrates several of my
previous projects: SIFT on GPU(SiftGPU), Multicore Bundle Adjustment,
and Towards Linear-time Incremental Structure from Motion
. VisualSFM runs fast by exploiting multicore parallelism for feature
detection, feature matching, and bundle adjustment.
Using VisualSFM and Meshlab as an offline alternative
to Autodesk's excellent 123D catch. I walk you through my
workflow for converting multiple images into a 3D model
suitable for use in Blender.
Tutorial for amateur photographers by Jamie Fuller.
https://www.youtube.com/watch?v=V4iBb_j6k_g
OpenSourcePhotogrammetrywithVisualSFM:
Ditching123DCatchJuly12,2013 by Jesse
Indoor Navigation from Multiple Images
By Jaan Tollander de Balsch, 2016, Aalto
https://jaantollander.github.io/SCI-C1000/pr
ototype.html
What is the best method for 3D object
modelling and reconstruction from photos
or videos taken by flying robots or drones?
What is the accuracy of such reconstruction
methods with regards to the vibrations of the
flying drones, quality of camera and resolution?
Is it possible to improve the results by organizing
multiple flights and overlaying/accumulating the
data in the point cloud? Is there any free
software available?

SoftwarePythonPhotogrammetryToolbox(PPT)GUI
Real photo x SfM with texture color x SfM with simple shader. Made
with Python Photogrammetry Toolbox GUI and rendered in Blender
with Cycles.
http://184.106.205.13/arcteam/ppt.php
https://github.com/archeos/ppt-gui/
Converting pictures into a 3D mesh with PPT, MeshLab and Blender
http://arc-team-open-research.blogspot.co.uk/2012/09/converting-pi
ctures-into-3d-mesh-with.html
Blender camera tracking + Python Photogrammetry Toolbox
http://arc-team-open-research.blogspot.co.uk/2012/11/blender-camer
a-tracking-python.html
The video show the skull reconstructed in 3D with Python Photogrammetry Toolkit GUI.
Smilodon, the 3D reconstruction of the saber-toothed cat
http://arc-team-open-research.blogspot.co.uk/2013/03/

Open-sourcelibraries forSfM
OpenSfM is a Structure from Motion
library written in Python on top of
OpenCV. The library serves as a
processing pipeline for reconstructing
camera poses and 3D scenes from
multiple images.
https://github.com/mapillary/OpenSfM
656 stars
OpenSfM
OpenMVG (Multiple View Geometry)
"open Multiple View Geometry" is a
library for computer-vision scientists and
especially targeted to the Multiple View
Geometry community.
https://github.com/openMVG/openMVG
1,1856 stars
OpenMVG
https://doi.org/10.1007/978-3-319-56414-2_5
http://imagine.enpc.fr/~marletr/publi/RRPR-2016
-Moulon-et-al.pdf
Sung and Lin (2017): “VisualSFM uses the pre-
emptive feature matching, the incremental
structure from motion and the re-triangulation
techniques. The incremental feature matching
can greatly speed up the process because
this kind of matching will first sort all feature
points and match only first h feature points for
each photo.”
Sung and Lin (2017): “OpenMVG also
contains incremental structure from
motion technique. Besides that, they
proposed a new iterative sampling
method called a contrario Random
Sample Consensus (AC-RANSAC) as a
substitution to the original RANSAC in
order to acquire higher precision and
better performance. The AC-RANSAC
using the “a contrario” methodology in
order to find a model that best fits the
data with a threshold T that adapts
automatically to the noise. Hence, it is
able to find a model and its associated
noise without a fixed threshold.”

Open-sourcelibraries forSfM+SLAM
OpenChisel
https://github.com/personalrobotics/OpenChisel
An open-source version of the Chisel chunked TSDF
library. It contains two packages:
open_chisel
open_chisel is an implementation of a generic
truncated signed distance field (TSDF) 3D mapping
library; based on the Chisel mapping framework
developed originally for Google's Project Tango. It is
a complete re-write of the original mapping system
(which is proprietary). open_chisel is chunked and
spatially hashed inspired by this work from
Neissner et. al, making it more memory-efficient than
fixed-grid mapping approaches, and more performant
than octree-based approaches. A technical
description of how it works can be found in our
RSS 2015 paper.
http://ri.cmu.edu/pub_files/2015/7/ChiselPaper.pdf

Research-gradeSfM old-school monovideo
http://dx.doi.org/10.1186/s13640-017-0168-3
Inspired by the structure from motion systems, we
propose a system that reconstructs sparse feature
points to a 3D point cloud using a mono video
sequence so as to achieve higher computation
efficiency. The system keeps tracking all detected
feature points and calculates both the amount of these
feature points and their moving distances. We only use
the key frames to estimate the current position of the
camera in order to reduce the computation load and
the noise interference on the system. Furthermore, for
the sake of avoiding duplicate 3D points, the system
reconstructs the 2D point only when the point shifts
out of the boundary of a camera. In our experiments,
we show that our system is able to be implemented on
tablets and can achieve state-of-the-art accuracy with
a denser point cloud with high speed.

Research-gradeSfM DeepLearning -based#1

Research-gradeSfM DeepLearning -based#2
https://arxiv.org/abs/1702.01381, 2 May 2017
We evaluated the performance of our proposal on the DTU dataset comparing it
with two traditional feature based methods, namely SURF (Cited by 8683
articles) and ORB ( Cited by 2739 articles).
The system is trained in an end-to-end manner utilising transfer
learning from a large scale classification dataset. In addition, a
variant of the proposed architecture containing a spatial pyramid
pooling (SPP) layer is evaluated and shown to further improve the
performance.
RegNet is able to correct even large decalibrations such as
depicted in the top image. The inputs for the deep neural
network are an RGB image and a projected depth map. RegNet
is able to establish correspondences between the two
modalities which enables it to estimate a 6 DOF extrinsic
calibration.
Additionally, with an iterative execution of multiple CNNs, that
are trained on different magnitudes of decalibration, our
approach compares favorably to state-of-the-art methods in
terms of a mean calibration error of 0.28º for the rotational and
6 cm for thetranslation components even for large
decalibrations up to 1.5 m and 20º
.

Research-gradePose/Structure DeepLearning -based#1
Essentially the same technology for stereo matching and depth map generation as for SfM
https://arxiv.org/abs/1703.04309 https://arxiv.org/abs/1704.07813
Empirical evaluation on the KITTI dataset
demonstrates the effectiveness of our
approach: 1) monocular depth performs
comparably with supervised methods that
use either ground-truth pose or depth for
training, and 2) pose estimation performs
favorably compared to established SLAM
systems under comparable input settings.

Research-gradePose/Structure DeepLearning -based#2
GANs on everything, so here as well :) The usefulness of VisualSFM/ openSFM/ openMVG for defensible startup products?
Inversion is often ambiguous, e.g., many compositions of 3D shape and camera pose give rise to the same 2D projection. To
address this ambiguity, we impose priors on the predicted latent factors, through an adversarial discriminator network
trained to discriminate between predicted factors and ground-truth ones. Training adversarial inversion does not require
input-output paired annotations, but merely a collection of ground-truth factors, unrelated (unpaired) to the current input.
Our model can thus be self-supervised by unlabelled image data, by minimizing a joint reconstruction and adversarial
loss, complementing any direct supervision provided by paired annotations.
Applying adversarial inversion to super-resolution and inpainting results in automated “visual plastic surgery”
Structure-from-motion(SfM) results with and without adversarial priors. The results of the baseline (columns 5th and 8th)
are obtained from a model with depth smooothness prior, trained with early stopping at 40K iterations (before divergence).

SfMonMobileDevices
https://doi.org/10.1109/ICCV.2013.15 | Cited by 141 articles, see Related articles
https://doi.org/10.1016/j.cviu.2016.09.007
After introducing the reconstruction algorithms at the base of our approach, we show how to build
applications able to generate 3D floor plans scaled to their real-world metric dimensions and
capable to manage scene not necessary limited by Manhattan World assumptions. Then, exploiting
the resulting structural and visual model, we propose a client-server interactive exploration system
implementing a low-DOF navigation interface, specifically developed for touch interaction on
smartphones and tablets.
https://doi.org/10.1145/2999508.2999526

SfMonMobileDevices CaseDacuda
Magic Leap, the augmented reality
startup that has raised $1.4 billion in
funding but has yet to release a product,
has made an acquisition to expand its
work in computer vision and deep
learning, and to build out its operations
into Europe.
The company has acquired the 3D division
of Dacuda, a computer vision startup
based out of Zurich. One of
Dacuda’s focuses had been
developing algorithms for consumer-
grade cameras (and not just cameras, but
any device with a camera function) to
capture 2D and 3D imaging in real time,
“making 3D content as easy as taking a
video.”
https://techcrunch.com/2017/02/18/confir
med-magic-leap-acquires-3d-division-of-d
As you can see, no detail about what the two might be working on. The acquisition was first rumored
last week — after Dacuda posted a note on its blog about selling its 3D division, and then
some Dacuda employees updated their LinkedIn profiles as Magic Leap employees (one example
here). Tom’s Hardware then speculated it could signal Magic Leap using technology developed by
Dacuda to enable room-scale, six degrees of freedom tracking (essentially to improve its image
capturing sensors in 3D environments).
The ecosystem there is attracting other big-name M&A. Faceshift, a motion capture startup
acquired by Apple in 2015, was also founded in Zurich. Facebook’s Oculus VR in August 2016
also quietly acquired a startup called Zurich Eye, incubated at the University of Zurich and ETH,
the federal institute of technology. Zurich Eye became the basis of Oculus and Facebook’s office in
the city. Zurich Eye, ironically, was co-founded by a three former software engineers from Dacuda
(they all now work for Oculus VR).
For example, in October the company had linked up with MindMaze, another virtual/augmented
reality startup out of Switzerland, to build a platform they were calling “MMI, the world’s first
multisensory computing platform for mobile-based, immersive and social virtual reality
applications,” MindMaze noted.
MindMaze said it planned to “deploy the technology for users globally to address a void left by
Google’s DayDream View for positional tracking and multiplayer interactions.” We have contacted
Magic Leap for comment and will update this post if and when we learn more.

AppleARKit Technology
https://developer.apple.com/arkit/
Since the iPhone 6, iPhones have used what Apple calls “Focus Pixels”, which is its term for phase
detection AF. Fast Company reports that system will be replaced with laser autofocus possibly as soon
as the next iPhone, which is set to debut this fall. It is likely that Apple would use both AF technologies,
as Google does in its Pixel line of phones. The technology would serve a dual purpose, also allowing for
better depth perception with the inbuilt camera for augmented reality apps. ARKit rolls out with iOS 11
this fall, so it would make sense to also include the VSCEL laser system in the phone launching at the
same time.
https://petapixel.com/2017/07/20/apple-bring-3d-laser-autofocus-iphone-cameras-report-says/
https://www.theverge.com/2017/6/26/15872332/apple-arkit-ios-11-augmented-reality-developer-excitement

AppleARKit ExampleApplications
https://twitter.com/madewithARKit
Measuring kitchen dimensions
http://bit.ly/2tJ5KV8 app by→ @SmartPicture3D
Measure distances with your
iPhone. Clever little #ARKit app by
@BalestraPatrick http://bit.ly/2sFl8RB
Inter-dimensional iPhone
AR portals are closer than they
appear http://bit.ly/2sufO0d ARkit
demo by @nedd
Demo Shows How Augmented Reality Will
Make Advertising More Immersive. Mixed
reality producer Bilawal Singh Sidhu show peek of
what the world of advertising could be with the
ARKit. #adtech
https://mobile-ar.reality.news/news/apple-ar-demo-shows-
augmented-reality-will-make-advertising-more-immersive-0
178905/

Google’s responsetoARKit ARCore
DAVID JAGNEUX, UPLOADVR@UPLOADVR SEPTEMBER 2, 2017 6:00 AM “Earlier this week, Google
announced ARCore, a software-based solution for making more Android devices AR-capable without the need for depth
sensors and extra cameras. It will even work on the Google Pixel, Galaxy S8, and several other devices very soon and
supports Java, Unity, and Unreal from day one. In short, it’s kind of like Google’s answer to Apple’s ARKit.”
- https://venturebeat.com/2017/09/02/googles-first-arcore-goal-100-million-ar-capable-android-phones/
“Another example, which is especially relevant for
developers that build traditional smartphone apps in
Java, is that we want to make it easier than ever for
people to get into 3D modeling that haven’t done it
before,” Bavor says. “We know there are a lot of people
that want to get into 3D development and AR but
aren’t experts in Maya, or Unity, or anything. So Blocks
is an app we built with the intention of enabling
people that have never done a 3D model in their
life to feel comfortable building 3D assets. We even
made it easy to export right from Blocks and pull into
ARCore apps you’re developing.”

ARCore tooearlytotellhowitwilldoagainst“AppleCult”
Verge Adi Robertson
https://youtu.be/NhJydpMkpug
FusedVR https://youtu.be/dNXBvDKRg1M
https://venturebeat.com/2017/08/29/google-launches-arcore
-sdk-in-preview-ar-on-android-phones-no-extra-hardware-re
quired/
https://youtu.be/ttdPqly4OF8
Super Ventures Blog Matt Miesnieks
CEO 6D.ai, Partner @Super_Ventures, AR technology & cycling
https://medium.com/super-ventures-blog/how-is-arcore-better-than-arkit-5223e6b3e79d
● Isn’t ARCore just Tango-lite?
● The iPhone-8-keynote sized elephant in the room
● So should I build on ARCore now?
● Is ARCore better than ARKit?
Scottie Gardonio Aug 30
AR / VR enthusiast. Creative Manager. Passionate graphic designer.
https://medium.com/iotforall/arcore-vs-arkit-google-counters-apple-33483c08d3da
ARCore vs. ARKit: Google Counters Apple
Let the Dueling Begin
Google announcing inside-out 6-DOF tracking support for Daydream back at Google IO earlier this year.

DeepLearningonMobileDevices
https://techcrunch.com/2017/05/17/googles-tensorflow-lite-brings-machine-learning-to-android-devices/
http://blog.stratospark.com/creating-a-deep-learning-ios-app-with-keras-and-tensorflow.html
● 3D Face Capture
● 3D Scene Reconstruction
● 2.5D Scene Reconstruction and Computational Photography
● SLAM and Object Tracking
● Augmented Reality
● Google Cardboard SDK for iOS
https://doi.org/10.1109/IPSN.2016.7460664 | Cited by 28 articles, see Related articles
Thursday 20 July 2017, Movidius USB stick
https://techcrunch.com/2017/07/20/movidius-launches-a-79-deep-learning-usb-stick/
Snapchat secretly acquires Seene, a computer vision
startup that lets ...
https://techcrunch.com/.../snapchat-secretly-acquires-seene-a-
computer-vision-startup-... 3 Jun 2016
https://doi.org/10.1109/PDP.2017.98

360°(omnidirectionalimaging) Introduction
The Panoptic Camera platform developed
jointly by Microelectronic Systems
Laboratory (LSM) and Signal Processing
Laboratory (LTS2) of EPFL.*
http://lsm.epfl.ch/page-52820-en.html
Wikipedia: “360-degree videos, also known as immersive videos[1] or spherical videos ,[2] are video recordings where a view in every direction is recorded
at the same time, shot using an omnidirectional camera or a collection of cameras. During playback the viewer has control of the viewing direction like a
panorama.”
Consumer-level camera review
http://thewirecutter.com/reviews/best-360-degree-camera/
By DANIEL CULPANWednesday 12 August 2015
http://www.wired.co.uk/article/9-mind-blowing-360-degree-videos
Scuba Diving Short Film in 360° Green Island, Taiwan
https://youtu.be/2OzlksZBTiA

360°aspartof “10BreakthroughTechnologiesof2017”
https://www.technologyreview.com/s/603496/10-breakthrough-technologies-2017-the-360-degree-selfie/
Seasonal changes to vegetation fascinate Koen Hufkens. So last fall Hufkens, an
ecological researcher at Harvard, devised a system to continuously broadcast
images from a Massachusetts forest to a website called VirtualForest.io. And
because he used a camera that creates 360°pictures, visitors can do more than
just watch the feed; they can use their mouse cursor (on a computer) or finger (on a
smartphone or tablet) to pan around the image in a circle or scroll up to view the
forest canopy and down to see the ground.
Journalists from the New York Times and Reuters are using $350
Samsung Gear 360 cameras to produce spherical photos and videos that
document anything from hurricane damage in Haiti to a refugee camp in Gaza.
One New York Times video that depicts people in Niger fleeing the militant group
Boko Haram puts you in the center of a crowd receiving food from aid groups.
Or consider the spherical videos of medical procedures that the Los Angeles
startup Giblib makes to teach students about surgery. The company films the
operations by attaching a $500 360fly 4K camera, which is the size of a baseball,
to surgical lights above the patient. The 360° view enables students to see not just
the surgeon and surgical site, but also the way the operating room is organized and
how the operating room staff interacts.
These applications are feasible because of the smartphone boom and
innovations in several technologies that combine images from multiple lenses and
sensors. For instance, 360° cameras require more horsepower than regular
cameras and generate more heat, but that is handled by the energy-efficient chips
that power smartphones. Both the 360fly and the $499 ALLie camera use
Qualcomm Snapdragon processors similar to those that run Samsung’s high-
end handsets.
Once people discover spherical videos, research suggests, they shift their
viewing behavior quickly. The company Humaneyes, which is developing an
$800 camera that can produce 3-D spherical images, says people need to watch
only about 10 hours of 360° content before they instinctively start trying to interact
with all videos. When you see 360°imagery that truly transports you somewhere
else, you want it more and more.

Low-costendSamsung Gear andGalaxy
Samsung Gear360, ~£250
Samsung GearVR, ~£100
Samsung Galaxy S6-8, smartphone, ~£200-£700
http://www.samsung.com/uk/wearables/gear-360-c200/
If you’re clamoring to shoot in 360 degrees, the Gear 360 balances
simple design with workable image quality — but you really need a
Samsung phone (and a Gear VR, and a good hunk of money) to get
the most out of it. And, for now, that's fine.
This version of the Gear 360 is more likely to be looked back on as a
relic anyway, a recognizable but eventually dismissible attempt at a
new idea, and the foundation for whatever Samsung does next.

Low-costend#2Ricoh Theta
Ricoh’s Theta V 4K camera sports 360-
degree video and wireless playback
RYAN WINTERHALTER, UPLOADVR@@UPLOADVR
SEPTEMBER 02, 2017 07:03 PM
https://venturebeat.com/2017/09/02/ricohs-theta-v-4k-camera-sport
s-360-degree-video-and-wireless-playback/
Ricoh is unveiling its latest 360-degree camera this morning. Dubbed the Ricoh Theta V, the $430 4K camera
is the latest in the line which launched in 2013 with the Ricoh Theta.
Available for pre-order now, and shipping in mid-September, the Theta V features 3,820-by-1,920 resolution
video capture. That’s a massive improvement on the earlier Theta S, which offered a sub-1,080p 1,920-by-960,
and the Theta SC, which allowed for 1,920-by-1,080 recording.
Perhaps the biggest usability improvement to the Theta V is the inclusion of remote playback. Users can now
wirelessly stream their video to an external display directly from the camera. Previous devices in the Theta line
(except the developer-only Theta R) required users to export their raw footage into a computer to stitch the
image and create a useable video. That’s now all done on the device. Videographers can watch their footage
on any display, and move the POV by moving the camera itself.
The Theta V boosts sound quality as well. Four microphones capture data from their respective dimensions,
creating spatial audio that allows users to hear where the sound is coming from within the recording.
Ricoh Theta V hands-on
Published Aug 31, 2017 | Jeff Keller
Based on some quick tests of a non-final Theta V,
both stills and videos are noticeably better than
those from its predecessor. We're looking forward
to getting our hands on a production model in a few
weeks and putting it through its paces.
For higher quality audio
capture, Ricoh is offering
the TA-1 3D Microphone
($269). Developed by
Audio Technica, the mic
attaches via the tripod
mount and uses a
standard 3.5mm audio
jack.

HigherEndGoPro, Nokia Ozo, FacebookSurround, etc.
GoPro (NASDAQ:GPRO) recently unveiled the Omni, a six-camera rig
for filming interactive spherical videos that can be explored through a
smartphone's movements, a user's finger swipes, or a virtual reality
headset. The device is the smaller sibling of the 16-camera Odyssey
rig ($15,000), which hasn't been launched despite being announced
nearly a year ago. Let's take a look at four key things investors should
know about the Omni ($3,500), and how they might impact GoPro's
future.
https://www.fool.com/investing/general/2016/04/14/4-things-inves
tors-need-to-know-about-gopro-incs-o.aspx
What's next for GoPro? GoPro investors don't have many catalysts
to look forward to this year. The Omni is too pricey relative to its
peers to gain any mainstream traction. The Karma drone, which is
due to arrive within the next two months, faces tough competition
from market leader DJI Innovations. By the time the Hero 5 cameras
arrive near the end of the year, the mainstream market could be
saturated with cheap VR and flying cameras.
Introducing Facebook Surround
360: An open, high-quality 3D-360
video capture system
Brian K Cabral, April 12, 2016
● Facebook has designed and built a durable, high-
quality 3D-360 video capture system.
● The system includes a design for camera hardware
and the accompanying stitching code, and we will
make both available on GitHub this summer. We're
open-sourcing the camera and the software to
accelerate the growth of the 3D-360 ecosystem —
developers can leverage the designs and code, and
content creators can use the camera in their
productions.
● The system exports 4K, 6K, and 8K video for each
eye. The 8K videos double industry standard output
and can be played on Gear VR with Facebook's
custom Dynamic Streaming technology.
https://code.facebook.com/posts/1755691291326688/introduc
ing-facebook-surround-360-an-open-high-quality-3d-360-vid
eo-capture-system/
https://www.theverge.com/2016/4/25/11421992/disney-nokia-oz
o-camera-virtual-reality-star-wars-marvel
Ever since Nokia announced its
360-degree Ozo virtual reality camera it has positioned the
system as a high-end option for Hollywood filmmakers, and
today the company is announcing a partnership with Disney
that should help deliver on that promise. As part of the deal,
Ozo cameras will be put into the hands of Disney filmmakers
and its marketing teams to create 360-degree, virtual reality
content across all of the studio’s various brands.

LytroImmerge The world'sfirst professional Light Field solution forcinematicVR
roadtovr.com/lytros-immerge-360
https://www.lytro.com/immerge
Consequently, to create a virtual reality that even the human eye cannot distinguish from the real
world, we must achieve the perfect immersive viewing experience, such that human viewers feel
they can walk into the scene. This is known as the virtual walk-in effect, and it requires light-field
technology—3D imaging technology that emerged from the field of computational
imaging/photography to capture the light rays that people perceive from different locations and
directions. When combined with computer vision and deep learning, light- field technology
provides a viable path for producing low-cost, high-quality VR content, positioning this technology
to be the most profitable segment of the VR industry.

“DepthLytro”‘Depth sensing with light fieldtechniques
Refocusing in spite of foreground occlusions: (a) Scene containing a
monkey toy being partially occluded by a plant in the foreground, (b)
traditional synthetic aperture refocusing on light field is partially effective in
removing the effect of foreground plants, (c) synthetic aperture refocusing
of depth displays corruption due to occlusion, (d) histogram of depth
clearly shows two clusters corresponding to plant and monkey, (e) virtual
aperture refocusing after removal of plant pixels shows sharp depth image
of monkey, (f) Quantitative comparison of indicated scan line of the
monkey’s head for (c) and (e)
We use coding techniques from Tadano et al. (2015) to image beyond
backscattering nets. Notice how the corrupted depth maps are improved
using the codes. We show how digital refocusing can be performed on the
images without the scattering occluders by combining depth fields with
coded TOF.

Post-processingfor360° imaging
https://doi.org/10.1007/s00371-017-1368-7
Overall process. a Input image. b Lines detected and classified: red for
vertical lines and yellow for horizontal lines. c Great circles from the
classified lines. Green dots are vanishing points computed from
horizontal (yellow) lines. d Upright adjustment result
We implemented our method using C++ and the OpenCV library on a 64-bit Windows
PC with an Intel i7- 6700K 4.00GHz CPU and 32GB RAM. For an input image of size
5376 × 2688 px, it takes a few hundred milliseconds (less than one second) to
obtain the final rotation matrix R for upright adjustment.
http://vllab1.ucmerced.edu/~wlai24/360hyperlapse
Pipeline of the proposed algorithm. Given a 360 video, we first stabilize the sequence to smooth the relative rotation◦
between adjacent frames. We estimate the focus of expansion (i.e., the direction of forward motion) as a prior information for
our camera path planning. To extract the regions of interest, we compute the spatial-temporal saliency and semantic
segmentation. The detected regions of interest are used to guide the camera path planning. Finally, we use an adaptive 2D
video stabilization to render a smooth hyperlapse.

360°DeepLearning #1
http://dx.doi.org/10.3390/s17061341
Watching a 360º sports video
requires a viewer to
continuously select a viewing
angle, either through a
sequence of mouse clicks or
head movements. To relieve
the viewer from this “360
piloting” task, we propose
“deep 360 pilot” – a deep
learning-based agent for
piloting through 360º sports
videos automatically
Panel (a) overlaps three panoramic frames
sampled from a 360 skateboarding video◦
with two skateboarders. One skateboarder
is more active than the other in this
example. For each frame, the proposed
“deep 360 pilot” selects a view – a
viewing angle, where a Natural Field of View
(NFoV) (cyan box) is centered at. It first
extracts candidate objects (yellow boxes),
and then selects a main object (green dash
boxes) in order to determine a view (just like
a human agent). Panel (b) shows the NFoV
from a viewer’s perspective.

360°DeepLearning #2
Flat2Sphere: Learning Spherical Convolution for Fast Features from 360° Imagery
Yu-Chuan Su, Kristen Grauman (Submitted on 2 Aug 2017) https://arxiv.org/abs/1708.00919
We propose to learn a spherical
convolutional network that translates a
planar CNN to process 360° imagery
directly in its equirectangular projection.
Our approach learns to reproduce the flat
filter outputs on 360° data, sensitive to
the varying distortion effects across the
viewing sphere. The key benefits are
1) Efficient feature extraction for
360°images and video, and
2) The ability to leverage powerful pre-
trained networks researchers have
carefully honed (together with massive
labeled image training sets) for
perspective images.
We validate our approach compared to
several alternative methods in terms of
both raw CNN output accuracy as well as
applying a state-of-the-art "flat" object
detector to 360° data. Our method yields
the most accurate results while saving
orders of magnitude in computation
versus the existing exact reprojection
solution.

360°Therolein PropTech? #1a
Usefor real estate agents, still a novelty/gimmicky? (from 2014 until 2017)
MAY 26, 2014 By James Dearsley
http://www.jamesdearsley.co.uk/is-the-property-industry-intereste
d-in-360-degree-hd-filming/
USES OF 360 DEGREE HD FILMING IN REAL ESTATE:
1. Sales and Marketing. Firstly, from a realtor or estate agent perspective there are several uses
here of 360 degree cameras, the first being obvious, that of sales and marketing. It will be simple
and efficient to take a quick film of each room, or just walk through the property with these devices
to record what you need
2. Property Management issues. We have also seen interest from companies looking to use these
bits of equipment for inventory taking. Seeing as they are of HD quality it means you can quickly
take photographs of properties which can later be looked at in more detail should problems arise in
letting disputes.
3. Virtual Reality. With Facebook recently buying Oculus Rift for $2 Billion, it is getting less far
fetched. Considering the price of an Oculus is relatively cheap (reckoned to be less than
$500/£360 when released next year) it would not be surprising if Facebook are hoping for a lot of
people to be purchasing these (Candy Crush Saga in Virtual Reality anyone?!). It isn’t just Facebook
though; Sony have a VR headset in production as does Samsung (it was recently announced) and so
this space is going to move quickly. By using these cameras you can put your clients into these
homes very quickly and easily – either in the office, if you get a set of these yourself, or, in time, in
their own home if Facebook get their way.
https://www.forbes.com/sites/forbesagencycouncil/2017/06/28/want-to-use-360
-degree-photo-and-video-11-things-to-consider/#22fffa955002
1. I would recommend that marketers stay on the sidelines until the industry
matures. - Kristopher Jones, LSEO.com
4. Use A Strategic Approach The capabilities of 360-degree photo/video have
powerful applications in many industries, including real estate, retail and tourism. A
360-degree view has a better chance of selling a house than a static image. -
Brock Murray, seoplus+
7. Prepare For Tomorrow's Consumer Expectations Today, 360-degree photos
and videos are very helpful in industries such as the auto industry or real estate where
visualizing the product is essential. As VR continues to grow, 360-degree photos and
videos will likely become a standard. The consumers' expectations will likely adjust to
needing to learn more about the overall "360-degree" experience of the restaurant for
example, not just a picture of the dish. - Ahmad Kareh, Twistlab Marketing
11. Create An Emotional Connection 360-degree multimedia is a brilliant tool for
meaningful storytelling, as it allows the consumer to be transported to the experience
you want them to have, bringing the story to life. Companies should take advantage of
these tools to transform products into experiences, cultivating an immersive and
emotional connection with the brand. - Joey Hodges, Demonstrate PR
JUN 28, 2017 by Forbes Agency Council

360°Therolein PropTech? #1b
Usefor real estate agents
A four-wheeled tripod outfitted with a computer, 360-
degree camera and sensors can roam properties,
producing highly choreographed, immersive videos that
would be difficult — if not impossible — to replicate with
a normal video camera.
VirtualAPT (Brooklyn, NYC) offers residential tour service at now $1/ft² (~10.8$/m²), and for commercial uses,
for a monthly fee per building or $0.50/ft² (~5.4$/m²) for separate units.
Generated by technology from companies such as Matterport, 3-D home tours allow users to jump between
360-degree photos — sometimes situated within a 3-D model.
● A rover can shoot 360-degree footage of
a home while moving along a pre-plotted
route.
● Made by VirtualAPT, the videos can
include on-camera presentations from
real estate agents.
● They're an alternative to 3-D homes tours
from companies such as Matterport.
https://www.youtube.com/watch?v=JhfQK-tDvGU

360°Therolein PropTech? #2a
Use forconstruction andasatoolforconstructing4D/5D/6DBIM (BuildingInformationModel)
Construction site manager
manually taking photos of the
progress.
- Time-consuming to walk through
and take photos
- No full coverage of site
- Might forget some spots
- Nice initial 3D BIM not properly
maintained during construction site.
+ Ideally have a drone inspecting the
whole construction site with an on-
board 360 degree video and a
LIDAR / laser scanner.
+ One can go back in time and see
who of the subcontractors for
example are responsible for possible
problems
https://doi.org/10.1186/s40327-014-0016-9

360°Therolein PropTech? #2b
360 videos registered or not to 3D BIM model allows inspection of the progress (“4D BIM”) in the
construction site also retrospectively, and can possibly reduce legal battles when it is clearer who is
the one to be held responsible in case of discrepancies between as-built and as-planned data.
VISUAL ASSET MANAGEMENT Visual Asset Management (VAM) service digitizes industrial
and infrastructure assets using 360 degree images, 3D Models, and relative asset information.
3D MODELING We thrive on enabling 3D realistic visualization to projects while preserving the
minute details necessary to portray our world.
360 VIDEO 360 video enables viewers to be at the center of any medium, allowing for a unique
visual experience and situational awareness from any device.
VIRTUAL REALITY OcuTech’s virtual reality solutions stimulate creative thinking and enhanced
information sharing allowing for one of kind virtual experience.
Ocutech from Houston, Texas, USA is
already providing these type of
services
https://ocutech360.com/3d-architectural-visualization-solution/#3dvrvideo

360°intosmartphones howbigwillitbe?
https://www.engadget.com/2017/07/10/future-of-smartphone-camera/
1) Augmented reality
2) Dual-lens cameras
3)Better lenses
4)4K recording
5)Thermal imaging
6)Optical zoom
7)360 video
“Several smartphone makers, including Samsung and Huawei, have already released add-on 360-
degree cameras for their handsets, but this is something that could eventually be integrated into the
phones themselves. Immersive 360-degree videos are gradually making their mark, with Facebook
among the big firms pushing the technology, while virtual reality companies are gradually introducing
more 360-VR content that be viewed from mobile phones.”
https://techcrunch.com/2016/08/30/the-future-of-mobile-video-is-virtual-reality/
Are 360 cameras the future?
https://youtu.be/i8EUerX90-0 TechAltar
So whether teens in big
numbers will ever apply
Snapchat bunny ears to
immersive 360 degree
videos?

360°intosmartphones plentyofoptionscoming#1
Acer’s new Holo 360 degree camera
is essentially a smartphone
Acer has announced its entry into the VR
video market with a device that’s half
360-degree camera, half smartphone.
http://www.trustedreviews.com/news/acer-s-new-ho
lo-360-degree-camera-is-essentially-a-smartphone
-2953609
Paul Monckton CONTRIBUTOR
I write about photography and related subjects
https://www.forbes.com/sites/paulmonckton/2016/05/31/worlds-first-live-smartphone-vr-camera/#9
fea6921a8b0
Yesterday at this year’s Computex trade show in Taipei,
Quanta Computer and ImmerVision jointly announced what
is claimed to be the world’s first 360-degree live VR
streaming camera for smartphones, with demos starting from
today. The, as yet unnamed, camera fits in the palm of the
hand and is designed to attach magnetically to any
smartphone. It comes with a 360-degree by 187-degree lens
and uses a Sony Exmor-HDR imaging sensor to produce 16
megapixel panoramic images.
ImmerVision's Panamorph lens makes more efficient use of an image sensor
(Image credit: ImmerVision)
THIS ADD-ON CAMERA WILL TURN YOUR
SMARTPHONE INTO A 360 CAMERAJULY 26, 2017
ION360 U 4K 360-Degree Smartphone Camera
is comprised of a 360 camera that goes on top of
Essential's 360 Camera Is the World's Smallest
360-Degree Personal Camera for a Smartphone
30 May 2017
http://gadgets.ndtv.com/mobiles/news/essentials-360-camera-is-the-worlds-sm
allest-360-degree-personal-camera-for-a-smartphone-1705826
After months of teasing, Android creator Andy Rubin has
finally unveiled the Essential Phone that features a near
bezel-less display that tries to outdo Samsung's Galaxy
S8. Essential's 360 camera, which weighs around 35
grams and is being called the world's smallest 360-
degree personal camera by the company, includes a dual
12-megapixel fisheye sensors that can capture 4K 360
video at 30fps. The camera also features 4 microphones
to capture sound in 3D. The 360 camera can be bought
along with the Essential Phone for an additional $50, or
can be bought separately which will cost you $199.
@essential, Palo Alto, CA, essential.com

360°intosmartphones plentyofoptionscoming#2
ProTruly’s Darling
https://www.theverge.com/2017/3/5/14809
182/protruly-darling-360-degree-camera-
smartphone
A company called HT Optical
that makes the cameras
found on ProTruly’s devices.
The company said that it is
working on a much smaller
360 camera module that will
actually fit into a 7.6 mm thick
smartphone and will be
capable of capturing 16 MP
photos and shoot 4K videos.
What’s even more interesting
is that the module will only
add an extra 1 mm to the
overall thickness of a device.
https://www.theverge.com/ci
rcuitbreaker/2017/2/22/1469
8026/huawei-360-degree-came
ra-honor-vr-smartphones
http://360rumors.com/
https://www.vrfocus.com/2017/07/360-degree-video-edi
ting-app-for-smartphones/
V360 -360 video editor Avincel GroupInc
360-DegreeVideo Editing App ForSmartphonesV360editingsuite alreadyout for Android, withiOS versioncomingsoon.

360°intosmartphones convergencewith AI players of course
https://www.embedded-vision.com/news/movidius-low-po
wer-vpu-technology-delivers-4k-vr-pixel-processing-p
erformance-motorola%E2%80%99s-newest
Movidius’ Myriad 2 Vision Processing Unit (VPU) technology,
known for its image signal processing and computer vision
capabilities with high energy efficiency, was selected by
Motorola Mobility to power their newest Moto Mod: the 360
Camera. Moto Mods are unique modular accessories for
Motorola smartphones that bring advanced functionality
beyond traditional smartphone features. Motorola’s newest
Moto Mod brings users the ability to live stream 360 videos⁰
while preserving battery life.
Say Hello to the moto z² Force Edition with moto mods
https://www.youtube.com/watch?v=0moMnChM6Ds
https://www.wsj.com/articles/intel-to-buy-semiconduct
or-startup-movidius-1473170441
https://www.altera.com/solutions/industry/automotive/applicat
ions/drive-assistance/surround-view-camera.html
http://www.nvidia.co.uk/object/drive-px-uk.html

360°VideoSfM Obviousextensiontocombineboth
Instead of manuallyrotatingyour camera,image all angles simultaneously while going through the
rooms in an apartment
https://uploadvr.com/adobe-algorithm-6dof-360-cam/
http://variety.com/2017/digital/news/adobe-6dof-vr-v
ideo-algorithms-1202394491/
Adobe Motion Parallax demo
https://youtu.be/37Z4f6p1HOY
https://www.roadtovr.com/adobes-new-research-aims-give-depth-monoscopic-360-video/: Other techniques to achieve 6-DoF VR video
usually require light-field cameras like HypeVR’s crazy 6k/60 FPS, LiDAR rig or Lytro’s giant Immerge camera. While these undoubtedly will
produce a higher quality 3D effect, they’re also custom-built and ungodly expensive.
6-DOF VR videos with a single 360-camera
Jingwei Huang ; Zhili Chen ; Duygu Ceylan ; Hailin Jin, Virtual Reality (VR), 2017 IEEE
http://dx.doi.org/10.1109/VR.2017.7892229, 18-22 March 2017
Given a 360-video captured by a single spherical panorama camera, in an offline pre-processing stage, we recover
the camera motion and the scene geometry first by performing structure-from-motion (SfM) followed by dense
reconstruction. Then, in real-time we playback the video in a VR headset where we track the 6-DOF motion of the
headset and synthesize new views by a novel warping algorithm.

360°VideoSfM KoreaAdvanced Institute ofScience andTechnology(KAIST)
Spherical panoramic cameras (Ricoh Theta S, Samsung
Gear 360 and LG 360)
Our sphere sweeping algorithm
enables to compute all-around
dense depth maps, minimizing the
loss of spatial resolution. With the
estimated all-around image and
depth map, we have shown
practical utilities by introducing
360 stereoscopic and anaglyph◦
images as VR contents.
European Conference on Computer Vision ECCV
2016: Computer Vision – ECCV 2016 pp 156-172
https://doi.org/10.1007/978-3-319-46487-9_10
All-Around Depth from Small Motion with a Spherical Panoramic Camera. Sunghoon ImEmail author Hyowon Ha François Rameau Hae-Gon Jeon Gyeongmin Choe In So Kweon

RangeSensing
Structured-LightandTime-of-Flight

MicrosoftKinect Democratizing structuredlightscanning
Structured light A sequence of known patterns is
sequentially projected onto an object, which gets
deformed by geometric shape of the object. The
object is then observed from a camera from a
different direction. By analyzing the distortion of
the observed pattern, i.e. the disparity from the
original projected pattern, depth information can
be extracted
The Time-of-Flight (ToF) technology is based on
measuring the time that light emitted by an illumination
unit requires to travel to an object and back to the sensor
array. The Kinec tToF camera applies this CW intensity
modulation approach. . Due to the distance between the
camera and the object (sensor and illumination are
assumed to be at the same location), and the finite speed
of light c, a time shift [s]φ is caused in the optical signal
which is equivalent to a phase shift in the periodic signal.
This shift is detected in each sensor pixel by a so-called
mixing process. The time shift can be easily transformed
into the sensor-object distance as the light has to travel
the distance twice,
Cited by 65 articles - see Related articles

KinectFusion Scanning with Kinect
https://doi.org/10.1145/2047196.2047270 Cited by 1356 articles, see Related articles
The semantic cue from floorplan
(i.e., door detection) resolves
ambiguities. The figure shows the
best placement based on the unary
potential with or without the
semantic cue
We show qualitative results on ModelNet using the TSDF encoding (Curless and Levoy, 1996) and 4 views. The
same TSDF truncation threshold has been used for traditional fusion, our OctNetFusion approach and the ground
truth generation process. While the baseline approach is not able to resolve conflicting TSDF information from
different viewpoints, our approach learns produce a smooth and accurate 3D model from highly noisy input.
By learning the structure of real world 3D objects and scenes, our approach is further able to
reconstruct occluded regions and to fill gaps in the reconstruction. We evaluate our approach
extensively on both synthetic and real-world datasets for volumetric fusion. Further, we apply
our approach to the problem of 3D shape completion from a single view where our approach
achieves state-of-the-art results.

Kinecttweaks depthresolution improvementswithpolarization measurement?
http://news.mit.edu/2015/object-recognition-robots-0724
https://youtu.be/m6sStUk3UVk
http://news.mit.edu/2015/algorithms-boost-3-d-imaging-resolution-1000-times-1201
https://doi.org/10.1007/s11263-017-1025-7
https://doi.org/10.1364/OE.25.001173

RangeSensing PlentyofOptions
http://3dscanexpert.com/photogrammetry-benchmarks-r
emake-vs-photoscan-vs-realitycapture-vs-zephyr/
This post is just an example based on a single photoset from a single
object. That makes it zero percent scientific. Also, RealityCapture
might have won this Drag Race in terms of both speed with the Fast
preset and quality with the Normal preset, but an organic object like
this is very favorable to its algorithms. Read my Full RC Review to see
that it can’t always handle non-organic objects well.
COMMERCIAL SOFTWARE
http://3dscanexpert.com/
By Nick Lievendag Entrepreneur at the intersection of Creativity × Technology. Writes, Speaks and Consults about 3D
Capture (3D Scanning & Photogrammetry). Founder of 3D Scan Expert.

Matterportdominating RealEstatescanning
This $4,500 camera turns the real world into the virtual one. Today, Matterport
’s hardware is a hit with real estate agents. But fueled by the $30 million Series C
it just raised, Matterport’s software and partnership with Google’s Project Tango
could let you wave your phone around to create VR tours of anywhere you want.
https://techcrunch.com/2015/06/25/matterport/
https://www.crunchbase.com/organization/matterport#/entity
Matterport spawned out of the Xbox Kinect hacker scene in 2010. Founder
Matt Bell had been working for a gesture recognition company that relied on a
$50,000 camera and expert operators to produce a huge CAD file that could
only be accessed through a specialized application. Bell was flabbergasted by
the power of the $150 Kinect. He realized the potential for a relatively cheap
device with similar technology that could let anyone map out rooms to create
3D models accessible straight from the web.
https://youtu.be/HZX8RupfQls

MatterportResearch onsemanticindoor segmentation
We collected the data using the Matterport Camera, which combines 3
structured-light sensors to capture 18 RGB and depth images during a
360 rotation at each scan location◦ . The output is the reconstructed 3D
textured meshes of the scanned area, the raw RGB-D images, and camera
metadata. We used this data as a basis to generate additional RGB-D data
and make point clouds by sampling the meshes. We semantically annotated
the data directly on the 3D point cloud, rather than images, and then
projected the per point labels on the 3D mesh and the image domains.
https://arxiv.org/abs/1702.01105 | Cited by 3 - Related articles
https://www.fastcompany.com/3059281/
introducing-hover-an-ai-powered-indo
or-safe-camera-drone
+
Indoor scanning with tripod-based Matterport
still requires a lot of manual work, and at some
point will be updated to autonomous AI-
powered indoor drone for better user
experience.

MatterportTechnologypatents
Capturing and aligning multiple 3-dimensional sceneswww.google.com/patents/US8879828Grant -
Filed Jun 29, 2012 - Issued Nov 4, 2014 - Matthew Bell - Matterport, Inc.
Multi-modal method for interacting with 3d models
www.google.com/patents/US20130342533App. - Filed Jun 24, 2013 - Published Dec 26, 2013 - Matthew Bell -
Matterport, Inc.
Identifying and filling holes across multiple aligned three-dimensional scenes
www.google.com/patents/US8861840Grant - Filed Oct 14, 2013 - Issued Oct 14, 2014 - Matthew Bell - Matterport, Inc.
Building a three-dimensional composite scene
www.google.com/patents/US8861841Grant - Filed Oct 14, 2013 - Issued Oct 14, 2014 - Matthew Bell - Matterport, Inc.
Processing and/or transmitting 3D data
www.google.com/patents/US9396586Grant - Filed Mar 14, 2014 - Issued Jul 19, 2016 - Matthew Tschudy Bell -
Matterport, Inc.
Semantic understanding of 3d data
www.google.com/patents/US20160055268App. - Filed Jun 6, 2014 - Published Feb 25, 2016 - Matthew Tschudy Bell -
Matterport, Inc.
Selecting two-dimensional imagery data for display within a three-dimensional model
www.google.com/patents/EP3120329A1?cl=enApp. - Filed Mar 13, 2015 - Published Jan 25, 2017 - Matthew Tschudy BELL -
Matterport,
Classifying, separating and displaying individual stories of a three-dimensional model
of a multi-story structure based on captured image data of the multi-story structure
www.google.com/patents/US20160217225App. - Filed Jan 28, 2016 - Published Jul 28, 2016 - Matthew Tschudy Bell -
Matterport, Inc.
Semantic understanding of 3d data
US 20160055268 A1
ABSTRACT Systems and techniques for processing three-
dimensional (3D) data are presented. Captured three-
dimensional (3D) data associated with a 3D model of an
architectural environment is received and at least a portion of
the captured 3D data associated with a flat surface is
identified. Furthermore, missing data associated with the
portion of the captured 3D data is identified and additional 3D
data for the missing data is generated based on other data
associated with the portion of the captured 3D data.
REFERENCED BY
US9576184 Textura Planswift Corporation
Detection of a perimeter of a region of interest in a floor plan document
US20130328872 Tekla Corporation
Computer aided modeling
US20150227644 Pictometry International Corp.
Method and system for displaying room interiors on a floor plan
US20160063722 Textura Planswift Corporation
Detection of a perimeter of a region of interest in a floor plan document
US20160379405 Jim S Baca
Technologies for generating computer models, devices, systems, and
methods utilizing the same

GoogleTangoTechnology
http://www.deccanchronicle.com/technology/gadgets/210717/i
s-google-tango-relevant-in-2017.html
https://arstechnica.co.uk/gadgets/2016/12/google-
tango-phab-2-pro-review/
A Project Tango device ‘sees’ the environment around it
through a combination of three core functions.
First up is motion tracking, which allows the device to
understand its position and orientation using a range of
sensors (including accelerometer and gyroscope).
Then there’s depth perception, which examines the
shape of the world around you. Intel provides a vital cog in
this respect with its RealSense 3D camera. With this
component on board, a device can gain accurate gesture
control and snappy 3D object rendering among other
things.
Finally, Project Tango incorporates area learning, which
means that it maps out and remembers the area around it.
Point Cloud Framework for Rendering 3D
Models Using Google Tango
Maxen Chung, Santa Clara University
Julian Callin, Santa Clara University
http://scholarcommons.scu.edu/cseng_senior/84
https://doi.org/10.1007/s11227-016-1891-8
Project Tango Tablet Development Kit, recently introduced by
Google, Inc. Equipped with the most powerful processor available
to date on a consumer-level mobile platform (i.e., NVIDIA Tegra K1
whose 192 programmable CUDA-enabled GPU cores use the
same efficient Kepler architecture found in the world’s most
powerful supercomputers and workstations) along with several
sensors (motion tracking camera, 3D depth sensor,
accelerometer, ambient light sensor, barometer, compass, GPS,
gyroscope), this mobile device can readily utilize GPU computing
making it an ideal platform for developing real-time contextual
awareness applications for the visually impaired (VI). Moreover,
being compact, lightweight, potentially wearable, relatively
discreet and affordable render it aesthetically appealing, socially
acceptable and accessible for VI users

GoogleTangoExampleApplications#1
We broke the news yesterday that Google
was producing a prototype 3D sensing
smartphone called Project Tango. We also
broke down the capabilities of the vision
processor inside the device and talked
about what it means for the future of
phones.
Now, we’ve got an exclusive look in the
video below at a real 3D indoor map of a
room captured with one of the prototype
devices by Matterport.
https://techcrunch.com/2014/02/21/heres-an-actual-3d-indoor-map-of-a-room-captured-with-googles-project-tango-phone/
https://matterport.com/mobile-3d-capture/
https://developers.google.com/tango/apis/overview
Daydream is Google’s platform for virtual
reality. It consists of Daydream-ready phones,
Daydream-ready headsets and controllers, and
Daydream apps. Daydream View is the first
Daydream-ready headset and controller
designed and developed by Google. It also
comes with a touch-and-motion enabled
controller so you can easily interact with VR
apps.
With the Daydream View, you will be able to
explore new worlds through Google Street View
and Fantastic Beasts. Kick back in your
personal cinema with YouTube, Netflix, Hulu,
and HBO. Get in the game with Gunjack 2,
LEGO® BrickHeadz, and Need for Speed.
That’s just the beginning of the VR possibilities
with Daydream.
http://www.techphlie.com/
2017/07/what-is-google-ta
ngo-and-daydream.html
Google has notably been pushing AR/VR
technologies with its latest Android OS. The
most prominent introduction however, has
been the ASUS ZenFone AR launch that took
place at CES, 2017, earlier this year.

GoogleTangoExampleApplications#2
Google Tango SDK
examples: how to
make a floor plan in
50 seconds
Alexander Grau
Google Tango and
Revit
Leonardo Manzione
https://www.youtube.com/watch?v=A-4cuJ1kOQ4

“GoogleTango”withoutdepth sensors
I have always believed that bringing 3D to consumers could only work without the need for
dedicated depth sensors. This pure-software approach is already being embraced for
Augmented Reality with Apple’s upcoming ARKit and Google’s ARCore which was announced
last week. Both can give modern smartphones AR-capabilities by just using the regular camera(s),
instead of using dedicated sensors like Tango.
https://3dscanexpert.com/sony-3d-creator-brings-sensor-less-3d-scanning-consumers/
But yesterday, at IFA Berlin, Sony announced its
latest smartphone, the XZ1. Which has all the
bells and whistles you expect from a flagship
Android phone but also an app called 3D Creator
. It basically does exactly what Microsoft showed
last year, but is actually available — albeit
exclusive for the XZ1.
https://www.sonymobile.com/global-en/products/phones/xperia
-xz1/3d-creator/

AppleDepthSensing
TheiPhoneX’s
notch isbasically
aKinect
365by Paul Miller@futurepaul Sep 17,2017, 10:00am
EDT
https://www.theverge.com/circuitbreaker/2017/9/17/16315510/iphone-x-notch-kinect-apple-primesense-microsoft
And now, in late 2017, Apple is going to sell a phone witha front-facing depthcamera. Unlike the original Kinect,
which was built to track motion in a whole living room, the sensor is primarily designed for scanning faces and
powers Apple’s Face ID feature. Apple’s “TrueDepth” camera blasts “more than 30,000 invisible dots” and can
create incredibly detailed scans of a human face. In fact, while Apple’s Animoji feature is impressive,
the developerAPIbehind it is even wilder: Apple generates, in real time, a full animated 3D mesh of your face,
while also approximating your face’s lighting conditions to improve the realismofAR applications.
How Apple’siPhone X
TrueDepth CameraWorks
By David Cardinal onSeptember 14, 2017
Beyond the Camera: Facial Motions and
Changing Features Getting a depth estimate for
portions of a scene is only the beginning of what’s
required for Apple’s implementation of secure facial
recognition and Animojis. For example, a mask could
be used to hack a facial recognition system that relied
solely on the shape of the face. So Apple is using
processing power to learn and recognize 50 different
facial motionsthat are muchharder toforge.Theyalso
provide the basis for making Animoji figures seem to
mimicthephone’sowner.
How Secure is Face ID? Given how willing Apple is
to commit to using Face ID for financial transactions,
I’m sure they have pushed the limits beyond either
simple 3D models or 2D motion. It is likely they are
relying on the phone’s abilitytorecognize minute facial
movements and feed them into a machine learning
system on the A11Bionicchip that will add another
layer of security to the system. That piece will also be
key in helping the phone decide whether you’re the
same person when you put on a pair of glasses, a hat,
or grow a beard — all of which Apple claims Face ID
willhandle.

LaserScanning LiDAR(LightDetection AndRanging)
http://dx.doi.org/10.1038/nphoton.2010.148
http://dx.doi.org/10.1080/19479832.2013.811124
3D building modeling
(BIM) using images and
LiDAR: a review
https://techcrunch.com/2017/07/12/nyu-releases-the-largest-lidar-
dataset-ever-to-help-urban-development/
http://ia.cr/2017/613
https://www.theregister.co.uk/2017/06/27/lidar_spoofed_bad_news_for_self_driving_cars/

VelodyneThemoston newsduetoautonomousdriving
http://velodynelidar.com/
https://www.youtube.com/watch?v=8nTFjVm9sTQ https://www.youtube.com/watch?v=nXlqv_k4P8Q
http://spectrum.ieee.org/cars-that-think/transportation/se
nsors/velodyne-announces-a-solidstate-lidar
http://spectrum.ieee.org/cars-that-think/transportati
on/sensors/israeli-stealth-startup-innoviz-promises-1
00-solidstate-automotive-lidar-by-2018
http://spectrum.ieee.org/transportation/advanced-cars/cheap-lidar-the-k
ey-to-making-selfdriving-cars-affordable

RieglA rangeof differentlaserscanners
http://www.riegl.com/products/unmanned-scanning/
RIEGL VZ-400 Indoor Scanned Data
by Jamis Choi, Published on Apr 1, 2010
https://www.youtube.com/watch?v=hOf0hpCn92I
Scanning made simple with RiSOLVE - RIEGL's new 3D Scene Capture Software
Published on Oct 4, 2012 (feat. horrible lounge music)
https://www.youtube.com/watch?v=lbxvzMlTWyg

Rieglsystemin practice
https://doi.org/10.1109/IROS.2016.7759501
Namely, we propose a method for the automatic selection of feature coordinate
locations, and introduce the concept of localized automatic relevance
determination (LARD) to the Hilbert Maps framework, in which different
dimensions in the projected Hilbert space operate within independent length scale
values. The proposed technique was tested against other state-of-the-art 3D
scene reconstruction tools in three different datasets: a simulated indoors
environment, RIEGL laser scans and dense LSD-SLAM pointclouds. The results
testify to the proposed framework’s ability to model complex structures and
correctly interpolate over unobserved areas of the input space while achieving
real-time training and querying performances.

HandheldScanning GeoSLAMZEB-REVO
Handheld Laser Scanning -
ZEB-REVO
The ZEB-REVO is the latest, lightweight
revolving laser scanner from GeoSLAM.
Handheld, pole-mounted or attached to a
mobile platform, the ZEB-REVO can
record more than 40,000 measurement
points per second from the survey
environment.
NEW ZEB-CAM
The new ZEB-CAM is an optional upgrade
for standard ZEB-REVO systems. Simply
attach ZEB-CAM to the underside of a
standard REVO and begin scanning
immediately.
The ZEB-CAM captures live video footage
of the survey environment and adds
contextual video and imagery to scan data
to aid feature identification.
Optical flow technology is utilised to
accurately synchronise the video and scan
together in GeoSLAM's Desktop software.
http://www.3dlasermapping.com/zeb-revo-
handheld-laser-scanning/
https://youtu.be/k8q5xr_eLgk

GeoSlamvs.Leica Portablescanningquality
http://dx.doi.org/10.1117/12.2270761
The paper investigates the performances of two portable
mobile mapping systems (MMSs), the handheld GeoSLAM
ZEB-REVO and Leica Pegasus:Backpack, in two typical
user-case scenarios: an indoor two-floors building and an
outdoor open city square.
Note! This paper would have
been even nicer with a
‘gold standard’ giving the
“correct measurements”
instead of just comparing
two “good enough” scanners.

ResearchScanners SensorFusion
The Indoor Multi-sensor Acquisition System
(IMAS) presented in this paper consists of a wheeled
platform equipped with two 2D laser heads, RGB
cameras, thermographic camera, thermohygrometer,
and luxmeter. One of the laser scanning sensors is
foreseen to obtain the building map and the navigation
information, and the other one to the 3D environment
reconstruction. The thermographic and optical
images, and the geometric and comfort data are
synchronized and automatically linked to trajectory
positions, so that they are georeferenced in the
building in terms of a relativepositioning system Software interface for virtual immersive navigation and ex situ data analysis.
http://dx.doi.org/10.3390/s16060785

AppliedPointCloud Scans Accessibility
Point Clouds to Indoor/Outdoor Accessibility
Diagnosis
J. Balado, L. Díaz-Vilariño, P. Arias, I. Garrido
https://www.isprs-ann-photogramm-remote-sens-spatial-inf-sci.net/IV-2-W4/287/2017/isprs-annals-IV-2-
W4-287-2017.pdf
This work presents an approach to automatically detect structural floor elements such as steps or ramps in
the immediate environment of buildings, elements that may affect the accessibility to buildings. The
methodology is based on Mobile Laser Scanner (MLS) point cloud and trajectory information. The
methodology is tested in a real case study, consisting of 100 m of an urban street. Ground elements are
correctly classified in an acceptable computation time. Steps and ramps also are exported to GIS software to
enrich building models from Open Street Map with information about accessible/inaccessible entrances and
their locations.
http://www.wired.co.uk/article/wayfindr-app
A project initiated by the Royal London Society for the
Blind's (RLSB) Youth Forum has led to the prototyping of
a new app called Wayfindr, which has been built especially
to help blind and partially sighted people use London's
transport network independently. The app relies on
smartphones and iBeacons and has been developed in
collaboration with global digital product design studio
ustwo
Our Open Standard gives you
the tools to create inclusive
and consistent experiences for
your vision impaired
customers. From transport
networks and shopping
centres, to hospitals and any
other indoor space - we can
help. Through our on-site trials
and consultancy we will work
together with you to
understand how digital
wayfinding can make your
estate accessible.
https://www.wayfindr.net/

Post-processing
Rawpointcloudsaremassiveandpossiblycontain alotof
redundantdatapoints

DataQuality compromisebetweenfilesize,computationaltimeandquality
3D model reconstruction from point cloud processed either with OpenSFM,
VisualSFM or Pix4D (top row) to mesh model (middle row) to final textured 3D
model (bottom row) across a series of downsampled Sky Ranger UAV including full
resolution (first column) half resolution (second column) and quarter resolution (last
column).
Bolick and Harguess (2016), http://dx.doi.org/10.1117/12.2224677
Garbage in – garbage out true like as always. The
more high-quality images / points you have as input, the
higher the reconstruction quality will obviously be.
Top-left: points sampled on a sphere and corrupted
with a lot of noise. Top-right: reconstructed surface
mesh. Bottom-left: smoothed point set. Bottom-
right: reconstructed surface mesh.
Reconstruction error (mm) against number of points
for the Bimba con Nastrino point set with 1.6M points
as well as for simplified versions.
CGAL 4.10 - Poisson Surface Reconstruction
The sensitivity of biological finite element models to the
resolution of surface geometry: a case study of
crocodilian crania: “Example of the simplified models. C.
moreletti models composed of 20k, 30k, 90k and 300k
surface (mesh) elements.”
https://doi.org/10.7717/peerj.988
point cloud & mesh processing
MAY 27 2017, posted by Taylor Wang
The final goal is to get a fully editable NURBS CAD
model so that it can be modified by any CAD
software to improve the design or reproduce the
product.

PointCloudLibray(PCL) The mostpopular open-sourcelibrary
http://unanancyowen.com/en/pcl-with-velodyne/
https://www.youtube.com/watch?v=7BUFxkyH1r0
https://doi.org/10.1109/MRA.2012.2206675
Cited by 186 articles - see Related articles

Otherlibraries CGALandresearchcode

Driftcorrection forproperimageregistration
https://doi.org/10.1109/ROBOT.2010.5509312
Correcting for drift (distortion) between different
scans or overlapping point clouds with added
velocity information for ICP (Iterative Closest Point)
algorithm.
(a) is a given environment. Blue points in (b) shows distortion of
the scan, and red points in (b) show compensated scan.
Transformation estimated using distorted data includes inevitable
errors(c). Transformation estimated from the rectified scan gives
us more accurate results(d).
Kaarta - Common point cloud registration issues
http://www.kaarta.com/cloud-registration-issues/
Published: 8 March 2017
http://dx.doi.org/10.3390/s17030539
Keywords: LiDAR; inertial measurement unit; iterative closest
point; iterated sigma point Kalman filter; time delay calibration

DataReduction andsimplificationfor storage
Imran Ashraf ; Soojung Hur ; Yongwan Park
https://doi.org/10.1109/ACCESS.2017.2699686
LIDAR produces large point cloud, but, while generating
images for limited field of view, data sparsity results in poor
quality images. Moreover, 3D to 2D data transformation also
involves data reduction, which further deteriorates the
quality of images.
http://dx.doi.org/10.1117/12.2270833
31 October 2016
https://doi.org/10.1109/TIP.2016.2623488
https://www.google.com/patents/US9582939
Keywords: Tensor networks, Function-related tensors, CP decomposition,
Tucker models, tensor train (TT) decompositions, matrix product states (MPS),
matrix product operators (MPO), basic tensor operations, multiway component
analysis, multilinear blind source separation, tensor completion,
linear/multilinear dimensionality reduction, large-scale optimization problems,
symmetric eigenvalue decomposition (EVD), PCA/SVD, huge systems of linear
equations, pseudo-inverse of very large matrices, Lasso and Canonical
Correlation Analysis (CCA)
https://doi.org/10.1016/j.isprsjprs.2016.06.012
In-base point cloud management pipeline in the point cloud server (PCS).

DataReduction CompressiongPointClouds
Dynamic polygon cloud compression
Eduardo Pavez ; Philip A. Chou (2017)
https://doi.org/10.1109/ICASSP.2017.7952694
We introduce a compressible representation of 3D
geometry (including its attributes, such as color texture)
intermediate between polygonal meshes and point clouds
called a polygon cloud. Polygon clouds, compared to
polygonal meshes, are more robust to live capture noise
and artifacts. Furthermore, dynamic polygon clouds,
compared to dynamic point clouds, are easier to
compress, if certain challenges are addressed. In this
paper, we propose methods for compressing dynamic
polygon clouds using transform coding of color and
motion residuals.
Real-time compression of point cloud
streams
Julius Kammerl ; Nico Blodow ; Radu Bogdan Rusu ;
Suat Gedikli ; Michael Beetz ; Eckehard Steinbach
(2012)
https://doi.org/10.1109/ICRA.2012.6224647
We present a novel lossy compression approach for point
cloud streams which exploits spatial and temporal
redundancy within the point data. Our proposed compression
framework can handle general point cloud streams of
arbitrary and varying size, point order and point density.
Furthermore, it allows for controlling coding complexity and
coding precision. To compress the point clouds, we perform
a spatial decomposition based on octree data structures.
3D Reconstruction Framework for
Multiple Remote Robots on Cloud
System
Phuong Minh Chu, Seoungjae Cho, Simon Fong, Yong Woon
Park and Kyungeun Cho (2017)
http://dx.doi.org/10.3390/sym9040055
This paper proposes a cloud-based framework that
optimizes the three-dimensional (3D) reconstruction of multiple
types of sensor data captured from multiple remote robots. A
working environment using multiple remote robots requires
massive amounts of data processing in real-time, which cannot
be achieved using a single computer. In the proposed
framework, reconstruction is carried out in cloud-based servers
via distributed data processing.

Data-drivenprocessing
Likein allthefieldsofcomputervision,real-timescanning,post-
processingandsemanticunderstandingareimprovedwith
recent deeplearningandartificial intelligencetechniques

DeepLearningbeyondnon-euclidean problems
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, andPierre Vandergheynst
https://doi.org/10.1109/MSP.2017.2693418

DeepLearningPointclouds

DeepLearningPointNet++
PointNet++: Deep Hierarchical Feature Learning on
Point Sets in a Metric Space
Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas
Stanford University, (Submitted on 7 Jun 2017)
Illustration of our hierarchical feature learning architecture and its application for set segmentation and classification using points in 2D
Euclidean space as an example. Single scale point grouping is visualized here.
Left: Point cloud with random point
dropout.
Right: Curve showing advantage of
our density adaptive strategy in
dealing with non-uniform density.
DP means random input dropout
during training; otherwise training is
on uniformly dense points
Scannet labeling results. PointNet captures the
overall layout of the room correctly but fails to
discover the furniture. Our approach, in contrast,
is much better at segmenting objects besides
the room layout.

DeepLearning2DFeatureDescriptors
Instead of using the old-school SIFT, SURF, ORB, etc., the
feature descriptor / matching can be done with data-driven
deep learning network as well
Note This model was trained with SfM data, which does not have strong
rotation changes. Newer models work better in this case, which will be
released soon. In the meantime, you can also use the models in the
learn-orientation, benchmark-orientation.
https://github.com/cvlab-epfl/LIFT
https://arxiv.org/abs/1603.09114 | Cited by 23 Related articles

DeepLearning3DFeatureDescriptors
We present a view-based convolutional network that produces local, point-based shape descriptors.
The network is trained such that geometrically and semantically similar points across different 3D
shapes are embedded close to each other in descriptor space (left). Our produced descriptors are
quite generic — they can be used in a variety of shape analysis applications, including dense
matching, prediction of human affordance regions, partial scan-to-shape matching, and shape
segmentation (right).
In contrast to findings in the image analysis community where learned 2D
descriptors are ubiquitous and general (e.g. LIFT), learned 3D descriptors have
not been as powerful as 2D counterparts because they (1) rely on limited training
data originating from small-scale shape databases, (2) are computed at low spatial
resolutions resulting in loss of detail sensitivity, and (3) are designed to operate on
specific shape classes, such as deformable shapes.
We generate training correspondences
automatically by leveraging highly structured
databases of consistently segmented shapes
with labeled parts. The largest such database
is the segmented ShapeNetCore dataset [
Yi et al. 2016, https://www.shapenet.org/] that
includes 17K man-made shapes distributed in
16 categories

Meshgenerativeshapeswith GAN
Our key insight is that 3D shapes are effectively
characterized by their hierarchical organization of parts,
which reflects fundamental intra-shape relationships such as
adjacency and symmetry. We develop a recursive neural net
(RvNN) based autoencoder to map a flat, unlabeled, arbitrary
part layout to a compact code. The code effectively captures
hierarchical structures of man-made 3D objects of varying
structural complexities despite being fixed-dimensional: an
associated decoder maps a code back to a full hierarchy. The
learned bidirectional mapping is further tuned using an
adversarial setup to yield a generative model of plausible
structures, from which novel structures can be sampled.
It would be interesting to thoroughly investigate the effect
of code length on structure encoding. Finally, it is worth
exploring recent developments in GANs, e.g. Wasserstein
GAN [Arjovsky et al. 2017], in our problem setting. It would
also be interesting to compare with plain VAE and other
generative adaptations.

PointCloud generativeGANsforpointclouds #1a
We build an end-to-end pipeline for 3D point clouds that uses an autoencoder (AE) to
create a latent representation, and a Generative Adversarial Networks (GAN) to generate
new samples in that latent space. Our AE is designed with a structural loss tailored to
unordered point clouds. Our learned latent space, while compact, has excellent class-
discriminative ability: per our classification results, it outperforms recent GAN-based
representations by 4.3%. In addition, the latent space allows for vector arithmetic, which
we apply in a number of shape editing scenarios, such as interpolation and structural
manipulation.
We argue that jointly learning the representation and training the GAN is unnecessary for
our modality. We propose a workflow that first learns a representation by training an AE
with a compact bottleneck layer, then trains a plain GAN in that fixed latent
representation. One benefit of this approach is that AEs are a mature technology: training
them is much easier and they are compatible with more architectures than GANs. We
point to theory that supports this idea, and verify it empirically: we show that GANs
trained in our learned AE-based latent space generate visibly improved results,
even with a generator and discriminator as shallow as a single hidden layer. Within a
handful of epochs, we generate geometries that are recognized in their right object class at
a rate close to that of ground truth data. Importantly, we report significantly better diversity
measures (10x divergence reduction) over the state of the art, establishing that we cover
more of the original data distribution. In summary, we contribute.
● An effective cross-category AE-based latent representation on point clouds.
● The first (monolithic) GAN architecture operating on 3D point clouds.
● A surprisingly simpler, state-of-the-art GAN working in the AE’s latent space.
1) Autoencoder
For fixed latent representation
Vector arithmetic
2) Generative Adversarial Network
Using the fixed latent representation
In our latent-space GAN, instead of operating on the raw point cloud input, we pass the data through
our pre-trained autoencoder, trained separately for each object class with the Earth Mover’s distance
(EMD) loss function. Both the generator and the discriminator of the GAN then operate on the 512-
dimensional bottleneck variable of the AE. Finally, once the GAN training is over, the output of the
generator is decoded to a point cloud via the AE decoder. We found that very shallow designs for both
the generator and discriminator (in our case, 1 hidden layer for the generator and 2 for the
discriminator) are sufficient to produce realistic results

PointCloud generativeGANsforpointclouds #1b
Interpolating between different point clouds, using our latent
space representation. Note the interpolation between
structurally and topologically different shapes.
Generative results using our latent-space GAN. Note the
variability and fidelity of the result.
For a recap on GANs, you could see for example:
Cited by 106 - Related articles
What does GANs for point clouds mean in practice?
Point-cloud super-resolution (e.g. Ledig et al. 2016 for natural images), to improve
model appearance (e.g. remove staircasing), and inpainting (e.g. Iizuka et al. 2017)
to handle occlusion and gaps from indoor scans (“shape completion”). “Visual
plastic surgery” in other words (Tung et al. 2017)
Sung et al. (2015)
Data-driven Structural Priors for Shape Completion
Mönch et al. (2010)
Staircase-Aware Smoothing of Medical Surface Meshes

HardwarePointCloud Super-resolution multiplescans
https://doi.org/10.2312/SPBG/SPBG06/009-015
Cited by 47 articles
On the left, one scan of the the parrot
statue, with a sample spacing of
about 1mm. Center, we combine 100
nearly identical such scans to
produce the surface in the center,
produced on a grid with sample
spacing of about 0.3mm. Notice the
noise reduction and the improvement
in the detail, for instance in the face,
neck and wing feathers. On the right,
a photograph of the parrot statue.
Super-resolution reconstruction
using only 30 input scans at the left
and increasing to 140 at the right.
Noise is reduced dramatically at the
beginning but more slowly at the end.
Surfaces were reconstructed from
subsets which were pre-registered
using all 140 scans.
For absolute measurement accuracy (e.g. Biljecki et al. 2017), one can scan the same space multiple times
A thin strip of the super-resolved
surface, and the nearby sample
points from the input scans. The
input is very noisy, but the points are
densely and randomly distributed
near the surface with few outliers, so
the average gives an accurate
representation of the surface.
(a) One scan. (b) Final super-resolved surface from 100 scans. (c) Photo of
the object (a plaster cast of a subway token). The bottom row shows some
results of other kinds of processing, to evaluate the importance of the various
steps of the algorithm. (d) One scan, bilinearly interpolated onto the finer grid
and smoothed. Detail is missing. (e) The entire algorithm except for the final
bilateral filtering step. The noise removed by the filtering seems to be residual
registration error, which perhaps could be improved. (f) Just averaging 100
scans taken without moving the scanner, using the same Gaussian kernel. Noise
is decreased, but there is aliasing from the lower-resolution grid obscuring detail
visible in (b).

DeepLearningSuper-Resolution
Plentyofoptionsforimage/video/volumesuper-resolution
https://arxiv.org/abs/1704.02470 https://arxiv.org/abs/1612.00085
Novel texture enhancement framework
creates an HR style image that is rich in
details, which can be used to restore
high-frequency texture details back into
the initial HR image via the style transfer
algorithm.
Four examples of SR results for nearest
neighbor and cubic interpolation, the
best-performing sparse coding, 3D-
FSRCNN, and 3D-SRU-Net
configurations. Arrows indicate regions
in which at least one SR result mis-
interprets a cell boundary or an
ultrastructural feature. Scale bar 500
nm.
Our method includes a sub-pixel
motion compensation (SPMC) layer
that can better handle inter-frame
motion for this task. Our detail
fusion (DF) network that can
effectively fuse image details from
multiple images after SPMC
alignment

Point-cloudsuper-resolution
Upsampling‘on-the-fly’toavoid“dataexplosion”?
Jason Schreier
4/17/17 12:05pm Horizon Zero Dawn, Kotaku
http://kotaku.com/horizon-zero-dawn-uses-all-sorts-
of-clever-tricks-to-lo-1794385026
Games like this don’t just look incredible because of ‘hyper-realism’
but because their engineers use all sorts of tricks [LOD’ing, or Level
of Detail; Mipmapping; frustum culling, etc.] to save memory.
The engine is designed to produce models in CityGML and does so in multiple
LODs. Besides the generation of multiple geometric LODs, we implement the
realisation of multiple levels of spatiosemantic coherence, geometric reference
variants, and indoor representations. The datasets produced by Random3Dcity
are suited for several applications, as we show in this paper with documented
uses. The developed engine is available under an open-source licence at Github
at http://github.com/tudelft3d/Random3Dcity
http://doi.org/10.5194/isprs-annals-IV-4-W1-51-2016
Filip Biljecki, Hugo Ledoux, Jantien Stoter
Level of detail texture filtering with dithering
and mipmaps US 5831624 A
Original Assignee 3Dfx Interactive Inc
https://www.google.com/patents/US5831624
Level-of-detail rendering: colors identify different
subdivision levels as stated in the top left corner.
Feature-Adaptive Rendering of Loop
Subdivision Surfaces on Modern GPUs
November 2014 DOI: 10.1007/s11390-014-1486-x
ManyLoDs: Parallel Many-View
Level-of-Detail Selection for Real-
Time Global Illumination
Matthias Hollander, Tobias Ritschel, Elmar Eisemann, Tamy Boubekeur
(2011) http://dx.doi.org/10.1111/j.1467-8659.2011.01982.x

3DContentgeneration VolumetricCapture
Generatecontentbyscanningreal-lifescenesandobjects
Kul Wadhwa's and Roddy O'Hara's Uncorporeal
http://www.uncorporeal.com/
Uncorporeal: volumetric capture systems for VR & AR content
creation. The team includes a technical Oscar-winner and
engineering and product leadership from WETA, Google X, Lucas
ILM, and Wikimedia.
https://venturebeat.com/2016/10/13/pathbreaker-ventures-raises-12-milli
on-to-invest-in-emerging-tech-such-as-vr-ar-and-robotics/
Ryan Gembala, founder of Pathbreaker Ventures
believes connected homes and cars and
autonomous vehicles will create a lot of
opportunities in vertical applications for startups.
And he also thinks that space technologies such as
small satellites, analysis of space-captured data,
consumer transport, space mining, and others are
interesting.
REALITYVIRTUAL.CO - A NEW ZEALAND BASED
CREATIVE TECHNOLOGIES RESEARCH &
DEVELOPMENT COLLECTIVE WITH AN ENTHUSIAST
TOWARDS THE VISUAL REALM:
● unique post production & signal processing techniques
including the development of deep learning image
enhancement & automation throughout our 3D pipeline
for PBR workflow
● strong emphasis on advanced robotics & autonomous
operations for large data acquisition of 3D
environments.
3D Scene Creation with Photogrammetry

3DContentgeneration Automaticphotorealism#1
Stillcanbequitelabor-intensivetocreaterealisticcontent
Get to know Rense de Boer, a technical art director from
Sweden, who is not only pushing the envelope of photo-real
CGI environments, but he’s doing it all in a real-time engine!
Art by Rens
https://news.developer.nvidia.com/artist-spotlight-creating-photorealistic-cgi-environments-in-real-time/
https://www.youtube.com/watch?v=bXouFfqSfxg
One Ph.D. position (supervision by Profs Niessner and Rüdiger
Westermann) is available at our chair in the area of photorealistic rendering
for deep learning and online reconstruction
Research in this project includes the development of photorealistic realtime rendering
algorithms that can be used in deep learning applications for scene understanding, and for
high-quality scalable rendering of point scans from depth sensors and RGB stereo image
reconstruction. If you are interested in applying, you should have a strong background in
computer science, i.e., efficient algorithms and data structures, and GPU programming,
have experience implementing C/C++ algorithms, and you should be excited to work on
state-of-the-art research in the 3D computer graphics.
https://wwwcg.in.tum.de/group/joboffers/phd-position-photorealistic-rendering-for-deep-le
arning-and-online-reconstruction.html
Ph.D. Position – Photorealistic Rendering for
Deep Learning and Online Reconstruction

3DContentgeneration Automaticphotorealism#2
ConvertingLiDARscanstovisuallyhighquality3Dcontent
Atom View is a new piece of software that allows content creators to
translate real-world scans into assets for virtual environments. Not only
does it aim to produce realistic results but also reduce the workflow for
content creation. The standalone app takes files captured from
volumetric cameras, offline graphics renderers, 360 lidar and more.
Volumetric capture is a promising area of development that could one day
allow content creators to skip over several of the more laborious steps of
traditional 3D content creation with better results. With Atom View, users can
even edit objects once they’ve been imported.
https://youtu.be/YxRI_3gKP8g

3DContentgeneration Styletransfer formaps
Neural Networks and The Future of 3D Procedural Content Generation
by Sam Snider-Held, Creative Technologist at MediaMonks, focusing on the intersection of AR, VR, AI, UX, and
Style transfer output on the left, real terrain on the right. Both are planes
whose vertices are being displaced by the height map texture.
Now was time to create my own style transfer light field and light field renderer. I
basically reimplemented Andrew Lowndes’ WebGl light field renderer in Unity.
What this post demonstrates is the idea that neural network could
radically change how we generate 3D content. I went with light fields
because currently my GPU is not fast enough to style transfer or any
other generative network at 60 FPS. But if we do get to that point, it’s
entirely possible see generative neural networks become an alternative
rendering pipe line to the standard rasterization approach. In this way,
neural networks could generate each frame of a game in real time,
based on realtime feedback from the user.
But it also potentially allows for a much more powerful creative approach, for
the creator and the end user. Imagine playing Gears of War, but then telling the
computer “Keep the gameplay, story, and 3d models, but make it look like
Zelda: Breath of the Wild.” This is how creating or playing a future gaming
experience could be, all because computers now know what things “look like”
and can make other things “look like” them too.

3DContentgeneration from Videoto3D
Production-Level Facial Performance Capture Using Deep
Convolutional Neural Networks In Proceedings of SCA'17, Los Angeles,
CA, USA, July 28-30, 2017
http://research.nvidia.com/publication/facial-performance-capture-deep
-neural-networks
Samuli Laine, Tero Karras, Timo Aila, Antti Herva (Remedy
Entertainment), Shunsuke Saito (Pinscreen, University of Southern
California), Ronald Yu (Pinscreen, University of Southern California), Hao
Li (USC Institute for Creative Technologies, University of Southern
California, Pinscreen), Jaakko Lehtinen (NVIDIA, Aalto University)
NVIDIA and game developer Remedy (Alan Wake, Quantum Break) showcased their
team-up solution to streamlining motion capture and animation using a deep learning
neural network, running on NVIDIA’s powerful DGX-1 server. After being “trained” with
information on previously produced animations, the network is able to generate
sophisticated 3D facial animation from videos of live actors, greatly alleviating the
time and labor burden of traditional mo-cap animation — it can even learn enough to
generate facial animation from just an audio clip. The companies believe this system
could eventually produce animation that’s just as good or better than traditionally
produced fare.
http://www.animationmagazine.net/events/siggraph-facial-animation-advances-fabri
c-engine-the-french-contingent/
“We present a real-time deep learning framework for video-based facial
performance capture -- the dense 3D tracking of an actor's face given a monocular
video. Our pipeline begins with accurately capturing a subject using a high-end
production facial capture pipeline based on multi-view stereo tracking and artist-
enhanced animations.
With 5-10 minutes of captured footage, we train a convolutional neural network to
produce high-quality output, including self-occluded regions, from a monocular
video sequence of that subject. Since this 3D facial performance capture is fully
automated, our system can drastically reduce the amount of labor involved in the
development of modern narrative-driven video games or films involving realistic
digital doubles of actors and potentially hours of animated dialogue per character. “

3DContentgeneration from Video(&Audio) toVideo
Face2Face: Real-time Face Capture and Reenactment of RGB Videos
Justus Thies1
Michael Zollhöfer 2
Marc Stamminger 1
Christian Theobalt 2
Matthias Nießner 3
1
University of Erlangen-Nuremberg2
Max Planck Institute for Informatics 3
Stanford University
http://www.graphics.stanford.edu/~niessner/thies2016face.html
https://doi.org/10.1109/CVPR.2016.262
Neural Face Editing
with Intrinsic Image
Disentangling
Zhixin Shu, Ersin Yumer,
Sunil Hadap, Kalyan Sunkavalli,
Eli Shechtman, Dimitris Samaras
(Submitted on 13 Apr 2017)
University of Washington researchers have developed new
algorithms that solve a thorny challenge in the field of computer
vision: turning audio clips into a realistic, lip-synced video of the
person speaking those words.
As detailed in a paper to be presented Aug. 2 at SIGGRAPH 2017,
the team successfully generated highly-realistic video of former
president Barack Obama talking about terrorism, fatherhood, job
creation and other topics using audio clips of those speeches and
existing weekly video addresses that were originally on a different
topic.
Synthesizing Obama: learning lip sync
from audioSupasorn Suwajanakorn, Steven M. Seitz,
Ira Kemelmacher-Shlizerman
ACM Transactions on Graphics (TOG), Volume 36 Issue 4,
July 2017, https://doi.org/10.1145/3072959.3073640
http://www.washington.edu/news/2017/07
/11/lip-syncing-obama-new-tools-turn-a
udio-clips-into-realistic-video/

Emerging 3D Scanning Technologies for PropTech

Emerging 3D Scanning Technologies for PropTech

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (20)

Ähnlich wie Emerging 3D Scanning Technologies for PropTech

Ähnlich wie Emerging 3D Scanning Technologies for PropTech (20)

Mehr von PetteriTeikariPhD

Mehr von PetteriTeikariPhD (20)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Emerging 3D Scanning Technologies for PropTech