The document discusses building Java applications that use MongoDB as the database. It covers connecting to MongoDB from Java using the driver, designing schemas for embedded documents and arrays, building Java objects to represent and insert data, and performing basic operations like inserts. The document also mentions using an object-document mapper like Morphia to simplify interactions between Java objects and MongoDB documents.
2. Contents
• From
Sampling
to
n=all
• Implica1ons
from
a
Data
Standpoint
• Building
an
Applica1on
in
Java
• Core
Features
of
MongoDB
• Connec1ng
the
Dots
3. Random Sample
Image
Source:
SurveyMonkey
Sampling
• Based
on
Random
Sampling
• Used
in
a
variety
of
fields
–
Opinion
Polls,
Bug
Es1ma1on
etc
Issues
• Loss
of
detail
–
3%
margin
of
error
• Are
the
samples
truly
Random?
• Outliers
might
have
very
interes1ng
informa1on
• Black
Swan
Events
have
a
massive
impact
that
cannot
be
captured
in
a
Normal
Distribu1on
4. N=All
Causa1on
to
Correla1on
From
Why
to
What
Ever-‐more
Ubiquitous
access
to
the
Digital
World
Cost
of
Storage
has
plummeted
over
the
years
Ability
to
process
Unstructured
and
semi-‐structured
informa1on
SoUware
tools
that
can
process
the
data
at
real-‐1me
Source:
Big
Data
–
Viktor
Mayer-‐Schönberger
and
Kenneth
Cukier
8. Example Application Requirements
• Skillsets of Employees
• Certification and Skill level
• Dashboard View of real-time data
• Scalable, Reliable and Performant
Database
9. Design the Schema
Embedded
Informa1on
Sub-‐documents,
Arrays
etc
Na1vely
Supported
Differing
Data
RD
DVa
FP
DA
DVo
GS
RTA
DD
10. Preparing the Java Application
• Add
the
driver
Libraries
to
the
Classpath
3.0
New
Features
– Generic
MongoCollec1on
Interface
– New
Asynchronous
API
– New
Codec
Infrastructure
– New
Core
Driver
• Start
the
MongoDB
instance.
Let’s
start
with
a
standalone
instance.
For
a
write-‐
performant
storage
engine,
start
the
mongod
with
–storageEngine
wiredTiger
11. Build the Java Object
Or
Use
a
Object-‐
Document
Mapper
such
as
Morphia
@En1ty
public
class
coll
{
@Id
private
int
id;
private
String
name;
@Embedded
private
List<SkillsPOJO>
skills;
@Embedded
private
InfoPOJO
info;
@Embedded
public
class
SkillsPOJO
{
private
String
skill;
private
int
level;
private
String
version;
private
boolean
cer1fied;
//
Similarly
for
Info
POJO
public
class
DataObject
{
private
int
id;
private
String
name;
private
List<SkillObject>
obj;
private
InfoObject
info;
public
class
SkillObject
{
private
String
skill;
private
int
level;
private
String
version;
private
boolean
cer1fied;
public
class
InfoObject
{
private
String
dept;
private
int
experience;
private
List<Double>
gps;
private
String
loca1on;
private
boolean
reviewed;
12.
DB
Tier
Connect to MongoDB
mongod
Java
Client
Driver
public
void
MongoConnect(String[]
hosts)
{
List<ServerAddress>
seeds
=
new
ArrayList<ServerAddress>();
for
(String
h
:
hosts)
{
//
MongoDB
Server
address
and
Port
seeds.add(new
ServerAddress(h));
}
//
MongoDB
client
with
internal
connec1on
pooling.
client
=
new
MongoClient(seeds);
//
The
database
to
connect
to
database
=
client.getDatabase("mydb");
//
The
collec1on
to
connect
to
collec6on
=
database.getCollec/on("coll");
}
import
com.mongodb.MongoClient;
import
com.mongodb.client.MongoCollec1on;
import
com.mongodb.client.MongoDatabase;
13. Or Use an ODM
import
com.mongodb.MongoClient;
import
com.mongodb.client.MongoCollec1on;
import
com.mongodb.client.MongoDatabase;
import
org.mongodb.morphia.Datastore;
import
org.mongodb.morphia.Morphia;
public
void
MorphiaConnect(String[]
hosts)
{
List<ServerAddress>
seeds
=
new
ArrayList<ServerAddress>();
for
(String
h
:
hosts)
{
seeds.add(new
ServerAddress(h));
}
client
=
new
MongoClient(seeds);
morphia
=
new
Morphia();
//
Map
the
Morphia
Object
morphia.map(coll.class).map(SkillsPOJO.class).
map(InfoPOJO.class);
//
Create
a
datastore
to
interact
with
MongoDB
//
using
POJOs
ds
=
morphia.createDatastore(client,
"mydb");
}
DB
Tier
mongod
Java
Client
Driver
23. Delete Data
Using
Morphia
import
sta1c
com.mongodb.client.model.Filters.*;
…
public
void
delete(int
id)
{
collec/on.deleteOne(eq("_id",
id));
System.out.println("Deleted
Document
with
id:
"
+
id
+
"n");
…
}
public
void
delete(int
id)
{
Query<coll>
query
=
ds.createQuery(coll.class)
.field("id").equal(id);
ds.delete(query);
…
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
24.
Replica
Set
High Availability
Secondary
Secondary
Primary
Java
Client
Driver
✔
✔
✔
• Automated
Fail-‐
over
• Rolling
upgrades
• Mul1
Data
Center
Support
• Data
Durability
and
Strong
Consistency
Heartbeat
RD
DVa
FP
DA
DVo
GS
RTA
DD
25. MongoDB set up
Use MongoDB OpsManager or Cloud Manager Automation to set up the cluster
(or)
sudo mongod --port 27017 --dbpath /data/rs1 --replSet rs --logpath /logs/rs1.log --fork
sudo mongod --port 27018 --dbpath /data/rs2 --replSet rs --logpath /logs/rs2.log --fork
sudo mongod --port 27019 --dbpath /data/rs3 --replSet rs --logpath /logs/rs3.log --fork
mongo --port 27017
> config = { "_id" : "rs", "members" : [
... {"host":"localhost:27017", "_id":0},
... {"host":"localhost:27018", "_id":1},
... {"host":"localhost:27019", "_id":2}
... ]
... }
rs.initiate(config)
In
the
Java
Program,
pass
the
addresseses
and
Ports
of
the
replica
set
members
as
part
of
the
Connec1on
String
RD
DVa
FP
DA
DVo
GS
RTA
DD
26. Ensuring Durability
• By
default,
WriteConcern
is
Acknowledged
=>
received
write
opera1on
and
has
applied
the
change
in-‐memory
• Primary
Server
crash
means
that
the
data
might
be
lost
• Stricter
WriteConcern
such
as
Majority
or
w:2
for
(int
retry
=
0;
retry
<
3;
retry++)
{
try
{
collec6on.withWriteConcern(WriteConcern.MAJORITY)
.insertOne(doc);
break;
}
catch
(Excep1on
e)
{
e.getMessage();
Thread.sleep(5000);
}
}
RD
DVa
FP
DA
DVo
GS
RTA
DD
27. Eventual Consistency
Repor1ng
Applica1on
Driver
Replica
Set
P
S
S
• Read
from
the
nearest
node
for
lower
latency
• Read-‐only
applica1ons
where
eventual
consistency
is
OK
–
For
Ex:
Repor1ng
Applica1ons
• Can
be
achieved
using
ReadPreference
in
MongoDB
• Modes
of
Primary,
PrimaryPreferred,
Secondary,
SecondaryPreferred
and
Nearest
Repor1ng
Applica1on
and
Secondary
Member
are
on
the
same
DC
myDoc
=
collec6on
.withReadPreference(ReadPreference.nearest())
.find(eq("_id",
id)).first();
28. HA Best Practices
• HA
against
DC
failures
and
ac1ve-‐ac1ve
=>
5
Nodes
across
3
DCs
• For
Writes
=>
Majority
Nodes
Need
to
be
in
Ac1ve
State
• For
Reads
=>
Secondary
Reads
can
con1nue
• Majority
Inac1ve
=>
Force
Reconfig
to
con1nue
Writes
rs:SECONDARY>
config
=
{
"_id"
:
"rs",
"members"
:
[
...
{"host":"localhost:27018",
"_id":1}
...
]
...
}
rs:SECONDARY>
rs.reconfig(config,
{force:true})
{
"ok"
:
1
}
rs:PRIMARY>
Replica
Set
Removed
Removed
Primary
Java
Client
Driver
✔
✗
✗
31.
DB
Tier
Sharding
Shard
1
Java
Client
Driver
Shard
2
P
S
S
P
S
S
Router
Router
…
Client
Tier
Config
Server
Config
Server
Config
Server
Shard
n
P
S
S
• Scale
as
you
grow
• Redundancy
is
built-‐in
at
all
levels
• 3
Types
of
Sharding
–
Range,
Hashed
or
Tag-‐
Aware
RD
DVa
FP
DA
DVo
GS
RTA
DD
32. MongoDB set up
Use MongoDB OpsManager or Cloud Manager Automation to set up the cluster
(or)
sudo mongod --port 37017 --dbpath /data/shard1 --logpath /logs/shard1.log –fork
sudo mongod --port 37018 --dbpath /data/shard2 --logpath /logs/shard2.log –fork
sudo mongod --port 47017 --dbpath /data/cfg —configsvr --logpath /logs/cfg.log –fork
sudo mongos --port 57017 --configdb localhost:47017
sudo mongos --port 57018 --configdb localhost:47017
mongo --port 57017
> sh.addShard("localhost:37017”)
> sh.addShard("localhost:37018”)
> sh.enableSharding("mydb”)
> sh.shardCollection("mydb.coll",{"_id":1})
In
the
Java
Program,
pass
the
Router
IP
addresseses
and
Ports
as
part
of
the
Connec1on
String
RD
DVa
FP
DA
DVo
GS
RTA
DD
33. MongoDB for a Big Data World
Rich
Data
Data
Variety
Fast
Processing
Data
Availability
Data
Volume
Geo-‐
Spa1al
Real-‐1me
Access
Data
Durability
34. MongoDB for a Big Data World
Rich
Data
Data
Variety
Fast
Processing
Data
Availability
Data
Volume
Geo-‐
Spa1al
Real-‐1me
Access
Data
Durability
Flexible
Data
Model
and
Dynamic
Schema
Embedded
Data
Na1ve
Replica1on
Across
Data
Centers
Appropriate
WriteConcern
Rich
Query
Model
and
Aggrega1on
Na1ve
Geo-‐
Spa1al
Features
Horizontal
Scalability
as
you
grow
Sub-‐documents,
Arrays
etc
35. More Information – Java/MongoDB
Resource Location
MongoDB Java Driver
http://docs.mongodb.org/
ecosystem/drivers/java/
Java API to connect to
MongoDB
http://api.mongodb.org/java/3.0/
Driver Download
http://mongodb.github.io/mongo-
java-driver/
Morphia Project
https://github.com/mongodb/
morphia
Hadoop Driver for
MongoDB
http://docs.mongodb.org/
ecosystem/tools/hadoop/
University Course
https://university.mongodb.com/
courses/M101J/about?
jmp=docs&_ga=1.249916550.186
6581253.1440492145
36. Resource Location
Case Studies mongodb.com/customers
Presentations mongodb.com/presentations
Free Online Training university.mongodb.com
Webinars and Events mongodb.com/events
Documentation docs.mongodb.org
MongoDB Downloads mongodb.com/download
Additional Info info@mongodb.com
More Information – MongoDB
Core Driver – alternative API
MongoDB Async Driver - A new asynchronous API that can leverage either Netty or Java 7’s AsynchronousSocketChannel for fast and non-blocking IO.
Netty is a non-blocking I/O (NIO) client-server framework for the development of Java network applications such as protocol servers and clients.
Pool of connections to the Database – even with multiple threads
MongoClientOptions.Builder()
connectionsPerHost
HeartbeatConnectTimeout
HeartbeatFrequency
MaxconnectionIdleTime
To create a capped Collection -> createCollection (MaxDocuments, UsePowerof2Sizes, capped), getCollection – defer creation till data is written
Lambda function as well – Java 8
SingleResultCallback<T> - An interface to describe the completion of an asynchronous operation.
QueryFilter
A position is the fundamental geometry construct. The "coordinates" member of a geometry object is composed of one position (in the case of a Point geometry), an array of positions (LineString or MultiPoint geometries), an array of arrays of positions (Polygons, MultiLineStrings), or a multidimensional array of positions (MultiPolygon)