SlideShare a Scribd company logo
1 of 53
How to Write the Fastest JSON
Parser/Writer in the World
Milo Yip
Tencent
28 Mar 2015
Milo Yip 叶劲峰
• Expert Engineer (2011 to now)
– Engine Technology Center, R & D Department,
Interactive Entertainment Group (IEG), Tencent
• Master of Philosophy in System Engineering &
Engineering Management, CUHK
• Bachelor of Cognitive Science, HKU
• https://github.com/miloyip
• http://www.cnblogs.com/miloyip
• http://www.zhihu.com/people/miloyip
Table of Contents
1. Introduction
2. Benchmark
3. Design
4. Limitations
5. Thoughts
6. References
1. INTRODUCTION
JSON
• JavaScript Object Notation
• Alternative to XML
• Human-readable text to transmit/persist data
• RFC 7159/ECMA-404
• Common uses
– Open API (e.g. Twitter, Facebook, etc.)
– Data storage/exchange (e.g. GeoJSON)
RapidJSON
• https://github.com/miloyip/rapidjson
• MIT License
• C++ Header-only Library
• Started in Nov 2011
• Inspired by RapidXML
• Will release 1.0 under Tencent *soon*
Features
• Both SAX and DOM style API
• Fast
• Cross platform/compiler
• No dependencies
• Memory friendly
• UTF-8/16/32/ASCII and transcoding
• In-situ Parsing
• More at http://miloyip.github.io/rapidjson/md_doc_features.html
Hello RapidJSON!
#include "rapidjson/document.h"
#include "rapidjson/writer.h"
#include "rapidjson/stringbuffer.h"
#include <iostream>
using namespace rapidjson;
int main() {
// 1. Parse a JSON string into DOM.
const char* json = "{"project":"rapidjson","stars":10}";
Document d;
d.Parse(json);
// 2. Modify it by DOM.
Value& s = d["stars"];
s.SetInt(s.GetInt() + 1);
// 3. Stringify the DOM
StringBuffer buffer;
Writer<StringBuffer> writer(buffer);
d.Accept(writer);
// Output {"project":"rapidjson","stars":11}
std::cout << buffer.GetString() << std::endl;
return 0;
}
Fast, AND Reliable
• 103 Unit Tests
• Continuous Integration
– Travis on Linux
– AppVeyor on Windows
– Valgrind (Linux) for memory leak checking
• Use in real applications
– Use in client and server applications at Tencent
– A user reported parsing 50 million JSON daily
Public Projects using RapidJSON
• Cocos2D-X: Cross-Platform 2D Game Engine
http://cocos2d-x.org/
• Microsoft Bond: Cross-Platform Serialization
https://github.com/Microsoft/bond/
• Google Angle: OpenGL ES 2 for Windows
https://chromium.googlesource.com/angle/angle/
• CERN LHCb: Large Hadron Collider beauty
http://lhcb-comp.web.cern.ch/lhcb-comp/
• Tell me if you know more
2. BENCHMARK
Benchmarks for Native JSON libraries
• https://github.com/miloyip/nativejson-benchmark
• Compare 20 open source C/C++ JSON libraries
• Evaluate speed, memory and code size
• For parsing, stringify, traversal, and more
Libaries
• CAJUN
• Casablanca
• cJSON
• dropbox/json11
• FastJson
• gason
• jansson
• json-c
• json spirit
• Json Box
• JsonCpp
• JSON++
• parson
• picojson
• RapidJSON
• simplejson
• udp/json
• ujson4c
• vincenthz/libjson
• YAJL
Results: Parsing Speed
Results: Parsing Memory
Results: Stringify Speed
Results: Code Size
Benchmarks for Spine
• Spine is a 2D skeletal animation tool
• Spine-C is the official runtime in C
https://github.com/EsotericSoftware/spine-runtimes/tree/master/spine-c
• It uses JSON as data format
• It has a custom JSON parser
• Adapt RapidJSON and compare loading time
Test Data
• http://esotericsoftware.com/forum/viewtopic.php?f=3&t=2831
• Original 80KB JSON
• Interpolate to get
multiple JSON files
• Load 100 times
Results
3. DESIGN
The Zero Overhead Principle
• Bjarne Stroustrup[1]:
“What you don't use, you don't pay for.”
• RapidJSON tries to obey this principle
– SAX and DOM
– Combinable options, configurations
SAX
StartObject()
Key("hello", 5, true)
String("world", 5, true)
Key("t", 1, true)
Bool(true)
Key("f", 1, true)
Bool(false)
Key("n", 1, true)
Null()
Key("i")
UInt(123)
Key("pi")
Double(3.1416)
Key("a")
StartArray()
Uint(1)
Uint(2)
Uint(3)
Uint(4)
EndArray(4)
EndObject(7)
DOM
When parsing a JSON to DOM, use SAX events to build a DOM.
When stringify a DOM, traverse it and generate events to SAX.
{"hello":"world", "t":true, "f":false, "n":null,
"i":123, "pi":3.1416, "a":[1, 2, 3, 4]}
DOM
SAX
Architecture
Value
Document
Reader
Writer
<<concept>>
Handler
<<concept>>
Stream
<<concept>>
Encoding
<<concept>>
Allocator
calls
implements
implements
accepts
has
Handler: Template Parameter
• Handler handles SAX event callbacks
• How to implement callbacks?
– Traditional: virtual function
– RapidJSON: template parameter
template <unsigned parseFlags, typename InputStream, typename Handler>
ParseResult Reader::Parse(InputStream& is, Handler& handler);
• No virtual function overhead
• Inline callback functions
Parsing Options: Template Argument
• Many parse options -> Zero overhead principle
• Use integer template argument
template <unsigned parseFlags, typename InputStream, typename Handler>
ParseResult Reader::Parse(InputStream& is, Handler& handler);
if (parseFlags & kParseInsituFlag) {
// ...
}
else {
// ...
}
• Compiler optimization eliminates unused code
Recursive SAX Parser
• Simple to write/optimize by hand
• Use program stack to maintain parsing state of
the tree structure
• Prone to stack overflow
– So also provide an iterative parser
(Contributed by Don Ding @thebusytypist)
Normal Parsing
In situ Parsing
No allocation and copying for strings! Cache Friendly!
Parsing Number: the Pain ;(
• RapidJSON supports parsing JSON number to
uint32_t, int32_t, uint64_t, int64_t, double
• Difficult to detect in single pass
• Even more difficult for double (strtod() is slow)
• Implemented kFullPrecision option using
1. Fast-path
2. DIY-FP (https://github.com/floitsch/double-conversion)
3. Big Integer method [2]
How difficult?
• PHP Hangs On Numeric Value 2.2250738585072011e-308
http://www.exploringbinary.com/php-hangs-on-numeric-
value-2-2250738585072011e-308/
• Java Hangs When Converting 2.2250738585072012e-308
http://www.exploringbinary.com/java-hangs-when-
converting-2-2250738585072012e-308/
• "2.22507385850720113605740979670913197593481954635
164564e-308“ → 2.2250738585072009e-308
• "2.22507385850720113605740979670913197593481954635
164565e-308“→ 2.2250738585072014e-308
• And need to be fast…
DOM Designed for Fast Parsing
• A JSON value can be one of 6 types
– object, array, number, string, boolean, null
• Inheritance needs new for each value
• RapidJSON uses a single variant type Value
Layout of Value
String
Ch* str
SizeType length
unsigned flags
Number
int i unsigned u
int64_t i64 uint64_t u64 double d
0 0
unsigned flags
Object
Member* members
SizeType size
SizeType capacity
unsigned flags
Array
Value* values
SizeType size
SizeType capacity
unsigned flags
Move Semantics
• Deep copying object/array/string is slow
• RapidJSON enforces move semantics
The Default Allocator
• Internally allocates a single linked-list of
buffers
• Do not free objects (thus FAST!)
• Suitable for parsing (creating values
consecutively)
• Not suitable for DOM manipulation
Custom Initial Buffer
• User can provide a custom initial buffer
– For example, buffer on stack, scratch buffer
• The allocator use that buffer first until it is full
• Possible to archive zero allocation in parsing
Short String Optimization
• Many JSON keys are short
• Contributor @Kosta-Github submitted a PR to
optimize short strings
String
Ch* str
SizeType length
unsigned flags
ShortString
Ch str[11];
uint8_t x;
unsigned flags
Let length = 11 – x
So 11-char long string is ended with ‘0’
SIMD Optimization
• Using SSE2/SSE4 to skip whitespaces
(space, tab, LF, CR)
• Each iteration compare 16 chars × 4 chars
• Fast for JSON with indentation
• Visual C++ 2010 32-bit test:
strlen()
for ref.
strspn() RapidJSON
(no SIMD)
RapidJSON
(SSE2)
RapidJSON
(SSE4)
Skip 1M
whitespace
(ms)
752 3011 1349 170 102
Integer-to-String Optimization
• Integer-To-String conversion is simple
– E.g. 123 -> “123”
• But standard library is quite slow
– E.g. sprintf(), _itoa(), etc.
• Tried various implementations
My implementations
• https://github.com/miloyip/itoa-benchmark
• Visual C++ 2013 on Windows 64-bit
Double-to-String Optimziation
• Double-to-string conversion is very slow
– E.g. 3.14 -> “3.14”
• Grisu2 is a fast algorithm for this[3]
– 100% cases give correct results
– >99% cases give optimal results
• Google V8 has an implementation
– https://github.com/floitsch/double-conversion
– But not header-only, so…
My Grisu2 Implementation
• https://github.com/miloyip/dtoa-benchmark
• Visual C++ 2013 on Windows 64-bit:
4. LIMITATIONS
Tradeoff: User-Friendliness
• DOM only supports move semantics
– Cannot copy-construct Value/Document
– So, cannot pass them by value, put in containers
• DOM APIs needs allocator as parameter, e.g.
numbers.PushBack(1, allocator);
• User needs to concern life-cycle of allocator
and its allocated values
Pausing in Parsing
• Cannot pause in parsing and resume it later
– Not keeping all parsing states explicitly
– Doing so will be much slower
• Typical Scenario
– Streaming JSON from network
– Don’t want to store the JSON in memory
• Solution
– Parse in an separate thread
– Block the input stream to pause
5. THOUGHTS
Origin
• RapidJSON is my hobby project in 2011
• Also my first open source project
• First version released in 2 weeks
Community
• Google Code helps tracking bugs but hard to
involve contributions
• After migrating to GitHub in 2014
– Community much more active
– Issue tracking more powerful
– Pull requests ease contributions
Future
• Official Release under Tencent
– 1.0 beta → 1.0 release (after 3+ years…)
– Can work on it in working time
– Involve marketing and other colleagues
– Establish Community in China
• Post-1.0 Features
– Easy DOM API (but slower)
– JSON Schema
– Relaxed JSON syntax
– Optimization on Object Member Access
• Open source our internal projects at Tencent
To Establish an Open Source Project
• Courage
• Start Small
• Make Different
– Innovative Idea?
– Easy to Use?
– Good Performance?
• Embrace Community
• Learn
References
1. Stroustrup, Bjarne. The design and evolution
of C++. Pearson Education India, 1994.
2. Clinger, William D. How to read floating point
numbers accurately. Vol. 25. No. 6. ACM,
1990.
3. Loitsch, Florian. "Printing floating-point
numbers quickly and accurately with
integers." ACM Sigplan Notices 45.6 (2010):
233-243.
Q&A

More Related Content

What's hot

超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya
超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya 超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya
超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya Netwalker lab kapper
 
프로그래머에게 사랑받는 게임 기획서 작성법
프로그래머에게 사랑받는 게임 기획서 작성법프로그래머에게 사랑받는 게임 기획서 작성법
프로그래머에게 사랑받는 게임 기획서 작성법Lee Sangkyoon (Kay)
 
NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉
NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉
NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉iFunFactory Inc.
 
김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019
김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019
김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019devCAT Studio, NEXON
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307Linaro
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLinaro
 
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games ConferenceKGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games ConferenceXionglong Jin
 
Next-generation MMORPG service architecture
Next-generation MMORPG service architectureNext-generation MMORPG service architecture
Next-generation MMORPG service architectureJongwon Kim
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsKaran Singh
 
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...Yasunori Goto
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic ControlSUSE Labs Taipei
 
게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCPSeungmo Koo
 
MMO Design Architecture by Andrew
MMO Design Architecture by AndrewMMO Design Architecture by Andrew
MMO Design Architecture by AndrewAgate Studio
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPThomas Graf
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaCloudera, Inc.
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detailMIJIN AN
 
[NDC2016] TERA 서버의 Modern C++ 활용기
[NDC2016] TERA 서버의 Modern C++ 활용기[NDC2016] TERA 서버의 Modern C++ 활용기
[NDC2016] TERA 서버의 Modern C++ 활용기Sang Heon Lee
 
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月VirtualTech Japan Inc.
 
실시간 게임 서버 최적화 전략
실시간 게임 서버 최적화 전략실시간 게임 서버 최적화 전략
실시간 게임 서버 최적화 전략YEONG-CHEON YOU
 

What's hot (20)

超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya
超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya 超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya
超激安WinタブレットにLinuxを入れて 賢く経済的にリサイクルしよう in 2018 #oscnagoya
 
Memory model
Memory modelMemory model
Memory model
 
프로그래머에게 사랑받는 게임 기획서 작성법
프로그래머에게 사랑받는 게임 기획서 작성법프로그래머에게 사랑받는 게임 기획서 작성법
프로그래머에게 사랑받는 게임 기획서 작성법
 
NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉
NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉
NDC14 범용 게임 서버 프레임워크 디자인 및 테크닉
 
김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019
김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019
김동건, 할머니가 들려주신 마비노기 개발 전설, NDC2019
 
WALT vs PELT : Redux - SFO17-307
WALT vs PELT : Redux  - SFO17-307WALT vs PELT : Redux  - SFO17-307
WALT vs PELT : Redux - SFO17-307
 
LCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoCLCU14-410: How to build an Energy Model for your SoC
LCU14-410: How to build an Energy Model for your SoC
 
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games ConferenceKGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
KGC 2016: HTTPS 로 모바일 게임 서버 구축한다는 것 - Korea Games Conference
 
Next-generation MMORPG service architecture
Next-generation MMORPG service architectureNext-generation MMORPG service architecture
Next-generation MMORPG service architecture
 
Ceph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion ObjectsCeph scale testing with 10 Billion Objects
Ceph scale testing with 10 Billion Objects
 
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
The Forefront of the Development for NVDIMM on Linux Kernel (Linux Plumbers c...
 
Linux Linux Traffic Control
Linux Linux Traffic ControlLinux Linux Traffic Control
Linux Linux Traffic Control
 
게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP게임서버프로그래밍 #1 - IOCP
게임서버프로그래밍 #1 - IOCP
 
MMO Design Architecture by Andrew
MMO Design Architecture by AndrewMMO Design Architecture by Andrew
MMO Design Architecture by Andrew
 
Cilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDPCilium - Container Networking with BPF & XDP
Cilium - Container Networking with BPF & XDP
 
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, ClouderaHadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
 
RocksDB detail
RocksDB detailRocksDB detail
RocksDB detail
 
[NDC2016] TERA 서버의 Modern C++ 활용기
[NDC2016] TERA 서버의 Modern C++ 활용기[NDC2016] TERA 서버의 Modern C++ 활용기
[NDC2016] TERA 서버의 Modern C++ 활용기
 
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
知っておくべきCephのIOアクセラレーション技術とその活用方法 - OpenStack最新情報セミナー 2015年9月
 
실시간 게임 서버 최적화 전략
실시간 게임 서버 최적화 전략실시간 게임 서버 최적화 전략
실시간 게임 서버 최적화 전략
 

Viewers also liked

Json for modern c++
Json for modern c++Json for modern c++
Json for modern c++지환 김
 
GPU Gems3 Vegetation
GPU Gems3 VegetationGPU Gems3 Vegetation
GPU Gems3 VegetationYoupyo Choi
 
D2 Horizon Occlusion
D2 Horizon OcclusionD2 Horizon Occlusion
D2 Horizon OcclusionYoupyo Choi
 
D2 Depth of field
D2 Depth of fieldD2 Depth of field
D2 Depth of fieldYoupyo Choi
 
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRYFINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRYnitinparashar786
 
How to be a writer in a world of structured content
How to be a writer in a world of structured contentHow to be a writer in a world of structured content
How to be a writer in a world of structured contentFabrizio Ferri-Benedetti
 
Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands Stephen Landau
 
Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
 Learning To Sell - The Most Essential Start-up Skill by Chris Cousins Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
Learning To Sell - The Most Essential Start-up Skill by Chris CousinsGibraltar Startup
 
Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2Vinaykumar Hebballi
 
Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Thibault Imbert
 

Viewers also liked (20)

Rapid json tutorial
Rapid json tutorialRapid json tutorial
Rapid json tutorial
 
Java JSON Benchmark
Java JSON BenchmarkJava JSON Benchmark
Java JSON Benchmark
 
Json for modern c++
Json for modern c++Json for modern c++
Json for modern c++
 
JSON and REST
JSON and RESTJSON and REST
JSON and REST
 
JSON with C++ & C#
JSON with C++ & C#JSON with C++ & C#
JSON with C++ & C#
 
D2 Rain (1/2)
D2 Rain (1/2)D2 Rain (1/2)
D2 Rain (1/2)
 
GPU Gems3 Vegetation
GPU Gems3 VegetationGPU Gems3 Vegetation
GPU Gems3 Vegetation
 
D2 Horizon Occlusion
D2 Horizon OcclusionD2 Horizon Occlusion
D2 Horizon Occlusion
 
D2 Rain (2/2)
D2 Rain (2/2)D2 Rain (2/2)
D2 Rain (2/2)
 
D2 Havok
D2 HavokD2 Havok
D2 Havok
 
D2 Job Pool
D2 Job PoolD2 Job Pool
D2 Job Pool
 
D2 Depth of field
D2 Depth of fieldD2 Depth of field
D2 Depth of field
 
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRYFINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
FINDING FORENSIC ARTIFACTS FROM WINDOW REGISTRY
 
D2 Hdr
D2 HdrD2 Hdr
D2 Hdr
 
How to Sell Content Strategy... in Spain
How to Sell Content Strategy... in SpainHow to Sell Content Strategy... in Spain
How to Sell Content Strategy... in Spain
 
How to be a writer in a world of structured content
How to be a writer in a world of structured contentHow to be a writer in a world of structured content
How to be a writer in a world of structured content
 
Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands Stories that Sell: Content Strategy for Adventure Brands
Stories that Sell: Content Strategy for Adventure Brands
 
Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
 Learning To Sell - The Most Essential Start-up Skill by Chris Cousins Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
Learning To Sell - The Most Essential Start-up Skill by Chris Cousins
 
Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2Open Ldap Integration and Configuration with Lifray 6.2
Open Ldap Integration and Configuration with Lifray 6.2
 
Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013Workers of the web - BrazilJS 2013
Workers of the web - BrazilJS 2013
 

Similar to How to Write the Fastest JSON Parser/Writer in the World

Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and AbstractionsMetosin Oy
 
Go from a PHP Perspective
Go from a PHP PerspectiveGo from a PHP Perspective
Go from a PHP PerspectiveBarry Jones
 
Python VS GO
Python VS GOPython VS GO
Python VS GOOfir Nir
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyRobert Viseur
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshareColin Charles
 
Hibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesHibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesBrett Meyer
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage SystemsSATOSHI TAGOMORI
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at TwitterAlex Payne
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014Derek Collison
 
Writing a fast HTTP parser
Writing a fast HTTP parserWriting a fast HTTP parser
Writing a fast HTTP parserfukamachi
 
PostgreSQL is the new NoSQL - at Devoxx 2018
PostgreSQL is the new NoSQL  - at Devoxx 2018PostgreSQL is the new NoSQL  - at Devoxx 2018
PostgreSQL is the new NoSQL - at Devoxx 2018Quentin Adam
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tigerElizabeth Smith
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdbjixuan1989
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...Amazon Web Services
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Javamalduarte
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tigerElizabeth Smith
 

Similar to How to Write the Fastest JSON Parser/Writer in the World (20)

Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
Performance and Abstractions
Performance and AbstractionsPerformance and Abstractions
Performance and Abstractions
 
Go from a PHP Perspective
Go from a PHP PerspectiveGo from a PHP Perspective
Go from a PHP Perspective
 
Python VS GO
Python VS GOPython VS GO
Python VS GO
 
Introduction to libre « fulltext » technology
Introduction to libre « fulltext » technologyIntroduction to libre « fulltext » technology
Introduction to libre « fulltext » technology
 
Your backend architecture is what matters slideshare
Your backend architecture is what matters slideshareYour backend architecture is what matters slideshare
Your backend architecture is what matters slideshare
 
Hibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance TechniquesHibernate ORM: Tips, Tricks, and Performance Techniques
Hibernate ORM: Tips, Tricks, and Performance Techniques
 
Ruby and Distributed Storage Systems
Ruby and Distributed Storage SystemsRuby and Distributed Storage Systems
Ruby and Distributed Storage Systems
 
The Why and How of Scala at Twitter
The Why and How of Scala at TwitterThe Why and How of Scala at Twitter
The Why and How of Scala at Twitter
 
High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014High Performance Systems in Go - GopherCon 2014
High Performance Systems in Go - GopherCon 2014
 
Writing a fast HTTP parser
Writing a fast HTTP parserWriting a fast HTTP parser
Writing a fast HTTP parser
 
PostgreSQL is the new NoSQL - at Devoxx 2018
PostgreSQL is the new NoSQL  - at Devoxx 2018PostgreSQL is the new NoSQL  - at Devoxx 2018
PostgreSQL is the new NoSQL - at Devoxx 2018
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
From a student to an apache committer practice of apache io tdb
From a student to an apache committer  practice of apache io tdbFrom a student to an apache committer  practice of apache io tdb
From a student to an apache committer practice of apache io tdb
 
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
AWS re:Invent 2016| GAM302 | Sony PlayStation: Breaking the Bandwidth Barrier...
 
High Performance With Java
High Performance With JavaHigh Performance With Java
High Performance With Java
 
Zero mq logs
Zero mq logsZero mq logs
Zero mq logs
 
Taming the resource tiger
Taming the resource tigerTaming the resource tiger
Taming the resource tiger
 
Php
PhpPhp
Php
 
Php
PhpPhp
Php
 

Recently uploaded

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Patryk Bandurski
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii SoldatenkoFwdays
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 

Recently uploaded (20)

Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
Integration and Automation in Practice: CI/CD in Mule Integration and Automat...
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko"Debugging python applications inside k8s environment", Andrii Soldatenko
"Debugging python applications inside k8s environment", Andrii Soldatenko
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 

How to Write the Fastest JSON Parser/Writer in the World

  • 1. How to Write the Fastest JSON Parser/Writer in the World Milo Yip Tencent 28 Mar 2015
  • 2. Milo Yip 叶劲峰 • Expert Engineer (2011 to now) – Engine Technology Center, R & D Department, Interactive Entertainment Group (IEG), Tencent • Master of Philosophy in System Engineering & Engineering Management, CUHK • Bachelor of Cognitive Science, HKU • https://github.com/miloyip • http://www.cnblogs.com/miloyip • http://www.zhihu.com/people/miloyip
  • 3.
  • 4. Table of Contents 1. Introduction 2. Benchmark 3. Design 4. Limitations 5. Thoughts 6. References
  • 6. JSON • JavaScript Object Notation • Alternative to XML • Human-readable text to transmit/persist data • RFC 7159/ECMA-404 • Common uses – Open API (e.g. Twitter, Facebook, etc.) – Data storage/exchange (e.g. GeoJSON)
  • 7. RapidJSON • https://github.com/miloyip/rapidjson • MIT License • C++ Header-only Library • Started in Nov 2011 • Inspired by RapidXML • Will release 1.0 under Tencent *soon*
  • 8. Features • Both SAX and DOM style API • Fast • Cross platform/compiler • No dependencies • Memory friendly • UTF-8/16/32/ASCII and transcoding • In-situ Parsing • More at http://miloyip.github.io/rapidjson/md_doc_features.html
  • 9. Hello RapidJSON! #include "rapidjson/document.h" #include "rapidjson/writer.h" #include "rapidjson/stringbuffer.h" #include <iostream> using namespace rapidjson; int main() { // 1. Parse a JSON string into DOM. const char* json = "{"project":"rapidjson","stars":10}"; Document d; d.Parse(json); // 2. Modify it by DOM. Value& s = d["stars"]; s.SetInt(s.GetInt() + 1); // 3. Stringify the DOM StringBuffer buffer; Writer<StringBuffer> writer(buffer); d.Accept(writer); // Output {"project":"rapidjson","stars":11} std::cout << buffer.GetString() << std::endl; return 0; }
  • 10. Fast, AND Reliable • 103 Unit Tests • Continuous Integration – Travis on Linux – AppVeyor on Windows – Valgrind (Linux) for memory leak checking • Use in real applications – Use in client and server applications at Tencent – A user reported parsing 50 million JSON daily
  • 11. Public Projects using RapidJSON • Cocos2D-X: Cross-Platform 2D Game Engine http://cocos2d-x.org/ • Microsoft Bond: Cross-Platform Serialization https://github.com/Microsoft/bond/ • Google Angle: OpenGL ES 2 for Windows https://chromium.googlesource.com/angle/angle/ • CERN LHCb: Large Hadron Collider beauty http://lhcb-comp.web.cern.ch/lhcb-comp/ • Tell me if you know more
  • 13. Benchmarks for Native JSON libraries • https://github.com/miloyip/nativejson-benchmark • Compare 20 open source C/C++ JSON libraries • Evaluate speed, memory and code size • For parsing, stringify, traversal, and more
  • 14. Libaries • CAJUN • Casablanca • cJSON • dropbox/json11 • FastJson • gason • jansson • json-c • json spirit • Json Box • JsonCpp • JSON++ • parson • picojson • RapidJSON • simplejson • udp/json • ujson4c • vincenthz/libjson • YAJL
  • 19. Benchmarks for Spine • Spine is a 2D skeletal animation tool • Spine-C is the official runtime in C https://github.com/EsotericSoftware/spine-runtimes/tree/master/spine-c • It uses JSON as data format • It has a custom JSON parser • Adapt RapidJSON and compare loading time
  • 20. Test Data • http://esotericsoftware.com/forum/viewtopic.php?f=3&t=2831 • Original 80KB JSON • Interpolate to get multiple JSON files • Load 100 times
  • 23. The Zero Overhead Principle • Bjarne Stroustrup[1]: “What you don't use, you don't pay for.” • RapidJSON tries to obey this principle – SAX and DOM – Combinable options, configurations
  • 24. SAX StartObject() Key("hello", 5, true) String("world", 5, true) Key("t", 1, true) Bool(true) Key("f", 1, true) Bool(false) Key("n", 1, true) Null() Key("i") UInt(123) Key("pi") Double(3.1416) Key("a") StartArray() Uint(1) Uint(2) Uint(3) Uint(4) EndArray(4) EndObject(7) DOM When parsing a JSON to DOM, use SAX events to build a DOM. When stringify a DOM, traverse it and generate events to SAX. {"hello":"world", "t":true, "f":false, "n":null, "i":123, "pi":3.1416, "a":[1, 2, 3, 4]}
  • 26. Handler: Template Parameter • Handler handles SAX event callbacks • How to implement callbacks? – Traditional: virtual function – RapidJSON: template parameter template <unsigned parseFlags, typename InputStream, typename Handler> ParseResult Reader::Parse(InputStream& is, Handler& handler); • No virtual function overhead • Inline callback functions
  • 27. Parsing Options: Template Argument • Many parse options -> Zero overhead principle • Use integer template argument template <unsigned parseFlags, typename InputStream, typename Handler> ParseResult Reader::Parse(InputStream& is, Handler& handler); if (parseFlags & kParseInsituFlag) { // ... } else { // ... } • Compiler optimization eliminates unused code
  • 28. Recursive SAX Parser • Simple to write/optimize by hand • Use program stack to maintain parsing state of the tree structure • Prone to stack overflow – So also provide an iterative parser (Contributed by Don Ding @thebusytypist)
  • 30. In situ Parsing No allocation and copying for strings! Cache Friendly!
  • 31. Parsing Number: the Pain ;( • RapidJSON supports parsing JSON number to uint32_t, int32_t, uint64_t, int64_t, double • Difficult to detect in single pass • Even more difficult for double (strtod() is slow) • Implemented kFullPrecision option using 1. Fast-path 2. DIY-FP (https://github.com/floitsch/double-conversion) 3. Big Integer method [2]
  • 32. How difficult? • PHP Hangs On Numeric Value 2.2250738585072011e-308 http://www.exploringbinary.com/php-hangs-on-numeric- value-2-2250738585072011e-308/ • Java Hangs When Converting 2.2250738585072012e-308 http://www.exploringbinary.com/java-hangs-when- converting-2-2250738585072012e-308/ • "2.22507385850720113605740979670913197593481954635 164564e-308“ → 2.2250738585072009e-308 • "2.22507385850720113605740979670913197593481954635 164565e-308“→ 2.2250738585072014e-308 • And need to be fast…
  • 33. DOM Designed for Fast Parsing • A JSON value can be one of 6 types – object, array, number, string, boolean, null • Inheritance needs new for each value • RapidJSON uses a single variant type Value
  • 34. Layout of Value String Ch* str SizeType length unsigned flags Number int i unsigned u int64_t i64 uint64_t u64 double d 0 0 unsigned flags Object Member* members SizeType size SizeType capacity unsigned flags Array Value* values SizeType size SizeType capacity unsigned flags
  • 35. Move Semantics • Deep copying object/array/string is slow • RapidJSON enforces move semantics
  • 36. The Default Allocator • Internally allocates a single linked-list of buffers • Do not free objects (thus FAST!) • Suitable for parsing (creating values consecutively) • Not suitable for DOM manipulation
  • 37. Custom Initial Buffer • User can provide a custom initial buffer – For example, buffer on stack, scratch buffer • The allocator use that buffer first until it is full • Possible to archive zero allocation in parsing
  • 38. Short String Optimization • Many JSON keys are short • Contributor @Kosta-Github submitted a PR to optimize short strings String Ch* str SizeType length unsigned flags ShortString Ch str[11]; uint8_t x; unsigned flags Let length = 11 – x So 11-char long string is ended with ‘0’
  • 39. SIMD Optimization • Using SSE2/SSE4 to skip whitespaces (space, tab, LF, CR) • Each iteration compare 16 chars × 4 chars • Fast for JSON with indentation • Visual C++ 2010 32-bit test: strlen() for ref. strspn() RapidJSON (no SIMD) RapidJSON (SSE2) RapidJSON (SSE4) Skip 1M whitespace (ms) 752 3011 1349 170 102
  • 40. Integer-to-String Optimization • Integer-To-String conversion is simple – E.g. 123 -> “123” • But standard library is quite slow – E.g. sprintf(), _itoa(), etc. • Tried various implementations
  • 42. Double-to-String Optimziation • Double-to-string conversion is very slow – E.g. 3.14 -> “3.14” • Grisu2 is a fast algorithm for this[3] – 100% cases give correct results – >99% cases give optimal results • Google V8 has an implementation – https://github.com/floitsch/double-conversion – But not header-only, so…
  • 43. My Grisu2 Implementation • https://github.com/miloyip/dtoa-benchmark • Visual C++ 2013 on Windows 64-bit:
  • 45. Tradeoff: User-Friendliness • DOM only supports move semantics – Cannot copy-construct Value/Document – So, cannot pass them by value, put in containers • DOM APIs needs allocator as parameter, e.g. numbers.PushBack(1, allocator); • User needs to concern life-cycle of allocator and its allocated values
  • 46. Pausing in Parsing • Cannot pause in parsing and resume it later – Not keeping all parsing states explicitly – Doing so will be much slower • Typical Scenario – Streaming JSON from network – Don’t want to store the JSON in memory • Solution – Parse in an separate thread – Block the input stream to pause
  • 48. Origin • RapidJSON is my hobby project in 2011 • Also my first open source project • First version released in 2 weeks
  • 49. Community • Google Code helps tracking bugs but hard to involve contributions • After migrating to GitHub in 2014 – Community much more active – Issue tracking more powerful – Pull requests ease contributions
  • 50. Future • Official Release under Tencent – 1.0 beta → 1.0 release (after 3+ years…) – Can work on it in working time – Involve marketing and other colleagues – Establish Community in China • Post-1.0 Features – Easy DOM API (but slower) – JSON Schema – Relaxed JSON syntax – Optimization on Object Member Access • Open source our internal projects at Tencent
  • 51. To Establish an Open Source Project • Courage • Start Small • Make Different – Innovative Idea? – Easy to Use? – Good Performance? • Embrace Community • Learn
  • 52. References 1. Stroustrup, Bjarne. The design and evolution of C++. Pearson Education India, 1994. 2. Clinger, William D. How to read floating point numbers accurately. Vol. 25. No. 6. ACM, 1990. 3. Loitsch, Florian. "Printing floating-point numbers quickly and accurately with integers." ACM Sigplan Notices 45.6 (2010): 233-243.
  • 53. Q&A