13. Fluentd stream aggregation:
Good for simple data/calculation
collect parse
clean up
process
visualize
store process
14. Our services:
More and more different services
Many changes in a day (including logging)
Many kind of logs for each services
Many different metrics for each services
15. Fluentd stream aggregation:
Not good for processing
about complex/fragile environment...
collect parse
clean up
process
visualize
store process
16. We want to:
add/remove queries anytime we want
write many queries for a service log stream
ignore events without data we want
make our service directors / growth hackers to write their own
queries!
20. Norikra:
Schema-less Stream Processing with SQL
Server software, written in JRuby, runs on JVM
Open source software (GPLv2)
http://norikra.github.io/
https://github.com/norikra/norikra
21. How To Setup Norikra:
Install JRuby
download jruby.tar.gz, extract it and export $PATH
use rbenv
rbenv install jruby-1.7.xx
rbenv shell jruby-..
Install Norikra
gem install norikra
Execute Norikra server
norikra start
22. Norikra Interface:
CLI client/Client library: norikra-client
norikra-client target open ...
norikra-client query add ...
tail -f ... | norikra-client event send ...
WebUI
show status
show/add/remove queries
HTTP API
JSON, MessagePack
23. Norikra:
Schema-less event stream:
Add/Remove data fields whenever you want
SQL:
No more restarts to add/remove queries
w/ JOINs, w/ SubQueries
w/ UDF (in Java/Ruby as rubygems)
Truly Complex events:
Nested Hash/Array, accessible directly from SQL
25. Norikra Queries: (1)
{“name”:”tagomoris”,
“age”:35, “address”:”Tokyo”,
“corp”:”LINE”, “current”:”San Diego”}
SELECT name, age
FROM events
{“name”:”tagomoris”,”age”:35}
26. Norikra Queries: (1)
{“name”:”tagomoris”,
“address”:”Tokyo”,
“corp”:”LINE”, “current”:”San Diego”}
without “age”
SELECT name, age
FROM events
nothing
27. Norikra Queries: (2)
{“name”:”tagomoris”,
“address”:”Tokyo”,
“corp”:”LINE”, “current”:”San Diego”}
SELECT name, age
FROM events
WHERE current=”San Diego”
{“name”:”tagomoris”,”age”:35}
28. Norikra Queries: (2)
{“name”:”nobu”,
“age”:0, “address”:”Somewhere”,
“corp”:”Heroku”, “current”:”SAN”}
current is not “San Diego”
SELECT name, age
FROM events
WHERE current=”San Diego”
nothing
29. Norikra Queries: (3)
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
30. Norikra Queries: (3)
{“name”:”tagomoris”,
“address”:”Tokyo”,
“corp”:”LINE”, “current”:”San Diego”}
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
every 5 mins
{”age”:35,”cnt”:3}, {“age”:33,”cnt”:1}, ...
31. Norikra Queries: (4)
{“name”:”tagomoris”,
“address”:”Tokyo”,
“corp”:”LINE”, “current”:”San Diego”}
SELECT age, COUNT(*) as cnt
FROM
events.win:time_batch(5 mins)
GROUP BY age
{”age”:35,”cnt”:3},
{“age”:33,”cnt”:1},
...
SELECT max(age) as max
FROM
events.win:time_batch(5 mins)
{“max”:51}
every 5 mins
32. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:35, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”San Diego”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY age
33. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:35, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”San Diego”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
GROUP BY user.age
34. Norikra Queries: (5)
{“name”:”tagomoris”,
“user:{“age”:35, “corp”:”LINE”,
“address”:”Tokyo”},
“current”:”San Diego”,
“speaker”:true,
“attend”:[true,true,false, ...]
}
SELECT user.age, COUNT(*) as cnt
FROM events.win:time_batch(5 mins)
WHERE current=”San Diego”
AND attend.$0 AND attend.$1
GROUP BY user.age
36. Use case 1:
External API call reports for partners (LINE)
External API call for LINE Business Connect
LINE backend sends requests to partner’s API
endpoint using users’ messages
http://developers.linecorp.com/blog/?p=3386
37. Use case 1:
External API call reports for partners (LINE)
channel
gateway
partner’s
server
logs
query
results
MySQL Mail
SELECT
channelId
AS
channel_id,
reason,
detail,
count(*)
AS
error_count,
min(timestamp)
AS
first_timestamp,
max(timestamp)
AS
last_timestamp
FROM
api_error_log.win:time_batch(60
sec)
GROUP
BY
channelId,reason,detail
HAVING
count(*)
>
0
http://developers.linecorp.com/blog/?p=3386
38. Use case 1:
External API call reports for partners (LINE)
API error response summaries
http://developers.linecorp.com/blog/?p=3386
39. Use case 2: Lambda architecture
Prompt reports for Ad service console
Prompt reports with Norikra + Fixed reports with Hive
app
serverapp
serverapp
server
app
serverapp
serverapp
server
Fluentd
HDFS
console
service
execute hive query
(daily)
fetch query results
(frequently)
impression
logs
40. Use case 2:
Prompt reports for Ad service console
SELECT
yyyymmdd,
hh,
campaign_id,
region,
lang,
COUNT(*)
AS
click,
COUNT(DISTINCT
member_id)
AS
uu
FROM
(
SELECT
yyyymmdd,
hh,
get_json_object(log,
'$.campaign.id')
AS
campaign_id,
get_json_object(log,
'$.member.region')
AS
region,
get_json_object(log,
'$.member.lang')
AS
lang,
get_json_object(log,
'$.member.id')
AS
member_id
FROM
applog
WHERE
service='myservice'
AND
yyyymmdd='20140913'
AND
get_json_object(log,
'$.type')='click'
)
x
GROUP
BY
yyyymmdd,
hh,
campaign_id,
region,
lang
Hive query
for fixed reports
41. Use case 2:
Prompt reports for Ad service console
Norikra query for prompt reports
SELECT
campaign.id
AS
campaign_id,
member.region
AS
region,
member.lang
AS
lang,
COUNT(*)
AS
click,
COUNT(DISTINCT
member.id)
AS
uu
FROM
myservice.win:time_batch(1
hours)
WHERE
type="click"
GROUP
BY
campaign.id,
member.region,
member.lang
42. Use case 3:
Realtime access dashboard on Google Platform
Access log visualization
Count using Norikra (2-step), Store on Google BigQuery
Dashboard on Google Spreadsheet + Apps Script
http://qiita.com/kazunori279/items/6329df57635799405547
https://www.youtube.com/watch?v=EZkw5TDcCGw
43. Use case 3:
Realtime access dashboard on Google Platform
Server
Fluentd
http://qiita.com/kazunori279/items/6329df57635799405547
https://www.youtube.com/watch?v=EZkw5TDcCGw
ngnix
access log
access logs
to BigQuery
norikra query results
norikra query to aggregate node
to aggregate locally
44. Use case 3:
Realtime access dashboard on Google Platform
70 servers, 120,000 requests/sec (or more!)
Fluentd
logs to store
http://qiita.com/kazunori279/items/6329df57635799405547
https://www.youtube.com/watch?v=EZkw5TDcCGw
ngnix
ngngninxix ngngninxix ngngninxix ngngninxix
ngngninxix ngngninxix ngngninxix ngngninxix ngnix
Google
BigQuery
Google
Spreadsheet
+ Apps script
...
counts per host
total count
45. Why Norikra is written in JRuby
Esper
CEP(Complex Event Processing) library, written in Java
Rubygems.org
Open repository, for public UDF plugins of Norikra provided as gem
46. JRuby for me
Ruby! (by great JRuby developer team!)
makes developing Norikra dramatically faster
with rubygems and rubygems.org for easy deployment/installation
with Java libraries, ex: Jetty, Esper, ...
There are not so many users in Tokyo :(
47. More queries, more simplicity
and less latency
in data processing
Thanks!
photo: by my co-workers
http://norikra.github.io/
https://github.com/norikra/norikra
48. See also:
http://norikra.github.io/
“Lambda Architecture Platform Using SQL”
http://www.slideshare.net/tagomoris/lambda-architecture-using-sql-hadoopcon-
2014-taiwan
“Stream processing and Norikra”
http://www.slideshare.net/tagomoris/stream-processing-and-norikra
“Batch processing and Stream processing by SQL”
http://www.slideshare.net/tagomoris/hcj2014-sql
“Norikra in Action”
http://www.slideshare.net/tagomoris/norikra-in-action-ver-2014-spring
http://www.slideshare.net/tagomoris/presentations
49. Storm or Norikra?
Simple and fixed workload for huge traffic
Use Storm!
Complex and fragile workload for non-huge traffic
Use Norikra!