2. About Me
• a.k.a. ihower
• http://ihower.tw
• http://twitter.com/ihower
• http://github.com/ihower
• Ruby on Rails Developer since 2006
• Ruby Taiwan Community
• http://ruby.tw
3. Agenda
• Distributed Ruby
• Distributed Message Queues
• Background-processing in Rails
• Message Queues for Rails
• SOA for Rails
• Distributed Filesystem
• Distributed database
5. DRb
• Ruby's RMI system
(remote method invocation)
• an object in one Ruby process can invoke
methods on an object in another Ruby
process on the same or a different machine
6. DRb (cont.)
• no defined interface, faster development time
• tightly couple applications, because no
defined API, but rather method on objects
• unreliable under large-scale, heavy loads
production environments
7. server example 1
require 'drb'
class HelloWorldServer
def say_hello
'Hello, world!'
end
end
DRb.start_service("druby://127.0.0.1:61676",
HelloWorldServer.new)
DRb.thread.join
10. server example 2
require 'drb'
require 'user'
class UserServer
attr_accessor :users
def find(id)
self.users[id-1]
end
end
user_server = UserServer.new
user_server.users = []
5.times do |i|
user = User.new
user.username = i + 1
user_server.users << user
end
DRb.start_service("druby://127.0.0.1:61676", user_server)
DRb.thread.join
13. Why? DRbUndumped
• Default DRb operation
• Pass by value
• Must share code
• With DRbUndumped
• Pass by reference
• No need to share code
14. Example 2 Fixed
# user.rb
class User
include DRbUndumped
attr_accessor :username
end
# <DRb::DRbObject:0x1003b84f8 @ref=2149433940,
@uri="druby://127.0.0.1:61676">
# Username: 2
# Username: ihower
15. Why use DRbUndumped?
• Big objects
• Singleton objects
• Lightweight clients
• Rapidly changing software
16. ID conversion
• Converts reference into DRb object on server
• DRbIdConv (Default)
• TimerIdConv
• NamedIdConv
• GWIdConv
17. Beware of garbage
collection
• referenced objects may be collected on
server (usually doesn't matter)
• Building Your own ID Converter if you want
to control persistent state.
18. DRb security
require 'drb'
ro = DRbObject.new_with_uri("druby://127.0.0.1:61676")
class << ro
undef :instance_eval
end
# !!!!!!!! WARNING !!!!!!!!! DO NOT RUN
ro.instance_eval("`rm -rf *`")
20. DRb security (cont.)
• Access Control Lists (ACLs)
• via IP address array
• still can run denial-of-service attack
• DRb over SSL
21. Rinda
• Rinda is a Ruby port of Linda distributed
computing paradigm.
• Linda is a model of coordination and communication among several parallel processes
operating upon objects stored in and retrieved from shared, virtual, associative memory. This
model is implemented as a "coordination language" in which several primitives operating on
ordered sequence of typed data objects, "tuples," are added to a sequential language, such
as C, and a logically global associative memory, called a tuplespace, in which processes
store and retrieve tuples. (WikiPedia)
22. Rinda (cont.)
• Rinda consists of:
• a TupleSpace implementation
• a RingServer that allows DRb services to
automatically discover each other.
23. RingServer
• We hardcoded IP addresses in DRb
program, it’s tight coupling of applications
and make fault tolerance difficult.
• RingServer can detect and interact with
other services on the network without
knowing IP addresses.
24. 1. Where Service X?
RingServer
via broadcast UDP
address
2. Service X: 192.168.1.12
Client
@192.1681.100
3. Hi, Service X @ 192.168.1.12
Service X
@ 192.168.1.12
4. Hi There 192.168.1.100
25. ring server example
require 'rinda/ring'
require 'rinda/tuplespace'
DRb.start_service
Rinda::RingServer.new(Rinda::TupleSpace.new)
DRb.thread.join
26. service example
require 'rinda/ring'
class HelloWorldServer
include DRbUndumped # Need for RingServer
def say_hello
'Hello, world!'
end
end
DRb.start_service
ring_server = Rinda::RingFinger.primary
ring_server.write([:hello_world_service, :HelloWorldServer, HelloWorldServer.new,
'I like to say hi!'], Rinda::SimpleRenewer.new)
DRb.thread.join
27. client example
require 'rinda/ring'
DRb.start_service
ring_server = Rinda::RingFinger.primary
service = ring_server.read([:hello_world_service, nil,nil,nil])
server = service[2]
puts server.say_hello
puts service.inspect
# Hello, world!
# [:hello_world_service, :HelloWorldServer, #<DRb::DRbObject:0x10039b650
@uri="druby://fe80::21b:63ff:fec9:335f%en1:57416", @ref=2149388540>, "I like
to say hi!"]
28. TupleSpaces
• Shared object space
• Atomic access
• Just like bulletin board
• Tuple template is
[:name, :Class, object, ‘description’ ]
29. 5 Basic Operations
• write
• read
• take (Atomic Read+Delete)
• read_all
• notify (Callback for write/take/delete)
30. Starfish
• Starfish is a utility to make distributed
programming ridiculously easy
• It runs both the server and the client in
infinite loops
• MapReduce with ActiveRecode or Files
31. starfish foo.rb
# foo.rb
class Foo
attr_reader :i
def initialize
@i = 0
end
def inc
logger.info "YAY it incremented by 1 up to #{@i}"
@i += 1
end
end
server :log => "foo.log" do |object|
object = Foo.new
end
client do |object|
object.inc
end
32. starfish server example
ARGV.unshift('server.rb')
require 'rubygems'
require 'starfish'
class HelloWorld
def say_hi
'Hi There'
end
end
Starfish.server = lambda do |object|
object = HelloWorld.new
end
Starfish.new('hello_world').server
33. starfish client example
ARGV.unshift('client.rb')
require 'rubygems'
require 'starfish'
Starfish.client = lambda do |object|
puts object.say_hi
exit(0) # exit program immediately
end
Starfish.new('hello_world').client
34. starfish client example (another way)
ARGV.unshift('server.rb')
require 'rubygems'
require 'starfish'
catch(:halt) do
Starfish.client = lambda do
|object|
puts object.say_hi
throw :halt
end
Starfish.new
('hello_world').client
end
puts "bye bye"
35. MapReduce
• introduced by Google to support
distributed computing on large data sets on
clusters of computers.
• inspired by map and reduce functions
commonly used in functional programming.
37. starfish client example
ARGV.unshift('client.rb')
require 'rubygems'
require 'starfish'
Starfish.client = lambda { |logs|
logs.each do |log|
puts "Processing #{log}"
sleep(1)
end
}
Starfish.new("log_server").client
38. Other implementations
• Skynet
• Use TupleSpace or MySQL as message queue
• Include an extension for ActiveRecord
• http://skynet.rubyforge.org/
• MRToolkit based on Hadoop
• http://code.google.com/p/mrtoolkit/
39. MagLev VM
• a fast, stable, Ruby implementation with
integrated object persistence and
distributed shared cache.
• http://maglev.gemstone.com/
• public Alpha currently
42. Why not DRb?
• DRb has security risk and poorly designed APIs
• distributed message queue is a great way to do
distributed programming: reliable and scalable.
43. Starling
• a light-weight persistent queue server that
speaks the Memcache protocol (mimics its
API)
• Fast, effective, quick setup and ease of use
• Powered by EventMachine
http://eventmachine.rubyforge.org/EventMachine.html
• Twitter’s open source project, they use it
before 2009. (now switch to Kestrel, a port of Starling from Ruby
to Scala)
45. Starling set example
require 'rubygems'
require 'starling'
starling = Starling.new('192.168.1.4:22122')
100.times do |i|
starling.set('my_queue', i)
end
append to the queue, not
overwrite in Memcached
46. Starling get example
require 'rubygems'
require 'starling'
starling = Starling.new('192.168.2.4:22122')
loop do
puts starling.get("my_queue")
end
47. get method
• FIFO
• After get, the object is no longer in the
queue. You will lost message if processing
error happened.
• The get method blocks until something is
returned. It’s infinite loop.
48. Handle processing
error exception
require 'rubygems'
require 'starling'
starling = Starling.new('192.168.2.4:22122')
results = starling.get("my_queue")
begin
puts results.flatten
rescue NoMethodError => e
puts e.message
Starling.set("my_queue", [results])
rescue Exception => e
Starling.set("my_queue", results)
raise e
end
49. Starling cons
• Poll queue constantly
• RabbitMQ can subscribe to a queue that
notify you when a message is available for
processing.
50. AMQP/RabbitMQ
• a complete and highly reliable enterprise
messaging system based on the emerging
AMQP standard.
• Erlang
• http://github.com/tmm1/amqp
• Powered by EventMachine
51. Stomp/ActiveMQ
• Apache ActiveMQ is the most popular and
powerful open source messaging and
Integration Patterns provider.
• sudo gem install stomp
• ActiveMessaging plugin for Rails
52. beanstalkd
• Beanstalk is a simple, fast workqueue
service. Its interface is generic, but was
originally designed for reducing the latency
of page views in high-volume web
applications by running time-consuming tasks
asynchronously.
• http://kr.github.com/beanstalkd/
• http://beanstalk.rubyforge.org/
• Facebook’s open source project
53. Why we need asynchronous/
background-processing in Rails?
• cron-like processing
text search index update etc)
(compute daily statistics data, create reports, Full-
• long-running tasks (sending mail, resizing photo’s, encoding videos,
generate PDF, image upload to S3, posting something to twitter etc)
• Server traffic jam: expensive request will block
server resources(i.e. your Rails app)
• Bad user experience: they maybe try to reload
and reload again! (responsive matters)
57. cron
• Cron is a time-based job scheduler in Unix-
like computer operating systems.
• crontab -e
58. Whenever
http://github.com/javan/whenever
• A Ruby DSL for Defining Cron Jobs
• http://asciicasts.com/episodes/164-cron-in-ruby
• or http://cronedit.rubyforge.org/
every 3.hours do
runner "MyModel.some_process"
rake "my:rake:task"
command "/usr/bin/my_great_command"
end
60. rufus-scheduler
http://github.com/jmettraux/rufus-scheduler
• scheduling pieces of code (jobs)
• Not replacement for cron/at since it runs
inside of Ruby.
require 'rubygems'
require 'rufus/scheduler'
scheduler =
Rufus::Scheduler.start_new
scheduler.every '5s' do
puts 'check blood pressure'
end
scheduler.join
61. Daemon Kit
http://github.com/kennethkalmer/daemon-kit
• Creating Ruby daemons by providing a
sound application skeleton (through a
generator), task specific generators (jabber
bot, etc) and robust environment
management code.
62. Monitor your daemon
• http://mmonit.com/monit/
• http://github.com/arya/bluepill
• http://god.rubyforge.org/
66. run_later plugin
http://github.com/mattmatt/run_later
• Borrowed from Merb
• Uses worker thread and a queue
• Simple solution for simple tasks
run_later do
AccountMailer.deliver_signup(@user)
end
68. spawn (cont.)
• By default, spawn will use the fork to spawn
child processes.You can configure it to do
threading.
• Works by creating new database
connections in ActiveRecord::Base for the
spawned block.
• Fock need copy Rails every time
69. threading vs. forking
• Forking advantages:
• more reliable? - the ActiveRecord code is not thread-safe.
• keep running - subprocess can live longer than its parent.
• easier - just works with Rails default settings. Threading
requires you set allow_concurrency=true and. Also,
beware of automatic reloading of classes in development
mode (config.cache_classes = false).
• Threading advantages:
• less filling - threads take less resources... how much less?
it depends.
• debugging - you can set breakpoints in your threads
70. Okay, we need
reliable messaging system:
• Persistent
• Scheduling: not necessarily all at the same time
• Scalability: just throw in more instances of your
program to speed up processing
• Loosely coupled components that merely ‘talk’
to each other
• Ability to easily replace Ruby with something
else for specific tasks
• Easy to debug and monitor
72. Rails only?
• Easy to use/write code
• Jobs are Ruby classes or objects
• But need to load Rails environment
73. ar_mailer
http://seattlerb.rubyforge.org/ar_mailer/
• a two-phase delivery agent for ActionMailer.
• Store messages into the database
• Delivery by a separate process, ar_sendmail
later.
74. BackgroundDRb
http://backgroundrb.rubyforge.org/
• BackgrounDRb is a Ruby job server and
scheduler.
• Have scalability problem due to
Mark Bates)
(~20 servers for
• Hard to know if processing error
• Use database to persist tasks
• Use memcached to know processing result
75. workling
http://github.com/purzelrakete/workling
• Gives your Rails App a simple API that you
can use to make code run in the
background, outside of the your request.
• Supports Starling(default), BackgroundJob,
Spawn and AMQP/RabbitMQ Runners.
77. Workling example
class EmailWorker < Workling::Base
def deliver(options)
user = User.find(options[:id])
user.deliver_activation_email
end
end
# in your controller
def create
EmailWorker.asynch_deliver( :id => 1)
end
78. delayed_job
• Database backed asynchronous priority
queue
• Extracted from Shopify
• you can place any Ruby object on its queue
as arguments
• Only load the Rails environment only once
80. delayed_job example
send_later
def deliver
mailing = Mailing.find(params[:id])
mailing.send_later(:deliver)
flash[:notice] = "Mailing is being delivered."
redirect_to mailings_url
end
81. delayed_job example
custom workers
class MailingJob < Struct.new(:mailing_id)
def perform
mailing = Mailing.find(mailing_id)
mailing.deliver
end
end
# in your controller
def deliver
Delayed::Job.enqueue(MailingJob.new(params[:id]))
flash[:notice] = "Mailing is being delivered."
redirect_to mailings_url
end
82. delayed_job example
always asynchronously
class Device
def deliver
# long running method
end
handle_asynchronously :deliver
end
device = Device.new
device.deliver
83. Running jobs
• rake jobs:works
(Don’t use in production, it will exit if the database has any network connectivity
problems.)
• RAILS_ENV=production script/delayed_job start
• RAILS_ENV=production script/delayed_job stop
84. Priority
just Integer, default is 0
• you can run multipie workers to handle different
priority jobs
• RAILS_ENV=production script/delayed_job -min-
priority 3 start
Delayed::Job.enqueue(MailingJob.new(params[:id]), 3)
Delayed::Job.enqueue(MailingJob.new(params[:id]), -3)
85. Scheduled
no guarantees at precise time, just run_after_at
Delayed::Job.enqueue(MailingJob.new(params[:id]), 3, 3.days.from_now)
Delayed::Job.enqueue(MailingJob.new(params[:id]),
3, 1.month.from_now.beginning_of_month)
86. Configuring Dealyed
Job
# config/initializers/delayed_job_config.rb
Delayed::Worker.destroy_failed_jobs = false
Delayed::Worker.sleep_delay = 5 # sleep if empty queue
Delayed::Worker.max_attempts = 25
Delayed::Worker.max_run_time = 4.hours # set to the amount of time
of longest task will take
87. Automatic retry on failure
• If a method throws an exception it will be
caught and the method rerun later.
• The method will be retried up to 25
(default) times at increasingly longer
intervals until it passes.
• 108 hours at most
Job.db_time_now + (job.attempts ** 4) + 5
88. Capistrano Recipes
• Remember to restart delayed_job after
deployment
• Check out lib/delayed_job/recipes.rb
after "deploy:stop", "delayed_job:stop"
after "deploy:start", "delayed_job:start"
after "deploy:restart", "delayed_job:restart"
89. Resque
http://github.com/defunkt/resque
• a Redis-backed library for creating background jobs,
placing those jobs on multiple queues, and processing
them later.
• Github’s open source project
• you can only place JSONable Ruby objects
• includes a Sinatra app for monitoring what's going on
• support multiple queues
• you expect a lot of failure/chaos
90. My recommendations:
• General purpose: delayed_job
(Github highly recommend DelayedJob to anyone whose site is not 50% background work.)
• Time-scheduled: cron + rake
91. 5. SOA for Rails
• What’s SOA
• Why SOA
• Considerations
• The tool set
92. What’s SOA
Service oriented architectures
• “monolithic” approach is not enough
• SOA is a way to design complex applications
by splitting out major components into
individual services and communicating via
APIs.
• a service is a vertical slice of functionality:
database, application code and caching layer
93. a monolithic web app example
request
Load
Balancer
WebApps
Database
94. a SOA example
request
Load
request
Balancer
WebApp WebApps
for Administration for User
Services A Services B
Database Database
96. Shared Resources
• Different front-web website use the same
resource.
• SOA help you avoiding duplication databases
and code.
• Why not only shared database?
• code is not DRY WebApp
for Administration
WebApps
for User
• caching will be problematic
Database
97. Encapsulation
• you can change underly implementation in
services without affect other parts of system
• upgrade library
• upgrade to Ruby 1.9
• you can provide API versioning
98. Scalability1: Partitioned
Data Provides
• Database is the first bottleneck, a single DB
server can not scale. SOA help you reduce
database load
• Anti-pattern: only split the database WebApps
• model relationship is broken
• referential integrity Database
A
Database
B
• Myth: database replication can not help you
speed and consistency
99. Scalability 2: Caching
• SOA help you design caching system easier
• Cache data at the right times and expire
at the right times
• Cache logical model, not physical
• You do not need cache view everywhere
100. Scalability 3: Efficient
• Different components have different task
loading, SOA can scale by service.
WebApps
Load
Balancer Load
Balancer
Services A Services A Services B Services B Services B Services B
101. Security
• Different services can be inside different
firewall
• You can only open public web and
services, others are inside firewall.
102. Interoperability
• HTTP is the common interface, SOA help
you integrate them:
• Multiple languages
• Internal system e.g. Full-text searching engine
• Legacy database, system
• External vendors
103. Reuse
• Reuse across multiple applications
• Reuse for public APIs
• Example: Amazon Web Services (AWS)
105. Reduce Local
Complexity
• Team modularity along the same module
splits as your software
• Understandability: The amount of code is
minimized to a quantity understandable by
a small team
• Source code control
107. How to partition into
Separate Services
• Partitioning on Logical Function
• Partitioning on Read/Write Frequencies
• Partitioning by Minimizing Joins
• Partitioning by Iteration Speed
108. API Design
• Send Everything you need
• Parallel HTTP requests
• Send as Little as Possible
• Use Logical Models
109. Physical Models &
Logical Models
• Physical models are mapped to database
tables through ORM. (It’s 3NF)
• Logical models are mapped to your
business problem. (External API use it)
• Logical models are mapped to physical
models by you.
110. Logical Models
• Not relational or normalized
• Maintainability
• can change with no change to data store
• can stay the same while the data store
changes
• Better fit for REST interfaces
• Better caching
116. XML parser
• http://nokogiri.org/
• Nokogiri ( ) is an HTML, XML, SAX, and
Reader parser. Among Nokogiri’s many
features is the ability to search documents
via XPath or CSS3 selectors.
119. Tips
• Define your logical model (i.e. your service
request result) first.
• model.to_json and model.to_xml is easy to
use, but not useful in practice.
120. 6.Distributed File System
• NFS not scale
• we can use rsync to duplicate
• MogileFS
• http://www.danga.com/mogilefs/
• http://seattlerb.rubyforge.org/mogilefs-client/
• Amazon S3
• HDFS (Hadoop Distributed File System)
• GlusterFS
123. References
• Books&Articles:
• Distributed Programming with Ruby, Mark Bates (Addison Wesley)
• Enterprise Rails, Dan Chak (O’Reilly)
• Service-Oriented Design with Ruby and Rails, Paul Dix (Addison Wesley)
• RESTful Web Services, Richardson&Ruby (O’Reilly)
• RESTful WEb Services Cookbook, Allamaraju&Amundsen (O’Reilly)
• Enterprise Recipes with Ruby on Rails, Maik Schmidt (The Pragmatic Programmers)
• Ruby in Practice, McAnally&Arkin (Manning)
• Building Scalable Web Sites, Cal Henderson (O’Reilly)
• Background Processing in Rails, Erik Andrejko (Rails Magazine)
• Background Processing with Delayed_Job, James Harrison (Rails Magazine)
• Bulinging Scalable Web Sites, Cal Henderson (O’Reilly)
• Web 点 ( )
• Slides:
• Background Processing (Rob Mack) Austin on Rails - April 2009
• The Current State of Asynchronous Processing in Ruby (Mathias Meyer, Peritor GmbH)
• Asynchronous Processing (Jonathan Dahl)
• Long-Running Tasks In Rails Without Much Effort (Andy Stewart) - April 2008
• Starling + Workling: simple distributed background jobs with Twitter’s queuing system, Rany Keddo 2008
• Physical Models & Logical Models in Rails, dan chak
125. Todo (maybe next time)
• AMQP/RabbitMQ example code
• How about Nanite?
• XMPP
• MagLev VM
• More MapReduce example code
• How about Amazon Elastic MapReduce?
• Resque example code
• More SOA example and code
• MogileFS example code