SlideShare ist ein Scribd-Unternehmen logo
1 von 48
Downloaden Sie, um offline zu lesen
GPU Accelerated
Libraries for
Ruby
Prasun Anand
About me
● SciRuby Contributor
● Google Summer of Code 2016, 2017
● Genenetwork project
● Ruby Grant 2017
● Projects:
○ JRuby port of NMatrix
○ ArrayFire gem
○ RbCUDA
Scientific Computing
Why should Python
programmers have all the
fun ?
Arrays / Matrices
BLAS and LAPACK
GPU Computing is not easy !
CUDA and OpenCL
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
Af_Array
[1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4]
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
1.0000 3.0000
2.0000 4.0000
=> #<ArrayFire::Af_Array:0x000000020aeab8>
[2] pry(main)> b = a + a
No Name Array
[2 2 1 1]
Offsets: [0 0 0 0]
Strides: [1 2 4 4]
2.0000 6.0000
4.0000 8.0000
=> #<ArrayFire::Af_Array:0x000000020625c8>
[1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10]
No Name Array
[3 3 1 1]
1.0000 4.0000 -5.0000
4.0000 11.0000 8.0000
6.0000 2.0000 10.0000
=> #<ArrayFire::Af_Array:0x000000014e56c8>
[2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8]
No Name Array
[3 2 1 1]
1.0000 10.0000
0.0000 -11.0000
8.0000 8.0000
=> #<ArrayFire::Af_Array:0x00000001591db0>
[3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE)
No Name Array
[3 2 1 1]
-39.0000 -74.0000
68.0000 -17.0000
86.0000 118.0000
=> #<ArrayFire::Af_Array:0x000000016136f8>
VALUE arf_init(int argc, VALUE* argv, VALUE self)
{
afstruct* afarray;
Data_Get_Struct(self, afstruct, afarray);
dim_t ndims = (dim_t)NUM2LONG(argv[0]);
dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t));
dim_t count = 1;
for (size_t index = 0; index < ndims; index++) {
dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index));
count *= dimensions[index];
}
double* host_array = (double*)malloc(count * sizeof(double));
for (size_t index = 0; index < count; index++) {
host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index));
}
af_create_array(&afarray->carray, host_array, ndims, dimensions, f64);
return self;
}
static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val, VALUE left_prop_val, VALUE
right_prop_val){
afstruct* left;
afstruct* right;
afstruct* result = ALLOC(afstruct);
Data_Get_Struct(left_val, afstruct, left);
Data_Get_Struct(right_val, afstruct, right);
af_mat_prop left_mat_prop = arf_mat_type_from_rbsymbol(left_prop_val);
af_mat_prop right_mat_prop = arf_mat_type_from_rbsymbol(right_prop_val);
af_matmul(&result->carray, left->carray, right->carray, left_mat_prop, right_mat_prop);
return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result);
}
BLAS functionalities
● Matmult
● Transpose
LAPACK functionalities
● Det
● Inverse
● Norm
● Qr
● Cholesky
● Svd
● lu
Statistics
● Mean
● Median
● Variance
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
10 X
Faster than NMatrix-Ruby-Lapack
10,000 X
Faster than NMatrix-Ruby
100,000 X
Faster than NMatrix-Ruby-BLAS
RbCUDA
GPU Array
● Generic pointer used to handle an array of elements on the GPU.
● Memory copying from CPU to GPU and vice-versa.
● Interfaced with NMatrix and NArray
vadd_kernel_src = <<-EOS
extern "C" {
__global__ void matSum(int *a, int *b, int *c)
{
int tid = blockIdx.x;
if (tid < 100)
c[tid] = a[tid] + b[tid];
}
}
EOS
f = compile(vadd_kernel_src)
RbCUDA::Driver.run_kernel(f.path)
● CuBLAS
● CuSolver
● CuRand
Benchmarks
● AMD FX 8350 octacore processor
● Nvidia GTX 750Ti GPU
● Double dtype
1,000,000 X
Faster than NMatrix-Ruby-BLAS
Future Work
● Image Processing APIs and Indexers
● Multiple dtypes
● RbCUDA is under active development.
● Project RbCUDA is being funded by Ruby Association
(Ruby Grant 2017)
Hobby Projects
1. Synapse : A Ruby first Deep Learning Library
2. Ninjaplot : A plotting library
● https://github.com/arrayfire/arrayfire-rb
● https://github.com/prasunanand/rbcuda
● https://github.com/prasunanand/synapse
● https://github.com/prasunanand/ninjaplot
Contributions are Welcome!
Acknowledgements
1. Pjotr Prins
2. Pradeep Garigipati
3. Kenta Murata
Thanks!
Github: prasunanand
Twitter: @prasun_anand
Blog: prasunanand.com
Questions

Weitere ähnliche Inhalte

Was ist angesagt?

Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & howlucenerevolution
 
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵Wanbok Choi
 
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみたスマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみたTaro Matsuzawa
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in Rmickey24
 
2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchersSamuel Harrold
 
Statistical Schema Induction
Statistical Schema InductionStatistical Schema Induction
Statistical Schema InductionJohanna Voelker
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboyKenneth Geisshirt
 
What they don't tell you about JavaScript
What they don't tell you about JavaScriptWhat they don't tell you about JavaScript
What they don't tell you about JavaScriptRaphael Cruzeiro
 
Introduction to R
Introduction to RIntroduction to R
Introduction to RHappy Garg
 
Javascript Arrays
Javascript ArraysJavascript Arrays
Javascript Arraysshaheenakv
 
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...Flink Forward
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvasdeanhudson
 
LLVM Backend の紹介
LLVM Backend の紹介LLVM Backend の紹介
LLVM Backend の紹介Akira Maruoka
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases LabFabio Fumarola
 
Python 101 language features and functional programming
Python 101 language features and functional programmingPython 101 language features and functional programming
Python 101 language features and functional programmingLukasz Dynowski
 

Was ist angesagt? (19)

Beyond tf idf why, what & how
Beyond tf idf why, what & howBeyond tf idf why, what & how
Beyond tf idf why, what & how
 
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
[Let'Swift 2019] 실용적인 함수형 프로그래밍 워크샵
 
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみたスマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
スマートフォン勉強会@関東 #11 どう考えてもdisconなものをiPhoneに移植してみた
 
Parallel Computing in R
Parallel Computing in RParallel Computing in R
Parallel Computing in R
 
Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09Spl Not A Bridge Too Far phpNW09
Spl Not A Bridge Too Far phpNW09
 
2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers2014-04-09, Data mining demo for astronomy researchers
2014-04-09, Data mining demo for astronomy researchers
 
Statistical Schema Induction
Statistical Schema InductionStatistical Schema Induction
Statistical Schema Induction
 
Unleash your inner console cowboy
Unleash your inner console cowboyUnleash your inner console cowboy
Unleash your inner console cowboy
 
Python GC
Python GCPython GC
Python GC
 
What they don't tell you about JavaScript
What they don't tell you about JavaScriptWhat they don't tell you about JavaScript
What they don't tell you about JavaScript
 
Introduction to R
Introduction to RIntroduction to R
Introduction to R
 
Javascript Arrays
Javascript ArraysJavascript Arrays
Javascript Arrays
 
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
Flink Forward Berlin 2017: Max Kiessling, Martin Junghanns - Cypher-based Gra...
 
Pointer Events in Canvas
Pointer Events in CanvasPointer Events in Canvas
Pointer Events in Canvas
 
JS OO and Closures
JS OO and ClosuresJS OO and Closures
JS OO and Closures
 
D3.js workshop
D3.js workshopD3.js workshop
D3.js workshop
 
LLVM Backend の紹介
LLVM Backend の紹介LLVM Backend の紹介
LLVM Backend の紹介
 
10b. Graph Databases Lab
10b. Graph Databases Lab10b. Graph Databases Lab
10b. Graph Databases Lab
 
Python 101 language features and functional programming
Python 101 language features and functional programmingPython 101 language features and functional programming
Python 101 language features and functional programming
 

Ähnlich wie Rubyconfindia2018 - GPU accelerated libraries for Ruby

High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018Prasun Anand
 
High performance GPU computing with Ruby
High performance GPU computing with RubyHigh performance GPU computing with Ruby
High performance GPU computing with RubyPrasun Anand
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on JrubyPrasun Anand
 
Coscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usageCoscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usageWayne Tsai
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]RootedCON
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with ScalaJorge Paez
 
Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for SpeedYung-Yu Chen
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyKimikazu Kato
 
Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014alex_perry
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...Andrew Lamb
 
SP-First-Lecture.ppt
SP-First-Lecture.pptSP-First-Lecture.ppt
SP-First-Lecture.pptFareedIhsas
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with ClojureDmitry Buzdin
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov
 
PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007Damien Seguy
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabCloudxLab
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 

Ähnlich wie Rubyconfindia2018 - GPU accelerated libraries for Ruby (20)

High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018
 
High performance GPU computing with Ruby
High performance GPU computing with RubyHigh performance GPU computing with Ruby
High performance GPU computing with Ruby
 
Fosdem2017 Scientific computing on Jruby
Fosdem2017  Scientific computing on JrubyFosdem2017  Scientific computing on Jruby
Fosdem2017 Scientific computing on Jruby
 
Distributed computing with spark
Distributed computing with sparkDistributed computing with spark
Distributed computing with spark
 
Coscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usageCoscup2021 - useful abstractions at rust and it's practical usage
Coscup2021 - useful abstractions at rust and it's practical usage
 
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
Sergi Álvarez & Roi Martín - Radare2 Preview [RootedCON 2010]
 
Getting Functional with Scala
Getting Functional with ScalaGetting Functional with Scala
Getting Functional with Scala
 
Python lecture 05
Python lecture 05Python lecture 05
Python lecture 05
 
Write Python for Speed
Write Python for SpeedWrite Python for Speed
Write Python for Speed
 
Effective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPyEffective Numerical Computation in NumPy and SciPy
Effective Numerical Computation in NumPy and SciPy
 
Eta
EtaEta
Eta
 
Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014Regular expressions, Alex Perry, Google, PyCon2014
Regular expressions, Alex Perry, Google, PyCon2014
 
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
A Rusty introduction to Apache Arrow and how it applies to a  time series dat...A Rusty introduction to Apache Arrow and how it applies to a  time series dat...
A Rusty introduction to Apache Arrow and how it applies to a time series dat...
 
Chp4(ref dynamic)
Chp4(ref dynamic)Chp4(ref dynamic)
Chp4(ref dynamic)
 
SP-First-Lecture.ppt
SP-First-Lecture.pptSP-First-Lecture.ppt
SP-First-Lecture.ppt
 
Refactoring to Macros with Clojure
Refactoring to Macros with ClojureRefactoring to Macros with Clojure
Refactoring to Macros with Clojure
 
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizationsEgor Bogatov - .NET Core intrinsics and other micro-optimizations
Egor Bogatov - .NET Core intrinsics and other micro-optimizations
 
PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007PHP and MySQL Tips and tricks, DC 2007
PHP and MySQL Tips and tricks, DC 2007
 
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLabApache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
Apache Spark - Basics of RDD | Big Data Hadoop Spark Tutorial | CloudxLab
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 

Kürzlich hochgeladen

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slidesvaideheekore1
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationBradBedford3
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...Bert Jan Schrijver
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Cizo Technology Services
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...OnePlan Solutions
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesVictoriaMetrics
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxAndreas Kunz
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Anthony Dahanne
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldRoberto Pérez Alcolea
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecturerahul_net
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtimeandrehoraa
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?Alexandre Beguel
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profileakrivarotava
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingShane Coughlan
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZABSYZ Inc
 

Kürzlich hochgeladen (20)

VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
Introduction to Firebase Workshop Slides
Introduction to Firebase Workshop SlidesIntroduction to Firebase Workshop Slides
Introduction to Firebase Workshop Slides
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
How to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion ApplicationHow to submit a standout Adobe Champion Application
How to submit a standout Adobe Champion Application
 
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
JavaLand 2024 - Going serverless with Quarkus GraalVM native images and AWS L...
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
Global Identity Enrolment and Verification Pro Solution - Cizo Technology Ser...
 
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
Revolutionizing the Digital Transformation Office - Leveraging OnePlan’s AI a...
 
What’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 UpdatesWhat’s New in VictoriaMetrics: Q1 2024 Updates
What’s New in VictoriaMetrics: Q1 2024 Updates
 
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptxUI5ers live - Custom Controls wrapping 3rd-party libs.pptx
UI5ers live - Custom Controls wrapping 3rd-party libs.pptx
 
Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024Not a Kubernetes fan? The state of PaaS in 2024
Not a Kubernetes fan? The state of PaaS in 2024
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Keeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository worldKeeping your build tool updated in a multi repository world
Keeping your build tool updated in a multi repository world
 
Understanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM ArchitectureUnderstanding Flamingo - DeepMind's VLM Architecture
Understanding Flamingo - DeepMind's VLM Architecture
 
SpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at RuntimeSpotFlow: Tracking Method Calls and States at Runtime
SpotFlow: Tracking Method Calls and States at Runtime
 
SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?SAM Training Session - How to use EXCEL ?
SAM Training Session - How to use EXCEL ?
 
SoftTeco - Software Development Company Profile
SoftTeco - Software Development Company ProfileSoftTeco - Software Development Company Profile
SoftTeco - Software Development Company Profile
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full RecordingOpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
OpenChain Education Work Group Monthly Meeting - 2024-04-10 - Full Recording
 
Salesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZSalesforce Implementation Services PPT By ABSYZ
Salesforce Implementation Services PPT By ABSYZ
 

Rubyconfindia2018 - GPU accelerated libraries for Ruby

  • 2. About me ● SciRuby Contributor ● Google Summer of Code 2016, 2017 ● Genenetwork project ● Ruby Grant 2017 ● Projects: ○ JRuby port of NMatrix ○ ArrayFire gem ○ RbCUDA
  • 4. Why should Python programmers have all the fun ?
  • 7. GPU Computing is not easy !
  • 9.
  • 10. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 11. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 12. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 13. Af_Array [1] pry(main)> a = ArrayFire::Af_Array.new 2, [2,2],[1,2,3,4] No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 1.0000 3.0000 2.0000 4.0000 => #<ArrayFire::Af_Array:0x000000020aeab8>
  • 14. [2] pry(main)> b = a + a No Name Array [2 2 1 1] Offsets: [0 0 0 0] Strides: [1 2 4 4] 2.0000 6.0000 4.0000 8.0000 => #<ArrayFire::Af_Array:0x000000020625c8>
  • 15. [1] pry(main)> left = ArrayFire::Af_Array.new 2 , [3,3] , [1, 4, 6, 4, 11 , 2 ,-5, 8, 10] No Name Array [3 3 1 1] 1.0000 4.0000 -5.0000 4.0000 11.0000 8.0000 6.0000 2.0000 10.0000 => #<ArrayFire::Af_Array:0x000000014e56c8> [2] pry(main)> right = ArrayFire::Af_Array.new 2 , [3,2] , [1, 0, 8, 10, -11, 8] No Name Array [3 2 1 1] 1.0000 10.0000 0.0000 -11.0000 8.0000 8.0000 => #<ArrayFire::Af_Array:0x00000001591db0>
  • 16. [3] pry(main)> result = ArrayFire::BLAS.matmul(left, right, :AF_MAT_NONE, :AF_MAT_NONE) No Name Array [3 2 1 1] -39.0000 -74.0000 68.0000 -17.0000 86.0000 118.0000 => #<ArrayFire::Af_Array:0x000000016136f8>
  • 17. VALUE arf_init(int argc, VALUE* argv, VALUE self) { afstruct* afarray; Data_Get_Struct(self, afstruct, afarray); dim_t ndims = (dim_t)NUM2LONG(argv[0]); dim_t* dimensions = (dim_t*)malloc(ndims * sizeof(dim_t)); dim_t count = 1; for (size_t index = 0; index < ndims; index++) { dimensions[index] = (dim_t)NUM2LONG(RARRAY_AREF(argv[1], index)); count *= dimensions[index]; } double* host_array = (double*)malloc(count * sizeof(double)); for (size_t index = 0; index < count; index++) { host_array[index] = (double)NUM2DBL(RARRAY_AREF(argv[2], index)); } af_create_array(&afarray->carray, host_array, ndims, dimensions, f64); return self; }
  • 18. static VALUE arf_matmul(VALUE self, VALUE left_val, VALUE right_val, VALUE left_prop_val, VALUE right_prop_val){ afstruct* left; afstruct* right; afstruct* result = ALLOC(afstruct); Data_Get_Struct(left_val, afstruct, left); Data_Get_Struct(right_val, afstruct, right); af_mat_prop left_mat_prop = arf_mat_type_from_rbsymbol(left_prop_val); af_mat_prop right_mat_prop = arf_mat_type_from_rbsymbol(right_prop_val); af_matmul(&result->carray, left->carray, right->carray, left_mat_prop, right_mat_prop); return Data_Wrap_Struct(CLASS_OF(left_val), NULL, arf_free, result); }
  • 19. BLAS functionalities ● Matmult ● Transpose LAPACK functionalities ● Det ● Inverse ● Norm ● Qr ● Cholesky ● Svd ● lu
  • 21. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  • 22.
  • 23. 10 X Faster than NMatrix-Ruby-Lapack
  • 24.
  • 25.
  • 26.
  • 27. 10,000 X Faster than NMatrix-Ruby
  • 28.
  • 29.
  • 30.
  • 31. 100,000 X Faster than NMatrix-Ruby-BLAS
  • 32.
  • 33.
  • 35. GPU Array ● Generic pointer used to handle an array of elements on the GPU. ● Memory copying from CPU to GPU and vice-versa. ● Interfaced with NMatrix and NArray
  • 36. vadd_kernel_src = <<-EOS extern "C" { __global__ void matSum(int *a, int *b, int *c) { int tid = blockIdx.x; if (tid < 100) c[tid] = a[tid] + b[tid]; } } EOS f = compile(vadd_kernel_src) RbCUDA::Driver.run_kernel(f.path)
  • 38. Benchmarks ● AMD FX 8350 octacore processor ● Nvidia GTX 750Ti GPU ● Double dtype
  • 39.
  • 40. 1,000,000 X Faster than NMatrix-Ruby-BLAS
  • 41.
  • 42.
  • 43. Future Work ● Image Processing APIs and Indexers ● Multiple dtypes ● RbCUDA is under active development. ● Project RbCUDA is being funded by Ruby Association (Ruby Grant 2017)
  • 44. Hobby Projects 1. Synapse : A Ruby first Deep Learning Library 2. Ninjaplot : A plotting library
  • 45. ● https://github.com/arrayfire/arrayfire-rb ● https://github.com/prasunanand/rbcuda ● https://github.com/prasunanand/synapse ● https://github.com/prasunanand/ninjaplot Contributions are Welcome!
  • 46. Acknowledgements 1. Pjotr Prins 2. Pradeep Garigipati 3. Kenta Murata