This document discusses tail call elimination in the OpenSmalltalk programming language. It begins by defining what a tail call is and providing an example in Smalltalk. It then describes how tail calls are implemented in the stack interpreter and Cog VM without and with tail call elimination. The document finds that tail call elimination provides significant performance improvements for tail recursive functions and more modest gains for general applications. Future work includes better supporting polymorphic calls and reducing redundant code generation in the Cog VM.
1. Tail Call Elimination in Opensmalltalk
Matthew Ralston Dave Mason
Department of Computer Science
Ryerson University
c 2019 Matthew Ralston
2. Agenda
What is a Tail Call?
Tail Call Elimination
Stack Interpreter Implementation
Cog VM JIT Implementation
Results
Conclusions and Future Work
3. What is a Tail Call?
Call followed by a return
Smalltalk Example
Array class>>new: sizeRequested
^ self basicNew: sizeRequested
Call to basicNew: is a tail call
Immediately followed by a return
4. What is a Tail Call?
Call followed by a return
Smalltalk Example
Array class>>new: sizeRequested
^ self basicNew: sizeRequested
Call to basicNew: is a tail call
Immediately followed by a return
5. What is a Tail Call?
Call followed by a return
Smalltalk Example
Array class>>new: sizeRequested
^ self basicNew: sizeRequested
Call to basicNew: is a tail call
Immediately followed by a return
6. What is a Tail Call?
Call followed by a return
Smalltalk Example
Array class>>new: sizeRequested
^ self basicNew: sizeRequested
Call to basicNew: is a tail call
Immediately followed by a return
7. Tail call in Bytecode
Bytecode for Array class»new:
self
pushTemp: 0
send: basicNew:
returnTop
8. Calling new without TCE
Stack after calling new:
new:’s stack frame
new:’s argument
Sender of new:’s frame
9. Calling new without TCE
Stack after calling new:
new:’s stack frame
new:’s argument
Sender of new:’s frame
10. Calling new without TCE - cont.
Stack after calling basicNew:
basicNew:’s stack frame
basicNew:’s argument
new:’s stack frame
new:’s argument
Sender of new:’s frame
11. Calling new without TCE - cont.
Stack after calling basicNew:
basicNew:’s stack frame
basicNew:’s argument
new:’s stack frame
new:’s argument
Sender of new:’s frame
12. Calling new without TCE - cont.
Stack after returning from basicNew:
basicNew:’s result
new:’s stack frame
new:’s argument
Sender of new:’s frame
13. Calling new without TCE - cont.
Stack after returning from basicNew:
basicNew:’s result
new:’s stack frame
new:’s argument
Sender of new:’s frame
14. Calling new without TCE - cont.
Stack after returning from new:
new:’s result
Sender of new:’s frame
15. Calling new without TCE - cont.
Stack after returning from new:
new:’s result
Sender of new:’s frame
23. Motivation
Well Known Optimization
Support in functional languages, CLR, etc.
Not supported in JVM, Python
Necessary for functional languages
Can be useful for OO as well
In common patterns like Visitor Pattern
Iteration in Smalltalk
24. Motivation
Well Known Optimization
Support in functional languages, CLR, etc.
Not supported in JVM, Python
Necessary for functional languages
Can be useful for OO as well
In common patterns like Visitor Pattern
Iteration in Smalltalk
25. Motivation
Well Known Optimization
Support in functional languages, CLR, etc.
Not supported in JVM, Python
Necessary for functional languages
Can be useful for OO as well
In common patterns like Visitor Pattern
Iteration in Smalltalk
26. Motivation
Well Known Optimization
Support in functional languages, CLR, etc.
Not supported in JVM, Python
Necessary for functional languages
Can be useful for OO as well
In common patterns like Visitor Pattern
Iteration in Smalltalk
27. Motivation
Well Known Optimization
Support in functional languages, CLR, etc.
Not supported in JVM, Python
Necessary for functional languages
Can be useful for OO as well
In common patterns like Visitor Pattern
Iteration in Smalltalk
28. Motivation
Well Known Optimization
Support in functional languages, CLR, etc.
Not supported in JVM, Python
Necessary for functional languages
Can be useful for OO as well
In common patterns like Visitor Pattern
Iteration in Smalltalk
29. Motivation
Well Known Optimization
Support in functional languages, CLR, etc.
Not supported in JVM, Python
Necessary for functional languages
Can be useful for OO as well
In common patterns like Visitor Pattern
Iteration in Smalltalk
30. Frequency of Tail Calls
Static Frequency
Platform Packages Tail Calls Total Percentage
Squeak - All 25162 407971 6.17
Squeak - Compiler 863 8747 9.87
Higher Dynamic Frequency
Action Tail Calls Total Percentage
Startup 47669 219054 21.76
Recompile 92041349 551250016 16.70
31. Frequency of Tail Calls
Static Frequency
Platform Packages Tail Calls Total Percentage
Squeak - All 25162 407971 6.17
Squeak - Compiler 863 8747 9.87
Higher Dynamic Frequency
Action Tail Calls Total Percentage
Startup 47669 219054 21.76
Recompile 92041349 551250016 16.70
34. Stack Interpreter
Interpreter only
Look ahead to next bytecode for return
Switch to tail call eliminating implementation
Remove the existing stack frame and arguments
Pushes the new arguments and calls the next method
35. Stack Interpreter
Interpreter only
Look ahead to next bytecode for return
Switch to tail call eliminating implementation
Remove the existing stack frame and arguments
Pushes the new arguments and calls the next method
36. Stack Interpreter
Interpreter only
Look ahead to next bytecode for return
Switch to tail call eliminating implementation
Remove the existing stack frame and arguments
Pushes the new arguments and calls the next method
37. Stack Interpreter
Interpreter only
Look ahead to next bytecode for return
Switch to tail call eliminating implementation
Remove the existing stack frame and arguments
Pushes the new arguments and calls the next method
38. Stack Interpreter
Interpreter only
Look ahead to next bytecode for return
Switch to tail call eliminating implementation
Remove the existing stack frame and arguments
Pushes the new arguments and calls the next method
39. Cog JIT Compiler
Not optimizing interpreted calls
JIT compile tail calls as jumps
Cost of tail call check moved to JIT compile time
40. Cog JIT Compiler
Not optimizing interpreted calls
JIT compile tail calls as jumps
Cost of tail call check moved to JIT compile time
41. Cog JIT Compiler
Not optimizing interpreted calls
JIT compile tail calls as jumps
Cost of tail call check moved to JIT compile time
42. Cog VM - Inline Caching
Cog VM - levels of inline caching
No Inline Cache
Monomorphic Send Sites
Polymorphic Send Sites
Megamorphic Send Sites
43. Cog VM - Inline Caching
Cog VM - levels of inline caching
No Inline Cache
Monomorphic Send Sites
Polymorphic Send Sites
Megamorphic Send Sites
44. Cog VM - Inline Caching
Cog VM - levels of inline caching
No Inline Cache
Monomorphic Send Sites
Polymorphic Send Sites
Megamorphic Send Sites
45. Cog VM - Inline Caching
Cog VM - levels of inline caching
No Inline Cache
Monomorphic Send Sites
Polymorphic Send Sites
Megamorphic Send Sites
46. Cog VM - Inline Caching
Cog VM - levels of inline caching
No Inline Cache
Monomorphic Send Sites
Polymorphic Send Sites
Megamorphic Send Sites
47. Cog VM - Inline Caching - cont.
Bypass for unlinked send sites
Activate for monomorphic send sites
Bypass for polymorphic and megamorphic send sites
Need tail call and non-tail call JIT code for each send
Copy method lookup code to sender in tail calls
48. Cog VM - Inline Caching - cont.
Bypass for unlinked send sites
Activate for monomorphic send sites
Bypass for polymorphic and megamorphic send sites
Need tail call and non-tail call JIT code for each send
Copy method lookup code to sender in tail calls
49. Cog VM - Inline Caching - cont.
Bypass for unlinked send sites
Activate for monomorphic send sites
Bypass for polymorphic and megamorphic send sites
Need tail call and non-tail call JIT code for each send
Copy method lookup code to sender in tail calls
50. Cog VM - Inline Caching - cont.
Bypass for unlinked send sites
Activate for monomorphic send sites
Bypass for polymorphic and megamorphic send sites
Need tail call and non-tail call JIT code for each send
Copy method lookup code to sender in tail calls
51. Cog VM - Inline Caching - cont.
Bypass for unlinked send sites
Activate for monomorphic send sites
Bypass for polymorphic and megamorphic send sites
Need tail call and non-tail call JIT code for each send
Copy method lookup code to sender in tail calls
60. Conclusions
Significant improvements in execution time for tail recursive cases
Improvements in execution time for most general applications
Memory usage is only significantly affected for deep recursive call
chains
Stack Interpreter outperforms Cog in some tests
Stack Interpreter supports polymorphic calls
Increased JIT compiled method size leads to overhead
61. Conclusions
Significant improvements in execution time for tail recursive cases
Improvements in execution time for most general applications
Memory usage is only significantly affected for deep recursive call
chains
Stack Interpreter outperforms Cog in some tests
Stack Interpreter supports polymorphic calls
Increased JIT compiled method size leads to overhead
62. Conclusions
Significant improvements in execution time for tail recursive cases
Improvements in execution time for most general applications
Memory usage is only significantly affected for deep recursive call
chains
Stack Interpreter outperforms Cog in some tests
Stack Interpreter supports polymorphic calls
Increased JIT compiled method size leads to overhead
63. Conclusions
Significant improvements in execution time for tail recursive cases
Improvements in execution time for most general applications
Memory usage is only significantly affected for deep recursive call
chains
Stack Interpreter outperforms Cog in some tests
Stack Interpreter supports polymorphic calls
Increased JIT compiled method size leads to overhead
64. Conclusions
Significant improvements in execution time for tail recursive cases
Improvements in execution time for most general applications
Memory usage is only significantly affected for deep recursive call
chains
Stack Interpreter outperforms Cog in some tests
Stack Interpreter supports polymorphic calls
Increased JIT compiled method size leads to overhead
65. Conclusions
Significant improvements in execution time for tail recursive cases
Improvements in execution time for most general applications
Memory usage is only significantly affected for deep recursive call
chains
Stack Interpreter outperforms Cog in some tests
Stack Interpreter supports polymorphic calls
Increased JIT compiled method size leads to overhead
66. Conclusions and Future Work
Support polymorphic caches (partially complete!)
Reduce redundant code generation for Cog
67. Conclusions and Future Work
Support polymorphic caches (partially complete!)
Reduce redundant code generation for Cog