This document discusses machine intelligence and the cortical theory of intelligence. It begins by comparing approaches to computing in the 1940s-1950s and 2010s-2020s, noting that while many approaches existed, one dominant paradigm eventually emerged in both eras due to flexibility and scalability. It then outlines Numenta's cortical theory, including hierarchical temporal memory (HTM), and how HTM models the neocortex. The document details Numenta's research applying HTM to areas like anomaly detection, language processing, and vision. It argues HTM may be the dominant machine intelligence paradigm due to the neocortex's success and HTM's ability to model the neocortex's common algorithms across modalities.
A Journey Into the Emotions of Software Developers
What the Brain says about Machine Intelligence
1. November 21, 2014
Jeff Hawkins
jhawkins@Numenta.com
What the Brain Says About Machine Intelligence
2. 1940’s 1950’s
- Dedicated vs. universal
- Analog vs. digital
- Decimal vs. binary
- Wired vs. memory-based programming
- Serial vs. random access memory
Many approaches
- Universal
- Digital
- Binary
- Memory-based programming
- Two tier memory
One dominant paradigm
The Birth of Programmable Computing
Why Did One Paradigm Win?
- Network effects
Why Did This Paradigm Win?
- Most flexible
- Most scalable
3. 2010’s 2020’s
The Birth of Machine Intelligence
- Specific vs. universal algorithms
- Mathematical vs. memory-based
- Batch vs. on-line learning
- Labeled vs. behavior-based learning
Many approaches
- Universal algorithms
- Memory-based
- On-line learning
- Behavior-based learning
One dominant paradigm
Why Will One Paradigm Win?
- Network effects
Why Will This Paradigm Win?
- Most flexible
- Most scalable
How Do We Know This is Going to Happen?
- Brain is proof case
- We have made great progress
4. 1) Discover operating principles of neocortex.
2) Create machine intelligence technology
based on neocortical principles.
Numenta’s Mission
Talk Topics
- Cortical facts
- Cortical theory
- Research roadmap
- Applications
- Thoughts on Machine Intelligence
5. What the Cortex Does
patterns Learns a model of world
from changing sensory data
The model generates
- predictions
- anomalies
- actions
Most sensory changes are due
to your own movement
The neocortex learns a sensory-motor model of the world
patterns
patterns
light
sound
touch
retina
cochlear
somatic
7. Cortical Theory
Hierarchy
Cellular layers
Mini-columns
Neurons: 3-10K synapses
- 10% proximal
- 90% distal
Active dendrites
Learning = new synapses
Remarkably uniform
- anatomically
- functionally
Sheet of cellsHTM
Hierarchical Temporal Memory
1) Hierarchy of identical regions
2) Each region learns sequences
3) Stability increases going up hierarchy if
input is predictable
4) Sequences unfold going down
Questions
- What does a region do?
- What do the cellular layers do?
- How do neurons implement this?
- How does this work in hierarchy?
2/3
4
6
5
8. 2/3
4
5
6
Cellular Layers
Sequence memory:
Sequence memory:
Sequence memory:
Sequence memory:
Inference (high-order)
Inference (sensory-motor)
Motor
Attention
FeedforwardFeedback
Each layer is a variation of common sequence memory algorithm.
These are universal functions. They apply to:
- all cortical regions
- all sensory-motor modalities.
Copy of motor commands
Sensor data Higher region
Sub-cortical
Motor centers
Lower region
10. HTM Temporal Memory
Learns sequences
Recognizes and recalls sequences
Predicts next inputs
- High capacity
- Distributed
- Local learning rules
- Fault tolerant
- No sensitive parameters
- Generalizes
11. HTM Temporal Memory
Not Just Another ANN 1) Cortical Anatomy
Mini-columns
Inhibitory cells
Cell connectivity patterns
2) Sparse Distributed
Representations
3) Realistic Neurons
Active dendrites
Thousands of synapses
Learn via synapse formation
numenta.com/learn/
12. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Capabilities: Prediction
Anomaly detection
Classification
Applications: Predictive maintenance
Security
Natural Language Processing
22. Document corpus
(e.g. Wikipedia)
128 x 128
100K “Word SDRs”
- =
Apple Fruit Computer
Macintosh
Microsoft
Mac
Linux
Operating system
….
Natural Language
23. Training set
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
elephant likes water
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Word 3Word 2Word 1
Sequences of Word SDRs
HTM
24. Training set
eats“fox”
?
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
elephant likes water
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Sequences of Word SDRs
HTM
25. Training set
eats“fox”
rodent
- Learning is unsupervised
- Semantic generalization
- Works across languages
- Many applications
Intelligent search
Sentiment analysis
Semantic filtering
frog eats flies
cow eats grain
elephant eats leaves
goat eats grass
wolf eats rabbit
cat likes ball
elephant likes water
sheep eats grass
cat eats salmon
wolf eats mice
lion eats cow
dog likes sleep
elephant likes water
cat likes ball
coyote eats rodent
coyote eats rabbit
wolf eats squirrel
dog likes sleep
cat likes ball
---- ---- -----
Sequences of Word SDRs
HTM
26. Server metrics Human metrics
Natural language
GPS dataEEG dataFinancial data
All these applications run on
the exact same HTM code.
27. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Capabilities: Prediction
Anomaly detection
Classification
Applications: IT
Security
Natural Language Processing
Static Data (via active learning)
Capabilities: Classification
Prediction
Applications: Vision image classification
Network classification
Classification of connected graphs
28. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Capabilities: Prediction
Anomaly detection
Classification
Applications: IT
Security
Natural Language Processing
Static Data (via active learning)
Capabilities: Classification
Prediction
Applications: Vision image classification
Network classification
Classification of connected graphs
Static and/or streaming Data
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense
29. 2/3
4
5
6
Research Roadmap
Sensory-motor Inference
High-order Inference
Motor Sequences
Attention/Feedback
Theory 98%
Extensively tested
Commercial
Theory 80%
In development
Theory 50%
Theory 30%
Streaming Data
Capabilities: Prediction
Anomaly detection
Classification
Applications: IT
Security
Natural Language Processing
Static Data (via active learning)
Capabilities: Classification
Prediction
Applications: Vision image classification
Network classification
Classification of connected graphs
Static and/or streaming Data
Capabilities: Goal-oriented behavior
Applications: Robotics
Smart bots
Proactive defense
Enables : Multi-sensory modalities
Multi-behavioral modalities
30. - Algorithms are documented
- Multiple independent implementations
NuPIC www.Numenta.org
- Numenta’s software is open source (GPLv3)
- Numenta’s daily research code is online
- Active discussion groups for theory and implementation
- Collaborative
IBM Almaden Research, San Jose, CA
DARPA, Washington D.C
Cortical.IO, Austria
Research Transparency
45. - This is a first order sequence memory.
- It cannot learn A-B-C-D vs. X-B-C-Y.
- Mini-columns turn this into a high-order sequence memory.
Learning Transitions
Multiple predictions can occur at once.
A-B A-C A-D
46. Forming High Order Representations
Feedforward: Sparse activation of columns
Burst of activity Highly sparse unique pattern
Unpredicted Predicted
Feedforward: Sparse activation of columns
47. Representing High-order Sequences
A
X B
B
C
C
Y
D
Before training
A
X B’’
B’
C’’
C’
Y’’
D’
After training
Same columns,
but only one cell active per column.
IF 40 active columns, 10 cells per column
THEN 1040 ways to represent the same input in different contexts
48. SDR Properties
subsampling is OK
3) Union membership:
Indices
1
2
|
10
Is this SDR
a member?
2) Store and Compare:
store indices of active bits
Indices
1
2
3
4
5
|
40
1)
2)
3)
….
10)
2%
20%Union
1) Similarity:
shared bits = semantic similarity
49. What Can Be Done With Software
1 layer
30 msec / learning-inference-prediction step
10-6 of human cortex
2048 columns 65,000 neurons
300M synapses
50. Challenges
Dendritic regions
Active dendrites
1,000s of synapses
10,000s of potential synapses
Continuous learning
Challenges and Opportunities for Neuromorphic HW
Opportunities
Low precision memory (synapses)
Fault tolerant
- memory
- connectivity
- neurons
- natural recovery
Simple activation states (no spikes)
Connectivity
- very sparse, topological
51. 2/3
4
5
6
Cellular Layers
Sequence memory
Sequence memory
Sequence memory
Sequence memory
Inference
Inference
Motor
Attention
FeedforwardFeedback
Each layer implements a variation of a common sequence
memory algorithm.
Higher cortexSensor/lower cortex
Lower cortex
Motor center
52. Why Will Machine Intelligence be Based on Cortical Principles?
1) Cortex uses a common learning algorithm
vision
hearing
touch
behavior
2) Cortical algorithm is incredibly adaptable
languages
engineering
science
arts …
3) Network effects
Hardware and software efforts will
focus on most universal solution
53. 2/3
4
5
6
Cellular Layers
Sequence memory:
Sequence memory:
Sequence memory:
Sequence memory:
Inference
Inference
Motor
Attention
FeedforwardFeedback
Each layer is a variation of a common sequence memory algorithm.
Higher cortexSensor/lower cortex
Lower cortex
Sub-cortical
motor center
Inputs/outputs define the role of each layer.
56. Sparse Distributed Representations (SDRs)
- Sensory perception
- Planning
- Motor control
- Prediction
- Attention
Sparse Distribution Representations are used
everywhere in the cortex.
57. Sparse Distributed Representations
What are they
• Many bits (thousands)
• Few 1’s mostly 0’s
• Example: 2,000 bits, 2% active
• Each bit has semantic meaning
• No bit is essential
01000000000000000001000000000000000000000000000000000010000…………01000
Desirable attributes
• High capacity
• Robust to noise and deletion
• Efficient and fast
• Enable new operations
58. SDR Operations
1) Similarity:
shared bits = semantic similarity
subsampling is OK
3) Union membership:
Indices
1
2
|
10
Is this SDR
a member?
2) Store and Compare:
store indices of active bits
Indices
1
2
3
4
5
|
40
1)
2)
3)
….
10)
2%
20%Union
67. x = 0100000000000000000100000000000110000000
• Extremely high capacity
• Robust to noise and deletions
• Have many desirable properties
• Solve semantic representation problem
Attributes
SDR Basics
• Large number of neurons
• Few active at once
• Every cell represents something
• Information is distributed
• SDRs are binary
10 to 15 synapses are
sufficient to
recognize patterns in
thousands of cells.
A single dendrite can
recognize multiple
unique patterns
without confusion.
68. Example: SDR Classification Capacity in Presence of Noise
• n = number of bits in SDR
• w = number of 1 bits
• W = number of vectors that overlap vector x by b bits
• Probability of false positive for one stored pattern
• Probability of false positive for M stored patterns
Wx (n,w,b) =
wx
b
æ
èç
ö
ø÷ ´
n - wx
w - b
æ
èç
ö
ø÷
fpw
n
(q) =
Wx (n,w,b)
b=q
w
å
n
w
æ
èç
ö
ø÷
fpX (q) £ fpwxi
n
(q)
i=0
M-1
å n = 2048, w = 40
With 50% noise, you can classify 1015 patterns with an error < 10-11
n = 64, w=12
With 33% noise, you can classify only 10 patterns with an error 0.04%
Link.to.whitepaper.com