Amit Sheth's keynote at IEEE BigData 2014, Oct 29, 2014.
Abstract from:
http://cci.drexel.edu/bigdata/bigdata2014/keynotespeech.htm
Big Data has captured a lot of interest in industry, with the emphasis on the challenges of the four Vs of Big Data: Volume, Variety, Velocity, and Veracity, and their applications to drive value for businesses. Recently, there is rapid growth in situations where a big data challenge relates to making individually relevant decisions. A key example is personalized digital health that related to taking better decisions about our health, fitness, and well-being. Consider for instance, understanding the reasons for and avoiding an asthma attack based on Big Data in the form of personal health signals (e.g., physiological data measured by devices/sensors or Internet of Things around humans, on the humans, and inside/within the humans), public health signals (e.g., information coming from the healthcare system such as hospital admissions), and population health signals (such as Tweets by people related to asthma occurrences and allergens, Web services providing pollen and smog information). However, no individual has the ability to process all these data without the help of appropriate technology, and each human has different set of relevant data!
In this talk, I will describe Smart Data that is realized by extracting value from Big Data, to benefit not just large companies but each individual. If my child is an asthma patient, for all the data relevant to my child with the four V-challenges, what I care about is simply, “How is her current health, and what are the risk of having an asthma attack in her current situation (now and today), especially if that risk has changed?” As I will show, Smart Data that gives such personalized and actionable information will need to utilize metadata, use domain specific knowledge, employ semantics and intelligent processing, and go beyond traditional reliance on ML and NLP. I will motivate the need for a synergistic combination of techniques similar to the close interworking of the top brain and the bottom brain in the cognitive models.
For harnessing volume, I will discuss the concept of Semantic Perception, that is, how to convert massive amounts of data into information, meaning, and insight useful for human decision-making. For dealing with Variety, I will discuss experience in using agreement represented in the form of ontologies, domain models, or vocabularies, to support semantic interoperability and integration. For Velocity, I will discuss somewhat more recent work on Continuous Semantics, which seeks to use dynamically created models of new objects, concepts, and relationships, using them to better understand new cues in the data that capture rapidly evolving events and situations.
Smart Data applications in development at Kno.e.sis come from the domains of personalized health, energy, disaster response, and smart city.
Smart Data - How you and I will exploit Big Data for personalized digital health and many other activities
1. Put Knoesis Banner
Smart Data - How you and I will exploit Big Data for
personalized digital health and many other activities
Keynote at IEEE BigData 2014, Oct 28, 2014
Amit Sheth
LexisNexis Ohio Eminent Scholar & Exec. Director,
The Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State, USA
2. 2
Thanks: My team (missing Pramod, Hemant, ...)
Collaborators: Clinicians: Dr. William Abrahams (OSU-Wexner), Dr. Shalini Forbis (Dayton Childrens), Dr.
Sangeeta Agrawal (VA), Valerie Shalin (WSU Cognitive Scientists ), Payam Barnaghi (U-Surrey), Ramesh
Jain(UCI), …
Funding: NSF (esp. IIS-1111183 “SoCS: Social Media Enhanced Organizational Sensemaking in Emergency
Response,”), AFRL, NIH, Industry….
3. 3
Big Data 2014
http://hrboss.com/hiringboss/articles/big-data-infographic
4. Only 0.5% to 1% of
the data is used for
analysis.
http://www.csc.com/insights/flxwd/78931-big_data_growth_just_beginning_to_explode 4
http://www.guardian.co.uk/news/datablog/2012/dec/19/big-data-study-digital-universe-global-volume
5. Variety – not just structure but modality: multimodal, multisensory
Semi structured
5
6. Velocity
Fast Data
Rapid Changes
Real-Time/Stream Analysis
Current application examples: financial services, stock brokerage, weather tracking, movies/entertainment and online retail 6
7. 7
Ever Increasing Connected Devices and People
About 2 billion of the 5+ billion have data connections – so they perform “citizen sensing”.
And there are more devices connected to the Internet than the entire human population.
These ~2 billion citizen sensors and 10 billion devices & objects connected to the Internet
makes this an era of IoT (Internet of Things) and Internet of Everything (IoE).
http://www.cisco.com/web/about/ac79/docs/innov/IoT_IBSG_0411FINAL.pdf
8. 8
Internet of Things / Everything : Future Trends
“The next wave of dramatic Internet growth will come through the confluence of people,
process, data, and things — the Internet of Everything (IoE).”
- CISCO IBSG, 2013
Beyond the IoE based infrastructure, it is the possibility of developing applications that spans
Physical, Cyber and the Social Worlds that is very exciting.
http://www.cisco.com/web/about/ac79/docs/innov/IoE_Economy.pdf
9. 10
What has not changed?
We are still working on the simpler representations of the real-world!
http://artint.info/html/ArtInt_8.html
http://en.wikipedia.org/wiki/Traffic_congestion
solve
represent interpret real-world
simplified representation
compute
10. 11
What should change?
solve
represent interpret real-world
richer representation
compute
We need computational paradigms to tap into the
rich pulse of the human populace, and utilize
diverse data
Represent, capture, and compute with richer and fine-grained
representations of real-world problems
+
Richer representation of
traffic observations
Effective solutions
People interpreting a
real-world event
11. Physical-Cyber-Social Computing for Actionable Insights from Multimodal Data
High CO influences
Wheezing Level (Low/High)
High CO
Reduced
CO level =>
better
Asthma
control
High Wheeze
Vertical Operators
(Semantic abstraction) operates on
Artifacts at each level and
transcends them to the
next level.
Horizontal Operators
(Semantic Integration) operates
on data from heterogeneous
sources to create
Integrated/correlated
data streams.
High Luminosity
Carbon Monoxide
“a holistic treatment of data,
information, and knowledge
integrate, correlate, interpret,
Low Luminosity
Wheeze
Luminosity
Low Wheeze
from the PCS worlds to
and provide contextually
1Amit Sheth, Pramod Anantharam, Cory Henson, 'Physical-Cyber-Social Computing: An Early 21st Century Approach,' IEEE Intelligent Systems, vol. 28, no. 1,
pp. 78-82, Jan.-Feb., 2013. http://doi.ieeecomputersociety.org/10.1109/MIS.2013.20
relevant abstractions
to humans. ”1
12
12. • Healthcare:
ADFH, Asthma,
GI, Demintia
– Using kHealth system
• Traffic Analytics:
– Understanding traffic flow
• Social Media Analysis :
– Crisis coordination using Twitris
13
I will use applications in 3 domains to demonstrate
13. 14
MIT Technology Review, 2012
The Patient of the Future
http://www.technologyreview.com/featuredstory/426968/the-patient-of-the-future/
14. Asthma: A Multi-faceted and Symptomatically Variable Health Challenge
15
Personal level
Signals
Public level
Signals
Population level
Signals
“ … survey indicates that adult patients and caregivers of pediatric patients
report variability in asthma symptoms over time, even when asthma medications are taken.”1
1Marcus, Philip, Kevin R. Murphy, Abid Rahman, and Christopher D. O’Brien. "Intrapatient symptom
variability in adults and children with asthma: Results of a survey." Advances in therapy 22, no. 5 (2005): 488-497.
15. Far better an approximate answer to the right question, which is often vague, than the exact answer to the
wrong question, which can always be made precise.
-- John Tukey, Ann. Math. Stat. 33 (1962)
16
Asthma: Actionable Information
How is my Asthma control?
Should I take additional medication today?
How can I reduce my asthma attacks at home?
16. 17
Asthma: Challenges in Heterogeneity, Variability, and Personalization
Contextual Personalized Actionable
Personal level
Signals
Public level
Signals
Population level
Signals
Domain
Knowledge
http://www.tuberktoraks.org/managete/fu_folder/2011-03/html/2011-3-291-311.html
OR
17. 18
My 2004-2005 formulation of SMART DATA - Semagix
Formulation of Smart Data
strategy providing services
for Search, Explore, Notify.
“Use of Ontologies and
Data repositories to gain
relevant insights”
18. Smart Data (2014 retake)
Smart data makes sense out of Big data
It provides value from harnessing the
challenges posed by volume, velocity, variety
and veracity of big data, in-turn providing
actionable information and improve decision
making.
19
19. Another perspective on Smart Data
OF human, BY human FOR human
Smart data is about extracting value by
improving human involvement in data creation,
processing and consumption.
It is about (improving)
computing for human experience.
20
20. ‘OF human’ : Relevant Real-time Data Streams for Human Experience
Petabytes of Physical(sensory)-Cyber-Social Data everyday!
More on PCS Computing: http://wiki.knoesis.org/index.php/PCS
21
21. Use of Prior Human-created Knowledge Models
22
‘BY human’: Involving Crowd Intelligence in data processing
Crowdsourcing and Domain-expert guided
Machine Learning Modeling
22. Weather Application
Asthma Healthcare
Application
Personal
Public Health
Detection of events, such as wheezing
sound, indoor temperature, humidity,
dust, and CO level
High CO content at
home during day
23
‘FOR human’ : Improving Human Experience (Smart Health)
Population Level
Action in the Physical World
Luminosity
CO level
CO in gush
during day time
23. ‘FOR human’ : Improving Human Experience (Smart Energy)
Weather Application
Power Monitoring Application
Personal Level Observations
Electricity usage over a day, device at
work, power consumption, cost/kWh,
heat index, relative humidity, and public
events from social stream
24
Population Level Observations
Action in the Physical World
Washing and drying has
resulted in significant cost
since it was done during peak
load period. Consider
changing this time to night.
24. 25
Big Data is pervasive -
It is Smart Data that matter!
25. DATA
Observations from
machine and social
sensors
KNOWLEDGE
for interpretation of
observations
ACTIONS
situation awareness useful
for decision making
26
Primary challenge is to bridge the gap between data and
actions
Contextualization
Personalization
26. “the top part of the brain is involved in setting up plans, controlling movements, registering
changes in where objects are located in space, and revising plans when anticipated events
do not occur.”
27
In the process, engaging both top and bottom brain
“bottom is involved in classifying and interpreting what we perceive, and allows us to
confer meaning on the world.”
“The Theory of Cognitive Modes* emphasizes the constant and
close interaction of the top and bottom systems. They don’t work in
isolation — or in competition — but seamlessly together.”
*http://brainblogger.com/2013/12/19/top-brain-bottom-brain-part-3-the-theory-of-cognitive-modes/
by G. Wayne Miller and Stephen M. Kosslyn, PhD | December 19, 2013
27. 28
Can we take inspiration from the ‘Theory of Cognitive Modes’ to develop a
computational model?
T & B B T
Mover Perceiver Simulator Adaptor
http://online.stanford.edu/pgm-fa12
T- Top brain, B- Bottom brain
our baby step toward
a computational model for perception
(Machine Perception)
28. 29
Toward a symbiotic partnership between machines and people
J.
McCarthy
M.
Weiser
D.
Engelbart
J. C. R. Licklider
htttp://j.mp/k-che
http://knoesis.org/index.php/Computing_For_Human_Experience
29. 30
How are machines supposed to integrate and interpret sensor data?
RDF OWL
Semantic Sensor Networks (SSN)
30. 31
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
31. 32
W3C Semantic Sensor Network Ontology
Lefort, L., Henson, C., Taylor, K., Barnaghi, P., Compton, M., Corcho, O., Garcia-Castro, R., Graybeal, J., Herzog, A., Janowicz, K.,
Neuhaus, H., Nikolov, A., and Page, K.: Semantic Sensor Network XG Final Report, W3C Incubator Group Report (2011).
32. SSN
Ontology
3 Interpreted data
(abductive)
[in OWL]
e.g., diagnosis
2 Interpreted data
(deductive)
[in OWL]
e.g., threshold
1 Annotated Data
[in RDF]
e.g., label
0 Raw Data
[in TEXT]
e.g., number
Intellego
Hyperthyroidism
… …
Elevated
Blood
Pressure
Systolic blood pressure of 150 mmHg
“150”
33
Levels of Abstraction
33. 34
What if we could automate this interpretation of Data?
… and do it efficiently and at scale
34. 35
Making sense of sensor data with
Henson et al An Ontological Approach to Focusing Attention and Enhancing Machine Perception on the Web, Applied Ont, 2011
35. 36
People are good at making sense of sensory input
What can we learn from cognitive models of perception?
The key ingredient is prior knowledge
36. Observe
Property
* based on Neisser’s cognitive model of perception
Perceive
Feature
Explanation
Discrimination
1
2
Translating low-level signals
into high-level knowledge
Focusing attention on those
aspects of the environment that
provide useful information
Prior Knowledge
37
Convert large number of observations to semantic
abstractions that provide insights and translate into
decisions
Perception Cycle*
37. 38
To enable machine perception,
Semantic Web technology is used to integrate
sensor data with prior knowledge on the Web
W3C SSN XG 2010-2011, SSN Ontology
38. W3C Semantic Sensor
Network (SSN) Ontology Bi-partite Graph
39
Prior knowledge on the Web
39. W3C Semantic Sensor
Network (SSN) Ontology Bi-partite Graph
40
Prior knowledge on the Web
40. Observe
Property
Perceive
Feature
Explanation
1
Explanation
Translating low-level
signals into high-level
knowledge
41
Explanation is the act of choosing the objects or events that best account
for a set of observations; often referred to as hypothesis building
41. Inference to the best explanation
• In general, explanation is an abductive problem;
and hard to compute
Finding the sweet spot between abduction and OWL
• Single-feature assumption* enables use of
OWL-DL deductive reasoner
* An explanation must be a single feature which accounts for
all observed properties
42
Explanation is the act of choosing the objects or events that best account
for a set of observations; often referred to as hypothesis building
Representation of Parsimonious Covering Theory in OWL-DL
Explanation
42. Explanation
ExplanatoryFeature ≡ ∃ssn:isPropertyOf—.{p1} ⊓ … ⊓ ∃ssn:isPropertyOf—.{pn}
Observed Property Explanatory Feature
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
43
Explanatory Feature: a feature that explains the set of observed
properties
43. Discrimination
Observe
Property
Perceive
Feature
Explanation
Discrimination
2
Focusing attention on those
aspects of the environment
that provide useful
information
44
Discrimination is the act of finding those properties that, if observed,
would help distinguish between multiple explanatory features
44. Discrimination
ExpectedProperty ≡ ∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ∃ssn:isPropertyOf.{fn}
Expected Property Explanatory Feature
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
45
Expected Property: would be explained by every explanatory feature
45. Discrimination
NotApplicableProperty ≡ ¬∃ssn:isPropertyOf.{f1} ⊓ … ⊓ ¬∃ssn:isPropertyOf.{fn}
Not Applicable Property Explanatory Feature
elevated blood pressure
clammy skin
palpitations
Hypertension
Hyperthyroidism
Pulmonary Edema
46
Not Applicable Property: would not be explained by any explanatory
feature
47. Semantic scalability: Resource savings of abstracting sensor data
48
Orders of magnitude resource savings for generating and storing relevant
abstractions vs. raw observations.
Relevant abstractions
Raw observations
48. Qualities
-High BP
-Increased Weight
Entities
-Hypertension
-Hypothyroidism
kHealth
Machine Sensors
Personal Input
EMR/PHR
Comorbidity risk
score e.g.,
Charlson Index
Longitudinal studies
of cardiovascular
risks
- Find risk factors
- Validation
- domain knowledge
- domain expert
Find contribution of
each risk factor
Risk Assessment Model
Current
Observations
-Physical
-Physiological
-History
Risk Score
(e.g., 1 => continue
3 => contact clinic)
Validate correlations Model Creation
Historical
observations e.g.,
EMR, sensor
observations
49
Risk Score: from Data to Abstraction and Actionable Information
49. Use of OWL reasoner is resource intensive
(especially on resource-constrained devices),
in terms of both memory and time
• Runs out of resources with prior knowledge >> 15 nodes
• Asymptotic complexity: O(n3)
50
How do we implement machine perception efficiently on a
resource-constrained device?
50. Approach 1: Send all sensor
observations to the cloud for
processing
intelligence at the edge
51
Approach 2: downscale semantic
processing so that each device is
capable of machine perception
51. Efficient execution of machine perception
010110001101
0011110010101
1000110110110
101100011010
0111100101011
000110101100
0110100111
52
Use bit vector encodings and their operations to encode prior
knowledge and execute semantic reasoning
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices,
ISWC 2012.
52. Efficiency Improvement
• Problem size increased from 10’s to 1000’s of
nodes
• Time reduced from minutes to milliseconds
• Complexity growth reduced from polynomial to
linear
O(n3) < x < O(n4) O(n)
53
Evaluation on a mobile device
53. 1 Translate low-level data to high-level knowledge
Machine perception can be used to convert low-level
sensory signals into high-level knowledge useful for
decision making
2 Prior knowledge is the key to perception
Using SW technologies, machine perception can be
formalized and integrated with prior knowledge on the
Web
3 Intelligence at the edge
By downscaling semantic inference, machine
perception can execute efficiently on resource-constrained
devices
54
Semantic Perception for smarter analytics: 3 ideas to takeaway
56. Empowering Individuals (who are not Larry Smarr!) for their own health
Through physical monitoring and
analysis, our cellphones could act as
an early warning system to detect
serious health conditions, and
provide actionable information
canary in a coal mine
kHealth: knowledge-enabled healthcare
57
58. WHY Big Data to Smart Data: Asthma example
what can we do to avoid asthma episode?
Understanding relationships between
health signals and asthma attacks
for providing actionable information
61
Value
What risk factors influence asthma control?
What is the contribution of each risk factor?
semantics
Velocity Veracity
Variety Volume
Real-time health signals from personal level (e.g., Wheezometer, NO in breath,
accelerometer, microphone), public health (e.g., CDC, Hospital EMR), and
population level (e.g., pollen level, CO2) arriving continuously in fine grained
samples potentially with missing information and uneven sampling frequencies.
59. kHealth: Health Signal Processing Architecture
Personal level
Signals
Public level
Signals
Population level
Signals
Domain
Knowledge
Risk Model
Events from
Social Streams
Take Medication before
going to work
Contact doctor
Avoid going out in the
evening due to high pollen
levels
Analysis
Personalized
Actionable
Information
Data Acquisition &
aggregation
62
60. 63
Asthma Domain Knowledge
Asthma Control
and Actionable Information
Domain
Knowledge
Asthma Control
à
Daily Medication
Choices for starting
therapy
Not Well Controlled Poor Controlled
Severity Level
of Asthma
(Recommended Action) (Recommended Action) (Recommended Action)
Intermittent Asthma SABA prn - -
Mild Persistent Asthma Low dose ICS Medium ICS Medium ICS
Moderate Persistent
Asthma
Medium dose ICS alone
Or with
LABA/montelukast
Medium ICS +
LABA/Montelukast
Or High dose ICS
Medium ICS +
LABA/Montelukast
Or High dose ICS*
Severe Persistent Asthma High dose ICS with
LABA/montelukast
Needs specialist care Needs specialist care
ICS= inhaled corticosteroid, LABA = inhaled long-acting beta2-agonist, SABA= inhaled short-acting beta2-agonist ;
*consider referral to specialist
61. 64
Patient Health Score (diagnostic)
How controlled is my asthma?
Risk assessment
model
Semantic
Perception
Personal level
Signals
Public level
Signals
Domain
Knowledge
Population level
Signals
GREEN -- Well Controlled
YELLOW – Not well controlled
Red -- poor controlled
62. Background
Knowledge
65
Patient Health Score (diagnostic): Details
Physical-Cyber-Social System Observations Health Signal Extraction Health Signal Understanding
Personal
Population Level
Acceleration readings from
on-phone sensors
Wheeze – Yes
Do you have tightness of chest? –Yes
Risk Category assigned by
doctors
<Wheezing=Yes, time, location>
<ChectTightness=Yes, time, location>
<PollenLevel=Medium, time, location>
<Pollution=Yes, time, location>
<Activity=High, time, location>
PollenLevel
Wheezing
ChectTightness
Pollution
Activity
PollenLevel
Wheezing
ChectTightness
Pollution
Activity
RiskCategory
<PollenLevel, ChectTightness, Pollution,
Activity, Wheezing, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
<2, 1, 1,3, 1, RiskCategory>
.
.
.
Expert
Knowledge
Sensor and personal
observations
tweet reporting pollution level
and asthma attacks
Signals from personal, personal
spaces, and community spaces
Qualify
Quantify
Enrich
Outdoor pollen and pollution
Public Health
Well Controlled - continue
Not Well Controlled – contact nurse
Poor Controlled – contact doctor
63. 66
Patient Vulnerability Score (prognostic)
How vulnerable* is my control level today?
Risk assessment
model
Semantic
Perception
Personal level
Signals
Public level
Signals
Domain
Knowledge
Population level
Signals
Patient health
Score
*considering changing environmental conditions and current control level
64. 67
Patient Vulnerability Score (prognostic): Details
Sensordrone – for monitoring
environmental air quality
Wheezometer – for monitoring
wheezing sounds
Can I reduce my asthma attacks at night?
What are the triggers? What is the wheezing level?
What is the exposure level over a day?
What is the propensity toward asthma?
Commute to Work
Luminosity
CO level
CO in gush
during day time
Actionable
Information
Personal level
Signals
Public level
Signals
Population level
Signals
What is the air quality indoors?
65. Sensordrone
(Carbon monoxide,
temperature, humidity)
Node Sensor
(exhaled Nitric Oxide)
68
Sensors
Android Device
(w/ kHealth App)
Total cost: ~ $500
kHealth Kit for the application for Asthma management
Along with two sensors in the kit, the application uses a variety of population
level signals from the web:
Pollen level Air Quality Temperature & Humidity
66. 69
Usability and decision support trial
Dr. Shalini G. Forbis, MD, MPH
67. Preliminary insights from patient data
S1 S2
Sensor data QA data
Number of
Observations
36
108
40
121
68. Medication (Albuterol) related to decreasing Exhaled Nitric Oxide
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Did patient take albuterol last
night due to cough or wheeze?
0.25
0.2
0.15
0.1
0.05
0
Exhaled Nitric Oxide
69. Activity limitation related to high exhaled Nitric Oxide
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
How much did asthma or asthma
symptoms limit patient's activity today?
0.25
0.2
0.15
0.1
0.05
0
Exhaled Nitric Oxide
70. Low exhaled Nitric Oxide observed with absence of coughing
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Has patient had wheeze, chest
tightness, or asthma related
6/2/2014
6/3/2014
6/4/2014
6/5/2014
6/6/2014
6/7/2014
6/8/2014
6/9/2014
6/10/2014
6/11/2014
6/12/2014
cough today?
0.25
0.2
0.15
0.1
0.05
0
Nitric Oxide
71. Activity limitation observed with high pollen activity
2.5
2
1.5
1
0.5
0
Pollen
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
How much did asthma or asthma
symptoms limit patient's activity
today?
72. 75
Two research directions for kHealth asthma with more data…
Root cause analysis Action Recommendation
Find Triggers of Asthma
Derive the cause of asthma
attacks for a given patient
using statistical techniques
+ knowledge of asthma and
its triggers
Minimize Asthma Attacks
Model actions based on the
utility theory (cost of
actions & its rewards) +
knowledge of action
consequences
73. • Healthcare:
ADFH, Asthma, GI
– Using kHealth system
• Traffic Analytics:
– Understanding traffic flow
• Social Media Analysis :
– Crisis coordination using Twitris
76
I will use applications in 3 domains to demonstrate
75. Big Data to Smart Data: Traffic Management example
Vehicular traffic data from San Francisco Bay Area aggregated from on-road
sensors (numerical) and incident reports (textual)
Value
Can we detect the onset of traffic congestion?
Can we characterize traffic congestion based on events?
Can we provide actionable information to decision makers?
semantics
Velocity Veracity
Variety Volume
Representing prior knowledge of
traffic lead to a focused exploration
of this massive dataset
Every minute update of speed, volume, travel time, and occupancy resulting in
178 million link status observations, 738 active events, and 146 scheduled
events with many unevenly sampled observations collected over 3 months.
79 http://511.org/
76. Semantic Annotation using Background Knowledge
slow-moving-traffic
Domain knowledge in the
form of traffic vocabulary
Image Credit: http://traffic.511.org/index
Domain knowledge of
traffic flow synthesized
from sensor data
80
Explained-by
Horizontal operator: relating/mapping data
from different modality to a concept
(theme) within a spatio-temporal context;
Spatial context even include what it means
to have a slow traffic for the type of road
77. • Healthcare:
ADFH, Asthma, GI
– Using kHealth system
• Traffic Analytics:
– Understanding traffic flow
• Social Media Analysis :
– Crisis coordination using Twitris
81
I will use applications in 2 domains to demonstrate
78. [BIG] Ad-hoc Community with Varying but [FEW] Important Intents
Image: http://www.gizmodo.com.au/2012/04/how-we-identify-single-voices-
82
in-a-crowd/
Me and @CeceVancePR are
coordinating a clothing/food drive
for families affected by Hurricane
Sandy. If you would like to donate,
DM us
Does anyone know how to donate
clothes to hurricane #Sandy
victims?
BIG QUESTION: Can these needles be identified in the
haystack of massive datasets?
[REQUEST/DEMAND]
[OFFER/SUPPLY]
Coordination teams
want to hear!
79. Uncoordinated Engagement
• May lead to second disaster to be managed:
– Under-supply of required demands
– Over-supply of not required resources
• Hurricane Sandy example,
“Thanks, but no thanks”, NPR,
Jan 12 2013
Story
link:http://www.npr.org/2013/01/09/168946170/tha
nks-but-no-thanks-when-post-disaster-donations-overwhelm
80. 84
How to volunteer, donate to Hurricane
Sandy: <URL>
If you have clothes to donate to those who
are victims of Hurricane Sandy …
Red Cross is urging blood donations to
support those affected <URL>
I have TONS of cute shoes & purses I want
to donate to hurricane victims …
Does anyone know how to donate clothes
to hurricane #Sandy victims?
Does anyone know of community service
organizations to volunteer to help out?
Needs to get something, suggests scarcity:
REQUEST (demand)
Offers or wants to give, suggests abundance:
OFFER (supply)
Matching requests with offers
81. Want to help animals in
#Oklahoma? @ASPCA
tells how you can help:
http://t.co/mt8l9PwzmO
x
RESPONSE TEAMS
(including humanitarian
org. and ‘pseudo’
responders)
VICTIM SITE
Where do I go
to help out for
volunteer work
DEMAND SUPPLY
around
Moore?
Anyone know?
CITIZEN SENSORS
Anyone know
where to donate to
help the animals
from the Oklahoma
disaster? #oklaho
ma #dogs
Matchable
Matchable
If you would like
to volunteer
today, help is
desperately
needed in
Shawnee. Call
273-5331 for
more info
85
Match-making: Assisting Coordination
Image: http://offthewallsocial.com/tag/social-media/
82. Two excellent videos
• Vinod Khosla: the Power of Storytelling and
the Future of Healthcare
• Larry Smarr: The Human Microbiome and the
Revolution in Digital Health
86
Wrapping up: For more on importance of what we talked about
83. • Big Data is every where
– at individual and community levels - not just
limited to corporation
– with growing complexity: Physical-Cyber-Social
• Analysis is not sufficient
• Need interaction between bottom up
techniques and top down processing
87
Wrapping up: Take Away
84. Wrapping up: Take Away
• Focus on Humans and Improve human life and
experience with SMART Data.
– Data to Information to Personally and Contextually
Relevant Abstractions (Semantic Perception)
– Actionable Information (Value from data) to assist
and support human in decision making.
• Focus on Value -- SMART Data
– Big Data Challenges without the intention of deriving
Value is a “Journey without GOAL”.
88
85. Special thanks: Pramod. This presentation covers some of the work of my PhD students.
Key contributors: Pramod Anantharam, Cory Henson and TK Prasad.
Amit Sheth’s
PHD students
Ashutos
h
Jadhav*
Hemant
Purohit
Vinh
Nguyen
Lu Chen
Pavan
Kapanipathi*
Pramod
Sujan
Perera
Anantharam*
Maryam Panahiazar
Sarasi Lalithsena
Shreyansh
Batt
Kalpa
Gunaratna
Delroy
Cameron
Sanjaya
Wijeratne
Wenbo
Wang
89
Special thanks
86. • Among top universities in the world in World Wide Web (cf: 10-yr impact,
Microsoft Academic Search: among top 10 in June2014)
• Among the largest academic groups in the US in Semantic Web + Social/Sensor
Webs, Mobile/Cloud/Cognitive Computing, Big Data, IoT, Health/Clinical &
Biomedicine Applications
• Exceptional student success: internships and jobs at top salary (IBM
Watson/Research, MSR, Amazon, CISCO, Oracle, Yahoo!, Samsung, research
universities, NLM, startups )
• 100 researchers including 15 World Class faculty (>3K citations/faculty avg) and
~45 PhD students- practically all funded
• Extensive research for largely multidisciplinary projects; world class resources;
industry sponsorships/collaborations (Google, IBM, …)
90
89. 93
Smart Data - How you and I will exploit Big Data
thank you, and please visit us at
http://knoesis.org
Hinweis der Redaktion
Starting slide
Various Big data problems – Traditional examples vs what we are doing examples.
Variety and Velocity than Volume. kHealth problem. People will be interested in Smart Data.
Traditional ML techniques, High Performance Computing, Statistics. Human level of Abstraction is Smart data.
Types of Data
Formats of Data
Also talk about the increase in the platforms that helps generating these data
Example high velocity Big Data applications at work: financial services, stock brokerage, weather tracking, movies/entertainment and online retail.
Fast data (rate at which data is coming: esp from mobile, social and sensor sources),
Rapid changes – in the data content,
Stream analysis – to cope with the incoming data for real-time online analytics
There are over 99.4% of physical devices that may one day be connected to
The Internet still unconnected.
- CISCO IBSG, 2013
Human interpretation of the world along with personalization context …
Raw data annotated data statistical analysis background knowledge based interpretation for actionable information
- Larry Smarr is a professor at the University of California, San Diego
And he was diagnosed with Chrones Disease
What’s interesting about this case is that Larry diagnosed himself
He is a pioneer in the area of Quantified-Self, which uses sensors to monitor physiological symptoms
Through this process he discovered inflammation, which led him to discovery of Chrones Disease
This type of self-tracking is becoming more and more common
sdd link to video
Characteristics of asthma – why is it a complex condition?
Asthma requires that we provide contextual, personalized, and actionable information to the patient by analyzing observations from Personal, Public, and Population level modalities
- HUMAN CENTRIC!!
All the data related to human activity, existence and experiences
More on PCS Computing: http://wiki.knoesis.org/index.php/PCS
Information is CREATED by human with the Machinery available – Wikipedia tool, sensors and social networks
Information is STORED in Man+Machine readable format, LOD
Information is PROCESSED using the LOD and Human assisted Knowledge-based
Higher level abstraction on info is now consumed in many mechanistic ways (including GIS) to provide EXPERIENCE for humans
Example of a human guided modeling and improved performance
http://research.microsoft.com/en-us/um/people/akapoor/papers/IJCAI%202011a.pdf
Actionable information example:
In Asthma use case we have a sensor – sensordrone which records luminosity and CO levels
A high correlation between CO level and luminosity is found
This is an actionable information to the user interpreting it as CO in gush during day time
=> Mitigating action can be “closing the window” during day
Also, we have weather application which performs abstraction on weather sensory observations to identify blizzard conditions (food for actions!!) :
-- 20,000 weather stations (with ~5 sensors per station)
-- Real-Time Feature Streams
- live demo: http://knoesis1.wright.edu/EventStreams/
- video demo: https://skydrive.live.com/?cid=77950e284187e848&sc=photos&id=77950E284187E848%21276
Lets find it..
Add personalization and contextual
- what if we could automate this sense making ability?
- and what if we could do this at scale?
sense making based on human cognitive models
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
A single-feature (disease) assumption means that all the observed properties (symptoms) must be explained by a single feature.
i.e., this framework is not expressive enough to model comorbidity where there may be more than one feature (disease) co-existing
For example, if there are two diseases causing disjoint symptoms, and all the symptoms of both the diseases are
observed, then this framework will not be able to find the coverage and returns no diseases.
perception cycle contains two primary phases
explanation
translating low-level signals into high-level abstractions
inference to the best explanation
discrimination
focusing attention on those properties that will help distinguish between multiple possible explanations
used to intelligently task sensors and collect additional observations (rather than brute force approach of blindly collecting all observations)
Intelligence distributed at the edge of the network
Requires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologies
Intelligence distributed at the edge of the network
Requires resource-constrained devices (mobile phones, gateway notes, etc.) to be able to utilize SW technologies
Henson et al. 'An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices, ISWC 2012.
compute machine perception inferences -- i.e., explanation and discrimination -- of high-complexity on a resource-constrained devices in miliseconds
Difference between the other systems and what this system provides
Intelligence at the age. Shipping computation and domain models to the edge (Distributed)
ADHF – Acute Decompensated Heart Failure
- With this ability, many problems could be solved
- For example: we could help solve health problems (before they become serious health problems) through monitoring symptoms and real-time sense making, acting as an early warning system to detect problematic health conditions
ADHF – Acute Decompensated Heart Failure
Research on Asthma has three phases
Data collection: what signals to collect?
Analysis: what analysis to be done?
Actionable information: what action to recommend?
In the next slide, we take a peek into the analysis that we do for Asthma
What is the current state of a person/patient? => Summarizing all the observations (sensor and personal) into a single score indicating health of a person
Instead of presenting all the raw data (often to much e.g., Asthma application we have developed collects CO, temperature, and humidity every 10 seconds resulting in 8,640 observations/day) which may not be comprehensible to the patient, we empower them by providing actionable summaries.
There are two components in making sense of Health Signals:
Health signal extraction – processing, aggregating, and abstracting from raw sensor/textual data to create human intelligible abstractions
Health signal understanding – derive (1) connections between abstractions and (2)
Action recommendation:
Continue
Contact nurse
Contact doctor
What is the likely state of the person in future? => Given the current state and the changing environmental conditions, estimate the state of the person by summarizing it into a number which is actionable.
For example, vulnerability score for a person with Asthma is computed with environmental factors (pollen, air quality, external temperature and humidity) and current state of the patient.
Intuitively, a person with well controlled asthma should have a lower vulnerability score than a person with poorly controlled asthma both being in a poor environmental state.
In the absence of declarative knowledge in a domain, we resort to statistical approaches to glean insights from data
Even if there is declarative knowledge of a domain, it may have to be personalized
The CO level may be related to the luminosity as observed by the sensordrone – as it gets brighter the CO level also increases => high CO level in daytime
If such an insight is provided to a person, the interpretation can be:
Some activity inside the house leads to high CO levels
Outside activity leads to high CO levels inside the house
Since the person knows that he/she is absent in the house during mornings, it has to be something from outside.
- Person narrows down to a possible opened window at home (forgot to close more often)
1)www.pollen.com(For pollen levels)
2)http://www.airnow.gov/(For air quality levels)
3)http://www.weatherforyou.com/(For temperature and humidity)
Subject 1
121 Data points from sensor observations
40 Data points from QA including one comment
Subject 2
108 Data points from sensor observations
36 Data points from QA including one comment
Pucher, J., Korattyswaroopam, N., & Ittyerah, N. (2004). The crisis of public transport in India: Overwhelming needs but limited resources. Journal of Public Transportation, 7(4), 1-30.
Horizontal operation
People join these SM communities for variety of intentions.
Varying intent may include a very small sample of important intentions to assist the coordination of actions
--- request to help
--- offer to help
(1) Example overview
Alright, so let’s motivate by this situation during emergency
- Various actors: resource seekers, responder teams, resource providers at remote site
And
- each of these actor groups have questions ---
- needs
- providers
- responders: wondering!
Here we have social network to connect these actors and bridge the gap for communication platform
But it’s potential use is yet to be realized for effective help
Because.. (next slide)
More at: http://wiki.knoesis.org/index.php/PCS
And http://knoesis.org/projects/ssw/