The document discusses research on integrating spatiotemporal sensor data and user-generated content. It provides an overview of pervasive sensor networks that generate continuous data streams about the environment and how people-centric sensing through mobile devices is creating a new layer of contextual location-based data. The integration of these different data sources could provide increased situational awareness for applications like emergency response management and urban planning. It also presents some challenges around data heterogeneity that semantic technologies may be able to address.
Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data
1. Digital Enterprise Research Institute www.deri.ie
Towards the Integration of Spatiotemporal
Sensor Data and User-Generated Content
Cornelius Rabsch • cornelius@rabsch.net
Chapter
Copyright 2008 Digital Enterprise Research Institute. All rights reserved.
“Towards the Integration of Spatiotemporal User-Generated Content and Sensor Data” by C. Rabsch
Abstract:
Pervasive sensor networks are the source of continuous data streams about our physical environment. With the rise of the Mobile
Web people-centric sensing yields a new layer of spatiotemporal contextual data, from qualitative user-generated content (e.g. geo-
referenced multimedia messages) to quantitative sensor measurements (e.g. earthquake or hazard alerts). This mobile sensed
content is made accessible within an ecosystem of heterogenous service providers, from social networks to social data networks.
The mining, analysis and processing of these streams provides many challenges and semantic technologies can be utilized to
overcome this heterogeneity. The integration of sensor data with a user-generated context will provide an increased situational
awareness and contextual knowledge, resulting in application scenarios from more efficient emergency response management to
improved urban planning.
This talk gives an overview of the research I am doing at the Digital Enterprise Research Institute as part of my final thesis for my
studies of Business Administration and Computer Science (Diploma degree, equivalent to a Master) at the University of Mannheim,
Germany. A detailed presentation about the spatiotemporal data integration steps and formalisms will be published at a later stage.
Alternative title:
“A geospatial activity-based approach to semantically link user-generated content and sensor data.”
Thesis supervision by:
Prof. Dr. Manfred Hauswirth - Sensor Middleware Unit, Digital Enterprise Research Institute, National University of Ireland, Galway -
www.deri.ie
Prof. H. Stuckenschmidt - Knowledge Representation and Knowledge Management Research Group, University of Mannheim,
Germany - ki.informatik.uni-mannheim.de
Index terms:
people-centric sensing, mobile sensing, user-generated content, sensor data, spatiotemporal integration
Contact: Cornelius Rabsch - cornelius@rabsch.net - http://www.inperspektive.com
2. Ubiquitous Mobile Sensing
Digital Enterprise Research Institute www.deri.ie
Source: National Research Council USA
Source: Nokia Research Center, 2008 [1]
The global trends of pervasive sensor networks and sensors embedded into everyday devices goes along with a steadily increasing
number of deployed mobile devices, already reaching over 4 billion in 2009 [1]. Sensors and mobile phones are highly connected by
the fact that every modern phone embeds a variety of sensors: location, acoustical, light or orientation, for example. According to
Nokia Research Center [1] there will be a shift from traditional sensor networks to a participatory sensing infrastructure that
leverages available devices and puts humans in the loop. Thus leading to a sensing network utilizing people and their mobility.
Mobile sensing is the origin for large heterogeneous spatiotemporal data sets that have to be mined, processed and analyzed which
provides challenging tasks and complex problems.
[1] Nokia Research Center, Sensing the World with Mobile Devices, http://research.nokia.com/files/insight/NTI_Sensing_-
_Dec_2008.pdf, 2008
3. Sensors + Mobile Phones + People
Digital Enterprise Research Institute www.deri.ie
To give an overview of the main concepts we take a closer look at sensors connected to mobile phones, people carrying mobile
phones and mobile phones as place of origin for user-generated content and sensor data.
Looking solely at networked sensors, we have a well understood infrastructure where sensors are part of sensor networks and
sensor bases or social data networks can provide a persistence layer with varying degrees of data accessibility. Providers for this
kind of service are Sensorbase [2], Sensorpedia [3] or Pachube [4], for example. Sensor middleware can be utilized to collect,
process and analyze large amounts of sensor data.
On the other hand, people in a networked world are part of online communities and create content within social networks. All user-
generated content
in Social Media is distributed over a variety of heterogeneous services. Photo community Flickr [5], video community YouTube [6] or
generic social network Facebook [7], for example. These services act as valuable centralized places to collect and analyze social
media contributions.
Bridging the quantitative sensor world with the human-centric social media world is a challenging task. Mobile phones can play an
important role as the place of origin for user-generated content and sensor data. The concept of mobile sensing is focusing on the
mobile devices and its embedded sensors. Humans are seen as carriers providing the mobility which results in an opportunistic
sensing infrastructure [1].
In both scenarios, sensors embedded into mobile phones and people submitting content with mobile phones, the networking aspect
is essential and Internet connectivity is enabler to exchange and aggregate data. The Internet and the World Wide Web in specific
provide a scalable infrastructure with shared well-accepted standards and technologies and thus build a ready-to-use foundation
and platform for data distribution, mining and analysis.
The specifics of data sharing around the aspects of provenance, trust, permissions and access rights wonʼt be in the focus of this
research.
[1] Lane et al., Urban sensing systems: opportunistic or participatory?, 9th workshop on Mobile computing systems, 2008,
www.cl.cam.ac.uk/~mm753/papers/hotmobile08.pdf
[2] Sensorbase.org
[3] Sensorpedia.org
[4] Pachube.com
[5] Flickr.com
[6] Youtube.com
[7] Facebook.com
[8] Campbell et al., The Rise of People-Centric Sensing. Internet Computing, IEEE (2008) vol. 12 (4) pp. 12 - 21
4. Mobile + Sensors
Digital Enterprise Research Institute www.deri.ie
BioMapping.net by C. Nold
NoiseTube, SONY Computer
Science Laboratory
CitySense, SenseNetworks
Real Time Rome, MIT SENSEable City Lab
We follow by giving a short overview about typical mobile sensing scenarios to clarify how mobile devices can be used to sense
your physical environment.
The NoiseTube project “turns your mobile phone into an environmental sensor and participates to the monitoring of noise
pollution” [1]. Users can also annotate the sensed data with tags as ʻConstruction buildingʼ to give more background with the sensed
location.
CitySense [2] is built on top of Sense Networksʼ Macrosense location analytics platform [3] to report about nightlife activity in San
Francisco by analyzing location traces and consumer behavior.
BioMapping [4] measures the Galvanic Skin Response (GSR) to recognize emotional arousal at specific places as bridges or hard-
to-cross streets.
The Real Time Rome project [5] by the MIT SENSEable City Lab [6] analyzes cell phone data within the city of Rome to figure out
consumer behavior and finding patterns in urban data collections [7].
Goal of these sensing application can be the utilization and analysis of the sensed data to influence human behavior, e.g. where to
go next, which places have to be avoided, how to select the fasted path to a specific place, for example.
[1] NoiseTube by SONY Computer Science Laboratory, http://noisetube.net/
[2] CitySense, Sense Networks, http://www.citysense.com, “Life San Francisco Nightlife Activity”
[3] MacroSense, Sense Networks, http://www.sensenetworks.com/macrosense.php
[4] BioMapping, C. Nold, http://biomapping.net
[5] Real Time Rome, MIT SENSEable City Lab, http://senseable.mit.edu/realtimerome/
[6] MIT SENSEable City Lab, http://senseable.mit.edu/
[7] Reades et al., Cellular Census: Explorations in Urban Data Collection, Pervasive Computing, IEEE (2007) Vol. 6 (3) pp. 30 - 38
5. Mobile + People
Digital Enterprise Research Institute www.deri.ie
Source: twitter.com/arunshanbhag
“There’s a plane in the Hudson. i’m on the
ferry going to pick up people. Crazy.”
Source: www.flickr.com/photos/vinu
Source: twitpic.com/135xa
User-generated content, also referred as social media contributions, can be personal, entertaining, of uncertain quality, timely,
contextual and often unpredictable. For the sake of simplicity and convincingness we give an overview of typical contributions of high
importance and high quality. Citizen journalism refers to the act of citizens reporting about their neighborhood and the community
they are living in, ranging from hyper-local news to on-the-spot emergency reports. The tools that facilitate citizen journalism are
manifold and often free and easy to use.
User ʻarunshanbhagʼ reports about a terrorist attack at the Taj Hotel in Mumbai [1] by using the micro-blogging service Twitter, user
ʻvinuʼ on Flickr uploads and geo-references a photo about the same event with only a minor delay [2], user ʻjkrumsʼ is witnessing a
plane crash and immediately takes a picture which got tens of thousands of views on Twitpic [3], to name a few real-life citizen
journalism examples.
Mobile mapping of earthquake catastrophes [4] is another example where it is important to know at which places a house is burning
or where it collapsed and how big the impact was.
It is important to note that location often matters and user-generated content is in many cases already geo-referenced by utilizing
built-in GPS sensors in mobile phones or by manually selecting the correct location on a web mapping interface.
[1] “Mumbai Blasts”, Twitter user arunshanbhag, http://twitter.com/arunshanbhag, accessed April 2009
[2] “Mumbai Attacks on Nov 26th 2008”, Flickr user vinu, http://www.flickr.com/photos/vinu/sets/72157610144709049/, accessed April
2009
[3] “US Airways Hudson River Plane Crash”, Twitpic user jkums, http://twitpic.com/135xa, accessed April 2009
[4] “Earthquake LʼAquilla”, Google Maps user TMG, http://maps.google.com/maps/ms?
msa=0&msid=112463924814795169379.000466dc7c10ff9a99bd4&ie=UTF8&ll=42.3496,13.397613&spn=0.013638,0.032187&t=p&
z=15, accessed April 2009
6. Related Work
Digital Enterprise Research Institute www.deri.ie
References: see slide notes
To distinguish the related work we use a) a breakdown by type of content and b) a breakdown by the heterogeneity. The types of
data can be user-generated content, sensor data or scenarios where both are considered for integration and analysis. With regards
to the heterogeneity, the distinction is between a closed world scenario where one or multiple well-known data sources are applied
and an open world scenario where multiple heterogeneous data sources are applied. The latter provides a much higher complexity
because of the missing semantic interoperability between service providers and their data schemas.
The focus lies on the data integration and analysis of spatiotemporal user-generated content and sensor data in a mobile context.
Note: Will be clarified and regrouped later on.
[1] Eagle and Pentland, Reality mining: sensing complex social systems. Personal and Ubiquitous Computing, 2006
[2] Sheth et al., Semantic Sensor Web, IEEE Internet Computing, 2008
[3] Calabrese et al., Wikicity: Real-time Location-sensitive Tools For The City, IEEE Pervasive Computing, 2007
[4] Girardin et al., Digital Footprinting: Uncovering Tourists with User-Generated Content, IEEE Pervasive Computing, vol. 7 (4) pp.
36 - 43, 2008
[5] Urban Sensing, CENS/UCLA, Center for Embedded Networked Sensing, http://urban.cens.ucla.edu/
[6] PEIR, the Personal Environmental Impact Report, http://urban.cens.ucla.edu/projects/peir/
[7] Real Time Rome, MIT SENSEable City Lab, http://senseable.mit.edu/realtimerome/
[8] Sensorbase.org
[9] Sensorpedia.org
[10] Pachube.com
[11] SIOC, Semantically-Interlinked Online Communities, http://sioc-project.org/
[12] BioMapping, C. Nold, http://biomapping.net
[13] NoiseTube by SONY Computer Science Laboratory, http://noisetube.net/
7. Research Questions
Digital Enterprise Research Institute www.deri.ie
How can spatiotemporal user-generated content be
semantically linked to sensor data?
What kind of semantics are required to describe,
analyze and query heterogenous geospatial activities?
How can we increase and assess the contextual
information provided by sensor data and user-
generated content?
Why? Increased eficiency in emergency response management, urban planning or
targeted advertising,...
The focus will be on data access and data integration in a scenario where the dataʼs life-cycle is important to understand, i. e.
mobile-originated web-accessible content provided by heterogenous services.
8. Geospatial Activity Semantics
Digital Enterprise Research Institute www.deri.ie
*
*M. Perry. A framework to support spatial, temporal and thematic analytics over semantic web data, 2008
The underlying concepts of the geospatial activity vocabulary are the shared semantics of time, space and theme [1] of the user-
generated content and the sensor data. GeoAct is a vocabulary utilizing Semantic Web technologies as RDF for the representation
of spatiotemporal thematic data. RDF provides the basis for easy to use extension and integration mechanisms.
The red circles show common vocabularies as FOAF, Dublin Core or SIOC that can be interlinked to the GeoAct Activity base class.
A gazetteer schema as the one of Geonames [2] can be utilized to refer GPS coordinates to well-known geographical identifiers as
a way to geographically cluster activity data.
This section and the following about the GeoAct schema and the underlying concepts will be reviewed in detail in a different
presentation.
[1] Sheth and Perry, Traveling the Semantic Web through Space, Time, and Theme, IEEE Internet Computing, 2008
[2] http://www.geonames.org/
9. Activity Example as RDF/XML
Digital Enterprise Research Institute www.deri.ie
Semantic Gap!
RDF / Atom Triples?
Atom / RSS Feed Content: Earthquakes, Hazards, Weather, Traffic, Photos, Multimedia, Messages, Events,...
A sample RDF/XML extract of a geoact:Activity class about an earthquake notification as provided by the U.S. Geological Survey.
The semantic gap shows that more specific semantics of the earthquake notification and the seismic waves sensors are hidden in
non-structured text. A solution are more domain-specific ontologies extending the GeoAct vocabulary.
Web feeds (Atom or RSS with GeoRSS extensions) are already providing a variety of activity information from earthquakes to
hazards to multimedia messages or event information.
Sample feed sources for New York City, USA:
USGS M2.5+ Earthquakes
http://earthquake.usgs.gov/eqcenter/catalogs/1day-M2.5.xml
Brightkite Place Stream
http://brightkite.com/places/ede07eeea22411dda0ef53e233ec57ca/objects.rss?limit=100filters=notes,photos
Flickr Photo Stream
http://www.flickr.com/services/feeds/geo/?tags=newyork or http://www.flickr.com/services/feeds/geo/?tags=manhattan
Upcoming Event Stream
http://upcoming.yahoo.com/syndicate/v2/place/hVUWVhqbBZlZSrZU
Yahoo! Traffic Alert Stream
http://local.yahooapis.com/MapsService/rss/trafficData.xml?appid=YdnDemolocation=new%20york%20city
Weather, NYC, Central Park
http://www.weather.gov/xml/current_obs/KNYC.rss
10. GeoAct Data Flow
Digital Enterprise Research Institute www.deri.ie
The GeoAct prototype utilizes the GeoAct vocabulary for the integration of sensor data and user-generated content.
Social networks for user-generated content and social data networks for sensor data are the ʻsinksʼ of the platform that provide web-
accessibility by supporting web feeds as Atom or other open standard. These services should act as the neutral storage and
aggregation services in an ecosystem where sensor data and unser-generated content is shared in equal ways.
Note: More to follow. First the focus is on available web content by ignoring the assumption that the data should be mobile-
originated.
11. GeoAct TimeMap Visualization
Digital Enterprise Research Institute www.deri.ie
The screenshot shows one dynamic view of the TimeMap [1] interface where spatiotemporal activity content of a variety of sources
around New York City was aggregated from web feeds and visualized. The data mining is decoupled from the querying and
visualization part.
The GeoAct prototype takes a bounding box as the geospatial query parameter and a time period as temporal query parameter to
visualize on a interactive TimeMap crawled web-accessible geospatial activity data. The goal is to visualize heterogeneous
spatiotemporal data from a variety of service providers, e.g. Flickr, Brightkite, Twitter, Yahoo! Traffic or the U.S. Geological Survey.
The implementation is built around the Ruby on Rails web framework [2] and the relational database PostgreSQL [3] with the
PostGIS [4] extension for improved geospatial queries. A triple store with a SPARQL endpoint will be used for advanced querying.
For simplicity, privacy considerations are not taken into account for this work and all crawled content is publicly available. For user-
generated content it means that the user agreed to make his or her social media contribution accessible to everyone.
[1] Timemap.js is a Javascript library to help use Google Maps with a SIMILE timeline, http://code.google.com/p/timemap/
[2] Ruby on Rails, http://rubyonrails.org/
[3] PostgreSQL, http://www.postgresql.org/
[4] PostGIS, http://postgis.refractions.net/
12. First Conclusions
Digital Enterprise Research Institute www.deri.ie
Sensor data and UGC can fit together
integration analysis of both required to fully understand
the context and to increase situational awareness
raw sensor data streams not useful in the open world
scenario - middleware, social data networks?
Mobile phone provides same origin for production of UGC
SD
Neighborhood level “real-time” data not (yet) realistic
Demand for (social) sensor sharing infrastructure
FireEagle, Pachube, SensorPedia, SensorBase, Web Feeds,
UGC Metadata (machine tags; EXIF)
To increase situational awareness heterogeneous data sets with spatiotemporal sensor data and user-generated content can be
used. Shared semantics for time, location and theme provide a central point for the data integration steps.
There is a growing demand for a sensor data sharing infrastructure where sensors and sensor networks provide their collected data
streams in accessible ways to third parties interested in integrating and remixing the data. These services not only take care about
the data aggregation and accessibility, a management and provenance layer can also help to track the flow and origin of the data,
for example. We refer to it as social data networks with respect to the analogy to social networks. Social networks or online
communities are carriers of user-generated content and provide privacy layers and mechanisms to distribute publicly available
content.
In [1] the integration of social networks and sensor networks is considered and the question how sensors can extend social
networks or replace humans to answer certain queries. The focus is on using existing connections and privacy concepts within a
social network to share and access sensor devices.
[1] Breslin et. al, Integrating Social Networks and Sensor Networks, W3C Workshop on the Future of Social Networking, 2009
http://www.w3.org/2008/09/msnws/papers/sensors.html
13. Next Steps
Digital Enterprise Research Institute www.deri.ie
Sample SPARQL Queries Inference
“All activity topics in case of an earthquake on April 2nd 2009 by the service
www.flickr.com near Time Square”
Advanced Web Mining
Named entity recognition to extract location data from
unstructured text (e. g. Twitter messages) via APIs
Prototype refinements (querying, export,
documentation,...)
Thesis write-up (until June ‘09)
Explaining multiple SPARQL [1] queries that require the availability of sensor data and user-generated content helps to understand
how querying on the crawled web content can be done. Reasoning can be described by utilizing extended GeoAct activities.
[1] SPARQL Query Language for RDF, http://www.w3.org/TR/rdf-sparql-query/