1. Exploring the Networks
in Open Public Data
Uldis Bojārs
Institute of Mathematics and Computer Science
University of Latvia
Using Open Data Workshop
Brussels, 20-Jun-2012
2. About us
• Institute of Mathematics and Computer
Science, University of Latvia
– http://www.lumii.lv/resource/show/170
– Uldis Bojārs @CaptSolo
– Valdis Krebs http://orgnet.com
– Pēteris Ručevskis
3. Network visualisation and analysis
Applications:
• discover interesting patterns
• explore data in [more] detail
Work from the Open Data Hackaton in Riga
• analysis of Saeima voting patterns
• http://opendata.lv
4. Overview
• Data needs to be Open
• Pre-processing and filtering the data
– selecting what to show
• Data visualization
– iterative process (visualize, refine, repeat)
• What’s next?
5. Open Data needed first (!)
“Open data is data that can be
freely used, reused and redistributed by anyone …”
http://opendefinition.org/
Data needs to be:
• open
• easy to use
Still a problem in Latvia:
• only a few datasets are open in
an easy-to-consume form (PDF does not count :)
7. Pre-processing
• Input:
– raw vote data (scraped from the website)
published at http://data.opendata.lv/
• Output:
– nodes (MPs)
– edges (connections between them)
• What is a connection?
8. Defining graph connections
• Connect MPs if they have voted similarly
– disagreed on at most n% of decisions
• Filter out cases where almost all
MPs voted the same
• Filter out trivial decisions
• Filter out noise
9. Node colour legend
• Ruling coalition:
– Zatler’s Reform Party
– Unity
– the National Alliance
• Opposition:
– Harmony Centre
– Greens / Farmers Party
• a few non-party MPs
10. MPs who always vote the same (n = 0%)
Connection criteria too narrow
11. MPs who disagree in less than 35% of cases
Connection criteria too broad
(everyone agrees, really?)
12. Refining the visualisation
• Need to find the right cut-off values (n%)
– where patterns [start to] appear
– and the visualisation makes sense
• Show the results to domain experts
– MPs, journalists, political researchers, …
• Experts:
– help improve visualisations
– can discover new things for themselves
13. MPs who disagree in less than 11% of cases
Opposition parties [sometimes] vote the same
14. MPs who disagree in less than 25% of cases
Bridges appear b/w position and opposition parties
(see slides 21, 22 re the bridging role of yellow nodes)
15. What next?
• Improve our understanding of data
• Enhance visualisations
– add clusters, etc.
• Create multiple visualisations
– different topics, changes in time, etc.
• Bring in more data
– explain nodes & edges
16. network
visualisation
example #1
Donations to political parties
http://www.thenetworkthinkers.com/2011/12/
innovation-happens-at-intersections.html
17. network
visualisation
example #2
Intra-company communication patterns
18. Conclusion
• Need more, useful Open Data
• Discovering patterns, making sense of data
– helping make sense = purpose of visualisations
• Looking forward to collaboration re:
– Using Open Data
– Data Visualisation and Analysis
19. More info
• Uldis Bojārs
uldis.bojars@gmail.com
• Social Network Analysis talk / Valdis Krebs
http://www.slideshare.net/DERIGalway/
valdis-krebs-social-network-analysis-19872007
• Smart Network Analyzer tool
http://sna.lumii.lv/
in development at IMCS, University of Latvia
Hinweis der Redaktion
the raw data not always immediately useful to wide public - using open data - discovering patterns - making sense of it
It’s worthwhile to explore networks that emerge from the data you’re looking atVarious kinds of networks: - people in companies (who communicates with whom) - MPs, based on co-voting patterns - companies (networks of)
Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. - http://opendefinition.org/http://opendatahandbook.org/en/what-is-open-data/index.html
- scrape the data -make it open - clean up the data - transform the data - make it usable [for the purpose]how do we define an edge?
We want to choose those parts of data from which we can deduce something - simple procedural decisions are outChose voting instances where there were notable opinion differencesNoise = MPs who had votes only a few times (throws off %s)---Some votes are more important than others
Harmony CentreGreens/Farmers–choice: (a) join one of twoclusters; (b) isolation; (c) bridge between them
strong voting discipline in the Harmony Centre. majority of the rest do not vote the same (at this value of n%)
far opposition / near opposition / coalitionlooks prettydoesnot give much useful information - almost a full graph
does it look right at first sight? (the “sniff test”)show to domain expertspeople can make pretty graphs - but what do they mean? - what can we explain or show via them?
the Greens / Farmers party is bridging between the strong opposition party Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalitionsee slides 21, 22 re “live animation” showing what happens if you take them off the graph
learned from experts: not everything appears as a vote; some votes are more important than others - more insights -> better visualisations (more truthful, etc.)some advanced visualisations will need more information - e.g., to define what laws are on what topicsbringing in more data - annotate nodes & edges with additional data / explanations of why this edge appears here - profiles for members of parliament (e.g., TheyWorkUs site in the UK) - linked data
another example of an open data graph visualisation
another view of this data: http://www.slideshare.net/DERIGalway/valdis-krebs-social-network-analysis-19872007/15The central red cluster corresponds to the company headquarters. Eachvertex in the network represents an employee, colored according to the locationthey work at. Graph edges denote frequent, confirmed, work-related communi-cations between employees. Cluster overlaps reveal which employees frequentlyinteract with other locations, serving as boundary-spanners. This visualizationhelps to identify key connectors in the company [0].
what do we do with thesevisualisations next? = how do we use them (to have impact, explain data, …)
social network visualisation & analysis allow to see what was previously invisible“Social Network Analysis” talk by Valdis Krebs - for more info re SNA and network visualization
demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalition - (edge connection criteria n = 25%)
demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalitionwhen the Greens / Farmers party nodes are hidden from the graph, there is no connection. - the coalition and the Harmony Centre do not vote the same