Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Loading in …3
1 of 22

Exploring the Networks in Open Public Data



Download to read offline

Related Audiobooks

Free with a 30 day trial from Scribd

See all

Exploring the Networks in Open Public Data

  1. 1. Exploring the Networks in Open Public Data Uldis Bojārs Institute of Mathematics and Computer Science University of Latvia Using Open Data Workshop Brussels, 20-Jun-2012
  2. 2. About us • Institute of Mathematics and Computer Science, University of Latvia – – Uldis Bojārs @CaptSolo – Valdis Krebs – Pēteris Ručevskis
  3. 3. Network visualisation and analysis Applications: • discover interesting patterns • explore data in [more] detail Work from the Open Data Hackaton in Riga • analysis of Saeima voting patterns •
  4. 4. Overview • Data needs to be Open • Pre-processing and filtering the data – selecting what to show • Data visualization – iterative process (visualize, refine, repeat) • What’s next?
  5. 5. Open Data needed first (!) “Open data is data that can be freely used, reused and redistributed by anyone …” Data needs to be: • open • easy to use Still a problem in Latvia: • only a few datasets are open in an easy-to-consume form (PDF does not count :)
  6. 6. 9DEA96450E79B7E5C2257944007E589D?OpenDocument
  7. 7. Pre-processing • Input: – raw vote data (scraped from the website) published at • Output: – nodes (MPs) – edges (connections between them) • What is a connection?
  8. 8. Defining graph connections • Connect MPs if they have voted similarly – disagreed on at most n% of decisions • Filter out cases where almost all MPs voted the same • Filter out trivial decisions • Filter out noise
  9. 9. Node colour legend • Ruling coalition: – Zatler’s Reform Party – Unity – the National Alliance • Opposition: – Harmony Centre – Greens / Farmers Party • a few non-party MPs
  10. 10. MPs who always vote the same (n = 0%) Connection criteria too narrow
  11. 11. MPs who disagree in less than 35% of cases Connection criteria too broad (everyone agrees, really?)
  12. 12. Refining the visualisation • Need to find the right cut-off values (n%) – where patterns [start to] appear – and the visualisation makes sense • Show the results to domain experts – MPs, journalists, political researchers, … • Experts: – help improve visualisations – can discover new things for themselves
  13. 13. MPs who disagree in less than 11% of cases Opposition parties [sometimes] vote the same
  14. 14. MPs who disagree in less than 25% of cases Bridges appear b/w position and opposition parties (see slides 21, 22 re the bridging role of yellow nodes)
  15. 15. What next? • Improve our understanding of data • Enhance visualisations – add clusters, etc. • Create multiple visualisations – different topics, changes in time, etc. • Bring in more data – explain nodes & edges
  16. 16. network visualisation example #1 Donations to political parties innovation-happens-at-intersections.html
  17. 17. network visualisation example #2 Intra-company communication patterns
  18. 18. Conclusion • Need more, useful Open Data • Discovering patterns, making sense of data – helping make sense = purpose of visualisations • Looking forward to collaboration re: – Using Open Data – Data Visualisation and Analysis
  19. 19. More info • Uldis Bojārs • Social Network Analysis talk / Valdis Krebs valdis-krebs-social-network-analysis-19872007 • Smart Network Analyzer tool in development at IMCS, University of Latvia

Editor's Notes

  • the raw data not always immediately useful to wide public - using open data - discovering patterns - making sense of it
  • It’s worthwhile to explore networks that emerge from the data you’re looking atVarious kinds of networks: - people in companies (who communicates with whom) - MPs, based on co-voting patterns - companies (networks of)
  • Open data is data that can be freely used, reused and redistributed by anyone - subject only, at most, to the requirement to attribute and sharealike. -
  • - scrape the data -make it open - clean up the data - transform the data - make it usable [for the purpose]how do we define an edge?
  • We want to choose those parts of data from which we can deduce something - simple procedural decisions are outChose voting instances where there were notable opinion differencesNoise = MPs who had votes only a few times (throws off %s)---Some votes are more important than others
  • Harmony CentreGreens/Farmers–choice: (a) join one of twoclusters; (b) isolation; (c) bridge between them
  • strong voting discipline in the Harmony Centre. majority of the rest do not vote the same (at this value of n%)
  • far opposition / near opposition / coalitionlooks prettydoesnot give much useful information - almost a full graph
  • does it look right at first sight? (the “sniff test”)show to domain expertspeople can make pretty graphs - but what do they mean? - what can we explain or show via them?
  • the Greens / Farmers party is bridging between the strong opposition party Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalitionsee slides 21, 22 re “live animation” showing what happens if you take them off the graph
  • learned from experts: not everything appears as a vote; some votes are more important than others - more insights -> better visualisations (more truthful, etc.)some advanced visualisations will need more information - e.g., to define what laws are on what topicsbringing in more data - annotate nodes & edges with additional data / explanations of why this edge appears here - profiles for members of parliament (e.g., TheyWorkUs site in the UK) - linked data
  • another example of an open data graph visualisation
  • another view of this data: central red cluster corresponds to the company headquarters. Eachvertex in the network represents an employee, colored according to the locationthey work at. Graph edges denote frequent, confirmed, work-related communi-cations between employees. Cluster overlaps reveal which employees frequentlyinteract with other locations, serving as boundary-spanners. This visualizationhelps to identify key connectors in the company [0].
  • what do we do with thesevisualisations next? = how do we use them (to have impact, explain data, …)
  • social network visualisation & analysis allow to see what was previously invisible“Social Network Analysis” talk by Valdis Krebs - for more info re SNA and network visualization
  • demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalition - sometimes agree with the opposition, sometimes with the coalition - (edge connection criteria n = 25%)
  • demo how the Greens / Farmers party is bridging between the stong opposition Harmony Centre and the ruling coalitionwhen the Greens / Farmers party nodes are hidden from the graph, there is no connection. - the coalition and the Harmony Centre do not vote the same
  • ×