SlideShare ist ein Scribd-Unternehmen logo
1 von 50
Big Data and the Future of
Knowledge
Eric T. Meyer; Ralph Schroeder
Collaborators:
Linnet Taylor, Josh Cowls, Greg Taylor, Monica Bulger
SDP, July, 2014
Overview
• Projects
• Questions
• Issues
• Definition
• How knowledge advances
• Examples
• Big Data Issues in Research and Beyond
• Policy Implications
• The Future of Social Science Knowledge
Accessing and Using Big Data to Advance
Social Science Knowledge
• Funded by Sloan Foundation
• Data sources
• 100+ interviews, mainly with social scientists
• Reports, workshops
• Publications, conferences
• No representative sample, but some patterns of
disciplinary and skills background and career
trajectory
See http://www.oii.ox.ac.uk/research/projects/?id=98
Data-driven economic models: challenges
and opportunities of big data
• Funded by Research Councils UK (RCUK),
New Economic Models in the Digital
Economy (NEMODE) network
• Data Sources:
– 25+ interviews
– Case studies
– Issues include how models relate to national
contexts (ie. privacy laws in Germany), where
skills are located (plus gaps), use of
public/private data, standardization
Source: http://www.forbes.com/sites/davefeinleib/2012/06/19/the-big-data-landscape/
Source: Leonard John Matthews, CC-BY-SA (http://www.flickr.com/photos/mythoto/3033590171)
Image source: Leonard John Matthews, CC-BY-SA (http://www.flickr.com/photos/mythoto/3033590171)
Spurious Correlations
Narrative vs. Database
http://www.hollywoodreporter.com/news/barack-obama-mitt-romney-presidential-
campaign-twitter-facebook-361929
Twitter-bots
OII master’s students Alexander Furnas and Devin Gaffney saw a large spike in then-US
presidential candidate Mitt Romney’sTwitter followers, and decided to look at the new
followers:
Furnas, A. and Gaffney, D. (2012). ‘Statistical Probability That Mitt Romney's New Twitter Followers Are Just Normal Users: 0%’. The Atlantic, July 31,
http://www.theatlantic.com/technology/archive/2012/07/statistical-probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31, 2012).
Google Images: Big Data
Big data in the commercial world
• Commercial uses are: ‘in house’,
‘outsourced own data’, ‘data analysis as a
consultancy service’
• Careers in data analysis entail as a baseline
computer science/statistical expertise, plus
different domains of ‘sorting people’ and
being able to ‘manipulate’ them (ie.
predict their behaviour)
Definition
• ‘Big data’
– the advance of knowledge via a leap in the scale
and scope in relation to a given object or
phenomenon
‘Data’
– Belongs to the object
– ‘taking…before interpreting’ (Ian Hacking)
• the view that ‘all data are of their nature interpreted’ is
misleading: ‘data are made, but as a good first
approximation, the making and taking come before
interpreting’
– The most atomizable useful unit of analysis
Computational Manipulability?
• ‘the distinctiveness of the network of mathematical
practitioners is that they focus their attention on the pure,
contentless form of human communicative operations: on
the gestures of marking items as equivalent and of ordering
them in series, and on the higher-order operations which
reflexively investigate the combinations of such operations’
• ‘mathematical rapid-discovery science…the lineage of
techniques for manipulating formal symbols representing
classes of communicative operations’
• Why is big data a big deal? Manipulability, plus new data
sources
Research computing
The Grid
Supercomputing
Clouds
Big Data
Web 2.0
Digital Objects and their Referents
Digital Object
(Examples: Twitter,
Tesco Loyalty card
information
Real World
(People / Physical
Objects)
Represent / Manipulate
Representing
Manipulating
Limits
Digital Data
010101 Knowledge
Uses and Limits
• Big data research uses (academic, commercial, government) are limited to
the exploitation of suitable objects, and the objects which ‘give off’ digital
data, and the phenomena they lay bare, are limited
• The knowledge produced is aimed at ‘sorting people’ and advancing
‘representing and intervening’ (but without ‘manipulating’, except where
this is warranted by practical economic and political objectives)
• Difference commercial versus academic world is that knowledge provides
competitive and practical advantage as against advancing (high-consensus
rapid-discovery) knowledge
– The limits in both cases are the objects (to which the data ‘belong’), and that need to
have available digitally manipulable data points
• How available these objects are differs, but also…
– Causation and theoretical embedding matters for academic social science
– For commercial (and non-academic uses), ‘predicting’ consumer choices and other
behaviours, for limited purposes and without increasing scientific knowledge, is good
enough
• There are many objects, for non-academics and scientists to humanities
scholars (physical, human, cultural), but they are not infinite
• This availability, not skills or other issues, determines the future of big data
research
113 240 278 367
558
1,195
1,538
2,350
3,960
6,787
7,276
9,010
-
1,000
2,000
3,000
4,000
5,000
6,000
7,000
8,000
9,000
10,000
1st Q 2nd Q 3rd Q 4th Q 1st Q 2nd Q 3rd Q 4th Q 1st Q 2nd Q 3rd Q 4th Q
2010
(n=998)
2011
(n=5,641)
2012
(n=27,033)
Number of News Articles on Big Data
Source: Nexis data compiled by Meyer & Schroeder
Platform Paper Size of Data in relation to
phenomenon investigated
Theoretical
question/practical aim
Key findings
Facebook Backstrom et al. (2012) 69 billion friendship links
between 721 million Facebook
users
Re-examine Milgram’s ‘six
degrees of separation’
online
Four degrees of separation on
Facebook
Ugander et al. (2012) 54 million invitation emails to
Facebook users
How does structure of
contacts affect invitation
acceptance?
Not number of contacts, but
number of distinct contexts,
matters for acceptance
Bond et al. (2012) 600000 Facebook users Facebook experiment about
how to mobilize voters
Voters can be mobilized via
Facebook friends more than via
informational messages
Twitter Kwak et al. (2010) 1.47 billion directed Twitter
relations
Is Twitter a broadcast
medium or a social
network?
Most use is for information, not
as a social network
Cha et al. (2010) 1.7 billion tweets among 54
million users
Who influences whom? Top influentials dominate, but
some variation by topic
Bakshy et al. (2011) 1.6 million Twitter users Who influences whom? ‘Ordinary user’ influencers can
sometimes be more effective
than top influencers
Wikipedia Loubser (2009) All Wikipedia activity How is editing organized? Administrators can impact
negatively on participation
Yasseri, Kertesz (2012) Editorial activity on Wikipedia,
especially reverts
Understanding conflict and
collaboration
Types of conflicts can be
modelled
West, Weber and Castillo
(2012)
Wikipedia contributions related
to Yahoo! browsing
What characterizes
Wikipedia contributors’
information behaviour
compared to Wikipedia
readers and non-readers
Wikipedia contributors are more
‘information hungry’, especially
about their topics
Example 1:
Search engine behaviour
Waller’s analysis ofAustralian Google Users
Key findings:
- Mainly leisure
- > 2% contemporary issues
- No perceptible ‘class’ differences
Novel advance:
- Unprecedented insight into what people search for
Challenge:
- Replicability
- Securing access to commercial data
?
?
?
?
?
?
?
?
?
“Surprisingly, the distribution of
types of search query did not vary
significantly across the different
Lifestyle Groups (p>0.01).”
Source: Waller, V. (2011). “Not Just Information:Who Searches for What on the Search Engine Google?” Journal of the American Society for Information Science &
Technology 62(4): 761-775.
Example 2:
Large-scale text analysis
Michel et al. ‘culturomic’ analysis of 5 Million Digitized Google
Books and Heuser & Le-Khac of 2779 19th Century British
Novels
Key findings:
- Patterns of key terms
- Industrialization tied to shift from abstract to concrete
words
Novel advance:
- Replicability, extension to other areas, systematic
analysis of cultural materials
Challenge:
- Data quality
J Michel et al. Science 2011;331:176-182
Example 3:
Social network or news?
Kwak et al.’s analysis ofTwitter
Key findings:
- 1.47 billion social relations
- 2/3 of users are not followers or not followed by any of their
followings
- Celebrities, politicians and news are among top 20 being followed
Novel advance:
-Volume of relations and topics
Challenge:
- News or social network needs to be contextualized in media
ecology
- Securing access to commercial data
Example 4: Wikipedia page views predict ticket sales
Mestyán, Márton, Taha Yasseri, and János Kertész. "Early prediction of movie box office success based
on Wikipedia activity big data." PloS one 8.8 (2013): e71226.
Example 5: Wikipedia edits
Graham’s focus is on geography, using geo-coded
Wikipedia edits
Key findings:
- asymmetries are about power relations (what is represented
or not represented, or what is visible and what stays invisible)
in relation to the world’s places and spaces
- persistent inequalities between global North and South
Novel advance:
- Tying geo-data to content updates
- As Yasseri et al (2012) point out: Wikipedia ‘is one of the few
human societies that the history of all actions of its members
are recorded and accessible’.
Challenge:
- Very incomplete geo-coding data
- Moving from visualization to theory
Mark Graham & Bernie
Hogan’s project
investigated inequalities
in the creation of
knowledge
The map shown here
reveals uneven spread of
geo-tagged Wikipedia
articles 2011-12
Example 6: British Library as Web Archive Curator
• Data
– Internet Archives data of .uk back to 1996
– Annual crawls of .uk websites since 2013
– 2.7 billion nodes, 40 TB compressed
• Features
– Full text search (in progress, IHR)
– Network analysis (OII)
– N-gram analysis
• Limitations
– Page content data access limited
Growth of sub-domains
N.B. y-axis on log scale
Relative sector size on the web
Sectoral linking on the UK Web
2010
(Big) data definition enables
pinpointing impacts and threats
• ‘Google Plus may not be much of a competitor to Facebook as
a social network, but…some analysts…say that Google
understands more about people’s social activity than
Facebook does.’
– New York Times, 15.2. 2014, p. A1 ‘The Plus in Google Plus? It’s Mostly for Google’.
• Facebook Likes: ‘Predicting users’ individual attributes and
preferences can beused to improve numerous products and
services. For instance, digital systems and devices (such as
online stores or cars) could be designed to adjust their
behavior to best fit each user’s inferred profile…online
insurance…advertisements might emphasize security when
facing emotionally unstable (neurotic) users but stress
potential threats when dealing with emotionally stable ones’
– ‘Private traits and attributes are predictable from digital records of human behavior.’ Kosinski M,
Stillwell D, Graepel T.,Proc Natl Acad Sci 2013 Apr 9;110(15):5802-5.
• More powerful knowledge will enable better services, and
more manipulation
‘Big data‘ for understanding society
• Real-time transactional data (unlike survey
data, traditional staple of social science)
• Outside capability of normal desktop
computing environment (‘Too big to
handle’)
• Big potential for understanding
institutions and individual behaviour
Social Science and Big Data
Research
• Dominated by social media
• Issues of ‘whole universe’
– What population, offline and online, does it
represent
– Data quality and replicability
– How does ‘modality’ determine findings about
implications
• How to embed the research
– In existing theory (but also advance theory)
– In existing ecology of media uses in society
(including ones that extend existing ones)
Scientificity and Big Data: Pro and
Con
• Pro
– Replicability, extension to new domain
– ‘Total’ datasets, ‘whole universe’
– (Often) no sampling needed, data for all behaviour and over
whole existence
– Ready made manipulability
– Powerful relation of data to object
• Con
– Limited access to object, skills needed for manipulability
– (Often) not known who users are
– No or little knowledge of how (commercial) data were gathered
– Researcher does not ask what is of interest without ‘givenness’
– Datasets capture limited dimensions, and about one object
– Object in isolation, not framed for social change significance
Other positions on Big Data
Implications 1
• Mayer-Schoenberger and Cukier, boyd and Crawford argue that not
all information can or should be captured
– No, need to create the legal and ethical social space which protects the
individual. The solution does not rely on denying the powerfulness of
knowledge, but harnessing it appropriately.
• Mayer-Schoenberger and Cukier solution of 1.more transparent
algorithm, 2. Certifiying validity of algorithm 3. Allowing
disprovability of prediction (p.176) –
– Yes, but within social science, solution is to make knowledge more
scientific.
• Underlying all these problems is more powerful knowledge
– This goes against free, untrammelled behaviour
– Solution: Society becomes more self-aware and shapes knowledge to
constrain it
• Crawford, Marwick: big data is product of neoliberal capitalism? No,
uses by different societies, and for purposes apart from ‘neoliberal
capitalist’ ones, such as open government data and Wikipedia
analysis
Other Positions on Big Data Implications 2
• Savage and Burrows: ask are commercial data outpacing
social science?
• Boyd and Crawford: does big data raise epistemological
conundrums, and isn’t it always already (social) contextual ?
• Mayer-Schoenberger and Cukier: what are the political and
commercial harms of wrong knowledge, especially when it
changes ‘everything’?
... No ...
• Knowledge depends on the relation between research
technologies and the advance of knowledge
• The threats and opportunities are not contextual, but
depend on how more powerful knowledge is used
• Big data contributes to more ‘scientific’ (i.e. cumulative)
social sciences, but within limits, and there are limits to
commercial and political uses too
Consumer (and gov’t) Big Data
• Consumer data and privacy (ie. Target pregnancy case)
– Solution: data protection
• Consumer data and prediction and control (ie. click
behaviour): affects consumer without transparency, predictive
privacy harm
– Solution: transparency, ‘due process’ (Crawford and Schultz)
• Consumer data – and government data - and exclusion from
benefits thereof (ie. no or little use of digital devices) - if not
captured by data, left out
– Solution: Data antisubordination (Lerman)
– Solution: government may need more data about us (and
counteract the data invisibility of parts of the population)
• Consumer data from digital media (ie. search engines) – manipulate
what is found without transparenyc, inappropriate personalization
(Pariser)
– Solution: transparency, consumer protection
Big Data and Policy
• Probabilistic rather than ‘causal’ commercial and
government uses of data (ie. profiling) - only probable,
not definite causal behaviour of data emitters
established (Mayer-Schoenberger and Cukier)
– Solution: more accurate knowledge
• Exposure of Data emitter because of identifiers in large-
scale and linked data (Netflix, AOL, Google Streetview,
National Security Administration), such that
anonymization does not work
– Solution: data protection, better anonymization,
opting out, consent
• Social media used in authoritarian regimes for control
(Weibo in China)
– Solution: more commercial independence, more civil
society pushback, researcher non-cooperation
Outlook and Implications
• There is an overlap between real world research and
the world of academic research which is closer than
elsewhere
– because this is the research front in both
– because they share common objects
• For research
– Develop theoretical frame in which to embed big data (for
social media), including power/function, relation to
traditional media, and role in society
• For society
– Awareness of how research can generate transparency and
manipulability
• Big Brother?
– Yes, but also Brave New World of Omniscience, with Social
Science as Handmaiden
Future of Big Data Research
• Difference commercial versus academic world is that
knowledge provides competitive advantage as against
advancing (high-consensus rapid-discovery) knowledge
• The limits in both cases are the objects (to which the data
‘belong’), and that need to have available digitally
manipulable data points
• How available these objects are differs
• There are many objects, for non-academics and scientists to
humanities scholars (physical, human, cultural), but they are
not infinite
• This availability, not skills or other issues, determines the
future of big data research
• A Golden Age of Quantification and New Sources of Data…A
Dark Age (so far) of understanding new online phenomena
and their social significance
The Future of Social Science
Knowledge
• Social science will be opened to other
disciplines, and be forced to consider its turf
• A major area of social science (‘big data’,
‘computational social science’) will have a
rapidly advancing research front and come to
prominence
• Three limits
– Only part of our lives are spent online and/or
digitally captured
– The social sciences are pluralist
– All (‘quant’) research advances must be made to
contribute to theoretical understandings of social
change (in ‘words’)
Additional readings and references
Bond, Robert et al. (2012). ‘A 61-million-person experiment in social influence and political mobilization’,
Nature 489: 295–298.
Bruns, A. and Liang,Y.E. (2012). ‘Tools and methods for capturingTwitter data during natural disasters’, First
Monday, 17 (4 – 2), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3937/3193
Furnas, A. and Gaffney, D. (2012). ‘Statistical ProbabilityThat Mitt Romney's NewTwitter Followers Are Just
Normal Users: 0%’. The Atlantic, July 31, http://www.theatlantic.com/technology/archive/2012/07/statistical-
probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31,
2012).
Giles, J. (2012). ‘Making the Links: From E-mails to Social Networks, the DigitalTraces left Life in the
ModernWorld areTransforming Social Science’, Nature, 488: 448-50.
Kwak, H. et al. (2010). ‘What isTwitter, a Social Network or a News Media?’ Proceedings of the 19th
InternationalWorldWide Web (WWW) Conference, April 26-30, 2010, Raleigh NC.
Manyika, J. et al. (2011). ‘Big data: the next frontier for innovation, competition and productivity’, McKinsey
Global Institute, available at: http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/
big_data_the_next_frontier_for_innovation (last accessed August 29, 2012).
Silver, Nate. (2012). The Signal and the Noise:The Art and Science of Prediction. London:Allen Lane.
Tancer, B. (2009). Click:What Millions of People are Doing Online andWhy It Matters. NewYork: Harper
Collins, 2009.
Wu, S. , J.M. Hofman,W.A. Mason, and D.J. Watts, (2011). ‘Who says what to whom on twitter’, Proceedings
of the 20th international conference onWorldWideWeb. (on DuncanWatts webpage,
http://research.microsoft.com/en-us/people/duncan/, last accessed August 29, 2012).
Project Papers
Schroeder, Ralph (Forthcoming). ‘Big Data: Towards a More Scientific Social Science and Humanities’ in Mark Graham and William H
Dutton (eds.), Society and the Internet: How Networks of Information are Changing our Lives. Forthcoming.
Schroeder, Ralph, & Taylor, Linnet (Forthcoming). ‘Is bigger better? The emergence of big data as a tool for international development
policy.’ GeoJournal.
Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, August). ‘Big Data in the Study of Twitter, Facebook and Wikipedia: On the Uses
and Disadvantages of Scientificity for Social Research.’ Paper presented at the proceedings of the Annual Meeting of the American
Sociological Association. (being submitted)
Schroeder, Ralph, & Taylor, Linnet. ‘Big Data and Wikipedia Research: Social Science Knowledge across Disciplinary Divides’. Submitted to
Information, Communication and Society.
Taylor, Linnet. ‘No place to hide? The ethics and analytics of tracking mobility using African mobile phone data. Submitted to Population,
Space and Place.
Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet. ‘Big Data in the Social Sciences: Towards a New Research Paradigm?’ (being
submitted).
Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, November). ‘The Boundaries of Big Data.’ Paper presented at SIG-SI Symposium,
ASIST 2013, November 1-6, 2013, Montreal, Quebec, Canada.
Schroeder, Ralph and Cowls, Josh. ‘Answering Questions and Questioning Answers in the Era of Big Data.’ In preparation.
Taylor, Linnet, Meyer, Eric T., & Schroeder, Ralph. ‘Bigger and better, or more of the same? Emerging practices and perspectives on big
data analysis in economics”. Forthcoming in Big Data & Society.
Cowls, Josh. ‘The Crowd in the Cloud?’, forthcoming presentation and IPP 2014’
Cowls, Josh ‘Big Data and Policy Implementation’, in preparation.
Schroeder, Ralph ‘Big Data and Policy Implications’, in preparation.
Oxford Internet Institute
With support from:
Ralph Schroeder
ralph.schroeder@oii.ox.ac.uk
http://www.oii.ox.ac.uk/people/?id=26
See http://www.oii.ox.ac.uk/research/projects/?id=98

Weitere ähnliche Inhalte

Was ist angesagt?

Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Big Data Spain
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Access Innovations, Inc.
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementAccess Innovations, Inc.
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Joe Keating
 
Smart IoT for Connected Manufacturing
Smart IoT for Connected ManufacturingSmart IoT for Connected Manufacturing
Smart IoT for Connected ManufacturingAmit Sheth
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris
 
Foresight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesForesight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesMichael Andreas Zeng
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Amit Sheth
 
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...WiLS
 
From Big Data to the Big Picture
From Big Data to the Big PictureFrom Big Data to the Big Picture
From Big Data to the Big PictureSAGE Publishing
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so farEnrico Daga
 
Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Paco Nathan
 
UT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingUT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingMatthew Lease
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’seSAT Journals
 

Was ist angesagt? (20)

Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
Data Science in 2016: Moving up by Paco Nathan at Big Data Spain 2015
 
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
Text Mining, Term Mining, and Visualization - Improving the Impact of Scholar...
 
Big Data
Big Data Big Data
Big Data
 
Big Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and ManagementBig Data Content Organization, Discovery, and Management
Big Data Content Organization, Discovery, and Management
 
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
Glantus Presentation: Ethical Data Science - BoI Analytics Connect 2018
 
Smart IoT for Connected Manufacturing
Smart IoT for Connected ManufacturingSmart IoT for Connected Manufacturing
Smart IoT for Connected Manufacturing
 
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
Sirris innovate2011 - Smart Products with smart data - introduction, Dr. Elen...
 
Big dataorig
Big dataorigBig dataorig
Big dataorig
 
Foresight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable EnergiesForesight by Online Communities - The Case of Renewable Energies
Foresight by Online Communities - The Case of Renewable Energies
 
Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...Smart Data - How you and I will exploit Big Data for personalized digital hea...
Smart Data - How you and I will exploit Big Data for personalized digital hea...
 
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
Conforming to Destiny or Adapting to Circumstance: The State of Cataloging in...
 
Innovation ecosystems value co-creation
Innovation ecosystems value co-creationInnovation ecosystems value co-creation
Innovation ecosystems value co-creation
 
From Big Data to the Big Picture
From Big Data to the Big PictureFrom Big Data to the Big Picture
From Big Data to the Big Picture
 
Linked Data at the OU - the story so far
Linked Data at the OU - the story so farLinked Data at the OU - the story so far
Linked Data at the OU - the story so far
 
Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...Data Center Computing for Data Science: an evolution of machines, middleware,...
Data Center Computing for Data Science: an evolution of machines, middleware,...
 
Big data Paper
Big data PaperBig data Paper
Big data Paper
 
Big data trends in 2020
Big data trends in 2020Big data trends in 2020
Big data trends in 2020
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
UT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd ComputingUT Dallas CS - Rise of Crowd Computing
UT Dallas CS - Rise of Crowd Computing
 
Isolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’sIsolating values from big data with the help of four v’s
Isolating values from big data with the help of four v’s
 

Andere mochten auch

Zook making sense of geosocial media-final
Zook making sense of geosocial media-finalZook making sense of geosocial media-final
Zook making sense of geosocial media-finaloiisdp
 
Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...oiisdp
 
Rebecca eynon learning & interaction in moo cs
Rebecca eynon learning & interaction in moo csRebecca eynon learning & interaction in moo cs
Rebecca eynon learning & interaction in moo csoiisdp
 
2014 digital ethography_eric meyer
2014 digital ethography_eric meyer2014 digital ethography_eric meyer
2014 digital ethography_eric meyeroiisdp
 
Kathryn eccles digital humanities
Kathryn eccles   digital humanitiesKathryn eccles   digital humanities
Kathryn eccles digital humanitiesoiisdp
 
Grant blank
Grant blankGrant blank
Grant blankoiisdp
 
Rebecca eynon e research ethics 2014
Rebecca eynon e research ethics 2014Rebecca eynon e research ethics 2014
Rebecca eynon e research ethics 2014oiisdp
 
Joss wright
Joss wrightJoss wright
Joss wrightoiisdp
 
Luciano Floridi
Luciano FloridiLuciano Floridi
Luciano Floridioiisdp
 
Ian walden - data protection in cloud computing
Ian walden - data protection in cloud computingIan walden - data protection in cloud computing
Ian walden - data protection in cloud computingoiisdp
 

Andere mochten auch (10)

Zook making sense of geosocial media-final
Zook making sense of geosocial media-finalZook making sense of geosocial media-final
Zook making sense of geosocial media-final
 
Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...Jonathan bright - collecting social media data with the python programming la...
Jonathan bright - collecting social media data with the python programming la...
 
Rebecca eynon learning & interaction in moo cs
Rebecca eynon learning & interaction in moo csRebecca eynon learning & interaction in moo cs
Rebecca eynon learning & interaction in moo cs
 
2014 digital ethography_eric meyer
2014 digital ethography_eric meyer2014 digital ethography_eric meyer
2014 digital ethography_eric meyer
 
Kathryn eccles digital humanities
Kathryn eccles   digital humanitiesKathryn eccles   digital humanities
Kathryn eccles digital humanities
 
Grant blank
Grant blankGrant blank
Grant blank
 
Rebecca eynon e research ethics 2014
Rebecca eynon e research ethics 2014Rebecca eynon e research ethics 2014
Rebecca eynon e research ethics 2014
 
Joss wright
Joss wrightJoss wright
Joss wright
 
Luciano Floridi
Luciano FloridiLuciano Floridi
Luciano Floridi
 
Ian walden - data protection in cloud computing
Ian walden - data protection in cloud computingIan walden - data protection in cloud computing
Ian walden - data protection in cloud computing
 

Ähnlich wie Ralph schroeder and eric meyer

Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Lauri Eloranta
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Sciencedatasciencekorea
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...The Higher Education Academy
 
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchDecomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchHan Woo PARK
 
Internet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam UniversityInternet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam Universitymwe400
 
Sci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loetSci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loetHan Woo PARK
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open ScienceMark Parsons
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-ResearchEric Meyer
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Biasgloriakt
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Meyer Big Data SDP13
Meyer Big Data SDP13Meyer Big Data SDP13
Meyer Big Data SDP13Eric Meyer
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...LIBER Europe
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)Han Woo PARK
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Jisc
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Tim Highfield
 
Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)MiguelRosario24
 

Ähnlich wie Ralph schroeder and eric meyer (20)

Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
Ethical and Legal Issues in Computational Social Science - Lecture 7 in Intro...
 
International Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data ScienceInternational Collaboration Networks in the Emerging (Big) Data Science
International Collaboration Networks in the Emerging (Big) Data Science
 
Big Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
Big Data Research Trend and Forecast (2005-2015): An Informetrics PerspectiveBig Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
Big Data Research Trend and Forecast (2005-2015): An Informetrics Perspective
 
Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...Making our mark: the important role of social scientists in the ‘era of big d...
Making our mark: the important role of social scientists in the ‘era of big d...
 
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” ResearchDecomposing Social and Semantic Networks in Emerging “Big Data” Research
Decomposing Social and Semantic Networks in Emerging “Big Data” Research
 
Internet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam UniversityInternet Archives and Social Science Research - Yeungnam University
Internet Archives and Social Science Research - Yeungnam University
 
Sci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loetSci 2011 big_data(30_may13)2nd revised _ loet
Sci 2011 big_data(30_may13)2nd revised _ loet
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
Data Policy for Open Science
Data Policy for Open ScienceData Policy for Open Science
Data Policy for Open Science
 
The End(s) of e-Research
The End(s) of e-ResearchThe End(s) of e-Research
The End(s) of e-Research
 
Studying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & BiasStudying Cybercrime: Raising Awareness of Objectivity & Bias
Studying Cybercrime: Raising Awareness of Objectivity & Bias
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Meyer Big Data SDP13
Meyer Big Data SDP13Meyer Big Data SDP13
Meyer Big Data SDP13
 
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
A Revolution in Open Science: Open Data and the Role of Libraries (Professor ...
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Big data divided (24 march2014)
Big data divided (24 march2014)Big data divided (24 march2014)
Big data divided (24 march2014)
 
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
Big Data for the Social Sciences - David De Roure - Jisc Digital Festival 2014
 
Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...Mapping Movements: Social movement research and big data: critiques and alter...
Mapping Movements: Social movement research and big data: critiques and alter...
 
Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14
Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14
Bowdoin: Data Driven Socities 2014 - Defining Data & Redefining Privacy 2/10/14
 
Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)Big data for qualitative research by kathy a. mills (z lib.org)
Big data for qualitative research by kathy a. mills (z lib.org)
 

Ralph schroeder and eric meyer

  • 1. Big Data and the Future of Knowledge Eric T. Meyer; Ralph Schroeder Collaborators: Linnet Taylor, Josh Cowls, Greg Taylor, Monica Bulger SDP, July, 2014
  • 2. Overview • Projects • Questions • Issues • Definition • How knowledge advances • Examples • Big Data Issues in Research and Beyond • Policy Implications • The Future of Social Science Knowledge
  • 3. Accessing and Using Big Data to Advance Social Science Knowledge • Funded by Sloan Foundation • Data sources • 100+ interviews, mainly with social scientists • Reports, workshops • Publications, conferences • No representative sample, but some patterns of disciplinary and skills background and career trajectory
  • 5. Data-driven economic models: challenges and opportunities of big data • Funded by Research Councils UK (RCUK), New Economic Models in the Digital Economy (NEMODE) network • Data Sources: – 25+ interviews – Case studies – Issues include how models relate to national contexts (ie. privacy laws in Germany), where skills are located (plus gaps), use of public/private data, standardization
  • 7. Source: Leonard John Matthews, CC-BY-SA (http://www.flickr.com/photos/mythoto/3033590171)
  • 8. Image source: Leonard John Matthews, CC-BY-SA (http://www.flickr.com/photos/mythoto/3033590171)
  • 11. Twitter-bots OII master’s students Alexander Furnas and Devin Gaffney saw a large spike in then-US presidential candidate Mitt Romney’sTwitter followers, and decided to look at the new followers: Furnas, A. and Gaffney, D. (2012). ‘Statistical Probability That Mitt Romney's New Twitter Followers Are Just Normal Users: 0%’. The Atlantic, July 31, http://www.theatlantic.com/technology/archive/2012/07/statistical-probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31, 2012).
  • 13. Big data in the commercial world • Commercial uses are: ‘in house’, ‘outsourced own data’, ‘data analysis as a consultancy service’ • Careers in data analysis entail as a baseline computer science/statistical expertise, plus different domains of ‘sorting people’ and being able to ‘manipulate’ them (ie. predict their behaviour)
  • 14. Definition • ‘Big data’ – the advance of knowledge via a leap in the scale and scope in relation to a given object or phenomenon ‘Data’ – Belongs to the object – ‘taking…before interpreting’ (Ian Hacking) • the view that ‘all data are of their nature interpreted’ is misleading: ‘data are made, but as a good first approximation, the making and taking come before interpreting’ – The most atomizable useful unit of analysis
  • 15. Computational Manipulability? • ‘the distinctiveness of the network of mathematical practitioners is that they focus their attention on the pure, contentless form of human communicative operations: on the gestures of marking items as equivalent and of ordering them in series, and on the higher-order operations which reflexively investigate the combinations of such operations’ • ‘mathematical rapid-discovery science…the lineage of techniques for manipulating formal symbols representing classes of communicative operations’ • Why is big data a big deal? Manipulability, plus new data sources
  • 17. Digital Objects and their Referents Digital Object (Examples: Twitter, Tesco Loyalty card information Real World (People / Physical Objects) Represent / Manipulate
  • 20. Uses and Limits • Big data research uses (academic, commercial, government) are limited to the exploitation of suitable objects, and the objects which ‘give off’ digital data, and the phenomena they lay bare, are limited • The knowledge produced is aimed at ‘sorting people’ and advancing ‘representing and intervening’ (but without ‘manipulating’, except where this is warranted by practical economic and political objectives) • Difference commercial versus academic world is that knowledge provides competitive and practical advantage as against advancing (high-consensus rapid-discovery) knowledge – The limits in both cases are the objects (to which the data ‘belong’), and that need to have available digitally manipulable data points • How available these objects are differs, but also… – Causation and theoretical embedding matters for academic social science – For commercial (and non-academic uses), ‘predicting’ consumer choices and other behaviours, for limited purposes and without increasing scientific knowledge, is good enough • There are many objects, for non-academics and scientists to humanities scholars (physical, human, cultural), but they are not infinite • This availability, not skills or other issues, determines the future of big data research
  • 21. 113 240 278 367 558 1,195 1,538 2,350 3,960 6,787 7,276 9,010 - 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 1st Q 2nd Q 3rd Q 4th Q 1st Q 2nd Q 3rd Q 4th Q 1st Q 2nd Q 3rd Q 4th Q 2010 (n=998) 2011 (n=5,641) 2012 (n=27,033) Number of News Articles on Big Data Source: Nexis data compiled by Meyer & Schroeder
  • 22. Platform Paper Size of Data in relation to phenomenon investigated Theoretical question/practical aim Key findings Facebook Backstrom et al. (2012) 69 billion friendship links between 721 million Facebook users Re-examine Milgram’s ‘six degrees of separation’ online Four degrees of separation on Facebook Ugander et al. (2012) 54 million invitation emails to Facebook users How does structure of contacts affect invitation acceptance? Not number of contacts, but number of distinct contexts, matters for acceptance Bond et al. (2012) 600000 Facebook users Facebook experiment about how to mobilize voters Voters can be mobilized via Facebook friends more than via informational messages Twitter Kwak et al. (2010) 1.47 billion directed Twitter relations Is Twitter a broadcast medium or a social network? Most use is for information, not as a social network Cha et al. (2010) 1.7 billion tweets among 54 million users Who influences whom? Top influentials dominate, but some variation by topic Bakshy et al. (2011) 1.6 million Twitter users Who influences whom? ‘Ordinary user’ influencers can sometimes be more effective than top influencers Wikipedia Loubser (2009) All Wikipedia activity How is editing organized? Administrators can impact negatively on participation Yasseri, Kertesz (2012) Editorial activity on Wikipedia, especially reverts Understanding conflict and collaboration Types of conflicts can be modelled West, Weber and Castillo (2012) Wikipedia contributions related to Yahoo! browsing What characterizes Wikipedia contributors’ information behaviour compared to Wikipedia readers and non-readers Wikipedia contributors are more ‘information hungry’, especially about their topics
  • 23. Example 1: Search engine behaviour Waller’s analysis ofAustralian Google Users Key findings: - Mainly leisure - > 2% contemporary issues - No perceptible ‘class’ differences Novel advance: - Unprecedented insight into what people search for Challenge: - Replicability - Securing access to commercial data
  • 24. ? ? ? ? ? ? ? ? ? “Surprisingly, the distribution of types of search query did not vary significantly across the different Lifestyle Groups (p>0.01).” Source: Waller, V. (2011). “Not Just Information:Who Searches for What on the Search Engine Google?” Journal of the American Society for Information Science & Technology 62(4): 761-775.
  • 25. Example 2: Large-scale text analysis Michel et al. ‘culturomic’ analysis of 5 Million Digitized Google Books and Heuser & Le-Khac of 2779 19th Century British Novels Key findings: - Patterns of key terms - Industrialization tied to shift from abstract to concrete words Novel advance: - Replicability, extension to other areas, systematic analysis of cultural materials Challenge: - Data quality
  • 26. J Michel et al. Science 2011;331:176-182
  • 27. Example 3: Social network or news? Kwak et al.’s analysis ofTwitter Key findings: - 1.47 billion social relations - 2/3 of users are not followers or not followed by any of their followings - Celebrities, politicians and news are among top 20 being followed Novel advance: -Volume of relations and topics Challenge: - News or social network needs to be contextualized in media ecology - Securing access to commercial data
  • 28. Example 4: Wikipedia page views predict ticket sales Mestyán, Márton, Taha Yasseri, and János Kertész. "Early prediction of movie box office success based on Wikipedia activity big data." PloS one 8.8 (2013): e71226.
  • 29. Example 5: Wikipedia edits Graham’s focus is on geography, using geo-coded Wikipedia edits Key findings: - asymmetries are about power relations (what is represented or not represented, or what is visible and what stays invisible) in relation to the world’s places and spaces - persistent inequalities between global North and South Novel advance: - Tying geo-data to content updates - As Yasseri et al (2012) point out: Wikipedia ‘is one of the few human societies that the history of all actions of its members are recorded and accessible’. Challenge: - Very incomplete geo-coding data - Moving from visualization to theory
  • 30. Mark Graham & Bernie Hogan’s project investigated inequalities in the creation of knowledge The map shown here reveals uneven spread of geo-tagged Wikipedia articles 2011-12
  • 31. Example 6: British Library as Web Archive Curator • Data – Internet Archives data of .uk back to 1996 – Annual crawls of .uk websites since 2013 – 2.7 billion nodes, 40 TB compressed • Features – Full text search (in progress, IHR) – Network analysis (OII) – N-gram analysis • Limitations – Page content data access limited
  • 32. Growth of sub-domains N.B. y-axis on log scale
  • 33. Relative sector size on the web
  • 34. Sectoral linking on the UK Web 2010
  • 35. (Big) data definition enables pinpointing impacts and threats • ‘Google Plus may not be much of a competitor to Facebook as a social network, but…some analysts…say that Google understands more about people’s social activity than Facebook does.’ – New York Times, 15.2. 2014, p. A1 ‘The Plus in Google Plus? It’s Mostly for Google’. • Facebook Likes: ‘Predicting users’ individual attributes and preferences can beused to improve numerous products and services. For instance, digital systems and devices (such as online stores or cars) could be designed to adjust their behavior to best fit each user’s inferred profile…online insurance…advertisements might emphasize security when facing emotionally unstable (neurotic) users but stress potential threats when dealing with emotionally stable ones’ – ‘Private traits and attributes are predictable from digital records of human behavior.’ Kosinski M, Stillwell D, Graepel T.,Proc Natl Acad Sci 2013 Apr 9;110(15):5802-5. • More powerful knowledge will enable better services, and more manipulation
  • 36. ‘Big data‘ for understanding society • Real-time transactional data (unlike survey data, traditional staple of social science) • Outside capability of normal desktop computing environment (‘Too big to handle’) • Big potential for understanding institutions and individual behaviour
  • 37. Social Science and Big Data Research • Dominated by social media • Issues of ‘whole universe’ – What population, offline and online, does it represent – Data quality and replicability – How does ‘modality’ determine findings about implications • How to embed the research – In existing theory (but also advance theory) – In existing ecology of media uses in society (including ones that extend existing ones)
  • 38. Scientificity and Big Data: Pro and Con • Pro – Replicability, extension to new domain – ‘Total’ datasets, ‘whole universe’ – (Often) no sampling needed, data for all behaviour and over whole existence – Ready made manipulability – Powerful relation of data to object • Con – Limited access to object, skills needed for manipulability – (Often) not known who users are – No or little knowledge of how (commercial) data were gathered – Researcher does not ask what is of interest without ‘givenness’ – Datasets capture limited dimensions, and about one object – Object in isolation, not framed for social change significance
  • 39. Other positions on Big Data Implications 1 • Mayer-Schoenberger and Cukier, boyd and Crawford argue that not all information can or should be captured – No, need to create the legal and ethical social space which protects the individual. The solution does not rely on denying the powerfulness of knowledge, but harnessing it appropriately. • Mayer-Schoenberger and Cukier solution of 1.more transparent algorithm, 2. Certifiying validity of algorithm 3. Allowing disprovability of prediction (p.176) – – Yes, but within social science, solution is to make knowledge more scientific. • Underlying all these problems is more powerful knowledge – This goes against free, untrammelled behaviour – Solution: Society becomes more self-aware and shapes knowledge to constrain it • Crawford, Marwick: big data is product of neoliberal capitalism? No, uses by different societies, and for purposes apart from ‘neoliberal capitalist’ ones, such as open government data and Wikipedia analysis
  • 40. Other Positions on Big Data Implications 2 • Savage and Burrows: ask are commercial data outpacing social science? • Boyd and Crawford: does big data raise epistemological conundrums, and isn’t it always already (social) contextual ? • Mayer-Schoenberger and Cukier: what are the political and commercial harms of wrong knowledge, especially when it changes ‘everything’? ... No ... • Knowledge depends on the relation between research technologies and the advance of knowledge • The threats and opportunities are not contextual, but depend on how more powerful knowledge is used • Big data contributes to more ‘scientific’ (i.e. cumulative) social sciences, but within limits, and there are limits to commercial and political uses too
  • 41. Consumer (and gov’t) Big Data • Consumer data and privacy (ie. Target pregnancy case) – Solution: data protection • Consumer data and prediction and control (ie. click behaviour): affects consumer without transparency, predictive privacy harm – Solution: transparency, ‘due process’ (Crawford and Schultz) • Consumer data – and government data - and exclusion from benefits thereof (ie. no or little use of digital devices) - if not captured by data, left out – Solution: Data antisubordination (Lerman) – Solution: government may need more data about us (and counteract the data invisibility of parts of the population) • Consumer data from digital media (ie. search engines) – manipulate what is found without transparenyc, inappropriate personalization (Pariser) – Solution: transparency, consumer protection
  • 42. Big Data and Policy • Probabilistic rather than ‘causal’ commercial and government uses of data (ie. profiling) - only probable, not definite causal behaviour of data emitters established (Mayer-Schoenberger and Cukier) – Solution: more accurate knowledge • Exposure of Data emitter because of identifiers in large- scale and linked data (Netflix, AOL, Google Streetview, National Security Administration), such that anonymization does not work – Solution: data protection, better anonymization, opting out, consent • Social media used in authoritarian regimes for control (Weibo in China) – Solution: more commercial independence, more civil society pushback, researcher non-cooperation
  • 43.
  • 44.
  • 45. Outlook and Implications • There is an overlap between real world research and the world of academic research which is closer than elsewhere – because this is the research front in both – because they share common objects • For research – Develop theoretical frame in which to embed big data (for social media), including power/function, relation to traditional media, and role in society • For society – Awareness of how research can generate transparency and manipulability • Big Brother? – Yes, but also Brave New World of Omniscience, with Social Science as Handmaiden
  • 46. Future of Big Data Research • Difference commercial versus academic world is that knowledge provides competitive advantage as against advancing (high-consensus rapid-discovery) knowledge • The limits in both cases are the objects (to which the data ‘belong’), and that need to have available digitally manipulable data points • How available these objects are differs • There are many objects, for non-academics and scientists to humanities scholars (physical, human, cultural), but they are not infinite • This availability, not skills or other issues, determines the future of big data research • A Golden Age of Quantification and New Sources of Data…A Dark Age (so far) of understanding new online phenomena and their social significance
  • 47. The Future of Social Science Knowledge • Social science will be opened to other disciplines, and be forced to consider its turf • A major area of social science (‘big data’, ‘computational social science’) will have a rapidly advancing research front and come to prominence • Three limits – Only part of our lives are spent online and/or digitally captured – The social sciences are pluralist – All (‘quant’) research advances must be made to contribute to theoretical understandings of social change (in ‘words’)
  • 48. Additional readings and references Bond, Robert et al. (2012). ‘A 61-million-person experiment in social influence and political mobilization’, Nature 489: 295–298. Bruns, A. and Liang,Y.E. (2012). ‘Tools and methods for capturingTwitter data during natural disasters’, First Monday, 17 (4 – 2), http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/3937/3193 Furnas, A. and Gaffney, D. (2012). ‘Statistical ProbabilityThat Mitt Romney's NewTwitter Followers Are Just Normal Users: 0%’. The Atlantic, July 31, http://www.theatlantic.com/technology/archive/2012/07/statistical- probability-that-mitt-romneys-new-twitter-followers-are-just-normal-users-0/260539/ (accessed August 31, 2012). Giles, J. (2012). ‘Making the Links: From E-mails to Social Networks, the DigitalTraces left Life in the ModernWorld areTransforming Social Science’, Nature, 488: 448-50. Kwak, H. et al. (2010). ‘What isTwitter, a Social Network or a News Media?’ Proceedings of the 19th InternationalWorldWide Web (WWW) Conference, April 26-30, 2010, Raleigh NC. Manyika, J. et al. (2011). ‘Big data: the next frontier for innovation, competition and productivity’, McKinsey Global Institute, available at: http://www.mckinsey.com/insights/mgi/research/technology_and_innovation/ big_data_the_next_frontier_for_innovation (last accessed August 29, 2012). Silver, Nate. (2012). The Signal and the Noise:The Art and Science of Prediction. London:Allen Lane. Tancer, B. (2009). Click:What Millions of People are Doing Online andWhy It Matters. NewYork: Harper Collins, 2009. Wu, S. , J.M. Hofman,W.A. Mason, and D.J. Watts, (2011). ‘Who says what to whom on twitter’, Proceedings of the 20th international conference onWorldWideWeb. (on DuncanWatts webpage, http://research.microsoft.com/en-us/people/duncan/, last accessed August 29, 2012).
  • 49. Project Papers Schroeder, Ralph (Forthcoming). ‘Big Data: Towards a More Scientific Social Science and Humanities’ in Mark Graham and William H Dutton (eds.), Society and the Internet: How Networks of Information are Changing our Lives. Forthcoming. Schroeder, Ralph, & Taylor, Linnet (Forthcoming). ‘Is bigger better? The emergence of big data as a tool for international development policy.’ GeoJournal. Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, August). ‘Big Data in the Study of Twitter, Facebook and Wikipedia: On the Uses and Disadvantages of Scientificity for Social Research.’ Paper presented at the proceedings of the Annual Meeting of the American Sociological Association. (being submitted) Schroeder, Ralph, & Taylor, Linnet. ‘Big Data and Wikipedia Research: Social Science Knowledge across Disciplinary Divides’. Submitted to Information, Communication and Society. Taylor, Linnet. ‘No place to hide? The ethics and analytics of tracking mobility using African mobile phone data. Submitted to Population, Space and Place. Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet. ‘Big Data in the Social Sciences: Towards a New Research Paradigm?’ (being submitted). Meyer, Eric T., Schroeder, Ralph, & Taylor, Linnet (2013, November). ‘The Boundaries of Big Data.’ Paper presented at SIG-SI Symposium, ASIST 2013, November 1-6, 2013, Montreal, Quebec, Canada. Schroeder, Ralph and Cowls, Josh. ‘Answering Questions and Questioning Answers in the Era of Big Data.’ In preparation. Taylor, Linnet, Meyer, Eric T., & Schroeder, Ralph. ‘Bigger and better, or more of the same? Emerging practices and perspectives on big data analysis in economics”. Forthcoming in Big Data & Society. Cowls, Josh. ‘The Crowd in the Cloud?’, forthcoming presentation and IPP 2014’ Cowls, Josh ‘Big Data and Policy Implementation’, in preparation. Schroeder, Ralph ‘Big Data and Policy Implications’, in preparation.
  • 50. Oxford Internet Institute With support from: Ralph Schroeder ralph.schroeder@oii.ox.ac.uk http://www.oii.ox.ac.uk/people/?id=26 See http://www.oii.ox.ac.uk/research/projects/?id=98