Overview on data collection methods and a deep dive on data (primary Vs secondary, qualitative and quantitative). Bias. Data processing and structured, unstructured, semistructured data. Example of personal data tracking.
4. LESSON 2
THE PROJECT
Dear Data is a project by Giorgia Lupi and Stefanie Posavec, developed over 12
months - between 2014 and 2015 - and from both sides of the Atlantic - London and
New York: each week, the two designers collected and gave shape to a particular
kind of data (actions and thoughts, from the number of clocks they had seen to the
number of greetings they had made), make a drawing on a postcard and sending it,
dropping it into a postbox and mailbox respectively. The front showed a drawing of
the data and the back displayed a key to decode it. A rite of observation and
translation, but also a personal documentary.
▸ https://it.moleskine.com/mind-maps-and-infographics/p0198
4
11. LESSON 2
WHAT ARE DATA
Data are individual units of information.
A datum describes a single quality or quantity of some object or phenomenon.
Data are measured, collected and reported, and analyzed, whereupon they can
be visualized using graphs, images or other analysis tools.
11
12. LESSON 2
PRIMARY VS SECONDARY DATA
▸ Primary data is data that is observed or collected from first-hand sources
▸ Secondary data is data gathered from studies, surveys, or experiments that
have been run by other people
12
13. LESSON 2
PRIMARY DATA PRO & CON
▸ Tailored according to research
needs
▸ The researcher can determine
exactly what data will be
collected
▸ Defined and consistent protocol
▸ Completeness of data is ensured
13
▸ Time consuming
▸ Rely on subjects recall and
communication abilities
▸ Bias may occur due to various
factors
▸ Need to check reliability of
raters
CC-BY-NC XKCD http://imgs.xkcd.com/comics/1_to_10.png
14. LESSON 2
SECONDARY DATA PRO & CON
▸ It is easier and quicker
▸ Absence of researcher’s biases
▸ Economical and time saving
▸ Participant’s co-operation may
not be necessary & it
eliminates the biases related
to participant awareness
14
▸ Accuracy, completeness and
reliability depend upon original
individual collecting the data
▸ May not be suitable for answering
current research question
▸ Missed data and inaccuracy are
common
▸ Biases are commonly expected
15. LESSON 2
QUALITATIVE VS QUANTITATIVE
▸ Quantitative data comes in the form of numbers, quantities and values.
Pro: it’s concrete and easily measurable.
▸ Qualitative data is descriptive, based on attributes.
It helps to explain the “why” behind the information quantitative data
reveals.
15
16. LESSON 2
PRIMARY DATA COLLECTION
▸ Observation
▸ Surveys & Questionnaire
▸ Interviews
▸ Focus Group
16
18. LESSON 2
PRIMARY DATA COLLECTION
▸ In-Person Interviews
Pros: In-depth and a high degree of confidence on the data
Cons: Time consuming, expensive and can be dismissed as anedoctal
▸ Mail Surveys
Pros: Can reach anyone and everyone – no barrier
Cons: Expensive, data collection errors, lag time
▸ Phone Surveys
Pros: High degree of confidence on the data collected, reach almost anyone
Cons: Expensive, cannot self-administer, need to hire an agency
▸ Web/Online Surveys
Pros: Cheap, can self-administer, very low probability of data errors
Cons: Not all your customers might have an email address/be on the internet, customers may be wary of
divulging information online.
18
19. LESSON 2
BIAS
Bias in data collection is a distortion which results in the information not being truly representative
of the situation you are trying to investigate. Bias occurs for example when systematic error is
introduced into sampling or testing by selecting or encouraging one outcome or answer over others.
It can results from:
▸ survey questions that are constructed with a particular slant
▸ choosing a known group with a particular background to respond to surveys
▸ reporting data in misleading categorical groupings
▸ non-random selections when sampling
▸ systematic measurement errors
19
24. LESSON 2
DATA CLEANING - WHERE ARE YOU FROM
▸ 91 answers
▸ 70 different values!
24
▸ Nata in Calabria residente a
Milano
▸ Manzano, UD, Friuli Venezia Giulia
▸ In cucina
▸ Bollate vs Bollate (MI)
▸ sardegna - Puglia - Basilicata
25. LESSON 2
DATA CLEANING - COUNTRY
▸ 4 countries
▸ 1 unknown
25
Paese #
In cucina 1
Cina 1
Colombia 1
Italia 86
Montenegro 1
Spagna 1
Totale 91
26. LESSON 2
DATA CLEANING - REGION
26
Regione #
- 4
Abruzzo 1
Basilicata 2
Calabria 5
Campagna 2
Emilia Romagna 4
Friuli Venezia Giulia 5
Liguria 2
Lombardia 44
Marche 1
Piemonte 2
Puglia 4
Sardegna 2
Sicilia 4
Spagna 1
Toscana 2
Veneto 6
Grand Total 91
27. LESSON 2
DATA CLEANING - PROVINCIA
27
Prov #
- 9
AG 1
AP 1
BG 2
BI 1
BS 2
CO 5
CT 1
CZ 3
EN 1
FE 1
GE 1
LE 1
LO 1
MB 1
ME 1
MI 25
MO 1
NA 1
NO 1
PD 1
PE 1
PO 1
PR 1
PZ 1
RC 2
RE 1
SA 1
SI 1
SO 2
SS 1
SV 1
TA 1
TS 2
TV 1
UD 3
VA 6
VE 1
VI 2
VR 1
Grand Total 91
Regione Prov #
- - 5
Abruzzo PE 1
Basilicata - 1
PZ 1
Calabria CZ 3
RC 2
Campagna NA 1
SA 1
Emilia Romagna FE 1
MO 1
PR 1
RE 1
Friuli Venezia Giulia TS 2
UD 3
Liguria GE 1
SV 1
Lombardia BG 2
BS 2
CO 5
LO 1
MB 1
MI 25
SO 2
VA 6
Marche AP 1
Piemonte BI 1
NO 1
Puglia - 2
LE 1
TA 1
Sardegna - 1
SS 1
Sicilia AG 1
CT 1
EN 1
ME 1
Toscana PO 1
SI 1
Veneto PD 1
TV 1
VE 1
VI 2
VR 1
Grand Total 91
28. LESSON 2
DATA CLEANING - CITY
28
Di dove sei? #
- 8
Alghero 1
Amalfi 1
Ascoli Piceno 1
Bassano del Grappa 1
Bergamo 2
Biella 1
Bollate 2
Brescia 2
Busto Arsizio 2
Canicattì 1
Castellanza 1
Catania 1
Catanzaro 1
Cinisello Balsamo 1
Como 4
Enna 1
Ferrara 1
Genova 1
Lainate 1
Lamezia Terme 1
Legnano 1
Lignano Sabbiadoro 1
Lissone 1
Lurate Caccivio 1
Madrid 1
Manzano 1
Milano 18
Milazzo 1
Modena 1
Monza 1
Napoli 1
Novara 1
Padova 1
Parabiago 1
Parma 1
Pescara 1
Pietra Ligure 1
Potenza 1
Prato 1
Racale 1
Reggio Calabria 2
Reggio Emilia 1
Saronno 1
Sesto Calende 1
Siena 1
Somma Lombardo 1
Sondrio 2
Sordio 1
Soverato 1
Taranto 1
Treviso 1
Trieste 1
Trieste 1
Udine 1
Verona 2
Vicenza 1
Grand Total 91
29. LESSON 2
SECONDARY DATA SOURCES
▸ Our data:
▸ Personal information, likes, activities and interests (Facebook, instagram,
Youtube, …)
▸ Personal data (from mobile phone)
29
30. LESSON 2
APPLE DATA HEALTH
▸ Heart rate, sleeping habits, workouts,
steps and walking routines
▸ Introduced in September 2014 with iOS
8, the Apple Health app is pre-installed
on all iPhones.
▸ Low-energy sensors, constantly
collecting information about the user’s
physical activities. With optional extra
hardware (e.g. Apple Watch), Apple
Health can collect significantly more
information.
30
32. LESSON 2
FLIGHTRADAR24
▸ Flightradar24 is a global flight tracking
service that provides you with real-time
information about thousands of aircraft
around the world.
▸ Flightradar24 tracks 180,000+ flights,
from 1,200+ airlines, flying to or from
4,000+ airports around the world in real
time.
▸
▸ https://www.flightradar24.com
32
33. LESSON 2
HISTORICAL CLIMATE DATA
▸ Many of the historical sources available to
climate historians mention weather in
some way, but these references are
buried in a huge volume of information.
▸ In recent years initiatives have
transcribed, quantified, and digitalized:
a) historical observations,
b) historical activities that must have been
strongly influenced by weather.
▸ https://www.historicalclimatology.com/databases.html
33
34. LESSON 2
ATLAS OF URBAN EXPANSION
▸ As of 2010, the world contained 4,231 cities
with 100,000 or more people.
▸ The Atlas of Urban Expansion collects and
analyzes data on the quantity and quality of
urban expansion in a stratified global
sample of 200 cities.
▸ The Atlas presents the output of the first two
phases of the Monitoring Global Urban
Expansion Program, an initiative that gathers
data and evidence on cities worldwide.
▸ http://atlasofurbanexpansion.org/cities/view/Milan
34
35. LESSON 2
MIT’S URBAN SENSING
▸ MIT quantified the sensing power of a
taxi fleet to cover a city’s street
segments during a day
▸ The model helps city planners and
policy makers to quantify the number of
mobile sensors necessary to cover
different urban areas, as well as the
temporal coverage requirements.
▸ http://senseable.mit.edu/urban-sensing/
35
36. LESSON 2
THE MOST POPULOUS CITY THROUGH TIME
36
https://www.youtube.com/watch?v=pMs5xapBewM
37. data & content design
DATA COLLECTION MAY BE AFFECTED BY
THEIR USE!
We
LESSON 2
37
39. LESSON 2
STRUCTURED DATA
Structured data is usually contained in rows and columns and its elements can be mapped into fixed pre-
defined model. Examples of sources:
▸ SQL Databases
▸ Spreadsheets such as Excel
▸ OLTP Systems
▸ Online forms
▸ Sensors such as GPS or RFID tags
▸ Network and Web server logs
▸ Medical devices
39
40. LESSON 2
UNSTRUCTURED DATA
Unstructured data is data that cannot be contained in a row-column format and doesn’t have a data
model. Examples of sources:
▸ Web pages
▸ Images (JPEG, GIF, PNG, etc.)
▸ Videos
▸ Memos
▸ Reports
▸ Word documents and presentations
▸ Surveys
40
41. LESSON 2
SEMI-STRUCTURED DATA
Basically it’s a mix between both of the previous ones. Semi-structured data has some defining or
consistent characteristics but doesn’t conform to a rigid structure. Examples of sources:
▸ E-mails
▸ XML and other markup languages
▸ Binary executables
▸ TCP/IP packets
▸ Zipped files
▸ JSON
▸ Web pages
41
47. LESSON 2
THE REPORTS
FAR document the measurements of a number of the author’s personal activities
over the course of a year.
Set out in maps and infographics, the reports reveal data gathered from
everyday actions: distance traveled on foot, the amount of time spent eating,
traveling on public transports, the method of greeting different individuals, time
spent with mom or other specific individuals, time devoted to reading or sleeping.
They included qualitative and quantitative data, measurements and behavioral
patterns expertly combined in a functional and attractive way.
47
51. LESSON 2
INFORMATION GRAPHICS
There is a magic in information graphics. Maps float you above the land for a bird’s
eye view. Timelines arrange memories on the page for all to see. Diagrams reveal
the parts inside without requiring disassembly, or incision.
Henry D. Hubbard
This exhibition examine information graphics that show space, time, nature, and
society
51
https://exhibits.stanford.edu/dataviz