SlideShare ist ein Scribd-Unternehmen logo
1 von 24
Our schedule
• Day 1:
– Find (any) initial common ground
– Breakout groups to explore a shared question
• How to share insights, models, methods, data about software?
• Day 2,3:
– Review, reassess, reevaluate, re-task
• Day 4:
– Lets write a manifesto
• Day 5:
– Some report writing tasks.
1
2
Day 1: What can we learn
from each other?
3
What can we learn
from each other?
How to share methods?
Write!
• To really understand
something..
• … try and explain it to
someone else
Read!
– MSR
– PROMISE
– ICSE
– FSE
– ASE
– EMSE
– TSE
– …
4
But how else can we
better share
methods?
How to share methods?
• Related questions:
– How to train newcomers?
– How to certify (say) a masters program in data
science?
– If you are hiring, what core competencies should
you expect in applications?
5
But how else can we
better share
methods?
6
What can we learn
from each other?
How to represent models?
Less is more
(contrast set learning)
• Difference between N things
– Is smaller than that the things
• Useful for learning ..
– What to do
– What not to do
– Link modeling to optimization
Bayes nets
• New = old + now
• Graphical form, visualizable
• Updatable
7
Tim Menzies
and Ying Hu.
2003. Data
Mining for Very
Busy People.
Computer 36,
11 (November
2003), 22-29.
Tosun Misirli, A.; Basar Bener,
A., "Bayesian Networks For
Evidence-Based Decision-
Making in IEEE TSE, pre-print
How to share models?
Incremental adaption
• Update N variants of the
current model as new data
arrives
• For estimation, use the
M<N models scoring best
Ensemble learning
• Build N different opinions
• Vote across the committee
• Ensemble out-performs
solos
8
L. L. Minku and X. Yao. Ensembles and locality: Insight on
improving software effort estimation. Information and
Software Technology (IST), 55(8):1512–1528, 2013.
Kocaguneli, E.; Menzies, T.; Keung, J.W., "On the Value
of Ensemble Effort Estimation," IEEE TSE, 38(6)
pp.1403,1416, Nov.-Dec. 2012
Re-learn when each
new record arrives
New: listen to N-variants
But how else can we
better share models?
9
What can we learn
from each other?
d
How to share data?
Relevancy filtering
• TEAK:
– prune regions of noisy
instances;
– cluster the rest
• For new examples,
– only use data in nearest
cluster
• Finds useful data from
projects either
– decades-old
– or geographically remote
Transfer learning
• Map terms in old and new
language to a new set of
dimensions
10
Kocaguneli, Menzies, Mendes, Transfer learning in effort
estimation, Empirical Software Engineering, March 2014
Nam, Pan and Kim, "Transfer Defect Learning"
ICSE’13 San Francisco, May 18-26, 2013
Handling Suspect Data
• Dealing with "holes"
in the data
• Effectiveness of quick
& dirty techniques to
narrow a big search
space
11
"Software Bertillonage: Determining the Provenance of Software Development Artifacts", by Julius Davies, Daniel M.
German, Michael W. Godfrey, and Abram Hindle, Empirical Software Engineering, 18(6), December 2013.
And sometimes, data breeds data
• Sum greater than
parts
• E.g. Mining and
correlating different
types of artifacts
– e.g., bugs and
design/architecture
(anti)patterns
– E.g. Learning common
error patters
• Visualizations
12
J Garcia, I Ivkovic, N Medvidovic. A comparative
analysis of software architecture recovery
techniques. 28th IEEE/ACM International
Conference on Automated Software Engineering
(ASE), 2013.
Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine:
finding common error patterns by mining software revision
histories. SIGSOFT Softw. Eng. Notes 30, 5 (September 2005),
296-305.
Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li,
Mining Invariants from Console Logs for System Problem
Detection, in Proceedings of the 2010 USENIX Annual
Technical Conference, USENIX, June 2010.
How to share data?
Privacy preserving data mining
• Compress data by X%,
– now, 100-X is private ^*
• More space between data
– Elbow room to
mutate/obfuscate data*
SE data compression
• Most SE data can be greatly
compressed
– without losing its signal
– median: 90% to 98% %&
• Share less, preserve privacy
• Store less, visualize faster
13
^ Boyang Li, Mark Grechanik, and Denys Poshyvanyk.
Sanitizing And Minimizing DBS For Software
Application Test Outsourcing. ICST14
* Peters, Menzies, Gong, Zhang, "Balancing Privacy
and Utility in Cross-Company Defect Prediction,” IEEE
TSE, 39(8) Aug., 2013
% Vasil Papakroni, Data Carving: Identifying and Removing Irrelevancies
in the Data by Masters thesis, WVU, 2013 http://goo.gl/i6caq7
& Kocaguneli, Menzies, Keung, Cok, Madachy: Active Learning and
Effort Estimation IEEE TSE. 39(8): 1040-1053 (2013)
But how else can we
better share data?
14
What can we learn
from each other?
How to
share insight?
15
• Open issue
• We don’t even know
how to measure
“insight”
• But how to share it?
– Elevators?
– Number of times the users
invite you back?
– Number of issues visited and
retired in a meeting?
– Number of hypotheses
rejected?
– Repertory grids?
Nathalie GIRARD . Categorizing stakeholders’ practices with repertory grids for sustainable
development, Management, 16(1), 31-48, 2013
Q: How to share insight
A: Do it again and again and again…
• “A conclusion is simply the place
where you got tired of thinking.” : Dan Chaon
• Experience is adaptive and accumulative.
– And data science is “just” how we report our
experiences.
• For an individual to find better conclusions:
– Just keep looking
• For a community to find better conclusions
– Discuss more, share more
• Theobald Smith
(American
pathologist and
microbiologist).
– “Research has
deserted the individual and entered
the group.
– “The individual worker find the
problem too large, not too difficult.
– “(They) must learn to work with
others. “
16
Insight is a
cyclic process
Learning to ask
the right questions
• actionable mining,
• tools for analytics,
• domain specific analytics
(mobile data, personal data,
etc),
• programming by examples
for analytics.
17
Kim, M.; Zimmermann, T.; Nagappan, N., "An Empirical
Study of Refactoring Challenges and Benefits at
Microsoft," IEEE TSE, pre-print 2014
Linares-Vásquez, M., Bavota, G., Bernal-Cárdenas,
C., Di Penta, M., Oliveto, R., and Poshyvanyk, D.,
"API Change and Fault Proneness: A Threat to
Success of Android Apps",
Q: How to share insights
A: Step1- find them
• One tool is card sorting.
• Labor intensive, but insightful
• E.g. we routinely use cross-val to verify
data mining results , which is a
statement on how well the part
predicts for new future data.
• Yet two-thirds of the information needs
for Software Developers are for insights
into the past and present.
18
Raymond P.L. Buse, Thomas Zimmermann. Information
Needs for Software Development Analytics. ICSE 2012 SEIP.
Andrew Begel and Thomas Zimmermann, Analyze This! 145
Questions for Data Scientists in Software Engineering, ICSE’14
Alberto Bacchelli and Christian Bird, Expectations, Outcomes,
and Challenges of Modern Code Review, in Proceedings of the
International Conference on Software Engineering, IEEE, May
2013
Past Present Future
Exploration
(find)
Trends Alerts Forecasts
Analysis
(explain)
Summarize Overlays Goals
Experiment
(what-if)
Model Bench
marks
Simulate
Finding insights (more)
19
• Interpretation of
data,
• Visualization
– To (e.g.) avoid (sub-
) optimization
based on data,
• But how to
capture/aggregate
diverse aspects of
software quality?
Engström, E., M. Mäntylä, P. Runeson, and M. Borg (2014). Supporting Regression Test Scoping with Visual Analytics, IEEE
International Conference on Software Testing, Verification, and Validation, pp.283–292.
Diversity in Software Engineering Research http://research.microsoft.com/apps/pubs/default.aspx?id=193433
(Collecting a Heap of Shapes) http://research.microsoft.com/apps/pubs/default.aspx?id=196194
Wagner et al. The Quamocao Quality Modeling and Assessment Approach , ICSE’12
An Industrial Case Study on the Risk of Software Changes, E. Shihab, A. E. Hassan, B. Adams and J. Jiang, In FSE'12, Nov. 2012
Building big insight
from little parts
• How to go from simple
predictions to explanations
and theory formation?
• How to make analysis
generalizable and repeatable?
• Qualitative data analysis
methods
• Falsifiability of results
20
Patrick Wagstrom, Corey Jergensen, Anita Sarma: A network of rails: a graph dataset of ruby on rails and associated
projects. MSR 2013: 229-232
Walid Maalej and Martin P. Robillard. Patterns of Knowledge in API Reference Documentation. IEEE Transactions on
Software Engineering, 39(9):1264-1282, September 2013. http://www.cs.mcgill.ca/~martin/papers/tse2013a.pdf
Categorizing bugs with social networks: A case study on four open source software communities, ICSE’13,
Zanetti, Marcelo Serrano; Scholtes, Ingo; Tessone, Claudio Juan; Schweitzer, Frank
21
What can we learn
from each other?
Words for a fledgling Manifesto?
• Vilfredo Pareto
– “Give me the fruitful
error any time, full of
seeds, bursting with its
own corrections. You can
keep your sterile truth
for yourself.”
• Susan Sontag:
– ““The only interesting
answers are those which
destroy the questions. “
22
• Martin H. Fischer
– “A machine has value
only as it produces more
than it consumes, so
check your value to the
community.”
• Tim Menzies
– “More conversations,
less conclusions.”
23
What can we learn
from each other?
Our schedule
• Day 1:
– Find (any) initial common ground
– Breakout groups to explore a shared question
• How to share insights, models, methods, data about software?
• Day 2,3:
– Review, reassess, reevaluate, re-task
• Day 4:
– Lets write a manifesto
• Day 5:
– Some report writing tasks.
24

Weitere ähnliche Inhalte

Was ist angesagt?

Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataJeongwhan Choi
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionCory Andrew Henson
 
Biological Foundations for Deep Learning: Towards Decision Networks
 Biological Foundations for Deep Learning: Towards Decision Networks Biological Foundations for Deep Learning: Towards Decision Networks
Biological Foundations for Deep Learning: Towards Decision Networksdiannepatricia
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deakin University
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)James Hendler
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataAmit Sheth
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —swethaT16
 
Collaborative Learning in Data Science Education: a Data Expedition as a Form...
Collaborative Learning in Data Science Education: a Data Expedition as a Form...Collaborative Learning in Data Science Education: a Data Expedition as a Form...
Collaborative Learning in Data Science Education: a Data Expedition as a Form...Olga Maksimenkova
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for RealJames Hendler
 
AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical scienceDeakin University
 
Machine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityMachine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityDeakin University
 
A metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalKai Li
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningJustin Beirold
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Amit Sheth
 
A Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationA Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationArie van Deursen
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Artificial Intelligence Institute at UofSC
 

Was ist angesagt? (20)

Past, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software DataPast, Present, and Future of Analyzing Software Data
Past, Present, and Future of Analyzing Software Data
 
A Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine PerceptionA Semantics-based Approach to Machine Perception
A Semantics-based Approach to Machine Perception
 
Biological Foundations for Deep Learning: Towards Decision Networks
 Biological Foundations for Deep Learning: Towards Decision Networks Biological Foundations for Deep Learning: Towards Decision Networks
Biological Foundations for Deep Learning: Towards Decision Networks
 
Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1Deep learning 1.0 and Beyond, Part 1
Deep learning 1.0 and Beyond, Part 1
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Broad Data (India 2015)
Broad Data (India 2015)Broad Data (India 2015)
Broad Data (India 2015)
 
Knowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big DataKnowledge Will Propel Machine Understanding of Big Data
Knowledge Will Propel Machine Understanding of Big Data
 
Hands-on Introduction to Machine Learning
Hands-on Introduction to Machine LearningHands-on Introduction to Machine Learning
Hands-on Introduction to Machine Learning
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
Collaborative Learning in Data Science Education: a Data Expedition as a Form...
Collaborative Learning in Data Science Education: a Data Expedition as a Form...Collaborative Learning in Data Science Education: a Data Expedition as a Form...
Collaborative Learning in Data Science Education: a Data Expedition as a Form...
 
The Semantic Web: It's for Real
The Semantic Web: It's for RealThe Semantic Web: It's for Real
The Semantic Web: It's for Real
 
AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical science
 
Machine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityMachine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin University
 
Deep learning and Healthcare
Deep learning and HealthcareDeep learning and Healthcare
Deep learning and Healthcare
 
A metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposalA metadata scheme of the software-data relationship: A proposal
A metadata scheme of the software-data relationship: A proposal
 
NLP & ML Webinar
NLP & ML WebinarNLP & ML Webinar
NLP & ML Webinar
 
Deep Neural Networks for Machine Learning
Deep Neural Networks for Machine LearningDeep Neural Networks for Machine Learning
Deep Neural Networks for Machine Learning
 
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
Semantic, Cognitive, and Perceptual Computing – three intertwined strands of ...
 
A Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software VisualizationA Pragmatic Perspective on Software Visualization
A Pragmatic Perspective on Software Visualization
 
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social ...
 

Andere mochten auch (6)

06558266
0655826606558266
06558266
 
Cicling2005
Cicling2005Cicling2005
Cicling2005
 
Annotation of anaphora and coreference for automatic processing
Annotation of anaphora and coreference for automatic processingAnnotation of anaphora and coreference for automatic processing
Annotation of anaphora and coreference for automatic processing
 
2011 NASA Open Source Summit - Patrick Hogan
2011 NASA Open Source Summit - Patrick Hogan2011 NASA Open Source Summit - Patrick Hogan
2011 NASA Open Source Summit - Patrick Hogan
 
Conll
ConllConll
Conll
 
RCM Services
RCM ServicesRCM Services
RCM Services
 

Ähnlich wie Dagstuhl14 intro-v1

Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceCS, NcState
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)HPCC Systems
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest linkCS, NcState
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.docbutest
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013CS, NcState
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceCS, NcState
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter? CS, NcState
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper ProvenancePaul Groth
 
How to make impact with journal publications on Software Process ImprovementH...
How to make impact with journal publications on Software Process ImprovementH...How to make impact with journal publications on Software Process ImprovementH...
How to make impact with journal publications on Software Process ImprovementH...Torgeir Dingsøyr
 
Agile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from ResearchAgile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from ResearchTorgeir Dingsøyr
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectbodaceacat
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSara-Jayne Terp
 
Assessing Complex Problem Solving Performances
Assessing Complex Problem Solving PerformancesAssessing Complex Problem Solving Performances
Assessing Complex Problem Solving PerformancesRenee Lewis
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration StackPierre Brunelle
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesStefan Dietze
 
"Awareness, Trust, and Software Tool Support in Distance Collaborations" by D...
"Awareness, Trust, and Software Tool Support in Distance Collaborations" by D..."Awareness, Trust, and Software Tool Support in Distance Collaborations" by D...
"Awareness, Trust, and Software Tool Support in Distance Collaborations" by D...Fabio Calefato
 
2016-04-27 research seminar
2016-04-27 research seminar2016-04-27 research seminar
2016-04-27 research seminarifi8106tlu
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...Editor IJCATR
 
Getstarteddssd12717sd
Getstarteddssd12717sdGetstarteddssd12717sd
Getstarteddssd12717sdThinkful
 

Ähnlich wie Dagstuhl14 intro-v1 (20)

Tim Menzies, directions in Data Science
Tim Menzies, directions in Data ScienceTim Menzies, directions in Data Science
Tim Menzies, directions in Data Science
 
Analyzing Big Data's Weakest Link (hint: it might be you)
Analyzing Big Data's Weakest Link  (hint: it might be you)Analyzing Big Data's Weakest Link  (hint: it might be you)
Analyzing Big Data's Weakest Link (hint: it might be you)
 
Big Data: the weakest link
Big Data: the weakest linkBig Data: the weakest link
Big Data: the weakest link
 
Ci2004-10.doc
Ci2004-10.docCi2004-10.doc
Ci2004-10.doc
 
Lecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptxLecture_1_Intro_toDS&AI.pptx
Lecture_1_Intro_toDS&AI.pptx
 
Ml pluss ejan2013
Ml pluss ejan2013Ml pluss ejan2013
Ml pluss ejan2013
 
Icse15 Tech-briefing Data Science
Icse15 Tech-briefing Data ScienceIcse15 Tech-briefing Data Science
Icse15 Tech-briefing Data Science
 
What Metrics Matter?
What Metrics Matter? What Metrics Matter?
What Metrics Matter?
 
Thoughts on Knowledge Graphs & Deeper Provenance
Thoughts on Knowledge Graphs  & Deeper ProvenanceThoughts on Knowledge Graphs  & Deeper Provenance
Thoughts on Knowledge Graphs & Deeper Provenance
 
How to make impact with journal publications on Software Process ImprovementH...
How to make impact with journal publications on Software Process ImprovementH...How to make impact with journal publications on Software Process ImprovementH...
How to make impact with journal publications on Software Process ImprovementH...
 
Agile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from ResearchAgile Development in Large-Scale: Challenges and Insight from Research
Agile Development in Large-Scale: Challenges and Insight from Research
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Session 01 designing and scoping a data science project
Session 01 designing and scoping a data science projectSession 01 designing and scoping a data science project
Session 01 designing and scoping a data science project
 
Assessing Complex Problem Solving Performances
Assessing Complex Problem Solving PerformancesAssessing Complex Problem Solving Performances
Assessing Complex Problem Solving Performances
 
Data Collaboration Stack
Data Collaboration StackData Collaboration Stack
Data Collaboration Stack
 
Analysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online ActivitiesAnalysing User Knowledge, Competence and Learning during Online Activities
Analysing User Knowledge, Competence and Learning during Online Activities
 
"Awareness, Trust, and Software Tool Support in Distance Collaborations" by D...
"Awareness, Trust, and Software Tool Support in Distance Collaborations" by D..."Awareness, Trust, and Software Tool Support in Distance Collaborations" by D...
"Awareness, Trust, and Software Tool Support in Distance Collaborations" by D...
 
2016-04-27 research seminar
2016-04-27 research seminar2016-04-27 research seminar
2016-04-27 research seminar
 
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
A Comparative Study of Various Data Mining Techniques: Statistics, Decision T...
 
Getstarteddssd12717sd
Getstarteddssd12717sdGetstarteddssd12717sd
Getstarteddssd12717sd
 

Mehr von CS, NcState

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdecCS, NcState
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringCS, NcState
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...CS, NcState
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9CS, NcState
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).CS, NcState
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits CS, NcState
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab templateCS, NcState
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUCS, NcState
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements EngineeringCS, NcState
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginiaCS, NcState
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software EngineeringCS, NcState
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)CS, NcState
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?CS, NcState
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4CS, NcState
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CSCS, NcState
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SECS, NcState
 

Mehr von CS, NcState (20)

Talks2015 novdec
Talks2015 novdecTalks2015 novdec
Talks2015 novdec
 
Future se oct15
Future se oct15Future se oct15
Future se oct15
 
GALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software EngineeringGALE: Geometric active learning for Search-Based Software Engineering
GALE: Geometric active learning for Search-Based Software Engineering
 
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...Three Laws of Trusted Data Sharing:(Building a Better Business Case for Dat...
Three Laws of Trusted Data Sharing: (Building a Better Business Case for Dat...
 
Lexisnexis june9
Lexisnexis june9Lexisnexis june9
Lexisnexis june9
 
Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).Welcome to ICSE NIER’15 (new ideas and emerging results).
Welcome to ICSE NIER’15 (new ideas and emerging results).
 
Kits to Find the Bits that Fits
Kits to Find  the Bits that Fits Kits to Find  the Bits that Fits
Kits to Find the Bits that Fits
 
Ai4se lab template
Ai4se lab templateAi4se lab template
Ai4se lab template
 
Automated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSUAutomated Software Enging, Fall 2015, NCSU
Automated Software Enging, Fall 2015, NCSU
 
Requirements Engineering
Requirements EngineeringRequirements Engineering
Requirements Engineering
 
172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia172529main ken and_tim_software_assurance_research_at_west_virginia
172529main ken and_tim_software_assurance_research_at_west_virginia
 
Automated Software Engineering
Automated Software EngineeringAutomated Software Engineering
Automated Software Engineering
 
Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)Next Generation “Treatment Learning” (finding the diamonds in the dust)
Next Generation “Treatment Learning” (finding the diamonds in the dust)
 
Goldrush
GoldrushGoldrush
Goldrush
 
Know thy tools
Know thy toolsKnow thy tools
Know thy tools
 
In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?In the age of Big Data, what role for Software Engineers?
In the age of Big Data, what role for Software Engineers?
 
Sayyad slides ase13_v4
Sayyad slides ase13_v4Sayyad slides ase13_v4
Sayyad slides ase13_v4
 
Ase2013
Ase2013Ase2013
Ase2013
 
Warning: don't do CS
Warning: don't do CSWarning: don't do CS
Warning: don't do CS
 
How to do better experiments in SE
How to do better experiments in SEHow to do better experiments in SE
How to do better experiments in SE
 

Kürzlich hochgeladen

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfIngrid Airi González
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...AliaaTarek5
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 

Kürzlich hochgeladen (20)

Generative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdfGenerative Artificial Intelligence: How generative AI works.pdf
Generative Artificial Intelligence: How generative AI works.pdf
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
(How to Program) Paul Deitel, Harvey Deitel-Java How to Program, Early Object...
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 

Dagstuhl14 intro-v1

  • 1. Our schedule • Day 1: – Find (any) initial common ground – Breakout groups to explore a shared question • How to share insights, models, methods, data about software? • Day 2,3: – Review, reassess, reevaluate, re-task • Day 4: – Lets write a manifesto • Day 5: – Some report writing tasks. 1
  • 2. 2 Day 1: What can we learn from each other?
  • 3. 3 What can we learn from each other?
  • 4. How to share methods? Write! • To really understand something.. • … try and explain it to someone else Read! – MSR – PROMISE – ICSE – FSE – ASE – EMSE – TSE – … 4 But how else can we better share methods?
  • 5. How to share methods? • Related questions: – How to train newcomers? – How to certify (say) a masters program in data science? – If you are hiring, what core competencies should you expect in applications? 5 But how else can we better share methods?
  • 6. 6 What can we learn from each other?
  • 7. How to represent models? Less is more (contrast set learning) • Difference between N things – Is smaller than that the things • Useful for learning .. – What to do – What not to do – Link modeling to optimization Bayes nets • New = old + now • Graphical form, visualizable • Updatable 7 Tim Menzies and Ying Hu. 2003. Data Mining for Very Busy People. Computer 36, 11 (November 2003), 22-29. Tosun Misirli, A.; Basar Bener, A., "Bayesian Networks For Evidence-Based Decision- Making in IEEE TSE, pre-print
  • 8. How to share models? Incremental adaption • Update N variants of the current model as new data arrives • For estimation, use the M<N models scoring best Ensemble learning • Build N different opinions • Vote across the committee • Ensemble out-performs solos 8 L. L. Minku and X. Yao. Ensembles and locality: Insight on improving software effort estimation. Information and Software Technology (IST), 55(8):1512–1528, 2013. Kocaguneli, E.; Menzies, T.; Keung, J.W., "On the Value of Ensemble Effort Estimation," IEEE TSE, 38(6) pp.1403,1416, Nov.-Dec. 2012 Re-learn when each new record arrives New: listen to N-variants But how else can we better share models?
  • 9. 9 What can we learn from each other? d
  • 10. How to share data? Relevancy filtering • TEAK: – prune regions of noisy instances; – cluster the rest • For new examples, – only use data in nearest cluster • Finds useful data from projects either – decades-old – or geographically remote Transfer learning • Map terms in old and new language to a new set of dimensions 10 Kocaguneli, Menzies, Mendes, Transfer learning in effort estimation, Empirical Software Engineering, March 2014 Nam, Pan and Kim, "Transfer Defect Learning" ICSE’13 San Francisco, May 18-26, 2013
  • 11. Handling Suspect Data • Dealing with "holes" in the data • Effectiveness of quick & dirty techniques to narrow a big search space 11 "Software Bertillonage: Determining the Provenance of Software Development Artifacts", by Julius Davies, Daniel M. German, Michael W. Godfrey, and Abram Hindle, Empirical Software Engineering, 18(6), December 2013.
  • 12. And sometimes, data breeds data • Sum greater than parts • E.g. Mining and correlating different types of artifacts – e.g., bugs and design/architecture (anti)patterns – E.g. Learning common error patters • Visualizations 12 J Garcia, I Ivkovic, N Medvidovic. A comparative analysis of software architecture recovery techniques. 28th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2013. Benjamin Livshits and Thomas Zimmermann. 2005. DynaMine: finding common error patterns by mining software revision histories. SIGSOFT Softw. Eng. Notes 30, 5 (September 2005), 296-305. Jian-Guang Lou, Qiang Fu, Shengqi Yang, Ye Xu, and Jiang Li, Mining Invariants from Console Logs for System Problem Detection, in Proceedings of the 2010 USENIX Annual Technical Conference, USENIX, June 2010.
  • 13. How to share data? Privacy preserving data mining • Compress data by X%, – now, 100-X is private ^* • More space between data – Elbow room to mutate/obfuscate data* SE data compression • Most SE data can be greatly compressed – without losing its signal – median: 90% to 98% %& • Share less, preserve privacy • Store less, visualize faster 13 ^ Boyang Li, Mark Grechanik, and Denys Poshyvanyk. Sanitizing And Minimizing DBS For Software Application Test Outsourcing. ICST14 * Peters, Menzies, Gong, Zhang, "Balancing Privacy and Utility in Cross-Company Defect Prediction,” IEEE TSE, 39(8) Aug., 2013 % Vasil Papakroni, Data Carving: Identifying and Removing Irrelevancies in the Data by Masters thesis, WVU, 2013 http://goo.gl/i6caq7 & Kocaguneli, Menzies, Keung, Cok, Madachy: Active Learning and Effort Estimation IEEE TSE. 39(8): 1040-1053 (2013) But how else can we better share data?
  • 14. 14 What can we learn from each other?
  • 15. How to share insight? 15 • Open issue • We don’t even know how to measure “insight” • But how to share it? – Elevators? – Number of times the users invite you back? – Number of issues visited and retired in a meeting? – Number of hypotheses rejected? – Repertory grids? Nathalie GIRARD . Categorizing stakeholders’ practices with repertory grids for sustainable development, Management, 16(1), 31-48, 2013
  • 16. Q: How to share insight A: Do it again and again and again… • “A conclusion is simply the place where you got tired of thinking.” : Dan Chaon • Experience is adaptive and accumulative. – And data science is “just” how we report our experiences. • For an individual to find better conclusions: – Just keep looking • For a community to find better conclusions – Discuss more, share more • Theobald Smith (American pathologist and microbiologist). – “Research has deserted the individual and entered the group. – “The individual worker find the problem too large, not too difficult. – “(They) must learn to work with others. “ 16 Insight is a cyclic process
  • 17. Learning to ask the right questions • actionable mining, • tools for analytics, • domain specific analytics (mobile data, personal data, etc), • programming by examples for analytics. 17 Kim, M.; Zimmermann, T.; Nagappan, N., "An Empirical Study of Refactoring Challenges and Benefits at Microsoft," IEEE TSE, pre-print 2014 Linares-Vásquez, M., Bavota, G., Bernal-Cárdenas, C., Di Penta, M., Oliveto, R., and Poshyvanyk, D., "API Change and Fault Proneness: A Threat to Success of Android Apps",
  • 18. Q: How to share insights A: Step1- find them • One tool is card sorting. • Labor intensive, but insightful • E.g. we routinely use cross-val to verify data mining results , which is a statement on how well the part predicts for new future data. • Yet two-thirds of the information needs for Software Developers are for insights into the past and present. 18 Raymond P.L. Buse, Thomas Zimmermann. Information Needs for Software Development Analytics. ICSE 2012 SEIP. Andrew Begel and Thomas Zimmermann, Analyze This! 145 Questions for Data Scientists in Software Engineering, ICSE’14 Alberto Bacchelli and Christian Bird, Expectations, Outcomes, and Challenges of Modern Code Review, in Proceedings of the International Conference on Software Engineering, IEEE, May 2013 Past Present Future Exploration (find) Trends Alerts Forecasts Analysis (explain) Summarize Overlays Goals Experiment (what-if) Model Bench marks Simulate
  • 19. Finding insights (more) 19 • Interpretation of data, • Visualization – To (e.g.) avoid (sub- ) optimization based on data, • But how to capture/aggregate diverse aspects of software quality? Engström, E., M. Mäntylä, P. Runeson, and M. Borg (2014). Supporting Regression Test Scoping with Visual Analytics, IEEE International Conference on Software Testing, Verification, and Validation, pp.283–292. Diversity in Software Engineering Research http://research.microsoft.com/apps/pubs/default.aspx?id=193433 (Collecting a Heap of Shapes) http://research.microsoft.com/apps/pubs/default.aspx?id=196194 Wagner et al. The Quamocao Quality Modeling and Assessment Approach , ICSE’12 An Industrial Case Study on the Risk of Software Changes, E. Shihab, A. E. Hassan, B. Adams and J. Jiang, In FSE'12, Nov. 2012
  • 20. Building big insight from little parts • How to go from simple predictions to explanations and theory formation? • How to make analysis generalizable and repeatable? • Qualitative data analysis methods • Falsifiability of results 20 Patrick Wagstrom, Corey Jergensen, Anita Sarma: A network of rails: a graph dataset of ruby on rails and associated projects. MSR 2013: 229-232 Walid Maalej and Martin P. Robillard. Patterns of Knowledge in API Reference Documentation. IEEE Transactions on Software Engineering, 39(9):1264-1282, September 2013. http://www.cs.mcgill.ca/~martin/papers/tse2013a.pdf Categorizing bugs with social networks: A case study on four open source software communities, ICSE’13, Zanetti, Marcelo Serrano; Scholtes, Ingo; Tessone, Claudio Juan; Schweitzer, Frank
  • 21. 21 What can we learn from each other?
  • 22. Words for a fledgling Manifesto? • Vilfredo Pareto – “Give me the fruitful error any time, full of seeds, bursting with its own corrections. You can keep your sterile truth for yourself.” • Susan Sontag: – ““The only interesting answers are those which destroy the questions. “ 22 • Martin H. Fischer – “A machine has value only as it produces more than it consumes, so check your value to the community.” • Tim Menzies – “More conversations, less conclusions.”
  • 23. 23 What can we learn from each other?
  • 24. Our schedule • Day 1: – Find (any) initial common ground – Breakout groups to explore a shared question • How to share insights, models, methods, data about software? • Day 2,3: – Review, reassess, reevaluate, re-task • Day 4: – Lets write a manifesto • Day 5: – Some report writing tasks. 24