SlideShare ist ein Scribd-Unternehmen logo
1 von 41
Six Easy Pieces (of Quantitatively Analyzing Open Source Software) ‏ Dirk Riehle SAP Research, SAP Labs LLC dirk@riehle.org, www.riehle.org, twitter.com/driehle
Open Source Software ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Talk Overview (Agenda) ‏ The Growth of Open Source Software Data Mining for Fun and Profit Efficiently Estimating Commit Sizes Developer Activity in Open Source Software Projects 1. 2. 3. 5. The Commit Size Distribution of Open Source 4. The Commenting Practice of Open Source 6. Team Size Evolution in Open Source Projects 7. Conclusions 8.
The Growth of Open Source Software Amit Deshpande, Dirk Riehle. “The Total Growth of Open Source.” In  Proceedings of the Fourth Conference on Open Source Systems  (OSS 2008). Springer Verlag, 2008. Page 197-209.   http://www.riehle.org/2008/03/14/the-total-growth-of-open-source/
Source Code Growth in Open Source SLoC = source lines of code
Model of Source Code Growth where, y: Total open source lines of code x: Time from Jan 1995 to Dec 2006 in months 0.964 y = 2E+06*e 0.0464x Lower bound 0.961 y = 784098*e 0.0555x Upper bound R-square value Model Approach
Project Growth in Open Source
Model of Project Growth where, y: Total number of open source projects x: Time from Jan 1995 to Dec 2006 in months 0.956 y = 7.1511e 0.0499x R-square value Model
Where Open Source is Growing ,[object Object],[object Object]
Data Mining for Fun and Profit Oliver Arafat, Amit Deshpande, Philipp Hofmann, Dirk Riehle. http://www.riehle.org/publications/
Motivation and Approach ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Data Source, Data Quality ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Open Source Analytics Tool Chain
Efficiently Estimating Commit Sizes Philipp Hofmann, Dirk Riehle. “Estimating Commit Sizes Efficiently.” In  Proceedings of the 5th International Conference on Open Source Systems  (OSS 2009). Springer Verlag, 2009. Forthcoming.  http://www.riehle.org/2009/02/11/estimating-commit-sizes-efficiently/
Definition of Commit Size ,[object Object],[object Object],[object Object]
What Diff Does 4,5c4,6 < d < f --- > e > e > e 7a9 > j 9d10 < n a b c e e e g h j m a b c d f g h m n 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: diff a.txt b.txt b.txt a.txt
The Trouble with Diff ,[object Object],[object Object],[object Object],[object Object],[object Object]
Some Diff Section Size Examples ,[object Object],[object Object],2 0 1 1 Event 2 1 1 0 0 Event 1 Number of  Modifications Number of  SLoC changed Number of  SLoC removed Number of  SLoC added (1, 1)‏ 7 0 3 4 Event 4 6 1 2 3 Event 3 5 2 1 2 Event 2 4 3 0 1 Event 1 Number of  Modifications Number of  SLoC changed Number of  SLoC removed Number of  SLoC added (4, 3)‏
Garden Variety of Heuristics 5.44 0 Linear Estimation 7 40.35 -5.95 Ldiff 6 30.87 -3.06 GNU diff –d 5 19.55 -1.96 GNU diff 4 7.68 -0.27 Bounds Mean 3 6.39 -4.41 Upper Bound 2 16.64 3.86 Lower Bound 1 Error  Standard Deviation Error Mean Approach
Visual Comparison of Heuristics
Definition of Commit Size ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
The Commit Size Distribution of Open Source Oliver Arafat, Dirk Riehle. “The Commit Size Distribution of Open Source Software.”  In  Proceedings of the 42nd Hawaiian International Conference on System Science   (HICSS-42). IEEE Press: 2009. Page 1-8. http://www.riehle.org/2008/09/23/ the-commit-size-distribution-of-open-source-software/
The Overall Commit Size Distribution
The Dominance of Small Commits
The Overall Commit Size Distribution ,[object Object],[object Object],[object Object],[object Object],[object Object]
Developer Activity in Open Source Software Projects Dirk Riehle, Oliver Arafat, Amit Deshpande. “Developer Activity in Open Source Software Projects.” In  preparation. Amit Deshpande, Dirk Riehle. “Continuous Integration in Open Source Software Development.” In  Proceedings of the Fourth Conference on Open Source Systems  (OSS 2008). Springer Verlag, 2008. Page 273-280. http://www.riehle.org/2008/03/08/ continuous-integration-in-open-source-software-development/
Average Commit Size
Average Commit Frequency
Changes in Developer Behavior ,[object Object],[object Object],[object Object],[object Object]
The Commenting Practice of Open Source Oliver Arafat, Dirk Riehle. “The Comment Density of Open Source Software Code.” In  Companion to Proceedings of the 31st International Conference on Software Engineering  (ICSE 2009). IEEE Press, 2009: Forthcoming.  http://www.riehle.org/2009/02/04/ the-comment-density-of-open-source-software-code/
Average Comment Density ,[object Object]
Comment Density by Programming Language 273 7% 10% Perl 6. 534 8% 11% Python 5. 276 9% 16% Javascript 4. 1621 8% 18% C/C++ 3. 559 12% 22% php 2. 1085 11% 26% Java 1. Population Size Stddev [%] Average [%] Language #
Comment Density by Commit Size
Comment Density by Team Size
Comment Density by Project Age
Commenting in Open Source ,[object Object],[object Object],[object Object],[object Object],[object Object]
Team Size Evolution in Open Source Projects Philipp Hofmann, Dirk Riehle. “Team Size Evolution in Open Source Software Projects.” In  preparation.
Teams Size Evolution Figure
Is Open Source Scale-Free? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Conclusions ,[object Object],[object Object],[object Object],[object Object],[object Object]
Thank you! dirk@riehle.org, www.riehle.org, twitter.com/driehle Comments are welcome! ‏

Weitere ähnliche Inhalte

Ähnlich wie Six Easy Pieces of Quantitatively Analyzing Open Source

Open source vs. open data
Open source vs. open dataOpen source vs. open data
Open source vs. open datadata publica
 
20080602 Microsoft and Open Source
20080602 Microsoft and Open Source20080602 Microsoft and Open Source
20080602 Microsoft and Open SourceDavid Chou
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaFriprogsenteret
 
Open source presentation enterprise ireland 2010
Open source presentation enterprise ireland 2010Open source presentation enterprise ireland 2010
Open source presentation enterprise ireland 2010Tim Willoughby
 
Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay
 
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Ruchi Raveendran
 
DevOps interview questions and answers
DevOps interview questions and answersDevOps interview questions and answers
DevOps interview questions and answersHopeTutors1
 
The OSGeo Foundation: Professionally Leveraging Open Source Geospatial
The OSGeo Foundation: Professionally Leveraging Open Source GeospatialThe OSGeo Foundation: Professionally Leveraging Open Source Geospatial
The OSGeo Foundation: Professionally Leveraging Open Source GeospatialArnulf Christl
 
Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010OpenSourceLGMA
 
Road to DevOps ROI
Road to DevOps ROIRoad to DevOps ROI
Road to DevOps ROICloudmunch
 
Open Source Software in Libraries
Open Source Software in LibrariesOpen Source Software in Libraries
Open Source Software in LibrariesSukhdev Singh
 
Open source softwares, 2011
Open source softwares, 2011Open source softwares, 2011
Open source softwares, 2011Florent Renucci
 
What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.anilpmuvvala
 
What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.anilpmuvvala
 

Ähnlich wie Six Easy Pieces of Quantitatively Analyzing Open Source (20)

Of Changes and Their History
Of Changes and Their HistoryOf Changes and Their History
Of Changes and Their History
 
Open source vs. open data
Open source vs. open dataOpen source vs. open data
Open source vs. open data
 
20080602 Microsoft and Open Source
20080602 Microsoft and Open Source20080602 Microsoft and Open Source
20080602 Microsoft and Open Source
 
GoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'EliaGoOpen 2010: Sandro D'Elia
GoOpen 2010: Sandro D'Elia
 
Open source presentation enterprise ireland 2010
Open source presentation enterprise ireland 2010Open source presentation enterprise ireland 2010
Open source presentation enterprise ireland 2010
 
Asundi
AsundiAsundi
Asundi
 
Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009Matt Asay - The Community Imperative - Openbravo World Conference 2009
Matt Asay - The Community Imperative - Openbravo World Conference 2009
 
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
Asayopenbravowccommunityimperativenv 090419061800 Phpapp01
 
DevOps interview questions and answers
DevOps interview questions and answersDevOps interview questions and answers
DevOps interview questions and answers
 
Msr2021 tutorial-di penta
Msr2021 tutorial-di pentaMsr2021 tutorial-di penta
Msr2021 tutorial-di penta
 
The OSGeo Foundation: Professionally Leveraging Open Source Geospatial
The OSGeo Foundation: Professionally Leveraging Open Source GeospatialThe OSGeo Foundation: Professionally Leveraging Open Source Geospatial
The OSGeo Foundation: Professionally Leveraging Open Source Geospatial
 
Microsoft ve Açık Kaynak
Microsoft ve Açık KaynakMicrosoft ve Açık Kaynak
Microsoft ve Açık Kaynak
 
Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010Open source presentation to lgma workshop april 2010
Open source presentation to lgma workshop april 2010
 
Road to DevOps ROI
Road to DevOps ROIRoad to DevOps ROI
Road to DevOps ROI
 
Open Source Software in Libraries
Open Source Software in LibrariesOpen Source Software in Libraries
Open Source Software in Libraries
 
What_is_DevOps.pptx
What_is_DevOps.pptxWhat_is_DevOps.pptx
What_is_DevOps.pptx
 
Open source softwares, 2011
Open source softwares, 2011Open source softwares, 2011
Open source softwares, 2011
 
What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.What_is_DevOps_how_it's_very_useful_in_daily_Life.
What_is_DevOps_how_it's_very_useful_in_daily_Life.
 
What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.What is DevOps And How It Is Useful In Real life.
What is DevOps And How It Is Useful In Real life.
 
Succeeding with FOSS!
Succeeding with FOSS!Succeeding with FOSS!
Succeeding with FOSS!
 

Mehr von Dirk Riehle

Single-Vendor Open Source at the Crossroads
Single-Vendor Open Source at the CrossroadsSingle-Vendor Open Source at the Crossroads
Single-Vendor Open Source at the CrossroadsDirk Riehle
 
Why open source is good for your economy
Why open source is good for your economyWhy open source is good for your economy
Why open source is good for your economyDirk Riehle
 
Startupinformatik
StartupinformatikStartupinformatik
StartupinformatikDirk Riehle
 
The Business of Open Source User Foundations
The Business of Open Source User FoundationsThe Business of Open Source User Foundations
The Business of Open Source User FoundationsDirk Riehle
 
The Business of Open Models
The Business of Open ModelsThe Business of Open Models
The Business of Open ModelsDirk Riehle
 
2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - web2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - webDirk Riehle
 
Open Source: A New Developer Career
Open Source: A New Developer CareerOpen Source: A New Developer Career
Open Source: A New Developer CareerDirk Riehle
 
The Comment Density of Open Source Software Code
The Comment Density of Open Source Software CodeThe Comment Density of Open Source Software Code
The Comment Density of Open Source Software CodeDirk Riehle
 
Micro-Blogging in the Enterprise Focus Groups Evaluation
Micro-Blogging in the Enterprise Focus Groups EvaluationMicro-Blogging in the Enterprise Focus Groups Evaluation
Micro-Blogging in the Enterprise Focus Groups EvaluationDirk Riehle
 
Learning From Wikipedia
Learning From WikipediaLearning From Wikipedia
Learning From WikipediaDirk Riehle
 
Open Collaboration
Open CollaborationOpen Collaboration
Open CollaborationDirk Riehle
 

Mehr von Dirk Riehle (12)

Single-Vendor Open Source at the Crossroads
Single-Vendor Open Source at the CrossroadsSingle-Vendor Open Source at the Crossroads
Single-Vendor Open Source at the Crossroads
 
Why open source is good for your economy
Why open source is good for your economyWhy open source is good for your economy
Why open source is good for your economy
 
Startupinformatik
StartupinformatikStartupinformatik
Startupinformatik
 
Tripod
TripodTripod
Tripod
 
The Business of Open Source User Foundations
The Business of Open Source User FoundationsThe Business of Open Source User Foundations
The Business of Open Source User Foundations
 
The Business of Open Models
The Business of Open ModelsThe Business of Open Models
The Business of Open Models
 
2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - web2010 06-10 - linux-tag - dirk riehle - developer career - web
2010 06-10 - linux-tag - dirk riehle - developer career - web
 
Open Source: A New Developer Career
Open Source: A New Developer CareerOpen Source: A New Developer Career
Open Source: A New Developer Career
 
The Comment Density of Open Source Software Code
The Comment Density of Open Source Software CodeThe Comment Density of Open Source Software Code
The Comment Density of Open Source Software Code
 
Micro-Blogging in the Enterprise Focus Groups Evaluation
Micro-Blogging in the Enterprise Focus Groups EvaluationMicro-Blogging in the Enterprise Focus Groups Evaluation
Micro-Blogging in the Enterprise Focus Groups Evaluation
 
Learning From Wikipedia
Learning From WikipediaLearning From Wikipedia
Learning From Wikipedia
 
Open Collaboration
Open CollaborationOpen Collaboration
Open Collaboration
 

Kürzlich hochgeladen

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 

Kürzlich hochgeladen (20)

Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 

Six Easy Pieces of Quantitatively Analyzing Open Source

  • 1. Six Easy Pieces (of Quantitatively Analyzing Open Source Software) ‏ Dirk Riehle SAP Research, SAP Labs LLC dirk@riehle.org, www.riehle.org, twitter.com/driehle
  • 2.
  • 3. Talk Overview (Agenda) ‏ The Growth of Open Source Software Data Mining for Fun and Profit Efficiently Estimating Commit Sizes Developer Activity in Open Source Software Projects 1. 2. 3. 5. The Commit Size Distribution of Open Source 4. The Commenting Practice of Open Source 6. Team Size Evolution in Open Source Projects 7. Conclusions 8.
  • 4. The Growth of Open Source Software Amit Deshpande, Dirk Riehle. “The Total Growth of Open Source.” In Proceedings of the Fourth Conference on Open Source Systems (OSS 2008). Springer Verlag, 2008. Page 197-209. http://www.riehle.org/2008/03/14/the-total-growth-of-open-source/
  • 5. Source Code Growth in Open Source SLoC = source lines of code
  • 6. Model of Source Code Growth where, y: Total open source lines of code x: Time from Jan 1995 to Dec 2006 in months 0.964 y = 2E+06*e 0.0464x Lower bound 0.961 y = 784098*e 0.0555x Upper bound R-square value Model Approach
  • 7. Project Growth in Open Source
  • 8. Model of Project Growth where, y: Total number of open source projects x: Time from Jan 1995 to Dec 2006 in months 0.956 y = 7.1511e 0.0499x R-square value Model
  • 9.
  • 10. Data Mining for Fun and Profit Oliver Arafat, Amit Deshpande, Philipp Hofmann, Dirk Riehle. http://www.riehle.org/publications/
  • 11.
  • 12.
  • 13. Open Source Analytics Tool Chain
  • 14. Efficiently Estimating Commit Sizes Philipp Hofmann, Dirk Riehle. “Estimating Commit Sizes Efficiently.” In Proceedings of the 5th International Conference on Open Source Systems (OSS 2009). Springer Verlag, 2009. Forthcoming. http://www.riehle.org/2009/02/11/estimating-commit-sizes-efficiently/
  • 15.
  • 16. What Diff Does 4,5c4,6 < d < f --- > e > e > e 7a9 > j 9d10 < n a b c e e e g h j m a b c d f g h m n 01: 02: 03: 04: 05: 06: 07: 08: 09: 10: 11: diff a.txt b.txt b.txt a.txt
  • 17.
  • 18.
  • 19. Garden Variety of Heuristics 5.44 0 Linear Estimation 7 40.35 -5.95 Ldiff 6 30.87 -3.06 GNU diff –d 5 19.55 -1.96 GNU diff 4 7.68 -0.27 Bounds Mean 3 6.39 -4.41 Upper Bound 2 16.64 3.86 Lower Bound 1 Error Standard Deviation Error Mean Approach
  • 20. Visual Comparison of Heuristics
  • 21.
  • 22. The Commit Size Distribution of Open Source Oliver Arafat, Dirk Riehle. “The Commit Size Distribution of Open Source Software.” In Proceedings of the 42nd Hawaiian International Conference on System Science (HICSS-42). IEEE Press: 2009. Page 1-8. http://www.riehle.org/2008/09/23/ the-commit-size-distribution-of-open-source-software/
  • 23. The Overall Commit Size Distribution
  • 24. The Dominance of Small Commits
  • 25.
  • 26. Developer Activity in Open Source Software Projects Dirk Riehle, Oliver Arafat, Amit Deshpande. “Developer Activity in Open Source Software Projects.” In preparation. Amit Deshpande, Dirk Riehle. “Continuous Integration in Open Source Software Development.” In  Proceedings of the Fourth Conference on Open Source Systems  (OSS 2008). Springer Verlag, 2008. Page 273-280. http://www.riehle.org/2008/03/08/ continuous-integration-in-open-source-software-development/
  • 29.
  • 30. The Commenting Practice of Open Source Oliver Arafat, Dirk Riehle. “The Comment Density of Open Source Software Code.” In  Companion to Proceedings of the 31st International Conference on Software Engineering  (ICSE 2009). IEEE Press, 2009: Forthcoming. http://www.riehle.org/2009/02/04/ the-comment-density-of-open-source-software-code/
  • 31.
  • 32. Comment Density by Programming Language 273 7% 10% Perl 6. 534 8% 11% Python 5. 276 9% 16% Javascript 4. 1621 8% 18% C/C++ 3. 559 12% 22% php 2. 1085 11% 26% Java 1. Population Size Stddev [%] Average [%] Language #
  • 33. Comment Density by Commit Size
  • 34. Comment Density by Team Size
  • 35. Comment Density by Project Age
  • 36.
  • 37. Team Size Evolution in Open Source Projects Philipp Hofmann, Dirk Riehle. “Team Size Evolution in Open Source Software Projects.” In preparation.
  • 39.
  • 40.
  • 41. Thank you! dirk@riehle.org, www.riehle.org, twitter.com/driehle Comments are welcome! ‏