The Evolution of the R Software Ecosystem (CSMR 2013)

50 %
50 %
Information about The Evolution of the R Software Ecosystem (CSMR 2013)
Technology

Published on March 14, 2014

Author: bramadams

Source: slideshare.net

Description

Software ecosystems form the heart of modern companies’ collaboration strategies with end users, open source developers and other companies. An ecosystem consists of a core platform and a halo of user contributions that provide value to a company or project. In order to sustain the level and number of high-quality contributions, it is crucial for companies and
contributors to understand how ecosystems tend to evolve and can be maintained successfully over time.

As a first step, this presentation explores the evolution characteristics of the statistical computing project GNU R, which is a successful, end-user programming ecosystem. We find that the ecosystem of user-contributed R packages has been growing steadily since R’s conception, at a significantly faster rate than core packages, yet each individual package remains stable in size. We also identified differences in the way user-contributed and core packages are able to attract an active community of users.

The Evolution of the R Software Ecosystem Daniel M. German University ofVictoria Bram Adams École Polytechnique de Montréal Ahmed E. Hassan Queen's University

An Ecosystem is ...

An Ecosystem is ... Jansen et al., ICSE '09 a set of (1) businesses functioning as a unit and interacting with a shared market for (2) software and services, together with (3) the relationships among [the businesses].

In Other Words

core platform

user contributions building on platform core platform

user contributions building on platform core platform ecosystem infrastructure

user contributions building on platform ecosystem infrastructure

user contributions building on platform CRAN

ggplot wethepeopledata.table Sim.DiffProc randomForest rbundler foreach RODBC rms WGCNA minpack.lm fields caret heavy plm rv ggplot2 Sim.DiffProcGUI CRAN

ggplot wethepeopledata.table Sim.DiffProc randomForest rbundler foreach RODBC rms WGCNA minpack.lm fields caret heavy plm rv ggplot2 Sim.DiffProcGUI CRAN

In Other Words

Bosch, SPLC '09 Desktop ecosystems for end- user programming are the holy grail of software platforms!

6

6 h#p://www.)obe.com

6 h#p://www.rexeranaly)cs.com/Data-­‐Miner-­‐Survey-­‐Results-­‐2011.html

6 h#p://www.rexeranaly)cs. But  How  Did  they  Get  This  Far?

Robert  Gentleman,  1993

Robert  Gentleman,  1993 non-­‐programmers

# Goals: A first look at R objects - vectors, lists, matrices, data frames. # To make vectors "x" "y" "year" and "names" x <- c(2,3,7,9) y <- c(9,7,3,2) year <- 1990:1993 names <- c("payal", "shraddha", "kritika", "itida") # Accessing the 1st and last elements of y -- y[1] y[length(y)] # To make a list "person" -- person <- list(name="payal", x=2, y=9, year=1990) person # Accessing things inside a list -- person$name person$x # To make a matrix, pasting together the columns "year" "x" and "y" # The verb cbind() stands for "column bind" cbind(year, x, y) # To make a "data frame", which is a list of vectors of the same length -- D <- data.frame(names, year, x, y) nrow(D) # Accessing one of these vectors D$names # Accessing the last element of this vector D$names[nrow(D)] # Or equally, D$names[length(D$names)] 8 The  R  Language

9 R  has  an  ACTIVE   Community

9 R  has  an  ACTIVE   Community package  infrastructure

9 R  has  an  ACTIVE   Community package  infrastructure mailing  lists

9 R  has  an  ACTIVE   Community package  infrastructure blogsmailing  lists

9 R  has  an  ACTIVE   Community package  infrastructure books blogsmailing  lists

9 R  has  an  ACTIVE   Community package  infrastructure books blogsmailing  lists commercial  partners

9 R  has  an  ACTIVE   Community package  infrastructure books blogsmailing  lists commercial  partners conference

How  does  a  Successful   Ecosystem  like  R  Evolve? 10

How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs

How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs Package  Evolu)on

How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs Package  Evolu)on Package  Dependencies

How  does  a  Successful   Ecosystem  like  R  Evolve? 10 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

Package  Data  Used

Package  Data  Used CRAN 23/04/1997  -­‐  25/02/2011 80  official  R  versions

base recommended popular contributed Package  Data  Used CRAN 23/04/1997  -­‐  25/02/2011 80  official  R  versions 2,733 15 13 179 19,593   versions +

How  to  Define  Popular  Packages?

How  to  Define  Popular  Packages?

How  to  Define  Popular  Packages? contest  providing  list  of   installed  packages  by  52  users

1 5 10 50 100 500 1000 Number of Packages Installed Numberofdifferentpackagesperuser All Inst. by at least 20% users

popular  packages= 1 5 10 50 100 500 1000 Number of Packages Installed Numberofdifferentpackagesperuser All Inst. by at least 20% users

Mailing  List  Data  Used 13

Mailing  List  Data  Used 13 R-­‐help R-­‐devel

Mailing  List  Data  Used 13 R-­‐help R-­‐devel MailMiner [Be#enburg  et  al.]

Mailing  List  Data  Used 13 R-­‐help R-­‐devel MailMiner [Be#enburg  et  al.] PostgreSQL

How  does  a  Successful   Ecosystem  like  R  Evolve? 14 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

How  does  a  Successful   Ecosystem  like  R  Evolve? 14 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

0.0 0.1 0.2 0.3 0.4 0.5 Proportion of files for a given extension Proportionoffiles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Base Recommended Popular Contributed rd r txt hpp rda c h description pdf cpp namespace f rdata png gif java rnw save html xml tex s q citation Documenta)on  Files  Dominate! 15

0.0 0.1 0.2 0.3 0.4 0.5 Proportion of files for a given extension Proportionoffiles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Base Recommended Popular Contributed rd r txt hpp rda c h description pdf cpp namespace f rdata png gif java rnw save html xml tex s q citation Documenta)on  Files  Dominate! 15 documentaDon

0.0 0.1 0.2 0.3 0.4 0.5 Proportion of files for a given extension Proportionoffiles ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Base Recommended Popular Contributed rd r txt hpp rda c h description pdf cpp namespace f rdata png gif java rnw save html xml tex s q citation Documenta)on  Files  Dominate! 15 documentaDon source  code

base recommended popular contributed Size of Documentation per Package Documentation Files (.rd) Lines 0 100 1k 10k 100k Extensive  Package  Documenta)on 16 5.3k 3.6k 1.7k 0.6k

Contributed  Packages  Contain  Less  Code 17 Size of Source Code per Package r Popular Contributed SLOCs 0 100 1k 10k 100k 1M All source code Base Recommended Popular SourceCodeperPackageurceCodeperPackage base recommended popular contributed Size of Documentation per Package Documentation Files (.rd) Lines 0 100 1k 10k 100k 7.3k 3.5k 1.8k 0.7k

18 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community

18 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages

18 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages

1550500 Number of Packages over Time Total ● ● ● ● ●● ●● ● ● 1998 2000 2002 2004 2006 2008 2010 ● Base Recommended Popular Contributed Fast  Growth  of  Contributed  Packages 19

1550500 Number of Packages over Time Total ● ● ● ● ●● ●● ● ● 1998 2000 2002 2004 2006 2008 2010 ● Base Recommended Popular Contributed Fast  Growth  of  Contributed  Packages 19 super-­‐linear  growth

1550500 Number of Packages over Time Total ● ● ● ● ●● ●● ● ● 1998 2000 2002 2004 2006 2008 2010 ● Base Recommended Popular Contributed Fast  Growth  of  Contributed  Packages 19 super-­‐linear  growth conservaDve  base/ recommended  evoluDon

Evolution of the Size of Source 1998 2001 2004 2007 2010 1999 2002 2005 2008 2011 1999 010010k1M Base Recommended Popu e Size of Source Code per Package 2008 2011 1999 2002 2005 2008 2011 1999 2002 2005 2008 2011 Recommended Popular Contributed Contributed  Packages  have  Stable  Size 20 05 2008 2011 1999 2002 2005 2008 2011 1999 200 Recommended Popular Contributed 2007 2010 1999 2002 2005 2008 2011 1999 2002 Base Recommended Popular

Number of Releases Per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 15102050160 ● Recommended Popular Contributed The  Less  Core,  the  Less  Releases 21

Number of Releases Per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 15102050160 ● Recommended Popular Contributed The  Less  Core,  the  Less  Releases 21 50%  had  <=17  releases

Number of Releases Per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 15102050160 ● Recommended Popular Contributed The  Less  Core,  the  Less  Releases 21 50%  had  <=3  releases 50%  had  <=17  releases

Date of Latest Release per Package ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 2003 2004 2005 2006 2007 2008 2009 2010 2011 ● Recommended Popular Contributed ...  but  Contributed  Packages  are  Ac)vely  Maintained! 22 >90%  of  packages  had  release  in  last  2  years

23

23

24 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages

24 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance

24 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance

0510152025 Number of Dependencies Per Package Proportion of Packages NumberofDependencies 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recommended Popular Contributed Packages  have  Few  Dependencies

0510152025 Number of Dependencies Per Package Proportion of Packages NumberofDependencies 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recommended Popular Contributed Packages  have  Few  Dependencies 1/3  has  NONE

0510152025 Number of Dependencies Per Package Proportion of Packages NumberofDependencies 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recommended Popular Contributed Packages  have  Few  Dependencies 1/3  has  NONE 1/4  has  1  dependency

Number of Dependents Per Package Proportion of Packages NumberofDependents 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0131050260 Recommended Popular Contributed Contributed  Packages  are  Higher-­‐Level

Number of Dependents Per Package Proportion of Packages NumberofDependents 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0131050260 Recommended Popular Contributed Contributed  Packages  are  Higher-­‐Level NO  dependents

Number of Dependents Per Package Proportion of Packages NumberofDependents 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0131050260 Recommended Popular Contributed Contributed  Packages  are  Higher-­‐Level NO  dependents 50%  popular  packages  has  <=6  dependents

27 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance

27 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level

27 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level

1998 2000 2002 2004 2006 2008 2010 05000100001500020000 #messages ● ● ● ● ● ● ● ● ● ● ● ● ● ● base recommended popular contributed Contributed  Packages  Generate   More  User  Traffic

1998 2000 2002 2004 2006 2008 2010 05001000150020002500 #messages ● ● ● ● ● ● ● ● ● ● ● ● ● ● base recommended popular contributed Contributed  Packages  take  over   Developer  Traffic

1998 2000 2002 2004 2006 2008 2010 05001000150020002500 #messages ● ● ● ● ● ● ● ● ● ● ● ● ● ● base recommended popular contributed Contributed  Packages  take  over   Developer  Traffic

110010000 Total#messages base recommended popular contributed The  Less  Core,  the  Less  Traffic

110010000 Total#messages base recommended popular contributed The  Less  Core,  the  Less  Traffic strong compeDDon

Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year

Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months

Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year

Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year 5  months  slower

Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year 5  months  slower 44.9%  gets  here

Time instant day week month year 5 year 10 year 1st msg. 10th msg. 100th msg. 1000th msg. base recommended popular contributed Star)ng  up  a  Community  takes  1  Year 3  months 1  year 5  months  slower only  6.5%   gets  this  far 44.9%  gets  here

32 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level

32 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level strong   compe))on   for  a#en)on building  a   community   takes  a  year

So  What? • How  do  contributors  deal  with  the  fight  for  aYenDon? –  What  is  their  mo)va)on? –  How  much  effort  do  they  spend  on  their  package? • How  does  a  package  become  popular/recommended? –  Do  bloggers/books  have  an  impact? –  Or  is  it  the  other  way  around? • How  do  R-­‐forge  and  the  core  team  ensure  high   quality  releases  without  broken  packages? • ...

Bosch, SPLC '09 Desktop ecosystems for end- user programming are the holy grail of software platforms!

base recommended popular contributed Case  Study  on  R CRAN 23/04/1997  -­‐  25/02/2011 80  official  R  versions 2,733 15 13 179 19,593   versions +

37 Package  Characteris)cs Package  Evolu)on Package  Dependencies Package  Community extensive   documenta)on small   contributed   packages fast  growth  of   contributed   packages stable   package  size ac)ve   maintenance few   dependencies contributed   packages  are   higher  level strong   compe))on   for  a#en)on building  a   community   takes  a  year

1st International Workshop on Release Engineering http://releng.polymtl.ca May 20, 2013, San Francisco, USA RELENG 2013

#messages presentations

Add a comment

Related presentations

Related pages

The Evolution of the R Software Ecosystem

The Evolution of the R Software Ecosystem. Authors: ... CSMR '13 Proceedings of the 2013 17th European Conference on Software Maintenance and Reengineering ...
Read more

The Evolution of the R Software Ecosystem - ResearchGate

The Evolution of the R Software Ecosystem on ... 10.1109/CSMR.2013.33 Conference: Software Maintenance ... understand the evolution of a software ecosystem.
Read more

www.computer.org

... The Evolution of the R Software Ecosystem SN ... mail,Companies,Programming,R,Software ecosystems,Evolution VL ... org/10.1109/CSMR.2013.33 ...
Read more

Table of Contents - computer.org

CSMR 2013 Table of Contents Welcome from the Conference Chairs ... The Evolution of the R Software Ecosystem ...
Read more

Software Evolvability: An Ecosystem Point of View

Software Evolvability: An Ecosystem ... 29th IEEE International Conference on Software Maintenance (ICSM 2013 ... The Evolution of the R Software Ecosystem.
Read more

On the Maintainability of CRAN Packages - UMONS

On the Maintainability of CRAN Packages ... software ecosystem consisting of over 5000 R packages ... the evolution of the R software ecosystem, ...
Read more

Software Ecosystem | LinkedIn

Software Ecosystem Manager, Software and ... and the evolution of the iPhone software ecosystem to ... the R Software Ecosystem (CSMR 2013)
Read more

Search results for "CSMR" – FacetedDBLP

URL (DBLP): http://dblp.uni-trier.de/db/conf/csmr. Publication years (Num. hits) 1997 (29) ... (60) 2012 (81) 2013 (64) Publication types (Num. hits) ...
Read more