advertisement

Changes and Bugs: Mining and Predicting Development Activities

33 %
67 %
advertisement
Information about Changes and Bugs: Mining and Predicting Development Activities

Published on June 23, 2008

Author: tom.zimmermann

Source: slideshare.net

Description

PhD defense talk, 2008.
advertisement

MINING SOFTW ARE AR CHIVES Changes and Bugs Mining and Predicting Development Activities Thomas Zimmermann Saarbrücken, May 26, 2008

Software development Build

Collaboration

Collaboration Comm. Archive

Collaboration Comm. Version Archive Archive

Collaboration Comm. Version Bug Archive Archive Database

Collaboration Comm. Version Bug Archive Archive Database Mining Software Archives

eROSE: Guiding developers Customers who bought this item also bought... Purchase History

eROSE: Guiding developers Customers who Developers who bought this item also changed this function bought... also changed... Purchase Version History Archive

eROSE suggests further locations.

THIS THESIS .

THIS THESIS . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing

THIS THESIS . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing

Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).

Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).

Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { ... getSite().getPage().removePartListener(partListener); }

Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { ... getSite().getPage().removePartListener(partListener); }

Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); }

Fine-grained analysis public void createPartControl(Composite parent) { ... // add listener for editor page activation getSite().getPage().addPartListener(partListener); } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); }

Fine-grained analysis public void createPartControl(Composite parent) { ... close // add listener for editor page activation open getSite().getPage().addPartListener(partListener); println } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); } begin

Fine-grained analysis public void createPartControl(Composite parent) { ... close // add listener for editor page activation open getSite().getPage().addPartListener(partListener); println } public void dispose() { co-added ... getSite().getPage().removePartListener(partListener); } begin Co-added items = patterns

Fine-grained analysis

Fine-grained analysis public static final native void _XFree(int address); public static final void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } finally { lock.unlock(); } }

Fine-grained analysis public static final native void _XFree(int address); public static final void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } finally { lock.unlock(); } } D IN NGE IONS CHA CAT 128 4 LO

Fine-grained analysis public static final native void _XFree(int address); public static final void XFree(int /*long*/ address) { lock.lock(); try { _XFree(address); } finally { lock.unlock(); } } D IN NGE IONS CHA CAT 128 4 LO Crosscutting changes = aspect candidates

Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).

Bugs! Bugs! Bugs!

Quality assurance is limited... ...by time...

Quality assurance is limited... ...by time... ...and by money.

Spent resources on the components that need it most, i.e., are most likely to fail.

Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006

Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006 • Code churn - Nagappan and Ball 2005

Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005

Indicators of defects • Code complexity - Basili et al. 1996, Subramanyam and Krishnan 2003, - Binkley and Schach 1998, Ohlsson and Alberg 1996, - Nagappan et al. 2006, Knab et al. 2006 • Code churn - Nagappan and Ball 2005 • Historical data - Khoshgoftaar et al. 1996, Graves et al. 2000, Kim et al. 2007, - Ostrand et al. 2005, Mockus et al. 2005 • Code dependencies - Nagappan and Ball 2007, Schröter et al. 2006

2252 Binaries 28.3 MLOC

Windows Server layout

Windows Server layout

Windows Server layout

Windows Server layout

Hypotheses Complexity of dependency graphs Sub system correlates with the number of post-release defects (H1) level can predict the number of post-release defects (H2)

Hypotheses Complexity of dependency graphs Sub system correlates with the number of post-release defects (H1) level can predict the number of post-release defects (H2) Network measures on dependency graphs Binary correlate with the number of post-release defects (H3) level can predict the number of post-release defects (H4) can indicate critical “escrow” binaries (H5)

DATA. .

Data collection Release point for Windows Server 2003

Data collection Release point for Windows Server 2003 Dependencies Network Measures Complexity Metrics

Data collection six months Release point for to collect Windows Server 2003 defects Dependencies Network Measures Complexity Metrics Defects

Dependencies • Directed relationship between two pieces of code (here: binaries) • MaX dependency analysis framework -Caller-callee dependencies - Imports and exports - RPC, COM - Runtime dependencies (such as LoadLibrary) - Registry access - etc.

Centrality

Centrality Degree Blue binary has dependencies to many other binaries

Centrality Degree Closeness Blue binary has dependencies Blue binary is close to all other to many other binaries binaries (only two steps)

Centrality Degree Closeness Betweenness Blue binary has dependencies Blue binary is close to all other Blue binary connects the left to many other binaries binaries (only two steps) with the right graph (bridge)

Centrality • Degreethe number dependencies centrality - counts • Closeness centrality binaries into account - takes distance to all other - Closeness: How close are the other binaries? - Reach: How many binaries can be reached (weighted)? - Eigenvector: similar to Pagerank • Betweenness centrality paths through a binary - counts the number of shortest

Ego networks EGO

Ego networks EGO INOUT

Ego networks EGO IN INOUT

Ego networks EGO IN OUT INOUT

Complexity metrics Group Metrics Aggregation Module metrics # functions in B for a binary B # global variables in B # executable lines in f() # parameters in f() Per-function metrics Total # functions calling f() for a function f() Max # functions called by f() McCabe’s cyclomatic complexity of f() # methods in C # subclasses of C OO metrics Total Depth of C in the inheritance tree for a class C Max Coupling between classes Cyclic coupling between classes

RESULTS. .

1 PATTERNS

Star pattern

Star pattern With defects No defects

Undirected cliques ... ...

Undirected cliques

Undirected cliques Average number of defects is higher for binaries in large cliques.

2 PREDICTION

Prediction Input metrics and measures Model Prediction PCA Regression

Prediction Input metrics and measures Model Prediction PCA Regression Metrics SNA Metrics+SNA

Prediction Input metrics and measures Model Prediction PCA Regression Metrics Classification SNA Metrics+SNA Ranking

Classification Has a binary a defect or not? or

Ranking Which binaries have the most defects? or or ... or

Random splits

Random splits 4×50×

Classification (logistic regression)

Classification (logistic regression)

Classification (logistic regression) SNA increases the recall by 0.10 (at p=0.01) while precision remains comparable.

Ranking (linear regression)

Ranking (linear regression) SNA+METRICS increases the correlation by 0.10 (significant at p=0.01)

FUTURE WORK . additions analysis architecture archives aspects bug cached calls changes collaboration complexities component concerns cross- cutting cvs data defects design development drawing dynamine eclipse effort evolves failures fine-grained fix fix-inducing graphs hatari history locate matching method mining predicting program programmers report repositories revision software support system taking transactions version visualizing

FUTURE WORK . analysis archives aspects bug changes collaboration complexities component concerns cross-cutting cvs data defects design development drawing eclipse erose evolves factor failures fine-grained fix fix-inducing fm graphs guide hatari history human matching mining networking predicting program programmers quality report repositories revision social software support system taking version

Collaboration Comm. Version Bug Archive Archive Database

Collaboration Collab. Data Comm. Version Bug Archive Archive Database

Collaboration Collab. Effort Data Data Comm. Version Bug Archive Archive Database

Collaboration Collab. Effort Data Data Comm. Version Bug Archive Archive Database

Collaboration Collab. Effort Data Data Comm. Version Bug Archive Archive Database Social Networking for Software Development

Contributions of the thesis Fine-grained analysis of version archives. 1 Project-specific usage patterns of methods (FSE 2005) Identification of cross-cutting changes (ASE 2006) Mining bug databases to predict defects. 2 Dependencies predict defects (ISSRE 2007, ICSE 2008) Domino effect: depending on defect-prone binaries increases the chances of having defects (In submission).

Add a comment

Related pages

Changes and Bugs: Mining and Predicting Development Activities

Mining and Predicting Development Activities ... I Mining Changes 5 ... imply a very strong correlation between one-line changes and bug corrections or ...
Read more

Changes and Bugs – Mining and Predicting Development ...

Changes and Bugs – Mining and Predicting Development Activities Thomas Zimmermann Microsoft Research Redmond, WA, USA tzimmer@microsoft.com Abstract
Read more

Changes and Bugs - Mining and Predicting Development ...

Software Engineering Chair (Prof. Zeller) ... Changes and Bugs - Mining and Predicting Development ... Mining and Predicting Development Activities" ...
Read more

Changes and Bugs – Mining and Predicting Development ...

... Software development results in a huge amount of data: changes ... mining version archives and bug ... Mining and Predicting Development Activities ...
Read more

Changes and bugs — Mining and predicting development ...

Changes and bugs — Mining and predicting development activities Full Text Sign ... We present techniques for mining version archives and bug databases to ...
Read more

Changes and Bugs - Mining and Predicting Development ...

... Mining and Predicting Development Activities on ResearchGate, ... Changes and Bugs - Mining and Predicting Developme... Available from uni-saarland.de
Read more

Changes and bugs mining and predicting development activities

Software development results in a ... changes to source code are ... we present techniques for mining version archives and bug databases to understand ...
Read more

Changes and bugs — Mining and predicting development ...

2009 IEEE International Conference on Software ... Mining and predicting development activities ... amount of data: changes to source code ...
Read more