BEIRUT: Repository Mining for Defect Prediction

Amir Elmishali, Bruno Sotto-Mayor, Inbal Roshanski, Amit Sultan, Meir Kalech

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Scopus citations


Software Defect Prediction is an important activity used in the Testing Phase of the software development life cycle. Within the research of new defect prediction approaches and the selection of training sets for the classification task, different benchmarks have been analyzed in the literature. They provide several features and defective information over specific software archives. Therefore, they are commonly used in research to evaluate new approaches. However, the current benchmarks contain several limitations, such as lack of project variability, outdated benchmarks, single-version projects, a small number of projects and metrics, unavailable resources, poor usability, and non-extensible tools. Therefore, we introduce a novel tool Bgu rEpository mlning foR bUg predicIion (BEIRUT) for benchmark generation for defect prediction, composed of three main features: Given an open-source repository from GitHub, BEIRUT mines the software repository by (1) selecting the best $k$ versions, based on the defective rate of each version, (2) generating training sets and a testing set for defect prediction, composed of a large number of metrics and defective information extracted from each of the selected versions and (3) creating defect prediction models from those extracted metrics. In the end, BEIRUT extracts a diversified catalog of 644 metrics and the defective information from each component of $k$ versions, automatically selected based on the rate of defects in each version. They were collected from 512 different projects, starting from 2009. The tool is also supplemented with an easy-to-use web interface that provides a configurable selection of projects and metrics and an interface to manage the defect prediction tasks. Moreover, this tool is adapted to be extended with new projects and new extractors, introducing new metrics to the benchmark. The web service tool can be found at

Original languageEnglish
Title of host publicationProceedings - 2021 IEEE 32nd International Symposium on Software Reliability Engineering, ISSRE 2021
EditorsZhi Jin, Xuandong Li, Jianwen Xiang, Leonardo Mariani, Ting Liu, Xiao Yu, Nahgmeh Ivaki
PublisherIEEE Computer Society
Number of pages10
ISBN (Electronic)9781665425872
StatePublished - 2021
Externally publishedYes
Event32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021 - Wuhan, China
Duration: 25 Oct 202128 Oct 2021

Publication series

NameProceedings - International Symposium on Software Reliability Engineering, ISSRE
ISSN (Print)1071-9458


Conference32nd IEEE International Symposium on Software Reliability Engineering, ISSRE 2021

Bibliographical note

Publisher Copyright:
© 2021 IEEE.


  • Defect Prediction
  • Open Source Metrics
  • Repository Mining Tool
  • Software Quality Metrics


Dive into the research topics of 'BEIRUT: Repository Mining for Defect Prediction'. Together they form a unique fingerprint.

Cite this