Prognostics and Health Management (PHM) is an engineering discipline focused on predicting the point at which systems or components will no longer perform as intended. The prediction is often articulated as a Remaining Useful Life (RUL). RUL is an important decision-making tool for contingency mitigation, i.e., the prediction of an RUL (and its associated confidence) enables decisions to be made about how and when to maintain the system. PHM is generally applied to hardware systems in the electronics and non-electronics application domains. The application of PHM (and RUL) concepts has not been explored for application to software.
Today, software (SW) health management is confined to diagnostic assessments that identify problems, whereas prognostic assessment potentially indicates when in the future a problem will become detrimental to the operation of the system. Relevant areas such as SW defect prediction, SW reliability prediction, predictive maintenance of SW, SW degradation, and SW performance prediction, exist, but all represent static models, built upon historical data — none of which can calculate an RUL.
This paper addresses the application of PHM concepts to software systems for fault predictions and RUL estimation. Specifically, we wish to address how PHM can be used to make decisions for SW systems such as version update, module changes, rejuvenation, maintenance scheduling and abandonment.
This paper presents a method to prognostically and continuously predict the RUL of a SW system based on usage parameters (e.g., numbers and categories of releases) and multiple performance parameters (e.g., response time). The model is validated based on actual data (on performance parameters), generated by the test beds versus predicted data, generated by a predictive model. Statistical validation (regression validation) has been carried out as well. The test beds replicate and validate faults, collected from a real application, in a controlled and standard test (staging) environment. A case study based on publicly available data on faults and enhancement requests for the open-source Bugzilla application is presented. This case study demonstrates that PHM concepts can be applied to SW systems and RUL can be calculated to make decisions on software version update or upgrade, module changes, rejuvenation, maintenance schedule and total abandonment.