Introduction
The {weibulltools} package includes statistical methods and visualizations that can be used in reliability engineering. This web application introduces its content and lets you explore all functions interactively.
For an introduction read the Getting Started guide on this page. This guide presents a basic Weibull analysis and shows the core functions of {weibulltools}. Details for all functions and datasets can be found on the further pages that you can navigate to using the sidebar. The user interface of each function resembles genuine R code. Results and code, which is needed to reproduce these results, are included, as well.
Reliability Data
First, we need to clarify what we mean by reliability data and Weibull analysis. Reliability data consist of a lifetime variable and a status variable. The lifetime variable can represent various characteristics influencing the reliability of a product, e.g operating time (days/months in service), mileage (km, miles) or load cycles. The status variable indicates whether a unit has failed or is non-defective (censored) at the observed lifetime. With the Weibull analysis we can use this reliability data to estimate the overall reliability of our product.
For performing an introductory Weibull analysis, we will use the shock
dataset, which is included in {weibulltools}. In this dataset kilometer-dependent problems that have occurred on shock absorbers are reported. In addition to failed items the dataset also contains censored (non-defective) observations. The raw dataset has to be marked as reliability data using the function reliability_data()
.
library(weibulltools)
shock_tbl <- reliability_data(shock, x = distance, status = status)
shock_tbl
Estimation of Failure Probabilities
Now that the data is in the correct format, we can use it to estimate failure probabilities by calling estimate_cdf()
. This function takes the output of reliability_data()
and one or multiple methods specified by the methods
argument. The following box contains one tab per usable method.
shock_cdf <- estimate_cdf(shock_tbl, methods = "mr")
shock_cdf <- estimate_cdf(shock_tbl, methods = "johnson")
shock_cdf <- estimate_cdf(shock_tbl, methods = "kaplan")
shock_cdf <- estimate_cdf(shock_tbl, methods = "nelson")
Probability Plotting
The estimated probabilities can now be presented in a probability plot. For the sake of simplicity we assume that the estimation of failure probabilities was done with methods = johnson
. With plot_prob()
probability plots for several lifetime distributions can be constructed and estimates of multiple methods can be displayed at once. The axes are transformed in such a way that the cumulative distribution function (CDF) is represented through a straight line. If the plotted probabilities lie on an approximately straight line it can be said that the chosen distribution is adequate.
p_weib <- plot_prob(shock_cdf, "weibull")
p_weib <- plot_prob(shock_cdf, "lognormal")
p_weib <- plot_prob(shock_cdf, "loglogistic")
p_weib <- plot_prob(shock_cdf, "normal")
p_weib <- plot_prob(shock_cdf, "logistic")
p_weib <- plot_prob(shock_cdf, "sev")
p_weib <- plot_prob(shock_cdf, "exponential")
p_weib <- plot_prob(shock_cdf, "weibull3")
p_weib <- plot_prob(shock_cdf, "lognormal3")
p_weib <- plot_prob(shock_cdf, "loglogistic3")
p_weib <- plot_prob(shock_cdf, "exponential2")
Estimation of Parametric Lifetime Distributions
{weibulltools} comes with two basic methods for the parameter estimation of lifetime distributions. Whereas rank_regression()
fits a straight line through the plotting positions of the calculated failure probabilites, ml_estimation()
strives to maximize a function of the parameters given the sample data. rank_regression()
and ml_estimation()
can be applied to complete data as well as failure and (multiple) right-censored data. Both methods can also deal with three-parametric models that include a threshold parameter γ.
rr <- rank_regression(
shock_cdf
distribution = "weibull"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "lognormal"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "loglogistic"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "normal"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "logistic"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "sev"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "exponential"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "weibull3"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "lognormal3"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "loglogistic3"
)
rr
rr <- rank_regression(
shock_cdf
distribution = "exponential2"
)
rr
mle <- ml_estimation(
shock_tbl
distribution = "weibull"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "lognormal"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "loglogistic"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "normal"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "logistic"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "sev"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "exponential"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "weibull3"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "lognormal3"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "loglogistic3"
)
mle
mle <- ml_estimation(
shock_tbl
distribution = "exponential2"
)
mle
Visualization of Parametric Lifetime Distributions
After the parameters are obtained, a cumulative distribution function (CDF) can be computed and added to a probability plot with plot_mod()
.
p_weib %>%
plot_mod(
rr
title_trace = "Rank Regression"
)
p_weib %>%
plot_mod(
mle,
title_trace = "ML Estimation
)
Description
A dataset containing the number of cycles of fatigue life for Alloy T7987 specimens.
alloy
Format
A tibble with 72 rows and 2 variables:
- cycles
Number of cycles (in thousands).
- status
If specimen failed before 300 thousand cycles
1
else0
.
Source
Meeker, William Q; Escobar, Luis A., Statistical Methods for Reliability Data, New York: Wiley series in probability and statistics (1998, p.131)
Data
Description
Distance to failure for 38 vehicle shock absorbers.
shock
Format
A tibble with 38 rows and 3 variables:
- distance
Observed distance.
- failure_mode
One of two failure modes (
mode_1
andmode_2
) orcensored
if no failure occurred.- status
If
failure_mode
is eithermode_1
ormode_2
this is1
else0
.
Source
Meeker, William Q; Escobar, Luis A., Statistical Methods for Reliability Data, New York: Wiley series in probability and statistics (1998, p.630)
Data
Description
High Voltage Stress Test for the Dielectric Insulation of Generator armature bars
Source:R/data.R
voltage.Rd
A sample of 58 segments of bars were subjected to a high voltage stress test. Two failure modes occurred, Mode D (degradation failure) and Mode E (early failure).
voltage
Format
A tibble with 58 rows and 3 variables:
- hours
Observed hours.
- failure_mode
One of two failure modes (
D
andE
) orcensored
if no failure occurred.- status
If
failure_mode
is eitherD
orE
this is1
else0
.
Source
Doganaksoy, N.; Hahn, G.; Meeker, W. Q., Reliability Analysis by Failure Mode, Quality Progress, 35(6), 47-52, 2002
Data
Description
An illustrative field dataset that contains a variety of variables commonly collected in the automotive sector.
The dataset has complete information about failed and incomplete information about intact vehicles. See 'Format' and 'Details' for further insights.
field_data
Format
A tibble with 10,684 rows and 20 variables:
- vin
Vehicle identification number.
- dis
Days in service.
- mileage
Distances covered, which are unknown for censored units.
- status
1
for failed and0
for censored units.- production_date
Date of production.
- registration_date
Date of registration. Known for all failed units and for a few intact units.
- repair_date
The date on which the failure was repaired. It is assumed that the repair date is equal to the date of failure occurrence.
- report_date
The date on which lifetime information about the failure were available.
- country
Delivering country.
- region
The region within the country of delivery. Known for registered vehicles,
NA
for units with a missingregistration_date
.- climatic_zone
Climatic zone based on "Köppen-Geiger" climate classification. Known for registered vehicles,
NA
for units with a missingregistration_date
.- climatic_subzone
Climatic subzone based on "Köppen-Geiger" climate classification. Known for registered vehicles,
NA
for units with aregistration_date
.- brand
Brand of the vehicle.
- vehicle_model
Model of the vehicle.
- engine_type
Type of the engine.
- engine_date
Date where the engine was installed.
- gear_type
Type of the gear.
- gear_date
Date where the gear was installed.
- transmission
Transmission of the vehicle.
- fuel
Vehicle fuel.
Details
All vehicles were produced in 2014 and an analysis of the field data was made at the end of 2015. At the date of analysis, there were 684 failed and 10,000 intact vehicles.
Censored vehicles:
For censored units the service time (dis
) was computed as the difference
of the date of analysis "2015-12-31"
and the registration_date
.
For many units the latter date is unknown. For these, the difference of the
analysis date and production_date
was used to get a rough estimation of
the real service time. This uncertainty has to be considered in the subsequent
analysis (see delay in registration in the section 'Details' of
mcs_delay
).
Furthermore, due to the delay in report, the computed service time could also
be inaccurate. This uncertainty should be considered as well (see
delay in report in the section 'Details' of mcs_delay
).
The lifetime characteristic mileage
is unknown for all censored units.
If an analysis is to be made for this lifetime characteristic, covered distances
for these units have to be estimated (see mcs_mileage
).
Failed vehicles:
For failed units the service time (dis
) is computed as the difference
of repair_date
and registration_date
, which are known for all of them.
See also
Data
Reliability Data
Code
Result
MCS Delay Data
Code
Result
MCS Mileage Data
MCS Mileage Data
<- mcs_mileage_data(
data
field_data
mileage
mileage
time
dis
status
status
id
NULL
)
Code
Result
Non-Parametric Failure Probabilities
Probability Plotting Method for Univariate Lifetime Distributions
<- plot_prob(
x
distribution
title_main
title_x
title_y
title_trace
plot_method
plotly
)
ML Estimation
Rank Regression
EM Algorithm
Probability Plotting Method for Univariate Lifetime Distributions
<- plot_prob(
x
title_main
title_x
title_y
title_trace
plot_method
)
Add Estimated Population Line(s) to a Probability Plot
p_mod_mix <- plot_mod(
p_obj
x
title_trace
)
Segmented Regression
Probability Plotting Method for Univariate Lifetime Distributions
<- plot_prob(
x
title_main
title_x
title_y
title_trace
plot_method
)
Add Estimated Population Line(s) to a Probability Plot
p_mod_mix <- plot_mod(
p_obj
x
title_trace
)
Beta Binomial Confidence Bounds
Add Confidence Region(s) for Quantiles and Probabilities
p_conf <- plot_conf(
p_obj
x
title_trace_mod
title_trace_conf
)
Fisher's Confidence Bounds
Add Confidence Region(s) for Quantiles and Probabilities
p_conf <- plot_conf(
p_obj
x
title_trace_mod
title_trace_conf
)
Simulation of Delays
Adjustment of Operating Times by Delays using a Monte Carlo Approach
mcs_delay_result <- mcs_delay(
x
distribution
)
Code
Result
Simulation of Mileages
Simulation of Unknown Covered Distances using a Monte Carlo Approach
mcs_mileage_result <- mcs_mileage(
x
distribution
)