Recently I've been refreshing my knowledge of reliability analysis, which is the use of data to assess a product's ability to perform over time. Quality engineers typically use reliability analysis to predict the likelihood that a certain percentage of products will fail over a given amount of time.
Why does choice of distribution matter for reliability analysis?
Using statistical software to identify the distribution of reliability data
- Do the data follow a symmetric distribution? Are they skewed left or right?
- Is the failure rate rising or falling? Or is it staying constant?
- What distribution has worked for this analysis in the past?
Choosing the best distribution model from the identification plot
We're looking to see which distribution line is the best match for our data. Immediately we can rule out the Exponential distribution, where barely any of our data points follow the best-fit line. The other three look better, but the points seem to fit the straight line of the lognormal plot best, so that distribution would be a good choice for running subsequent reliability analyses.
It can sometimes be difficult to tell which distribution is the best fit from the graph, so you should also check the Anderson-Darling goodness-of-fit values and other statistics in the Session window output. The Anderson-Darling values appear alongside the "Correlation Coefficients" on the plot. The smaller the Anderson-Darling value, the better the fit of the distribution.
For our data, the Anderson-Darling value for the lognormal distribution is lower than those for other distributions, further supporting the lognormal distribution as the best fit.
Have you ever needed to identify the distribution of your data? How did you do it?