If you collect and analyze real data for a living, the idea of using simulated data for a Monte Carlo simulation sounds a bit odd. How can you improve a real product with simulated data?
Companion by Minitab is a software platform that combines a desktop app for executing quality projects with a web dashboard that makes reporting on your entire quality initiative literally effortless. Among the first-in-class tools in the desktop app is a Monte Carlo simulation tool that makes this method extremely accessible.
The Monte Carlo method uses repeated random sampling to generate simulated data to use with a mathematical model. This model often comes from a statistical analysis, such as a designed experiment or a regression analysis. Check out our webinar recording, Seeing the Unknown: Identifying Risk and Quantifying Probability with Monte Carlo Simulation, to see Monte Carlo in action.
Suppose you study a process and use statistics to model it like this:
With this type of linear model, you can enter the process input values into the equation and predict the process output. However, in the real world, the input values won’t be a single value thanks to variability. Unfortunately, this input variability causes variability and defects in the output.
To design a better process, you could collect a mountain of data in order to determine how input variability relates to output variability under a variety of conditions. However, if you understand the typical distribution of the input values and you have an equation that models the process, you can easily generate a vast amount of simulated input values and enter them into the process equation to produce a simulated distribution of the process outputs.
You can also easily change these input distributions to answer "what if" types of questions. That's what Monte Carlo simulation is all about. In the example we are about to work through, we'll change both the mean and standard deviation of the simulated data to improve the quality of a product.
Today, simulated data is routinely used in situations where resources are limited or gathering real data would be too expensive or impractical.