Bruno Scibilia

I practiced quality improvement statistical techniques in manufacturing for many years, and I look forward to sharing some of what I've learned about quality improvement with you! Continue Reading »

There may be huge potential benefits waiting in the data in your servers. These data may be used for many different purposes. Better data allows better decisions, of course. Banks, insurance firms, and telecom companies already own a large amount of data about their customers. These resources are useful for building a more personal relationship with each customer. Some organizations already use... Continue Reading
Businesses are getting more and more data from existing and potential customers: whenever we click on a web site, for example, it can be recorded in the vendor's database. And whenever we use electronic ID cards to access public transportation or other services, our movements across the city may be analyzed. In the very near future, connected objects such as cars and electrical appliances will... Continue Reading

7 Deadly Statistical Sins Even the Experts Make

Do you know how to avoid them?

Get the facts >
This is an era of massive data. A huge amount of data is being generated from the web and from customer relations records, not to mention also from sensors used in the manufacturing industry (semiconductor, pharmaceutical, petrochemical companies and many other industries). Univariate Control Charts In the manufacturing industry, critical product characteristics get routinely collected to ensure... Continue Reading
In my last post, I discussed how a DOE was chosen to optimize a chemical-mechanical polishing process in the microelectronics industry. This important process improved the plant's final manufacturing yields. We selected an experimental design that let us study the effects of six process parameters in 16 runs. Analyzing the Design Now we'll examine the analysis of the DOE results after the actual... Continue Reading
I used to work in the manufacturing industry. Some processes were so complex that even a very experienced and competent engineer would not necessarily know how to identify the best settings for the manufacturing equipment. You could make a guess using a general idea of what should be done regarding the optimal settings, but that was not sufficient. You need very precise indications of the correct... Continue Reading
There are many reasons why a distribution might not be normal/Gaussian. A non-normal pattern might be caused by several distributions being mixed together, or by a drift in time, or by one or several outliers, or by an asymmetrical behavior, some out-of-control points, etc. I recently collected the scores of three different teams (the Blue team, the Yellow team and the Pink team) after a laser... Continue Reading
Having delivered training courses on capability analyses with Minitab, several times, I have noticed that one question you can be absolutely sure will be asked, during the course, is: What is the difference between the Cpk and the Ppk indices? Ppk vs. Cpk indices The terms Cpk and Ppk are often confused, so that when quality or process engineers refer to the Cpk index, they often actually intend to... Continue Reading
When performing a design of experiments (DOE), some factor levels may be very difficult to change—for example, temperature changes for a furnace. Under these circumstances, completely randomizing the order in which tests are run becomes almost impossible.To minimize the number of factor level changes for a Hard-to-Change (HTC) factor, a split-plot design is required. Why Do We Want to Randomize a... Continue Reading
Kappa statistics are commonly used to indicate the degree of agreement of nominal assessments made by multiple appraisers. They are typically used for visual inspection to identify defects. Another example might be inspectors rating defects on TV sets: Do they consistently agree on their classifications of scratches, low picture quality, poor sound?  Another application could be patients examined... Continue Reading
The Cp and Cpk are well known capability indices commonly used to ensure that a process spread is as small as possible compared to the tolerance interval (Cp), or that it stays well within specifications (Cpk). Yet another type of capability index exists: the Cpm, which is much less known and used less frequently. The main difference between the Cpm and the other capability indices is that the... Continue Reading
Imagine that you are watching a race and that you are located close to the finish line. When the first and fastest runners complete the race, the differences in times between them will probably be quite small. Now wait until the last runners arrive and consider their finishing times. For these slowest runners, the differences in completion times will be extremely large. This is due to the fact that... Continue Reading
Suppose that you have designed a brand new product with many improved features that well help create a much better customer experience. Now you must ensure that it is manufactured according to the best quality and reliability standards, so that it gets the excellent long-term reputation it deserves from potential customers. You need to move quickly and seamlessly from Research and Development into... Continue Reading
In my recent meetings with people from various companies in the service industries, I realized that one of the problems they face is that they were collecting large amounts of "qualitative" data: types of product, customer profiles, different subsidiaries, several customer requirements, etc. As I discussed in my previous post, one way to look at qualitative data is to use different types of... Continue Reading
In several previous blogs, I have discussed the use of statistics for quality improvement in the service sector. Understandably, services account for a very large part of the economy. Lately, when meeting with several people from financial companies, I realized that one of the problems they faced was that they were collecting large amounts of "qualitative" data: types of product, customer... Continue Reading
Suppose that you plan to source a substantial amount of parts or subcomponents from a new supplier. To ensure that their quality level is acceptable to you, you might want to assess the capability levels (Ppk and Cpk indices) of their manufacturing processes and check whether their critical process parameters are fully under control (using control charts). If you are not sure about the efficiency... Continue Reading
Using statistical techniques to optimize manufacturing processes is quite common now, but using the same approach on social topics is still an innovative approach. For example, if our objective is to improve student academic performances, should we increase teachers wages or would it be better to reduce the number of students in a class? Many social topics (the effect of increasing the minimum... Continue Reading
Screening experimental designs allow you to study a very large number of factors in a very limited number of runs. The objective is to focus on the few factors that have a real effect and eliminate the effects that are not significant. This is often the initial typical objective of any experimenter when a DOE (design of experiments) is performed. Table of Factorial Designs Consider the table below.... Continue Reading
Choosing the right type of subgroup in a control chart is crucial. In a rational subgroup, the variability within a subgroup should encompass common causes, random, short-term variability and represent “normal,” “typical,” natural process variations, whereas differences between subgroups are useful to detect drifts in variability over time (due to “special” or “assignable” causes). Variation within... Continue Reading
There is more than just the p value in a probability plot—the overall graphical pattern also provides a great deal of useful information. Probability plots are a powerful tool to better understand your data. In this post, I intend to present the main principles of probability plots and focus on their visual interpretation using some real data. In probability plots, the data density distribution... Continue Reading
In April 2012, I wrote a short paper on binary logistic regression to analyze wine tasting data. At that time, François Hollande was about to get elected as French president and in the U.S., Mitt Romney was winning the Republican primaries. That seems like a long time ago… Now, in 2014, Minitab 17 Statistical Softwarehas just been released. Had Minitab 17, been available in 2012, would have I... Continue Reading