dcsimg
 

Got Risk? Using FMEA to Improve Quality Assurance

Balancing Cost, Quality and Delivery

Quality improvements pros know that balancing cost, quality and delivery can be a risky proposition. Let's talk about risk-based testing. I should start by saying that this is a sensitive subject. It’s why a career in quality is not for the faint of heart. The fact is that cost, quality and delivery are at constant odds in every industry, including software. Quality professionals play a key role in this balancing act. And, whenever things become "unbalanced," we quality professionals are often the bad guy—or at least we feel that way.

For a long time, I felt sorry for myself. Why couldn’t people understand how difficult my job was? Fighting so hard each and every day against those who might sacrifice quality in the name of cost and/or delivery, standing up for truth and justice and...well, okay, so there were no villains, I’m not a superhero and it wasn’t quite so dramatic. But keeping the perfect balance between quality and delivery is challenging!

Over the years, however, it's become evident that there is no correlation between the amount of time we spend testing and the quality of the products we deliver. What a relief, right?  Well, not really. It doesn't mean ensuring quality is easier; it just means "more of the same" isn't the answer. It's much more complex than that.

I've learned that quality assurance is not about how much we test, it’s how we test. Truth be told, quite often it’s not about testing at all (more on that in a future post).

Risk-Based Testing

So, how do we balance cost, quality and delivery at Minitab?  Very delicately. I'd love to say that we nail it every time but I'd be lying—and one thing a quality professional needs is rigorous honesty. Again, being a quality improvement professional is not for the faint of heart! But I can say that Minitab takes quality very seriously, and everyone who works here plays a part in continuously improving it.


While this subject can cover many posts, we’ll start with Risk-Based testing (RBT). Since there is a nearly infinite number of test cases for any feature or release, running every possible test would be both cost- and time-prohibitive. The role of a quality professional is to determine which tests should be run, how often they should be run, and how they are run.  I liken this to my days in manufacturing when we utilized sampling plans for inspection. We couldn't inspect everything, but we could be very smart about what we did inspect.

Failure Modes and Effects Analysis, or FMEA

At Minitab, we use RBT to develop test strategies based on the risk of failure, which is a function of the probability of occurrence and severity. To assist our efforts, we utilize Companion by Minitab's Failure Modes and Effects Analysis (FMEA).

I first began using an FMEA while working in the medical device industry in the early 90s. It worked well then and, almost 20 years later, it’s still one of my go-to tools. Of course, Quality Companion has made it much easier to manage than it was 20 years ago! As illustrated below, FMEA is used to help prioritize failures based on seriousness, frequency of occurrence, and ease of detection. Based on this assessment, clear strategies for risk mitigation can be identified and implemented.  

Excerpt from an FMEA at Minitab:

Step#

Process Map - Activity

Potential Failure Mode

Potential Failure Effects

SEV

OCC

Current Controls

DET

RPN

10

Dialog Testing

Access key validation isn’t automated for each dialog

The hot keys are initially validated within development. If a developer makes a change that breaks a hot key assignment, it may not be immediately detected.

3

2

Initial manual testing and follow up manual regression testing of changes for each dialog. General automated testing of access keys.

5

30

 

Actions Recommended

Responsibility

Target End Date

Actions Taken

Actual End Date

An evaluation will be done to determine if some automated testing can be performed

JRoan (Test Architect)

3/1/2011

Automated monitoring of dialog file to detect changes in assignments.

2/15/2011

 

Revised metrics

SEV

OCC

DET

RPN

3

2

1

6

 

Steps for Completing the FMEA

1) In Process Map - Activity, enter each process step, feature or type of activity.   In the example above, it's dialog testing at a feature level.

2) In Potential Failure Mode, identify ways the process can fail for each activity. Multiple failure modes may exist.  In the example above, we don't have test scripts running to validate the functionality of each access key in each dialog.

3) In Potential Failure Effects, enter potential failure effects for each failure mode. Any failure mode can have multiple failure effects. The potential failure above is that if a developer makes a change (once the dialog has been verified), and that change breaks the access key, the failure may not be detected.

4) In SEV (Severity Rating), estimate the severity of each failure effect. Use a 1 to 10 scale, where 10 signifies high and 1 signifies low.  This is a relative assignment. In our world, the access keys would have a lower severity than, for example, the statistical or graphical results. We assigned this a 3 severity rating.

5) In OCC (Occurrence Rating), estimate the probability of occurrence of the cause. Use a 1 to 10 scale, where 10 signifies high frequency (guaranteed ongoing problem) and 1 signifies low frequency (extremely unlikely to occur).  In our example, the probability of making an access key change after the initial assignment is low, but not impossible. We assigned this an occurrence rating of 2.

6) In Current Control, enter the manner in which the failure causes/modes are detected or controlled.  In the dialog testing, we do manually validate each access key, automate the general validation of access keys, and perform quick tests.

7) In DET (Detection Rating), evaluate the ability of each control to detect or control the failure cause/mode. Use a 1 to 10 scale, where 10 signifies poor detection/control (the customer will almost surely receive a flawed output) and 1 signifies high detection/control (almost certain detection, generally finding the cause before it has a chance to create the failure mode).  In our example, the probability of detecting the problem wasn't high but we may find it due to the manual testing performed throughout. We assigned a detection rating of 5.

8) Evaluating the RPN: The RPN (Risk Priority Number) is the product of the SEV, OCC, and DET scores. The RPN is the overall score for a combination of Mode/Effect/Cause. The higher the RPN, the more severe, more frequent, or less controlled a potential problem is, indicating a greater need for immediate attention. In our example, the RPN was 30 and we decided to evaluate and determine if any automation could be performed. Our test architect identified an alternative solution that would improve our probability of detection. 

9) Once corrective action has been taken, enter the new SEV, OCC, and DET values to calculate a revised RPN.  In our example, the RPN was reduced to a very acceptable 6!

Making FMEA Easy

The FMEA analysis tool in Companion has provided a great way to analyze, manage and communicate our risk-based assessments and resulting test strategies. In the example above, an automated system was created to alert QA of changes to the files. This was a cost-effective risk mitigation strategy for the issue at hand. The FMEA provided a structure to guide the teams through the analysis and then document the reasoning behind the decision.

The FMEA also is maintained and available as a record of these types of decisions—I don’t have to tell any quality professional how helpful that is in preventing future “Why exactly did we decide to do that?” discussions. I just wish Companion had been around when I started doing FMEAs 20 years ago! 

How do you analyze risk in your organization? 

Comments

Name: Sjay • Monday, February 24, 2014

Thanks for the article. How would you suggest doing this in Agile project where we dont have the luxury of time to go through this detail analysis of the system? thoughts?


Name: Dawn Keller • Tuesday, February 25, 2014

That is an excellent question! We use an agile development model at Minitab, as well. We've found that evaluating and managing risk is a key factor in supporting our development. To that end, our FMEA is really s continuous assessment done by our development leads. Rather than plan it out in the very beginning of the project, we are continuously evaluating risk during each sprint and the FMEA provides a framework for doing so. I hope that I answered your question. Please let me know if you'd like more information on our approach and thank you so much for your insight!


Name: SJ • Wednesday, February 26, 2014

Thanks for the response.Its great you could make time to do this analysis. A few questions,

1) Isn't QA involved in the analysis?if so why? if you do, what is your role?Isn't it little too risky to just have dev leads analyze risk instead of doing it as a team that includes BA's, QA's and others where SME's from each team would serve the purpose better?

2) How did you folks implement this six sigma methodology in your SDLC!?, did you gain support from the Top management or was it driven top down?

3) Iam aware of the six sigma methodolgies (to a certain extent) like FB diagram, (could be used to find root cause for prodution issues) and FMEA(to try and minimize risk/issues proactively) e.t.c. But my dilemma is how do you take these concepts and apply this as a framework in an organization as part of your SDLC cycle? I would love to know your thoughts/experience.Thanks again for your time.


Name: Dawn Keller • Monday, March 3, 2014

These are excellent questions!

Sorry about the reference to Development Teams without a context. When we refer to the development team at Minitab, we mean Software Engineering, Quality Assurance, Product Management, Design and Project Management. All groups are involved in understanding and managing risk throughout development. The team drives most of the analysis and reports on it. I'm involved to some degree, depending on risk level.

Regarding the implementation of the methodology within the SDLC, I would say it's both top down and bottom up. The tools chosen are driven by the project teams. However, the support for Quality Improvement is definitely top down. Our executive team at Minitab is actively supportive and involved in how we manage Quality throughout the organization. In fact, it is not uncommon for our CFO to call me with questions regarding Quality status and strategies, as he is keenly aware of its impact to finances. That being said, I have worked at many organizations and understand that this is not common and I understand the difficulty in attempting to implement without support.

Your third question is an excellent one! To implement these methodologies, ideally there is support in doing so. As the Quality Manager, I work with the teams and with project management to identify the best tools (FMEA, FB, etc) and they vary by team and by project. We don’t necessarily insist on a certain set of tools but we do lay the foundation and expectations for Quality. Our SLDC is our “process”, Quality is our foundation, and the teams determine the best strategy and tools to support those goals.

Thank you again for your insightful questions. As this is such a broad and interesting topic, it’s sometimes difficult to answer via comments. However, when we have a more substantial document, whether it be a white paper, more detailed blog posts or a combo of resources, I would love to keep your email and get your input on it.

Until then, please let me know if I can provide further information. Have a wonderful day and thank you again!

Best,
Dawn


blog comments powered by Disqus