What is Root Cause Analysis?Root cause analysis is a systematic method to determine the underlying cause or causes of the problem. Once the “real” problem is identified, then you go about trying to fix the problem. You will also need to follow up and confirm the problem was resolved after the fix was implemented. It is all too common that the root cause of equipment failures is not addressed, so failures keep happening. Just like in gardening if you do not pull out the weed’s root, the weed will grow back quickly.
Where is Your Maintenance Organization?The goal of root cause analysis is to move from a reactive to a proactive maintenance organization. Consider the following scenario. A belt breaks in the supply fan for the air handler that cools the plant manager’s office. What happens next? 1) The parts budget has been exceeded for the month. The belt will be fixed next month. 2) The belt is expedited from the supplier, and maintenance works overtime to replace the belt. 3) The belt is in stock, and the equipment is repaired. It was discovered that there is no PM on the equipment. A six-month PM to inspect the belt for damage is set up. 4) The belt is in stock, and the equipment is repaired. It was determined that a severe pulley misalignment was the cause of the belt failure through root cause analysis. Laser alignment tools were used during the repair to ensure proper pulley alignment. 5) In addition to the previous step, all maintenance personnel are trained on how to use laser alignment tools. A new policy is instituted requiring all new equipment to be checked for proper alignment and aligned, if necessary, during the commissioning process.
DMAIC Process for Root Cause AnalysisBefore explaining this important root cause analysis process, it is important to explain when to complete root cause analysis. It should be performed when it is not obvious why the equipment failed, the equipment is very important, or this is a recurring problem. DMAIC is a Six Sigma improvement process to determine the cause of a problem and implement a lasting improvement. It has the following root cause analysis steps: Define: Have a clear problem statement. Measure: Collect data on the problem. Analyze: Analyze data to determine the root cause. Improvement: Develop improvement actions based on root cause(s). Control: Develop controls to prevent recurrence of the problem. The initial step of the DMAIC process is the define stage. In this stage, it is important to define to whom, what and where the problem is happening. Also, it is essential to write a problem statement to define the problem clearly for root cause analysis. Next, write what the desired state should be. Yogi Berra’s wisdom for the define stage: “You’ve got to be very careful if you don’t know where you are going because you might not get there.” The second step of the DMAIC and root cause analysis process is the measure stage. In this stage, you collect data relating to the problem and observe the failing equipment in person. Once the data has been collected, it can be organized in a format such as a Pareto chart. Yogi Berra’s wisdom for the measure stage: “You can observe a lot by watching.” The third stage of the DMAIC and root cause analysis process is the analyze phase. The purpose of the analyze phase is to determine the root of the problem. The data collected in the measure phase may very clearly illuminate what the problem is. If not, the following techniques can be used to determine the cause. The “5 Why” approach involves asking why until you get to the cause. Another approach is to use brainstorming to list potential root causes. The man, method, materials, materials diagram is a way to organization the results of the brainstorming. An alternative method is to organize the results by ranking potential root causes among different categories such as most likely to be causing the problem, easy to test, etc. A numerical value is assigned to each category. Next, focus on confirming if the highest ranking items are the true cause. This may involve additional testing to confirm. Forget Yogi Berra here, but turn to the Beatles instead. You need help from people to complete the analysis phase. The fourth stage of the DMAIC process is the improve phase. This is where you implement improvements that address the cause of the problem found in the analyze phase. Before you implement the improvement, make sure you think through if your “improvement” is going to create a new problem. Also, you must understand the expected results. Then go ahead and implement your solution. Yogi Berra’s wisdom for the improve phase: “When you come to a fork in the road, take it.” The fifth and final stage of the DMAIC process is the control phase. This is when you confirm the improvement solved the issue. You should look to apply the improvement to other equipment. Yogi Berra’s wisdom for the control phase: “It ain’t over, till it’s over.” One way to determine if you truly solved the problem using root cause analysis is to use the rule of three standard deviations. Standard deviation is just a fancy way of asking whether something occurs regularly or if the occurrence is unpredictable. If the standard deviation is small, then the event occurs very regularly. If the standard deviation is large, the event is more unpredictable. Below is an example of standard deviation for a belt breaking. You can use Excel to perform a standard deviation calculation. The graph below shows the range time for the 99-percent-confidence interval. I am more than 99-percent confident that the belt will fail between 48 days (93-15 x3) and (93+15 x 3) and 138 days. I implemented an improvement that gave maintenance people laser alignment tools and then trained them on their proper use. Now the belt does not fail before it is replaced on a six-month PM. Before the improvement, I would have been 99-percent confident the belt would have failed before the six-month PM. Since the belt did not fail in six months, I know a real improvement was made. I now have objective criteria for closing the DMAIC. I have controlled the problem of the belt failing with root cause analysis. Now let’s move on to some real-world examples of where the DMAIC process was used to solve three different problems. The problems involved the failure of a dust collector relief vent, an unexplained discharge of a sprinkler head and the failure of coating pan spray nozzles.
Failure of the Dust Collector Relief VentThere were a large number of “identical” dust collectors onsite, but one had an issue with the relief failing about every three months. No one could figure out why the dust collector was failing and not root cause analysis had been conducted. I was asked to resolve the issue using the DMAIC process and root cause analysis. I started with the define stage. The problem statement was, “Encapsulator #1 dust collector has had repeated, unexplained discharges of the relief vent over the last three years, including seven relief vent failures since 2011. The last event was on Aug. 7, 2013. MTBF (mean time between failures) is 108 days with standard deviation of 57 days since 2011. Desired state: No discharges of the relief vent on the encapsulator #1 dust collector.” Next, I began the measure phase. Only one product was run through this dust collector. I had a laboratory test conducted to understand how explosive the dust was. The results were the dust had a low explosive potential. I observed the dust collector after the vent failed on Aug. 7, 2013, and did not see any evidence of a fire. I looked at the relief vents on all the “identical” dust collectors. I observed the following:
Guess which dust collector vent was failing repeatedly?After I collected some data, I was ready to move to the analyze phase. I knew the dust was not explosive and the encapsulator dust collector relief vent was different than the other dust collectors onsite. What was causing the flat vent to fail? I discussed what I found with the dust collector manufacturer. They had reported issues with the flat-style vents failing without dust explosions. Please note that I called the manufacturer when I first started working on the problem before I had pictures, and they did not offer an explanation on why the vent was failing. The recommendation is always to get pictures. The manufacturer’s explanation was that there was a design flaw to the flat vent. Their engineering department believed the flat vent failed in fatigue from the force of the air pulse used to “knock” dust off the filters and into the dust collection bucket (as shown below). I now progressed to the improve phase. The manufacturer’s recommendation was to use a domed vent that had been used successfully in the field for a number of years. I compared the flat vent vs. the domed vent in terms of its ability to resist deflection. I remembered the calculation for the deflection of a beam (base x height^3). The flat vent’s height is its metal thickest, while the domed vent was more than 1 inch tall. I made the calculation that the domed vent was approximately 10,000 stronger than the flat vent. The domed vent was ordered and installed in November 2013. Finally, I reached the control phase of root cause analysis. How long do I have to wait until I am confident the domed vent resolved the problem? The rule of three standard deviations applies. From the data I collected in the define stage, I knew historically the vent failed every 108 days with a standard deviation of 57 days. Using the rule of three standard deviations, I calculated that we would need to wait nine months to close this DMAIC (108+57*3 = 279 days). If the domed vent did not address the root cause of the domed vent failing, I would be 99 confident the vent would fail in the next nine months. In September 2014, I closed the DMAIC and root cause analysis, as no failure had occurred since the domed vent was installed.