Root Cause Failure Analysis (RCFA) is a very useful tool for improving the reliability of plant process equipment. It is a logical, structured, and deductive technique that can identify the causes behind the failure. This case study will demonstrate how an RCFA was used to uncover the root causes of chronic bearing failures on a continuous belt filter. It will also show how those root causes were addressed which resulted in a significant improvement in the reliability of the belt filter.
Root Cause Failure Analysis
One definition of RCFA is the process of identifying the most basic reason for a failure which, if eliminated or corrected, would have prevented it from existing or occurring. That “basic reason” is generally referred to as the root cause. There can be and often are multiple causes involved when a failure occurs. And root causes can be categorized into three areas based on their nature and origin:
- Physical root causes are typically the easiest ones to uncover. They are typically the tangible results of the failure (i.e. they can be seen, touched, smelled etc.).
- Systematic root causes are more difficult to discern. These have their basis in procedures, communication, training, documentation, etc..
- Human root causes are related to errors in judgment and behavior of humans
A root cause failure analysis is a pre-determined and structured procedure that is followed during an investigation. But its structure is not so rigorous that it does not allow for some flexibility. The approach of the analysis can be tailored to the particular failure or event for best effectiveness. It can be very simple or very involved. The goal of the RCFA is to provide enough factual information about the failure to allow for the development of an accurate and appropriate response. An accurate and appropriate response is one that will eliminate (or at least reduce) the likelihood of that failure occurring again. It is a reactive response to a failure that has occurred. It is also often used to analyze chronic equipment failures.
Why perform an RCFA? When a piece of equipment experiences a failure, the nature and cause of the failure should be investigated and analyzed. By understanding the how, why, what, and when of a failure, steps can be taken to prevent a reoccurrence. There are a number of benefits that can be gained from performing an RCFA:
- Increase in OEE (Overall Equipment Effectiveness). The equipment runs more and runs better. It is available for production more and makes quality product.
- Cost reduction. Less operational, maintenance, and administrative resources are required to keep the equipment operating.
- Education. Those involved in an Root Cause Failure Analysis come out of the experience with a greater understanding of how the equipment operates and what can affect its performance. This knowledge is documented and can be passed on to the appropriate individuals. The lessons learned can be shared with other sites that have the same or similar equipment. The workforce will become familiar with the RCFA process and more adept at performing future RCFAs.
- Safety and the environment. Some failures can affect the environmental and safety performance of the equipment. Reducing or eliminating those failures can prevent injury or damage to the environment.
- Mental Health. Chronic problem machines can cause fatigue & frustration for those operating and maintaining them. Equipment problems corrected via an RCFA can improve the morale of the workforce and validate the RCFA process.
Not every failure requires an RCFA. The cause (or causes) of some failures can be obvious. Likewise, the appropriate solution may also be obvious. A word of caution here – while a root cause may seem obvious on the surface, there could be other contributing causes that may not be so obvious. The systematic and human root causes noted above are often not obvious. Not uncovering those causes could lead to a less than effective solution and a repeat of the failure. Good judgment must be exercised and if there is any doubt, an Root Cause Failure Analysis should be done. This is especially true if there are safety implications and/or the equipment is high on the criticality list.
RCFAs can be done by an individual or a team. The one person approach is generally OK for the simple and straight forward failures. Again as noted above, use good judgment when deciding how to proceed. Seek input from others. If there is doubt, put together a team. Once you have decided to facilitate a team approach, carefully consider who should be on the team. Some things to consider are the complexity of the equipment, the consequences of the failure, the number of shifts and operators involved, the number of departments affected, and who are the knowledgeable people. Keep the size of the team reasonable and appropriate for the task. Too many people can slow and disrupt the process. Not enough could result in missed or inaccurate information and a faulty result. Consider all resources including not only plant personnel but also vendors, contractors, and subject matter experts. At the first meeting, follow the usual group project/meeting guidelines including a clear statement of the problem and goal, a schedule of meetings and events, and establishment of roles. Also, decide what approach the team will take.
There are several methodologies that are typically used when performing an RCFA. It is up to the individual or team facilitating the RCFA to determine which method is best for their purposes. Each method is based on cause and effect logic but they each approach it in different ways. Here is a brief summary of the methods:
- The 5 WHYS – This method simply asks the question WHY five times. You begin with a statement of the problem or failure and ask WHY did this happen. This is the first why. The answer is typically BECAUSE THIS happened. This leads to the second question, WHY did THIS happen, and so on. After asking why five times, you have drilled down to the root cause.
- Causal Factor Mapping – This is similar to the 5 WHYS approach. It is more graphical. You begin with the problem statement and working to the left, you write down the factors involved that led to the problem. Multiple factors are typically listed and the interaction between factors is noted using the words AND or OR. An AND connector signifies that both factors must occur for the failure to occur. OR means either one or the other is necessary.
- Logic Tree – The logic tree approach is suitable for more complex problems. It is more of a flow chart approach and includes detailed failure descriptions that rely on complete and accurate data.
- Fish Bone Diagram – A fish bone diagram is much like its sounds. You draw lines like the bones of a fish on an easel. There is a long horizontal line (like the spine of a fish) with shorter angled lines coming off that line above and below (like the ribs of a fish). The spine represents the path to the problem or failure and the ribs are those events or conditions that help produce the problem.
The facilitator must promote a neutral and open atmosphere as he or she leads and guides the group through the RFCA process. It is important to take ownership of the process and be enthusiastic. The facilitator must drive the process. Assist the team with gathering pertinent and accurate data and information. This can include inspections of the equipment and parts, maintenance records, equipment files, control system records and trends, condition monitoring records and reports, drawings, specifications, and video data. Interviews with witnesses and/or involved personnel such as operators, mechanics, electricians, foremen, engineers, and technicians can be enormously enlightening and revealing. Throughout the process, be cautious of false information that is based on here say, emotion, or cultural beliefs. Try to use only reliable factual information in your deliberations. Keep accurate records of the meetings and the teams work. When the Root Cause Failure Analysis is complete, use this documentation to create a clear and concise report of the findings, the root cause/s, and any recommendations.
The team may have accomplished its mission and found the root cause/s but there is still more work to be done. A solution must be developed. The solution almost always involves some level of change. It could be a change in equipment design or the adoption of a new or different technology. It could be a process or procedure change. Change can be difficult to implement. The facilitator must be a champion of the Root Cause Failure Analysis and work to ensure a complete and successful culmination. It is important to remember to update the equipment files and any other documentation, especially when physical changes have been made to a piece of equipment as a result of the RCFA.
Look for other opportunities to apply the results of the RCFA. Important and new knowledge about the equipment and/or the process that was learned needs to be shared with the appropriate people. That includes others in the company and other sites that may have the same equipment and processes.
Root Cause Analysis
I would like to touch briefly on Root Cause Analysis (RCA). It is a little different from RCFA. One definition of RCA is a systematic, analytical work process of identifying defects/actions which, if eliminated or corrected, will prevent specific undesirable conditions or situations from ever happening. Note that an RCA is a proactive approach used to prevent a failure or problem from occurring. The methodologies used are similar but there is no failure to analyze. Rather the team attempts to determine possible failure modes and what events or causes would produce those failures. it is typically a more rigorous and involved process than an RCFA, but with a potentially greater pay-off.
This is a case study of chronic bearing failures on a continuous belt filter at the Curtis Bay Plant of W.R.GRACE & Co. located in Baltimore Maryland. This location has produced specialty chemicals of one type or another for over 100 years. Presently the plant produces fluid cracking catalysts, silica gel products, and specialty plastics catalysts. The belt filter that is the subject of this case study is part of the fluid cracking catalyst facility. A continuous belt filter is a piece of processing equipment that takes a slurry solution, removes much of the water from it, and forms a filter cake that is returned into the process stream. The machine itself is like a horizontal rubber belt conveyor, 40 feet long and 4 feet wide. The belt has grooves cut in the top surface and holes drilled through the belt in the center. A porous filter cloth belt rests on top of this rubber belt and travels with the rubber belt as it runs over the pulleys. In between the pulleys and resting against the bottom side of the rubber belt at the center is a vacuum pan. Water is used to lubricate the belt as it travels on the vacuum pan. The slurry drops onto the cloth belt at the tail pulley. As the cloth and belt travel, the vacuum within the center pan pulls the water out of the slurry.
The remaining solids on the top of the cloth form a filter cake which is discharged off the cloth at the drive pulley end of the machine. This process is continuous. The rubber belt is supported on the return loop by rubber covered rollers. The rubber covered return rollers are supported by pillow block style ball bearings. The filter operates 24 hours per day and 7 days per week. The operators are on a rotating shift schedule. There is a dedicated maintenance crew for this plant during the day shift with coverage on the off shifts by the site night maintenance group. A single lubrication technician assigned to this plant uses a hand held grease gun to periodically lubricate the bearings. The machine is critical to plant production (without it, product cannot be produced).
The problem is the premature failure of the pillow block bearings. The decision was made to address this issue to improve the reliability of the machine. The reliability engineer determined that a team was not necessary for this particular RCFA. The process began with an inspection and observation of the filter. The bearings are physically located on the lower half of the machine. Any chemicals or water dripping from the belt above will make direct contact with the bearings. Several failed bearings were inspected and analyzed at various stages of failure and operating hours. Bearings suffering from severe corrosion were typical. The seals and inner race would rust and fail allowing water to enter the bearings and grease to come out. During interviews with the mechanics, they explained how the rusted inner races would seize to the roller shafts. This would make replacements very difficult and would often require replacement of the entire roller. The lubrication technician said he would often encounter rusted zerk fittings that would not accept grease. Inspections of bearings that were only in service a short time were found to have significant amounts of moisture (water) in the raceways and balls.
The 5 WHY method was selected for this RCFA because it is a simple and effective method and this analysis was not complex one. The first why asked the question why did the bearings fail? The answer was because they came apart or seized. Why did they come apart or seize? – because the seals failed. Why did the seals fail? – because the steel corroded. Why did the steel corrode? – because they were exposed to a corrosive environment. Why were they exposed to this corrosive environment? – because of the design of the machine. So the root cause in this RCFA (as determined by the 5 Why method) was the design of the machine that promoted corrosion of the steel bearings.
The next step was to formulate a solution strategy. The results of the Root Cause Failure Analysis indicated that a change in machine design would be necessary to prevent the corrosive material and water from destroying the bearings. Changing the machine design to stop the flow of water and corrosive material was not a viable option. Water and chemical drips are inherent in the design of these type machines. It was noted however that the spillage could be reduced and controlled with more attentive operator control. The idea of installing protective covers over the bearings was investigated. During interviews with the operators, they noted that covers had been tried years ago but they were not effective. The corrosion still occurred. In fact, the covers made it more difficult to visually inspect the bearings, lubricate, and maintain them. The protective cover idea was dropped. So, if we can’t stop the spills and drips, what could we do to mitigate their effect on the bearings? Could we make the bearings corrosion resistant? Reviews of various bearing catalogs were conducted. Discussions were held with the local power transmission distributor.
These activities led to the more reasonable design change to upgrade the materials of construction of the bearings. We found that pillow block bearings of the same type and dimensions as the existing bearings are available in corrosion resistant materials. They are generically called “wash down duty” bearings and are used in the food processing industry. They are designed to survive in an environment where frequent washing with water and detergents occurs. The housings are constructed of a polymer material rather than cast iron and the bearings and seals are stainless steel rather than carbon steel. Additionally, they have the same dimensions as the existing bearings. They are bolt in replacements (no machine modifications required). They are also reasonably priced and readily available (not special order). It was decided to convert the bearings to this new design.
During the investigation and analysis phase of the Root Cause Failure Analysis, it was noted that contamination of the bearing lubricant was an issue – even in bearings with seemingly intact seals. We had concerns that the water contamination would compromise the effectiveness of the grease and lead to bearing failures from poor lubrication. So we decided to address this concern also. Best practice lubrication says apply the correct lubricant in the correct amount and at the correct time. One of the most effective ways to do this is with an automatic lubrication system. We discussed this option with our lubrication consultant. He made a recommendation for a system. We reviewed the proposal with our lubrication technician.
The system was simple and rugged. It consisted of an electrically operated pump, a discharge manifold, a lubricant reservoir, injectors, polyethylene tubing, nickel plated fittings. The injectors are preset to deliver a specific amount of lubricant each time they are activated. Each injector is preset for the particular size bearing it is servicing. The injectors are attached to the pump manifold. Tubing is run from each injector to its corresponding bearing. An adjustable timer activates the pump. Each injector fires once per cycle. The reservoir can be refilled by attaching a hand, electric, or air operated grease gun. The lubrication tech was trained in the operation of the system. We agreed to proceed with the purchase and installation of the automatic system.
The project engineering group had a capital maintenance project to replace the main belt of the filter. We were able to piggyback the bearing upgrade and automatic lube system onto that project. Once the modifications to the filter were complete and proven to be effective, the equipment files were updated. Spares of the new type bearings and lube fittings were purchased and placed in spare parts inventory.
As noted at the beginning of this paper, the ultimate goal of an RCFA is to eliminate or at least reduce the reoccurrence of a failure. From that standpoint, this was a successful project. We did not eliminate the failures but there has been a reduction in failures. A review of the maintenance and production records revealed the following quantitative results:
- During the two year period from July 2009 to July 2011 (before any modifications) we had seventeen (17) bearing failures
- During the two year period from July 2011 to July 2013 (after the modifications) we had two (2) bearing failures
- During the two year period from July 2009 to July 2011 (before any modifications) we had maintenance costs of $12,374 for bearing failures.
- During the two year period from July 2011 to July 2013 (after the modifications) we had maintenance costs of $3,750 for bearing failures.
- During the two year period from July 2009 to July 2011 (before any modifications) we had lost production costs of $70,125.
- During the two year period from July 2011 to July 2013 (after the modifications) we had lost production costs of $8,250.
- During the two year period from July 2011 to July 2013 (after the modifications) we had 93+ maintenance man-hours available to work on other equipment.
- During the two year period from July 2011 to July 2013 (after the modifications) we had 50+ lubrication technician man-hours available to work on other equipment.
There have also been some positive qualitative benefits as a result of this work:
- If a roller bearing fails and goes un-noticed, it can cause belt tracking issues. These tracking issues can adversely affect the operation of the filter. We have not had any belt tracking issues.
- The use of administrative resources (storeroom, purchasing, maintenance planning & scheduling, etc.) has been reduced.
- The operations people are smiling more. There are fewer problems with the filter.
- The success of this project has demonstrated the value of performing RCFAs in improving the reliability of equipment.
- The success of this project has demonstrated the value of the proper application of materials and lubrication technologies.
This paper has endeavored to show how Root Cause Failure Analysis can be used to improve the reliability of industrial process equipment. A brief explanation of RCFA was offered. Some of the methods used when conducting an RCFA were presented. Some of the key factors to consider when performing and/or facilitating an RCFA were reviewed. We noted the importance of follow-up and follow through after the RCFA when implementing the solution. An example of an RCFA was presented. The 5 WHY methodology was used to analyze bearing failures on a belt filter. We demonstrated how two technologies (materials of construction and lubrication) were applied to provide a solution to the root cause uncovered in the RCFA. Finally, the quantitative and qualitative benefits achieved as a result of the RCFA were listed.
There are many tools available to improve the reliability of machinery. RCFA is a very useful one. This paper has just scratched the surface of the information and resources available. The authors hope that this paper has encouraged and stimulated the reader to learn more about Root Cause Failure Analysis and how it can be applied to the challenges they face in their particular situations.
- GP Allied. (2012). Root Cause Analysis-Participant Overview
- Ouvreloeil. (1998). Root cause Analysis Solutions
- Sachs, Salvaterra & Associates, Inc. (2002). Failure Analysis Training
This article was previously published in the Reliable Plant 2014 Conference Proceedings.
By Allan Andrycak, W.R. Grace & Co. and Chris Nowlen, Lubrication Engineers Inc.