An Approach to Reliability EngineeringDowntime reporting is a technique that allows reliability engineers to understand equipment and how it is failing. Companies conscious of optimizing uptime and capacity utilization, particularly in industries where 24/7 operations and continuous processes are running, should care about downtime reporting because it improves asset reliability and maintainability, increases uptime, reduces maintenance costs for unplanned or reactive maintenance, and assists with accomplishing safety and regulatory compliance. It is important to remember that it is not the equipment but the components of the equipment that ultimately fail. By understanding these failures and getting to the bottom of why they happen, you will be able to properly manage its impact through maintenance and reliability activities. The data collected as a result of investigating failures and reporting on downtime provides the information reliability engineers need in order to analyze the behavior of equipment and to create a knowledge base of the performance of the physical infrastructure. This provides a strategic advantage and allows you to define actions for increasing plant availability. Many organizations already have a computerized maintenance management system (CMMS) or enterprise asset management (EAM) system like IBM Maximo that can record downtime and some of the related costs as well as track operational problems and their remedies. This is a great way to start tracking downtime, but in order to truly create a foundation for world-class reliability engineering, there are additional activities to consider.
Downtime and Reliability AnalysisEstablishing a downtime reporting program enables the reliability engineering team to properly record events, failures and repairs as well as determine key performance indicators and monitor equipment conditions. It is important to note that a method of reporting and recording downtime for all required pieces of equipment has to take into consideration:
- Plant processes
- Systems and functions and their characteristics, roles and performance
- Logical connections and boundaries between elements
- Equipment criticality as determined by a risk assessment tool
- Failure trees
- Reliability block diagrams
- Online data from automation tools and process historians
Downtime Reporting Road MapA downtime reporting road map is like a blueprint for you to follow as you go about modelling operations with a view to optimizing plant performance. On the road map, you can see all of the equipment and the relationship between different assets as the production process occurs.
Data Gathering and Plant Process AnalysisBefore you can start optimizing anything, it is important to fully understand all of the processes and operating procedures in place during the production cycle. The first step is to gather data and do a plant process analysis by examining process flow diagrams, piping and instrumentation diagrams and control narratives. Next, you will need to take a close look at your systems and their components in order to understand the taxonomy and hierarchy of the plant’s assets and how they are configured. Once you understand the relationship between the assets, it becomes easier to identify which equipment is truly the most critical. As you build your equipment hierarchy, remember, all equipment must be included in the hierarchy if it is part of the production process. If it is decommissioned or not used in production, it can be omitted. Another thing to remember when building a hierarchy is that each “child” asset has only one “parent” asset.
Failure Mode AnalysisFailure mode analysis is a systematic procedure that facilitates the analysis of a system or function to identify the potential failure modes and their impact on the performance of the system or function. Starting with what you believe to be the most critical systems or functions performed in the plant, a list of the equipment performing these systems or functions needs to be established and assigned as per the decomposition (breakdown structure) of the function. Next, an analysis of the elements of the breakdown, from the bottom up, is performed with the aim of identifying the end effect of a failure on the system. One approach is to categorize relevant failure codes according to their equipment class. This means that you use a catalog of known failure codes (like one from ISO 55000) and evaluate each code to see where they should fall in terms of priority for your particular organization. Another option is to use a risk assessment matrix review. This is a tool that helps you evaluate the probability and consequence of a particular failure. If you can determine that the probability of a particular failure is low, but the consequences would be severe, it may be determined that preventing the failure from happening is a high priority for your plant. The assessment for each failure needs to be evaluated and weighted according to organizational objectives in order to establish priorities for preventative maintenance. You can also review process functions and perform an analysis of why the failure mode happens and what the consequences are when a failure occurs. The following are the most important objectives of the failure mode analysis:
- Identify failures which have unwanted effects on system operation
- Make improvements to system reliability or safety
- Make improvements to system maintainability
Reliability Block DiagramsA reliability block diagram (RBD) analysis is a graphical representation of the logical structure and connections of a system or function in a plant. It visually divides the plant into functional blocks to help facilitate reliability analysis. By using RBDs, you can begin to represent the impact of failure in the plant. These diagrams will outline the relationships between the assets and will help you see what could happen if something fails.
Model ValidationNow, you can load data, calculate KPIs and compare your indicators to the plan. If the amount of deviation between the plan and the indicator value is small, the model works. If there is too much deviation, you will have to adjust the structure and connections of the RBDs and reapply the model until you find the right balance. It is important to have sub-diagrams that provide the ability to drill down (and up) to see different systems and subsystems. You also want to associate the equipment tags to reliability blocks, configure the failure codes according to each piece of equipment and create a graphical dashboard to provide a clear snapshot of performance.
Downtime Model ResultsIdeally, you want to use a downtime modeling tool that will do correlation analysis between related operational parameters and reliability indicators. Integration with a data historian such as plant information management systems and business systems is also helpful, as it will give you an overall business unit downtime dashboard. Customized reports will help you understand the trends around availability, reliability and maintainability. The reports that come as a part of the RtDuet program, for example, give you KPI reporting as well as standard reporting including:
- Process parameters downtime reports
- Equipment downtime reports
- Overall business downtime reports
- Operational equipment effectiveness reports
- Operational process parameters effectiveness reports
Challenges and Benefits of Downtime ReportingDowntime reporting is not without its challenges, but once the processes are in place, the benefits of the program outweigh the legwork involved with building the solid foundation. Challenges:
- Model creation requires thorough analysis.
- Timeline model requires agreement throughout the different business units.
- Capturing downtime involves participation from operations, management, engineering and maintenance.
- Roles and responsibilities need to be established.
- Organization-wide knowledge base
- Full visibility of plant performance
- More efficient data analysis and reporting
- Proactive prevention of critical failures
- Dynamic analysis and continuous improvement of reliability KPIs
- Increased plant and asset availability
- Reduced unplanned maintenance
- Improved processes around operations, maintenance and reliability
- Reduced maintenance costs