Downtime Reporting for Reliability Engineering Programs

Reliability engineering departments are placing more focus on improving plant reliability and maintainability in an effort to improve return on investment (ROI). The purpose of this paper is to analyze and suggest a new method of reporting and recording downtime as part of an asset performance optimization program for key/required equipment. The approach we will discuss takes into consideration equipment criticality, reliability block diagrams and online data from automation tools and process historians. This will result in providing higher quality data to drive maintenance and reliability activities.

An Approach to Reliability Engineering

Downtime reporting is a technique that allows reliability engineers to understand equipment and how it is failing. Companies conscious of optimizing uptime and capacity utilization, particularly in industries where 24/7 operations and continuous processes are running, should care about downtime reporting because it improves asset reliability and maintainability, increases uptime, reduces maintenance costs for unplanned or reactive maintenance, and assists with accomplishing safety and regulatory compliance.

It is important to remember that it is not the equipment but the components of the equipment that ultimately fail. By understanding these failures and getting to the bottom of why they happen, you will be able to properly manage its impact through maintenance and reliability activities.

The data collected as a result of investigating failures and reporting on downtime provides the information reliability engineers need in order to analyze the behavior of equipment and to create a knowledge base of the performance of the physical infrastructure. This provides a strategic advantage and allows you to define actions for increasing plant availability.

Many organizations already have a computerized maintenance management system (CMMS) or enterprise asset management (EAM) system like IBM Maximo that can record downtime and some of the related costs as well as track operational problems and their remedies. This is a great way to start tracking downtime, but in order to truly create a foundation for world-class reliability engineering, there are additional activities to consider.

Downtime and Reliability Analysis

Establishing a downtime reporting program enables the reliability engineering team to properly record events, failures and repairs as well as determine key performance indicators and monitor equipment conditions. It is important to note that a method of reporting and recording downtime for all required pieces of equipment has to take into consideration:

Plant processes
Systems and functions and their characteristics, roles and performance
Logical connections and boundaries between elements
Equipment criticality as determined by a risk assessment tool
Failure trees
Reliability block diagrams
Online data from automation tools and process historians

Models are incredibly useful to demonstrate a plant’s reliability under different conditions and help establish improvement actions to optimize plant performance. Following a systematic approach, a failure tree of the plant can be established. By collecting statistical data about failures and their consequences, you can determine indicators such as mean time between failures (MTBF), mean time to failure (MTTF) and mean time to repair (MTTR).

These indicators not only help with defining a timeline model for the operations of the plant, they also provide a framework for determining and analyzing events that affect plant availability and capacity utilization.

In examining the failure data and the corresponding indicators, the pieces of equipment that are most prone to failures affecting plant production (or “bad actors”) will be revealed. To identify the “bad actors,” which must be addressed as a priority, it is necessary to understand the time the unit is down, the classification of the downtime and the impact of the downtime. The impact of downtime against the subsystem and the plant as a whole can also be measured, allowing for further information that can be used in planning proactive maintenance activities.

Downtime Reporting Road Map

A downtime reporting road map is like a blueprint for you to follow as you go about modelling operations with a view to optimizing plant performance. On the road map, you can see all of the equipment and the relationship between different assets as the production process occurs.

Data Gathering and Plant Process Analysis

Before you can start optimizing anything, it is important to fully understand all of the processes and operating procedures in place during the production cycle. The first step is to gather data and do a plant process analysis by examining process flow diagrams, piping and instrumentation diagrams and control narratives. Next, you will need to take a close look at your systems and their components in order to understand the taxonomy and hierarchy of the plant’s assets and how they are configured. Once you understand the relationship between the assets, it becomes easier to identify which equipment is truly the most critical.

As you build your equipment hierarchy, remember, all equipment must be included in the hierarchy if it is part of the production process. If it is decommissioned or not used in production, it can be omitted. Another thing to remember when building a hierarchy is that each “child” asset has only one “parent” asset.

Failure Mode Analysis

Failure mode analysis is a systematic procedure that facilitates the analysis of a system or function to identify the potential failure modes and their impact on the performance of the system or function.

Starting with what you believe to be the most critical systems or functions performed in the plant, a list of the equipment performing these systems or functions needs to be established and assigned as per the decomposition (breakdown structure) of the function.

Next, an analysis of the elements of the breakdown, from the bottom up, is performed with the aim of identifying the end effect of a failure on the system.

One approach is to categorize relevant failure codes according to their equipment class. This means that you use a catalog of known failure codes (like one from ISO 55000) and evaluate each code to see where they should fall in terms of priority for your particular organization.

Another option is to use a risk assessment matrix review. This is a tool that helps you evaluate the probability and consequence of a particular failure. If you can determine that the probability of a particular failure is low, but the consequences would be severe, it may be determined that preventing the failure from happening is a high priority for your plant. The assessment for each failure needs to be evaluated and weighted according to organizational objectives in order to establish priorities for preventative maintenance.

You can also review process functions and perform an analysis of why the failure mode happens and what the consequences are when a failure occurs.

The following are the most important objectives of the failure mode analysis:

Identify failures which have unwanted effects on system operation
Make improvements to system reliability or safety
Make improvements to system maintainability

Reliability Block Diagrams

A reliability block diagram (RBD) analysis is a graphical representation of the logical structure and connections of a system or function in a plant. It visually divides the plant into functional blocks to help facilitate reliability analysis.

By using RBDs, you can begin to represent the impact of failure in the plant. These diagrams will outline the relationships between the assets and will help you see what could happen if something fails.

Model Validation

Now, you can load data, calculate KPIs and compare your indicators to the plan. If the amount of deviation between the plan and the indicator value is small, the model works. If there is too much deviation, you will have to adjust the structure and connections of the RBDs and reapply the model until you find the right balance.

It is important to have sub-diagrams that provide the ability to drill down (and up) to see different systems and subsystems. You also want to associate the equipment tags to reliability blocks, configure the failure codes according to each piece of equipment and create a graphical dashboard to provide a clear snapshot of performance.

Downtime Model Results

Ideally, you want to use a downtime modeling tool that will do correlation analysis between related operational parameters and reliability indicators. Integration with a data historian such as plant information management systems and business systems is also helpful, as it will give you an overall business unit downtime dashboard.

Customized reports will help you understand the trends around availability, reliability and maintainability. The reports that come as a part of the RtDuet program, for example, give you KPI reporting as well as standard reporting including:

Process parameters downtime reports
Equipment downtime reports
Overall business downtime reports
Operational equipment effectiveness reports
Operational process parameters effectiveness reports

The results of the reports can be captured from the data historian and can be recorded into Maximo.

Challenges and Benefits of Downtime Reporting

Downtime reporting is not without its challenges, but once the processes are in place, the benefits of the program outweigh the legwork involved with building the solid foundation.

Challenges:

Model creation requires thorough analysis.
Timeline model requires agreement throughout the different business units.
Capturing downtime involves participation from operations, management, engineering and maintenance.
Roles and responsibilities need to be established.

Benefits:

Organization-wide knowledge base
Full visibility of plant performance
More efficient data analysis and reporting
Proactive prevention of critical failures
Dynamic analysis and continuous improvement of reliability KPIs
Increased plant and asset availability
Reduced unplanned maintenance
Improved processes around operations, maintenance and reliability
Reduced maintenance costs

Company profitability as well as competitiveness depends on optimizing asset and plant availability. Plant availability is driven by deep control of asset reliability and maintainability. Keeping track of failures, why they happen and the resulting downtime forms the foundation for effective reliability engineering. The data uncovered as a result of downtime reporting will pave the way for more reliable equipment with fewer failures, increased uptime and improved operations.

This article was previously published in the Reliable Plant 2014 Conference Proceedings.

By Eduardo Neira

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.