
How to build a foundation for predictive reliability
Predictive reliability is an often sought after but challenging use case in data science. The idea of avoiding failures by monitoring the health of a system and applying advanced algorithms is very enticing.
If successful, predictive reliability promises to reduce expensive failures and increase the productivity of equipment by avoiding unnecessary downtime. Yet, it is often a use case just out of reach for manufacturers and operators of industrial equipment alike. The reason? Lack of a solid foundation.
In this post, I am going to lay out the steps to build the necessary foundation to create a predictive reliability program that can
- Reduce operational costs
- Reduce expensive equipment failures
- And increase equipment availability.
But first, a bit of definition.
What is predictive reliability?
Predictive reliability is a programmatic approach to maintenance where maintenance activities are tailored based on the condition of equipment to prevent unnecessary failures and improve operational performance. Predictive reliability is based on the idea that algorithms can detect symptoms of a potential maintenance issue so that action can be taken before that issue leads to an expensive failure event.
Key building blocks of predictive reliability
A solid foundation for predictive reliability is made up of a few key building blocks including
- Standardized sensor data collection strategy
- Clear operational history
- Comprehensive maintenance records
- Deployment strategy
I will cover the techniques for designing algorithms to predict failures in a separate blog post. In this post, I am going to focus on these key building blocks, which are necessary for developing the algorithms to detect potential failures.
Standardized sensor data collection strategy
The first building block is a standardized sensor data collection strategy. Most modern industrial and field equipment have a good collection of sensors as part of a control or monitoring system.
However,in order to be an effective data source for identifying potential issues, these sensors must be aligned towards potential failure modes. In other words, the sensors must be able to detect a change in the condition of the equipment that is relevant to a failure or pre-failure state. Temperature sensors, for example, are important for equipment that are at risk for overheating or conditions of high friction. Accelerometers are important for detecting an imbalance in a part of the system.
Sensor data must also be sampled at or above the minimum frequency needed to observe an event. For example, if the telltale sign of an imbalanced engine is a vibration that occurs twice per second, then the sampling frequency of an accelerometer should be at least four times per second (4 hertz) in order to ensure that the event can be seen in the data. Otherwise, the event won’t be seen in the data. By analogy, just imagine trying to measure seasonal weather patterns by measuring temperature only once per year. There may be four seasons every year, but you would never see these patterns if you only measure the temperature once a year in the summer.
Another requirement is that sensor data must be collected in a manner that is consistent across similar devices. Central to the concept of predictive reliability is the idea that the signal pattern associated with failures can be learned from past data or can be identified through comparison to data from normal operating periods. In order to make this sort of comparison, sensor data must be sampled and stored in a consistent manner across similar equipment. Otherwise, it becomes an apples and oranges comparison with no real conclusions.
Clear operational history
Tracking operational history provides sensor data with the necessary context and provides a means to filter data that is not relevant to the predictive reliability use case.
The state of the system at the time that the sensor data is collected is especially important. Is the equipment currently operating? Is it in standby? Is the equipment currently operating in a reduced state or a safe mode? These are some of the basic questions that can be addressed by reviewing the operational history data. For a more advanced consideration of the operational history, one can look at factors such as duty cycle and RPMs to develop an understanding of the degree of stress that the equipment has experienced over its operational lifetime.
Also important to maintaining a clear operational history are the system logs or records of actions relevant to the system. Logs are important because they can explain changes in sensor data that may help rule out issues that may be indicated by the sensor data. For example, the revolutions per minute of a car’s engine can either be an important indicator of an issue with the engine, or just a sign that the operator is making a strong effort to accelerate the car. The difference can be determined based on the position of the accelerator pedal.
Alarms and warnings are the final category of information we will cover in operational history. Alarms and warnings are the results of logic that have been built into the control system often by the original manufacturer. These indicators are designed to help provide clues relevant to potential issues and can provide further context relevant to the state of the system.
Comprehensive maintenance records
Another key building block is maintenance records. These records are key for applying modeling approaches related to remaining useful life (RUL) and for providing additional operational context. In general, equipment that has been serviced consistently can be expected to last longer and with fewer issues than equipment that has been serviced sporadically or neglected entirely.
Maintenance records should be complete and stored in a digital format. Maintenance records provide limited value if the information is not readily available or cannot be combined with other data sources. There are several tools available to facilitate recording maintenance records digitally. I recommend standardizing the collection process as much as possible so that the key details such as service provided and issues fixed can be easily identified and incorporated into the analysis.
Another key element of good maintenance logs is a record of any parts that have been replaced or refurbished. This is key for the remaining useful life (RUL) models. It is often a wrong assumption that the systems or parts of an industrial machine will fail at the same time. Each major part or system in a piece of equipment has its own history that should be considered in developing a custom predictive reliability program. Just look at classic cars for an example of this. There are many cases of classic cars that still have an original frame 25+ years old but have engines that were rebuilt in the last couple of years and tires that are brand new.
Deployment strategy
The deployment strategy is the final building block for a solid foundation in predictive reliability. In many applications, deployment is a decision that may be delayed until after the predictive models have been developed.
With predictive reliability programs, it is important to consider options for deployment upfront because deployment affects the types of failure modes that can be detected and controlled in a predictive reliability program. Some failure modes have a long pre-failure period where signs of a failure event are present days or weeks before the failure occurs. These failure modes can be detected by analyzing data centrally or in the cloud. Other failure modes can shift from a detectable pre-failure mode to failure in a matter of minutes or seconds. In such cases, it is often necessary to include edge computing as part of your deployment strategy so that the data can be processed, and preventive action can be taken, in real-time.
For critical equipment, I recommend considering a dual strategy from the start that incorporates a centralized analysis program and deployment of predictive reliability components at the remote edge. This enables the program to capture both fast-acting failure modes and slower modes that may benefit from further analysis by a human operator through comparison with similar systems. Deploying a dual strategy has the benefit of allowing the flexibility to deal with a broad range of failure modes in a way that maximizes the long-term health and operational capabilities of the equipment.
From building blocks to competitive advantage
Building effective predictive reliability requires careful planning and long-term investment. With the right investments and strategy, predictive reliability programs can lead to great benefits starting with improved operational KPIs and ending with a long-term competitive advantage. By recognizing and implementing the key building blocks of a predictive reliability program, you will be well on your way to developing a capability that can truly help transform your operations.
For more information on predictive reliability, please reach out to Neal Analytics to learn how we can help you build your foundation.