An essential part of any rail infrastructure project or asset management system is a robust Reliability, Availability, Maintainability, Safety (RAMS) assessment process. But what is RAMS and how does each element relate to one another?
RAMS is a decision-making tool to identify how to increase the availability of a system. In some industries RAMS refers to a Reliability, Availability, Maintainability, Study or Schedule. However, in the rail industry, the ‘S’ generally refers to functional safety and standard EN 50126 is called: “Railway Applications. The Specification and Demonstration of Reliability, Availability, Maintainability and Safety (RAMS)”. EN 50126 also defines the type of analyses that must be carried out for many railway applications.
In simple terms, reliability is the probability of no failures occurring over a defined time. Availability is the percentage of time a system is considered available when required, and maintainability is a measure of the ease with which a system can be restored to operation after a failure. Safety (in the context of this article and RAMS) is the condition of being protected from danger, risk, or injury by the safety function of an asset. For example, a signalling interlocking ensures that signals and points act together for the safe routing and movement of trains.
There are many academic and detailed papers explaining RAMS with calculations that can become quite complex. However, the principle is that good availability is delivered by good reliability and maintainability, and the safety function of a safety critical or safety related asset, is dependent on good RAM. A RAMS study should be conducted in the early stages of a project and be reviewed and updated as the project progresses. This will identify targets for RAMS, together with any significant causes of loss of availability or the safety function. The study will also identify improvements to the design or maintenance regime to achieve the identified targets.
The relationship between reliability, availability, maintainability, and functional safety is shown in Figure 1. Availability is at the top of the first triangle, as it is dependent on reliability and maintainability. Consider the availability of an asset that is very reliable but has poor maintainability. This could be for a variety of reasons, such as poor access or location, no competent staff available, no working spare parts, or it may be an asset where it is difficult to identify what has actually failed. In extreme cases it could be that a spare part or a competent person has to be flown from another part of the world (and this has occurred on more than one occasion). So, in this scenario the availability target could fail significantly.
Now let’s consider another extreme example of an asset which is very unreliable, but has excellent maintainability, with many competent, knowledgeable technicians and engineers readily available and with many spare parts at hand which are easy to change. In this scenario it could be that an asset with poor reliability has good maintainability and, therefore, acceptable availability. ‘A’ also stands for Affordability and, generally, good maintainability will cost more in terms of competent staff, spares, and support contracts, so affordability is another factor that has to be included in any RAMS study.
If an asset is providing a safety function, then its RAM must be of an acceptable level to ensure the safety function is delivered when required and any failure of the asset results in a safe state. So, for example, a failed track circuit must return the protecting signal to red. The relationship between RAM, and how reliability and maintainability affect availability in simple terms is shown in Figure 2.
So, how can reliability be improved to deliver better availability? It’s vital to identify why an asset fails, so that something can be done to improve its reliability. A root cause analysis should be carried out, with tools such as Failure Mode and Effects Analysis (FMEA). Asset managers and maintainers need to continually review, revise, and learn from failure reports, which could also involve independent forensic investigations from specialist engineering experts. Some Original Equipment Manufacturers (OEM) may be defensive when cooperating with third party investigations, but any independent analysis of a problem should be welcomed. Cooperation and collaboration is how we will deliver a better railway for all.
Higher quality components could also be looked at to improve reliability, although with safety related and safety critical assets this could involve a safety case and design change, with appropriate verification and validation, and testing.
EMC immunisation should always be checked out if any electronic assets are failing for no apparent reason. Many railway electronic assets are old and were designed before modern immunisation standards were in place. It is not unknown for new assets to be installed in existing equipment rooms which, while complying with modern standards, can cause older equipment to fail. With more and more modern electronic equipment installed in older equipment rooms with poor immunisation protection, this problem may increase in the future.
Frequency of maintenance
Reliability Centred Maintenance (RCM) is one tool available to asset managers. Rather than based on a period of time or mileage, RCM sets the frequency of maintenance interventions to take into account the criticality of the asset with respect to its function and historic reliability. So, if an asset is likely to fail more often, and the failure is going to adversely affect availability, it should be checked more often.
Increasing redundancy in the design of a system will also improve reliability. If designed properly with hot standby or load sharing, when a vital asset fails another one seamlessly takes over. Processes need to be in place to detect the non-service effecting failure and resolve it, otherwise reduced availability is simply being delayed, rather than the failure being mitigated. Sometimes it may require a healthy asset to be taken out of service to resolve the issue and operators may be unwilling to assist, because as far as they are concerned there is no failure.
Signalling interlockings are provided with redundancy with several Safety Integrity Level (SIL) processors. The interlocking systems have to be carefully designed and tested so that if they do fail, they fail safely and do not create unsafe train paths.
Detect likely failures
Remote condition monitoring is increasingly being used in many engineering disciplines to detect likely failures before they occur, so that an intervention can take place before an asset fails. Greater remote monitoring with more functionality and more intelligent infrastructure enables technicians to safely inspect and predict ‘work arising’ earlier and more accurately. This results in fewer people working on the track ‘at risk’, and provides the ability to plan work earlier and in safety, and deliver a more reliable railway.
A good example of the benefits of remote condition monitoring are with track circuit monitoring. With intermittent faults it can be difficult to identify the root cause and can result in time-consuming attempts at fault finding and incorrect ‘fixes’ being applied. Remote condition monitoring of track circuits enables prompt detection of intermittent faults and the correct identification of the root cause prior to attendance on site.
There are examples where too much data and false positives can overload operators. One answer is to use Artificial Intelligence (AI) to create useable information, rather than just collect raw data, and this is one area where AI has much to offer asset management and improving reliability in the future. AI could one day automatically instigate a manual intervention or may even be able to deliver a robotic repair.
An earthworks example of remote conditioning monitoring is the Insight Earthworks Monitoring system by L.B. Foster. This won the Equipment Innovation category at the GE Awards in 2021. The system uses LiDAR technology and was used on a project for Network Rail in Gloucestershire. Little Hagloe is on the coastal rail line adjacent to the River Severn in Gloucestershire. The railway runs along the bottom of a steep embankment and has a history of failures of the cutting slope. The Insight LiDAR units are now providing real-time monitoring of slope integrity at critical sites along the line.
When an asset does fail, its maintainability is the ease with which it can be restored to operation. So, competent, well-trained staff equipped with the right tools and spares must be readily available. Therefore, good training to enable technicians to fault the asset is essential. A good project should engage with the maintenance organisation at an early stage to identify and scope the required training and, just as important, the maintainer needs to communicate with the project during its development.
Good documentation/faulting guides can also assist technicians with the maintainability and faulting of systems, and it is important that projects communicate and consult with the maintainer throughout the project development to ensure they are provided with the right assistance to fault the system. This is not always the case and sometimes generic training courses provided right at the end of the project may not provide what is required to deliver good availability. It is too late to start thinking of the training requirements just before the commissioning of a new system. Similarly, test equipment, spares, and documentation, including faulting guides, should be addressed at an early stage of any project.
It is somewhat ironic that a reliable system will not provide technicians and engineers the opportunity to practise their faulting skills. Therefore, further training throughout the systems asset life should be considered, both for current and new staff.
Siemens Surelock – the successor to the Type 63 point machine, widely used on London Underground and designed with maintainability in mind.
- Four easily identifiable modules.
- Red is screw drive, blue is motor, yellow is detection and control, black is escapement.
- All the electrical connections are plug-coupled. Each of the modules are easily carried, and light
- Remote condition monitoring can be provided with a plug coupler to the outside world.
- Fitted with ‘mechanical fuse’ so if points are run through this breaks before any damage is done to the point machine.
- Every bolt that needs to be removed is the same size – so only one spanner required.
Consideration should also be made for establishing a test/training rig for staff to practise their fault-finding skills and to receive training on the system, along with competency assessment. This could be located at the maintainers training school, on site, or at the OEM premises. This will also be useful for the OEM or a third party to develop and test replacement obsolete parts for the asset during its life. Test/training rigs will not be cheap but could be invaluable with ensuring systems are maintainable throughout their life.
Access to the system must also be considered during its design, both for good access to the site by technicians with tools, test equipment, and spares; and to easily access any parts that may need replacement. So, equipment cubicles may need front, rear, and side access. It should be easy to disconnect electrical connections and remove items with standard tools. Remote diagnostic alarm and analyses of systems will also aid maintainability. Ideally, if people have to go to site they should be provided with details of the intervention required well before they arrive.
The maintenance organisation will also need to carefully plan and optimise the faulting cover to aid maintainability. This needs to make allowance for travelling to site at all times of the day and week. An escalation process will also be required, both for telephone and on-site assistance, 24/7. This could include in house, third party, and OEM support. No matter how good and competent a person is, it always helps to seek another independent opinion, especially with difficult and safety related/critical tasks. There is an engineer’s phrase: “If you plan for the worst, you may not need it; but if you don’t plan for the worst, you will definitely need it!”
“Are you building it right?”
Independence is also an important part of design and testing, especially with safety critical assets. One important area of independence in project development and implementation is verification and validation. This is sometimes carried out by specialist verification and validation engineers, who will assess RAMS among other project deliverables.
Verification is intended to check that a product, service, or system meets a set of design specifications. “Are you building it right?” is a process that is used to evaluate whether a product, service, or system complies with standards, regulations, and specifications.
Validation is intended to ensure that a product, service, or system meets the operational needs of the user, which must also include the maintainer. “Are you building the right thing?” Identifying design problems and solving them as early in the project as possible is the key to keeping projects on time and within budget, and providing systems that are reliable and maintainable, and well as delivering their safety function.
When applied appropriately, from the early stages and throughout development and implementation of a project, RAMS modelling is an effective tool for assessing system reliability, availability, maintainability, and functional safety. RAMS calculations can become quite complex because of inter-dependence between the various elements, but good RAMS modelling is crucial to support the whole life availability and viability projects.