Aircraft Engine Failure Prediction

A Classification Case Study

Aircrafts are very important part of modern age. The number of passengers traveling by airplanes has been increasing every year (until Covid happened). In 2019, the number of scheduled passengers boarded by the global airline industry reached over 4.54 billion people as described by statista. So the safety of aircraft passengers’ is of paramount importance.

It is crucial that Aircraft Engines should undergo proper maintenance. Doing a routine maintenance can be very expensive. Predictive maintenance is an effective alternative to it. This approach ensures cost saving. It is also called as condition-based maintenance, as the degrading state of an item is estimated to schedule a maintenance. Machine/Deep Learning are widely used for predictive maintenance.

Photo by Josue Isai Ramos Figueroa on Unsplash


The dataset is provided by NASA in the form of text files and can be downloaded here. The train set consists of run-to-failure data of 100 Aircraft Engines. The test set consists of operating data of 100 aircraft engine without failure events recorded. And the RUL files hold the record of remaining cycles for each engine in test set. The dataset has four different set of these files simulated under different combinations of operational conditions and fault modes.

Dataset Files

I have used the first set i.e. FD001 for our case study. C-MAPSS is used to carry out this Engine degradation simulation. It stands for ‘Commercial Modular Aero-Propulsion System Simulation’. Realistic large commercial turbofan engine data is simulated using C-MAPSS.

Problem Statement

The problem can be posed as a regression or binary classification or multi-class classification for this dataset. In this case study, binary classification is done and the code predicts whether the Engine will fail in next 30 cycles or not. Class label 1 represents that it will fail in next 30 cycles and class label 0 represents that it won’t. These labels are not given by dataset but are generated by code.

As it is more important to correctly classify it as failure when it is going to fail, Recall is considered as the performance metrics in this case study.

Features Description

The given train and test files have 28 features.

Feature Description

In train set the ‘cycles’ column has increasing values for every ID. The last value of ‘cycles’ for a particular Engine ID represents the failure of that Engine. The test set does not have run-to-failure data. Hence the last value of ‘cycles’ for a particular Engine ID does not represents the failure of that Engine. But the corresponding RUL file gives information about how many more cycles are left before failure of Engines in test data.

For example, in train data, ID 1 has values of cycles from 1 to 192. It means the Engine failed after 192 cycles. In test set ID 1 has values of cycles from 1 to 31 and its corresponding entry in RUL files has values of 112. It means after finishing 31 cycles given in test set, the Engine will run another 112 cycles. Sensor 22 and 23 has all Null values. So, these columns are removed.

Distribution of Number of cycles to failure

Histograms are plotted for Distribution of Number of cycles to failure. It’s somewhat bell shaped (or not?). In train set, around 12 engines failed after 200 cycles, which is maximum in this distribution. Very few engines failed after 300 cycles. In test set this peak is somewhere around 70, which is different from peak of bell shape.

Some observations about minimum and maximum number of cycles to fail

1. In train set engine with id=69 took maximum number of cycles to fail i.e. 362 cycles

2. In train set engine with id=39 took minimum number of cycles to fail i.e. 128 cycles

3. In test set engine with id=49 took maximum number of cycles to fail i.e. 303 cycles

4. In test set engine with id=1 took minimum number of cycles to fail i.e. 31 cycles

Data Pre-processing

LSTM is used for classification. LSTM requires a series of sequences as an input. The length of sequences is an important hyperparameter. After some experimentation it is found that a length of 70 works well for classification. But Engines with ID 1, 85, 39, 22, 14, 25, 2, 33, 69, 44, 9, 87, 71, 88 has less than 70 cycles in test set. So records with these IDs are removed. Train and test set are normalized using MinMaxScaler in sklearn.

Some features have zero std deviation

Both train and test set consist of 7 columns with zero standard deviation.

Pre-processing steps :

1. The columns with zero standard deviation are removed.

2. The class label column ‘label30’ is generated. It is 1 when engine is going to fail within next 30 cycles. 0 otherwise.

3. Now we have 18 useful features. Sequence length is 70. A 3D array is created for training and test set for input to LSTM. This array has dimensions of (13731,70,18) for train set and (6473,70,18) for test set

4. Array for training labels and test labels are generated with size of (13731,1) and (6473,1) respectively

Behavior of Sensor output towards the end of life

The values of all settings and sensors are plotted to observe how they behave when the engine is about to fail.

Variations in settings and sensor data (ID=1 from train set)

As can be seen from above plot:

  1. As the engine gets closer to failure the waveforms get closer to each other and overlap

2. Then they get spread again before failure

Variations in settings and sensor data (ID=19 from train set)

Same can be reaffirmed from another plot for Engine with ID=19

Variations in settings and sensor data (ID=99 from train set)

Same same!!!

Now for test set two Engines are chosen for plotting. One which is closer to end of its life and one which is not

Variations in settings and sensor data (ID=20 from test set)

This Engine from test set has only 20 cycles left before it fails hence the same pattern is observed in this plot.

Variations in settings and sensor data (ID=96 from test set)

But this one from test set has 137 cycles left before failure. It is far away from failure. Hence its quite quite different from previous plots. (Microsoft word shows red under second quite but my school teacher used to say quite two times. Hence I am gonna keep it !!!)

  1. Waveforms do not get closer to each other and overlap.

2. The frequency (number of variations in given time) of waveforms is also less when they are away from failures


A sequential LSTM model is generated with Adam optimizer and sigmoid as activation function. L2regularization is used.

Model Summary

A recall of 0.9082 is obtained by this model. The complete code can be found in my github.


This is a very interesting dataset and a popular one. It tries to solve a real-world problem that really matters. And it is directly related to lives of passengers. I don’t claim to have given the best solution. But I really enjoyed solving it. It is very amazing to see that Deep Learning networks learn the patterns without any feature engineering.

Photo by Emiel Molenaar on Unsplash

This is my first medium article. All suggestions and feedback are very much welcomed. There is nothing like learning together.

This was just one of the approaches to schedule predictive maintenance of Aircraft Engine. I also intend to pose this as a regression problem where I can predict the RUL for test data with the same dataset and that might be my next medium article.

Stay tuned !!!

I am a hands-on guy. I appreciate the beauty of theory but understand its futility without application. ML, DL and computer vision are my interests.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store