Submissions for the Kaggle competition found here: https://www.kaggle.com/c/LANL-Earthquake-Prediction
Given a segment of an acoustic signal, one must predict the 'time_to_failure', or in other words, time before the next laboratory earthquake. The metric used for model evaluation is Mean Absolute Error (MAE).
I will update the repository with the best approach I have been able to come up with yet. The repo is live, and subject to change.
I used Gradient Boosting after extracting several features from the input acoustic data.
Data Engineering:
- Split input signal into chunks of 150000 data points. This is done based on the information given by the competition organisers.
- For each of these segments, several statistical features are calculated (can be viewed in the notebook).
- The training data is a collection of these segments, 4194 of them, to be precise.
- The data is then scaled using scikit-learn's
StandardScaler()
function.
Next, we use the GradientBoostingRegressor
and train over the training data. The training is done using K-Fold cross validation, and for each fold, the estimator object (model) is saved. After examining the results, the best estimator is chosen to make the predictions on the testing data. I used the huber loss function as it is less prone to fluctuations and outlier values, and I feel in this case, it gives better approximations.
I got MAE=1.583 on the Public Leaderboard for this approach.