Text and illustrations Przemyslaw Domel
An active fluid flow system Vestnesa Ridge continuously releases methane from the seafloor onshore Svalbard, in the Fram Strait. Systems such as these can be investigated by a variety of methods, one of them being monitoring of seismic signals. Since we are talking about recording data from deep underwater more than 100 km from land, we need to have an instrumentation deployed locally to record all types of seismic “murmurs” that can tell us which areas are active, and how this activity changes over time. For this purpose, we deployed instruments called ocean bottom seismometers (OBS) from the deck of the ship directly in the areas with known methane release. OBSs fall freely to the sea bottom and continuously record signals connected to geological processes. However, these are not the only signals recorded. Ocean is not a huge reservoir of calm water and in this section of the Arctic, underwater currents “shake” the instrumentation and generate large amounts of noise (“tremor”). Another signals unwanted by us, but greatly appreciated in general, are caused by marine mammals singing to each other (e.g., blue and fin whales). Because we usually have many instruments recording at same time and we try to record at least a year’s worth of data, detection of local seismicity becomes a difficult task, and we need some ways to filter data first.
We tried to devise a method of recognizing signals using ever-so popular field of machine learning, specifically an algorithm called Random Forest. This method is relatively simple and involves a large number of objects called “decision tree”. A decision tree is a simple conditional statement, for instance like one below:

This is only an example, and obviously one such statement will not help us much, since there can be earthquakes shorter than 5 seconds, and noise can last longer than that. But Random Forest works with hundreds, and even thousands of such trees. To train it to recognize different signals, we selected manually several hundreds of examples of earthquakes, micro-seismic signals (referred as “SDE”) and noise. For each example, we calculated almost 200 hundred individual metrics, related to signal duration, amount of energy, frequency content, polarization etc. During the training, Random Forest algorithm tries to randomly find the relations between all these parameters and find the set (“forest”) of decision trees that works as best as possible for the training data. Based on the average result from each decision tree (“voting”), it assigns the signal to one of the final categories we created: earthquake, SDE or noise. In the recently published article, we found that this type of algorithm works very well with seismological data from OBS. We used the metrics that have been shown in the past to give accurate results in the landslide recognition and in volcano monitoring. The diagram below shows how well the trained model recognized all the signals it was trained on, and examples of the different signals we tried to differentiate.

The study is published in Geophysical Journal International: https://doi.org/10.1093/gji/ggad244