Patients residing in hospitals, especially elders, tend to weaken over time and are prone to never fully recovering [2]. The adequate exercise regime during hospitalization can stop or even reverse that effect [3]. Sometimes, however, during the remobilization, the patients happen to faint. After undergoing the syncope, patients experience additional stress, which then results in a lack of self-confidence and undermines trust in the rehabilitation process. Often it could be avoided by predicting whether the person is likely to faint. Such an individual could be commissioned to further rehabilitation while not being exposed to stress associated with the collapse.
When it comes to falls prediction, there were several systems proposed [4]. Among others, there were attempts to asses the syncope risk by applying the ML (and isolation forest in particular) to the outcome of cognitive and motor tests [5], or use the accelerometer data gathered by the wearable sensor [6].
The purpose of this work is to develop a model based on recurrent neural networks that would be capable of forecasting occurrences of syncope by using real-life cardiological time series. The collapse ideally should be predicted ahead of time, allowing the technician to interrupt the examination. The models should also be highly sensitive – patients who are certainly going to faint must be classified correctly, even at the cost of the overall accuracy.
The section 2 of the paper provides information about the dataset and preprocessing. Quality measures are described in section 3, and the experimental setup and results are presented in section 4. Finally, the conclusions of our research are presented in section 5.
The provided data consisted of nearly 700 files. Each file was labeled either as „syncope” (indicating fainting during examination) or „no findings”, which for simplicity we will call „nosyncope” from now on. For each patient, several measurements were performed, which upon further investigation turned out to be strongly correlated. After consultations, we used only two signals (with sampling rate 1.25 Hz) in further tests:
• mBP – mean blood pressure, and
• HR – heart rate.
Since the selected data was incomplete, unbalanced, and fuzzy, it needed further cleaning. In a few cases, data happened to be wrongly labeled or existed in both classes. For some time, no one noticed this issue. As a result, we trained, evaluated, and tested the models on erroneous
Falls Prediction in eldery people using Gated Recurrent Units Dataset
Figure 1: Example signals before and after interpolation.
data, which caused their low predictive capabilities and worse overall accuracy. The problem was eventually solved by fixing the labels and removing the redundant files.
We trimmed the first 500 and the last 50 samples of every time series, as their collecting happened during the start, fine-tuning or stopping the measuring devices. After the trimming, we removed the series shorter than 500 samples, since they were deemed unusable for model training.
Often, there were gaps in the signals, and the mBP and HR started and ended independently from each other. Depending on the gap location, we followed one of the two scenarios:
• when the missing data was at the start or the end of the series, the first or last available data
• when the gap was in the middle of the signal, the linear interpolation was used.
Please see Fig. 1 for the example data series before and after interpolation.
We devised an iterative procedure composed of several steps to remove the outliers from the signal:
1. perform input signal studentization,
2. apply the median filter (window size 31),
3. find difference between the studentized signal and the median filter output,
4. identify outliers by comparing the difference to the threshold value,
5. remove outliers from the input signal and interpolate the gaps.
Falls Prediction in eldery people using Gated Recurrent Units Quality measures
The threshold value decreased with each iteration, and 2 to 5 iterations were sufficient to clean even very noisy signals.
As a final preprocessing step, the data was normalized using the minmax normalization and rescaled to range.
The ratio of „nosyncope” to „syncope” series in the dataset was almost 6:1. In order to use the classification approach, we applied the data balancing – we used all „syncope” series, while the „nosyncope” were selected at random to match the cardinality. Then, we divided the balanced data into two sets:
• training set (154 series) – used to train the models,
• test set (38 series) – used to evaluate the models.
An F-measure was used as a quality metric during conducted experiments. It is calculated using two helper metrics, a recall (1), also called sensitivity, and a precision (2):
where:
• tp – true positive – item correctly classified as an anomaly,
• fp – false positive – item incorrectly classified as an anomaly,
• fn – false negative – item incorrectly classified as a part of normal operation.
The parameter controls the recall importance in relevance to the precision when calculating an F-measure:
During the experiments was used.
Additionally, since the classes were balanced, accuracy could be used as a quality measure. Given the values t and f representing, respectively, an amount of correctly and incorrectly classified samples, the accuracy can be defined as in (4):
Falls Prediction in eldery people using Gated Recurrent Units Experiments
Figure 2: Neural network architecture
As mentioned, we structured the syncope prediction problem as a classification task. The overview of the used neural network architecture is presented in Fig. 2. Two main architecture variants involved using vanilla GRU layers and the bidirectional variant. The loss used during training was categorical cross-entropy, with the batch size equal to 16 and ADADELTA optimizer. The softmax output was compared with the threshold to select the final label. We optimized the threshold value during the experiments.
4.1 Hyperparameters optimization
The hyperparameters optimization, using Bayesian optimization, was conducted in two phases. The first one aimed to determine which among the parameters have the most significant impact on proposed model quality. There were seven parameters tested: the number of GRU units, the number of GRU layers, history window size, batch size, learning rate, learning rate decay, and output threshold. The experiments showed that:
• the relevance of the number of layers decreases when the number of unit increases,
• models tend to perform better when with smaller batch size,
• the output threshold should be no higher than 0.75.
The second phase focused on the most influential of the parameters: the number of GRU units, the number of GRU layers a history window size. Fig. 3 shows the partial dependence plot for those parameters.
4.2 Vanilla GRU
All vanilla GRU models generated false negatives and did not significantly improve the reaction time when compared to the manually labeled time of the syncope. Applying the per-model optimized threshold generally improved the accuracy and F1 score results when compared to applying
Falls Prediction in eldery people using Gated Recurrent Units 4.2 Vanilla GRU
Figure 3: Partial dependence plots showing the relations between the number of GRU units in layer (number of GRU layers (
) and history window size (
) and their influence on classification error (line charts).
Table 1: F1 score and accuracy obtained for three vanilla GRU architecture variants.
Falls Prediction in eldery people using Gated Recurrent Units 4.3 Bidirectional GRU
Figure 4: Relationships between threshold value and quality scores (on the left) and its influence on the syncope detection when compared to the manual marking (on the right) for the model with single vanilla GRU layer containing 200 units.
Table 2: F1 score and accuracy obtained for three bidirectional GRU architecture variants.
a fixed threshold = 0.7 (Fig. 1). However, when one considers the recall (sensitivity) and the reaction time, the lower thresholds tended to yield better results (Fig. 4).
4.3 Bidirectional GRU
In the case of bidirectional GRU, applying the per-model optimized threshold also improved the results when compared to applying a fixed threshold = 0.7 (see Tab. 2). However, the gains were not as significant as in the vanilla GRU case. Moreover, as can be seen in Fig. 5, in case of the best of considered models, change of the threshold value has almost no impact on model quality. Additionally, only the extreme values significantly impact the reaction time.
Falls Prediction in eldery people using Gated Recurrent Units Conclusions and future work
Figure 5: Relationships between threshold value and quality scores (on the left) and its influence on the syncope detection when compared to the manual marking (on the right) for the model with two bidirectional GRU layers containing 100 units each.
This work is preliminary research which addresses several issues related to falls prediction such as data preparation and model training. The best bidirectional GRU model enabled detection of forthcoming fall approx ten minutes before the event with approx 90 % accuracy. It is worth noting the model is well suited for implementation in wearable divides with appropriate compression [7]. As future work, the authors are going to focus on enhancing the vanilla GRU solution with more advanced mechanisms and its architecture modifications for better performance.
This project was realized in collaboration with the Laboratory for Gravitational Physiology, Aging and Medicine Research Unit, Institute of Physiology, the Medical University of Graz, and with the cooperation of Department of Health Sciences and Information technologies at Alma Mater Europea University, Maribor, Slovenia. We would like to especially thank professor Nandu Goswami, who provided the data and explanations used to train and evaluate models.
[1] Marcin Radzio. Falls prediction with recurrent neural networks. Master’s thesis, AGH University of Science and Technology, 2019.
Falls Prediction in eldery people using Gated Recurrent Units References
[2] Nicolás Martínez-Velilla, Alvaro Casas-Herrero, Fabrício Zambom-Ferraresi, Nacho Suárez, Javier Alonso-Renedo, Koldo Cambra Contín, Mikel López-Sáez de Asteasu, Nuria Fernandez Echeverria, María Gonzalo Lázaro, and Mikel Izquierdo. Functional and cognitive impairment prevention through early physical activity for geriatric hospitalized patients: study protocol for a randomized controlled trial. BMC Geriatrics, 15(1), 2015. ISSN 1471-2318. doi: 10.1186/s12877-015-0109-x.
[3] Nicolás Martínez-Velilla, Alvaro Casas-Herrero, Fabricio Zambom-Ferraresi, Mikel López-Sáez de Asteasu, Alejandro Lucia, Arkaitz Galbete, Agurne García-Baztán, Javier Alonso-Renedo, Belen González-Glaría, María Gonzalo-Lázaro, Itziar Apezteguía Iráizoz, Marta Gutiérrez-Valencia, Leocadio Rodríguez-Mañas, and Mikel Izquierdo. Effect of Exercise Intervention on Functional Decline in Very Elderly Patients During Acute Hospitalization: A Randomized Clinical Trial. JAMA Internal Medicine, 179(1):28–36, Jan 2019. ISSN 2168-6106. doi: 10.1001/jamainternmed.2018.4869.
[4] Ramesh Rajagopalan, Irene Litvan, and Tzyy-Ping Jung. Fall prediction and prevention sys- tems: Recent trends, challenges, and future research directions. Sensors, 17(11), 2017. ISSN 1424-8220. doi: 10.3390/s17112509.
[5] Bilal A. Mateen, Matthias Bussas, Catherine Doogan, Denise Waller, Alessia Saverino, Franz J. Király, and E. Diane Playford. Machine learning in falls prediction; A cognition-based predictor of falls for the acute neurological in-patient population. CoRR, abs/1607.07751, 2016.
[6] Ahmed Nait Aicha, Gwenn Englebienne, Kimberley S. Van Schooten, Mirjam Pijnappels, and Ben Kröse. Deep learning to predict falls in older adults based on daily-life trunk accelerometry. Sensors, 18(5), 2018. ISSN 1424-8220. doi: 10.3390/s18051654.
[7] Maciej Wielgosz and Michał Karwatowski. Mapping neural networks to FPGA-based IoT de- vices for ultra-low latency processing. Sensors, 19(13), 2019. ISSN 1424-8220. doi: 10.3390/ s19132981.