Prof. Dr. Florian Haselbeck

Josef Eiglsperger, Prof. Dr. Florian Haselbeck, Viola Stiele, Claudia Guadarrama Serrano, Kelly Lim-Trinh, Prof. Dr. Klaus Menrad, Prof. Dr. Thomas Hannus, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
Forecasting seasonally fluctuating sales of perishable products in the horticultural industry (2024) Expert Systems with Applications 249 . DOI: 10.1016/j.eswa.2024.123438

Accurately forecasting demand is a potential competitive advantage, especially when dealing with perishable products. The multi-billion dollar horticultural industry is highly affected by perishability, but has received limited attention in forecasting research. In this paper, we analyze the applicability of general compared to dataset-specific predictors, as well as the influence of external information and online model update schemes. We employ a heterogeneous set of horticultural data, three classical, and twelve machine learning-based forecasting approaches. Our results show a superiority of multivariate machine learning methods, in particular the ensemble learner XGBoost. These advantages highlight the importance of external factors, with the feature set containing statistical, calendrical, and weather-related features leading to the most robust performance. We further observe that a general model is unable to capture the heterogeneity of the data and is outperformed by dataset-specific predictors. Moreover, frequent model updates have a negligible impact on forecasting quality, allowing long-term forecasting without significant performance degradation.
Mehr
Prof. Dr. Florian Haselbeck, Maura John, Yuqi Zhang, Jonathan Pirnay, Juan Pablo Fuenzalida-Werner, Ruben Costa, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
Superior protein thermophilicity prediction with protein language model embeddings (2023) NAR Genomics and Bioinformatics 5 (4). DOI: 10.1093/nargab/lqad087

Protein thermostability is important in many areas of biotechnology, including enzyme engineering and protein-hybrid optoelectronics. Ever-growing protein databases and information on stability at different temperatures allow the training of machine learning models to predict whether proteins are thermophilic. In silico predictions could reduce costs and accelerate the development process by guiding researchers to more promising candidates. Existing models for predicting protein thermophilicity rely mainly on features derived from physicochemical properties. Recently, modern protein language models that directly use sequence information have demonstrated superior performance in several tasks. In this study, we evaluate the usefulness of protein language model embeddings for thermophilicity prediction with ProLaTherm, a Protein Language model-based Thermophilicity predictor. ProLaTherm significantly outperforms all feature-, sequence- and literature-based comparison partners on multiple evaluation metrics. In terms of the Matthew’s correlation coefficient, ProLaTherm outperforms the second-best competitor by 18.1% in a nested cross-validation setup. Using proteins from species not overlapping with species from the training data, ProLaTherm outperforms all competitors by at least 9.7%. On these data, it misclassified only one nonthermophilic protein as thermophilic. Furthermore, it correctly identified 97.4% of all thermophilic proteins in our test set with an optimal growth temperature above 70°C.
Mehr
Josef Eiglsperger, Prof. Dr. Florian Haselbeck, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
ForeTiS: A comprehensive time series forecasting framework in Python (2023) Machine Learning with Applications 12 . DOI: 10.1016/j.mlwa.2023.100467

Summary: Time series forecasting is a research area with applications in various domains, nevertheless without yielding a predominant method so far. We present ForeTiS, a comprehensive and open source Python framework that allows rigorous training, comparison, and analysis of state-of-the-art time series forecasting approaches. Our framework includes fully automated yet configurable data preprocessing and feature engineering. In addition, we use advanced Bayesian optimization for automatic hyperparameter search. ForeTiS is easy to use, even for non-programmers, requiring only a single line of code to apply state-of-the-art time series forecasting. Various prediction models, ranging from classical forecasting approaches to machine learning techniques and deep learning architectures, are already integrated. More importantly, as a key benefit for researchers aiming to develop new forecasting models, ForeTiS is designed to allow for rapid integration and fair benchmarking in a reliable framework. Thus, we provide a powerful framework for both end users and forecasting experts.Availability: ForeTiS is available at https://github.com/grimmlab/ForeTiS. We provide a setup using Docker, as well as a Python package at https://pypi.org/project/ForeTiS/. Extensive online documentation with hands-on tutorials and videos can be found at https://foretis.readthedocs.io.
Mehr
Prof. Dr. Florian Haselbeck, Maura John, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
easyPheno: An easy-to-use and easy-to-extend Python framework for phenotype prediction using Bayesian optimization (2023) Bioinformatics Advances 3 (1). DOI: 10.1093/bioadv/vbad035

SummaryPredicting complex traits from genotypic information is a major challenge in various biological domains. With easyPheno, we present a comprehensive Python framework enabling the rigorous training, comparison, and analysis of phenotype predictions for a variety of different models, ranging from common genomic selection approaches over classical machine learning and modern deep learning based techniques. Our framework is easy-to-use, also for non-programming-experts, and includes an automatic hyperparameter search using state-of-the-art Bayesian optimization. Moreover, easyPheno provides various benefits for bioinformaticians developing new prediction models. easyPheno enables to quickly integrate novel models and functionalities in a reliable framework and to benchmark against various integrated prediction models in a comparable setup. In addition, the framework allows the assessment of newly developed prediction models under pre-defined settings using simulated data. We provide a detailed documentation with various hands-on tutorials and videos explaining the usage of easyPheno to novice users.Availability and ImplementationeasyPheno is publicly available at https://github.com/grimmlab/easyPheno and can be easily installed as Python package via https://pypi.org/project/easypheno/ or using Docker.Supplementary informationA comprehensive documentation including various tutorials complemented with videos can be found at https://easypheno.readthedocs.io/. In addition, we provide examples of how to use easyPheno with real and simulated data in the Supplementary.
Mehr
Nikita Genze, Raymond Ajekwe, Zeynep Güreli, Prof. Dr. Florian Haselbeck, Michael Grieb, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
Deep Learning-based Early Weed Segmentation using Motion Blurred UAV Images of Sorghum Fields (2022) Computers and Electronics in Agriculture 202 . DOI: 10.1016/j.compag.2022.107388

Weeds are undesired plants in agricultural fields that affect crop yield and quality by competing for nutrients, water, sunlight and space. For centuries, farmers have used several strategies and resources to remove weeds. The use of herbicide is still the most common control strategy. To reduce the amount of herbicide and impact caused by uniform spraying, site-specific weed management (SSWM) through variable rate herbicide application and mechanical weed control have long been recommended. To implement such precise strategies, accurate detection and classification of weeds in crop fields is a crucial first step. Due to the phenotypic similarity between some weeds and crops as well as changing weather conditions, it is challenging to design an automated system for general weed detection. For efficiency, unmanned aerial vehicles (UAV) are commonly used for image capturing. However, high wind pressure and different drone settings have a severe effect on the capturing quality, what potentially results in degraded images, e.g., due to motion blur. In this paper, we investigate the generalization capabilities of Deep Learning methods for early weed detection in sorghum fields under such challenging capturing conditions. For this purpose, we developed weed segmentation models using three different state-of-the-art Deep Learning architectures in combination with residual neural networks as feature extractors.We further publish a manually annotated and expert-curated UAV imagery dataset for weed detection in sorghum fields under challenging conditions. Our results show that our trained models generalize well regarding the detection of weeds, even for degraded captures due to motion blur. An UNet-like architecture with a ResNet-34 feature extractor achieved an F1-score of over 89 % on a hold-out test-set. Further analysis indicate that the trained model performed well in predicting the general plant shape, while most misclassifications appeared at borders of the plants. Beyond that, our approach can detect intra-row weeds without additional information as well as partly occluded plants in contrast to existing research.All data, including the newly generated and annotated UAV imagery dataset, and code is publicly available on GitHub: https://github.com/grimmlab/UAVWeedSegmentation and Mendeley Data: https://doi.org/10.17632/4hh45vkp38.3
Mehr
Maura John, Prof. Dr. Florian Haselbeck, Rupashree Dass, Christoph Malisi, Christian Dreischer, Sebastian J Schultheiss, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species (2022) Frontiers in Plant Science 13 . DOI: 10.3389/fpls.2022.932512

Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare twelve different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allows us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
Mehr
Prof. Dr. Florian Haselbeck, Jennifer Killinger, Prof. Dr. Klaus Menrad, Prof. Dr. Thomas Hannus, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
Machine Learning Outperforms Classical Forecasting on Horticultural Sales Predictions (2022) Machine Learning with Applications 7 . DOI: 10.1016/j.mlwa.2021.100239

Forecasting future demand is of high importance for many companies as it affects operational decisions. This is especially relevant for products with a short shelf life due to the potential disposal of unsold items. Horticultural products are highly influenced by this, however with limited attention in forecasting research so far. Beyond that, many forecasting competitions show a competitive performance of classical forecasting methods. For the first time, we empirically compared the performance of nine state-of-the-art machine learning and three classical forecasting algorithms for horticultural sales predictions. We show that machine learning methods were superior in all our experiments, with the gradient boosted ensemble learner XGBoost being the top performer in 14 out of 15 comparisons. This advantage over classical forecasting approaches increased for datasets with multiple seasons. Further, we show that including additional external factors, such as weather and holiday information, as well as meta-features led to a boost in predictive performance. In addition, we investigated whether the algorithms can capture the sudden increase in demand of horticultural products during the SARS-CoV-2 pandemic in 2020. For this special case, XGBoost was also superior. All code and data is publicly available on GitHub: https://github.com/grimmlab/HorticulturalSalesPredictions.
Mehr

Prof. Dr. Florian Haselbeck

Time Series Forecasting with Self-Adaptive Gaussian Process Regression (2023) Dissertation am TUM Campus Straubing für Biotechnologie und Nachhaltigkeit, 29.11.2023 .

Mehr

Prof. Dr. Florian Haselbeck

KI im Einzelhandel: Einblick in das Projekt PlantGrid (2023) Vortrag am 08. September 2023 .

Mehr
Jan D Hüwel, Prof. Dr. Florian Haselbeck, Prof. Dr. Dominik Grimm, Christian Beecks

Dynamically Self-Adjusting Gaussian Processes for Data Stream Modelling (2022) Vortrag auf der 45th German Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence am 22.09.2022, virtuell in Trier .

One of the major challenges in time series analysis are changing data distributions, especially when processing data streams. To ensure an up-to-date model delivering useful predictions at all times, model reconfigurations are required to adapt to such evolving streams. For Gaussian processes, this might require the adaptation of the internal kernel expression. In this paper, we present dynamically self-adjusting Gaussian processes by introducing Event Triggered Kernel Adjustments in Gaussian process modelling (ETKA), a novel data stream modelling algorithm that can handle evolving and changing data distributions. To this end, we enhance the recently introduced Adjusting Kernel Search with a novel online change point detection method. Our experiments on simulated data with varying change point patterns suggest a broad applicability of ETKA. On real-world data, ETKA outperforms comparison partners that differ regarding the model adjustment and its refitting trigger in nine respective ten out of 14 cases. These results confirm ETKA's ability to enable a more accurate and, in some settings, also more efficient data stream processing via Gaussian processes.Code availability: https://github.com/JanHuewel/ETKA
Mehr
Prof. Dr. Florian Haselbeck, Prof. Dr. Dominik Grimm

EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (2021) 44th German Conference on Artificial Intelligence (Virtual Conference) . DOI: 10.1007/978-3-030-87626-5_11

Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned forecasting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR combines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on simulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8% lower RMSE on different real-world datasets compared to methods with a similar computational resource consumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online forecasting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr.
Prof. Dr. Klaus Menrad, Prof. Dr. Florian Haselbeck, M.Sc. Daniel Berki-Kiss, Dr. Thomas Decker, Prof. Dr. Dominik Grimm, Prof. Dr. Thomas Hannus, Dr. rer. pol. Kai Sparke, M. Lehberger, M. Drechsler, Prof. Dr. Andreas Holzapfel, S. Schröder, Gerald Neu, F. Bertlich

PlantGrid – Digitale Management-Unterstützungssysteme für kleine und mittelständische Unternehmen in Wertschöpfungsketten von Zierpflanzen, Stauden und Schnittblumen (2021) Vortrag auf Statusworkshop der FuE-Projekte im BMEL-Förderschwerpunkt Gartenbau 4.0. 4.-5.5.2021 (digitale Konferenz) .

Mehr

Jan D Hüwel, Prof. Dr. Florian Haselbeck, Prof. Dr. Dominik Grimm, Christian Beecks
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
Dynamically Self-Adjusting Gaussian Processes for Data Stream Modelling (2022) In: KI 2022: Advances in Artificial Intelligence. KI 2022. Lecture Notes in Computer Science, vol 13404. Springer, Cham 13404 , S. 96-114. DOI: 10.1007/978-3-031-15791-2_10

One of the major challenges in time series analysis are changing data distributions, especially when processing data streams. To ensure an up-to-date model delivering useful predictions at all times, model reconfigurations are required to adapt to such evolving streams. For Gaussian processes, this might require the adaptation of the internal kernel expression. In this paper, we present dynamically self-adjusting Gaussian processes by introducing Event Triggered Kernel Adjustments in Gaussian process modelling (ETKA), a novel data stream modelling algorithm that can handle evolving and changing data distributions. To this end, we enhance the recently introduced Adjusting Kernel Search with a novel online change point detection method. Our experiments on simulated data with varying change point patterns suggest a broad applicability of ETKA. On real-world data, ETKA outperforms comparison partners that differ regarding the model adjustment and its refitting trigger in nine respective ten out of 14 cases. These results confirm ETKA's ability to enable a more accurate and, in some settings, also more efficient data stream processing via Gaussian processes.Code availability: https://github.com/JanHuewel/ETKA
Mehr
Prof. Dr. Florian Haselbeck, Prof. Dr. Dominik Grimm
Berechtigungen: Peer Reviewed
EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (2021) In: Edelkamp, S., Möller, R., Rueckert, E. (eds) KI 2021: Advances in Artificial Intelligence. KI 2021. Lecture Notes in Computer. Springer, Cham 12873 , S. 135-157. DOI: 10.1007/978-3-030-87626-5_11

Timeseriesforecastingisagrowingdomainwithdiverseapplications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned forecast- ing model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS- GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR combines online change point detection with a refitting of the prediction model using data aug- mentation for samples prior to a change point. Our experiments on simulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8% lower RMSE on different real-world datasets compared to methods with a similar computational resource consumption. Fur- thermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online forecasting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: https://github.com/grimmlab/evars-gpr.
Mehr
Martin Golz, Adolf Schenka, Prof. Dr. Florian Haselbeck, Martin Patrick Pauli
Berechtigungen: Open Access

Berechtigungen: Peer Reviewed
Inter-individual variability of eeg features during microsleep events (2019) Current Directions in Biomedical Engineering 2019 (1), S. 13-16. DOI: 10.1515/cdbme-2019-0004

This paper examines the question of how strongly the spectral properties of the EEG during microsleep differ between individuals. For this purpose, 3859 microsleep examples were compared with 4044 counterexamples in which drivers were very drowsy but were able to perform the driving task. Two types of signal features were compared: logarithmic power spectral densities and entropy measures of wavelets coefficient series. Discriminant analyses were performed with the following machine learning methods: support-vector machines, gradient boosting, learning vector quantization. To the best of our knowledge, this is the first time that results of the leave-one-subject-out cross-validation (LOSO CV) for the detection of microsleep are presented. Error rates lower than 5.0 % resulted in 17 subjects and lower than 13 % in another 11 subjects. In 3 individuals, EEG features could not be explained by the pool of EEG features of all other individuals; for them, detection errors were 15.1 %, 17.1 %, and 27.0 %. In comparison, cross validation by means of repeated random subsampling, in which individuality is not considered, yielded mean error rates of 5.0 ± 0.5 %. A subsequent inspection of raw EEG data showed that in two individuals a bad signal quality due to poor electrode attachment could be the cause and in one individual a very unusual behavior, a high and long-lasting eyelid activity which interfered the recorded EEG in all channels.
Mehr

Prof. Dr. Florian Haselbeck, Prof. Dr. Dominik Grimm
Berechtigungen: Open Access
EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (2021) arXiv:2101.04422 .

Time series forecasting is a growing domain with diverse applications. However, changes of the system behavior over time due to internal or external influences are challenging. Therefore, predictions of a previously learned fore-casting model might not be useful anymore. In this paper, we present EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (EVARS-GPR), a novel online algorithm that is able to handle sudden shifts in the target variable scale of seasonal data. For this purpose, EVARS-GPR com-bines online change point detection with a refitting of the prediction model using data augmentation for samples prior to a change point. Our experiments on sim-ulated data show that EVARS-GPR is applicable for a wide range of output scale changes. EVARS-GPR has on average a 20.8 % lower RMSE on different real-world datasets compared to methods with a similar computational resource con-sumption. Furthermore, we show that our algorithm leads to a six-fold reduction of the averaged runtime in relation to all comparison partners with a periodical refitting strategy. In summary, we present a computationally efficient online fore-casting algorithm for seasonal time series with changes of the target variable scale and demonstrate its functionality on simulated as well as real-world data. All code is publicly available on GitHub: this https URL
Mehr

Prof. Dr. Florian Haselbeck

Publikationen

Zeitschriftenbeiträge (peer-reviewed)

Forecasting seasonally fluctuating sales of perishable products in the horticultural industry (2024) Expert Systems with Applications 249 . DOI: 10.1016/j.eswa.2024.123438

Superior protein thermophilicity prediction with protein language model embeddings (2023) NAR Genomics and Bioinformatics 5 (4). DOI: 10.1093/nargab/lqad087

ForeTiS: A comprehensive time series forecasting framework in Python (2023) Machine Learning with Applications 12 . DOI: 10.1016/j.mlwa.2023.100467

easyPheno: An easy-to-use and easy-to-extend Python framework for phenotype prediction using Bayesian optimization (2023) Bioinformatics Advances 3 (1). DOI: 10.1093/bioadv/vbad035

Deep Learning-based Early Weed Segmentation using Motion Blurred UAV Images of Sorghum Fields (2022) Computers and Electronics in Agriculture 202 . DOI: 10.1016/j.compag.2022.107388

A comparison of classical and machine learning-based phenotype prediction methods on simulated data and three plant species (2022) Frontiers in Plant Science 13 . DOI: 10.3389/fpls.2022.932512

Machine Learning Outperforms Classical Forecasting on Horticultural Sales Predictions (2022) Machine Learning with Applications 7 . DOI: 10.1016/j.mlwa.2021.100239

Dissertation

Time Series Forecasting with Self-Adaptive Gaussian Process Regression (2023) Dissertation am TUM Campus Straubing für Biotechnologie und Nachhaltigkeit, 29.11.2023 .

Vorträge

KI im Einzelhandel: Einblick in das Projekt PlantGrid (2023) Vortrag am 08. September 2023 .

Dynamically Self-Adjusting Gaussian Processes for Data Stream Modelling (2022) Vortrag auf der 45th German Conference on Artificial Intelligence, Lecture Notes in Artificial Intelligence am 22.09.2022, virtuell in Trier .

EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (2021) 44th German Conference on Artificial Intelligence (Virtual Conference) . DOI: 10.1007/978-3-030-87626-5_11

PlantGrid – Digitale Management-Unterstützungssysteme für kleine und mittelständische Unternehmen in Wertschöpfungsketten von Zierpflanzen, Stauden und Schnittblumen (2021) Vortrag auf Statusworkshop der FuE-Projekte im BMEL-Förderschwerpunkt Gartenbau 4.0. 4.-5.5.2021 (digitale Konferenz) .

Beiträge zu wissenschaftlicher Konferenz/Tagung

Dynamically Self-Adjusting Gaussian Processes for Data Stream Modelling (2022) In: KI 2022: Advances in Artificial Intelligence. KI 2022. Lecture Notes in Computer Science, vol 13404. Springer, Cham 13404 , S. 96-114. DOI: 10.1007/978-3-031-15791-2_10

Inter-individual variability of eeg features during microsleep events (2019) Current Directions in Biomedical Engineering 2019 (1), S. 13-16. DOI: 10.1515/cdbme-2019-0004

Sonstige Veröffentlichungen

EVARS-GPR: EVent-triggered Augmented Refitting of Gaussian Process Regression for Seasonal Data (2021) arXiv:2101.04422 .

Forschungsprojekte

Digitale Management-Unterstützungssysteme für kleine und mittelständische Unternehmen in Wertschöpfungsketten von Zierpflanzen, Stauden und Schnittblumen (PlantGrid)