Josef Eiglsperger,
Prof. Dr. Florian Haselbeck,
Viola Stiele,
Claudia Guadarrama Serrano,
Kelly Lim-Trinh,
Prof. Dr. Klaus Menrad,
Prof. Dr. Thomas Hannus,
Prof. Dr. Dominik Grimm
Accurately forecasting demand is a potential competitive advantage, especially when dealing with perishable products. The multi-billion dollar horticultural industry is highly affected by perishability, but has received limited attention in forecasting research. In this paper, we analyze the applicability of general compared to dataset-specific predictors, as well as the influence of external information and online model update schemes. We employ a heterogeneous set of horticultural data, three classical, and twelve machine learning-based forecasting approaches. Our results show a superiority of multivariate machine learning methods, in particular the ensemble learner XGBoost. These advantages highlight the importance of external factors, with the feature set containing statistical, calendrical, and weather-related features leading to the most robust performance. We further observe that a general model is unable to capture the heterogeneity of the data and is outperformed by dataset-specific predictors. Moreover, frequent model updates have a negligible impact on forecasting quality, allowing long-term forecasting without significant performance degradation.
Mehr
Nikita Genze,
Wouter K Vahl,
Jennifer Groth,
Maximilian Wirth,
Michael Grieb,
Prof. Dr. Dominik Grimm
Sustainable weed management strategies are critical to feeding the world’s population while preserving ecosystems and biodiversity. Therefore, site-specific weed control strategies based on automation are needed to reduce the additional time and effort required for weeding. Machine vision-based methods appear to be a promising approach for weed detection, but require high quality data on the species in a specific agricultural area. Here we present a dataset, the Moving Fields Weed Dataset (MFWD), which captures the growth of 28 weed species commonly found in sorghum and maize fields in Germany. A total of 94,321 images were acquired in a fully automated, high-throughput phenotyping facility to track over 5,000 individual plants at high spatial and temporal resolution. A rich set of manually curated ground truth information is also provided, which can be used not only for plant species classification, object detection and instance segmentation tasks, but also for multiple object tracking.
Mehr
Fabian Schäfer,
Manuel Walther,
Prof. Dr. Dominik Grimm,
Alexander Hübner
Assigning inpatients to hospital beds impacts patient satisfaction and the workload of nurses and doctors. The assignment is subject to unknown inpatient arrivals, in particular for emergency patients. Hospitals, therefore, need to deal with uncertainty on actual bed requirements and potential shortage situations as bed capacities are limited. This paper develops a model and solution approach for solving the patient bed-assignment problem that is based on a machine learning (ML) approach to forecasting emergency patients. First, it contributes by improving the anticipation of emergency patients using ML approaches, incorporating weather data, time and dates, important local and regional events, as well as current and historical occupancy levels. Drawing on real-life data from a large case hospital, we were able to improve forecasting accuracy for emergency inpatient arrivals. We achieved up to 17% better root mean square error (RMSE) when using ML methods compared to a baseline approach relying on averages for historical arrival rates. We further show that the ML methods outperform time series forecasts. Second, we develop a new hyper-heuristic for solving real-life problem instances based on the pilot method and a specialized greedy look-ahead (GLA) heuristic. When applying the hyper-heuristic in test sets we were able to increase the objective function by up to 5.3% in comparison to the benchmark approach in [40]. A benchmark with a Genetic Algorithm shows also the superiority of the hyper-heuristic. Third, the combination of ML for emergency patient admission forecasting with advanced optimization through the hyper-heuristic allowed us to obtain an improvement of up to 3.3% on a real-life problem.
Mehr
Prof. Dr. Florian Haselbeck,
Maura John,
Yuqi Zhang,
Jonathan Pirnay,
Juan Pablo Fuenzalida-Werner,
Ruben Costa,
Prof. Dr. Dominik Grimm
Protein thermostability is important in many areas of biotechnology, including enzyme engineering and protein-hybrid optoelectronics. Ever-growing protein databases and information on stability at different temperatures allow the training of machine learning models to predict whether proteins are thermophilic. In silico predictions could reduce costs and accelerate the development process by guiding researchers to more promising candidates. Existing models for predicting protein thermophilicity rely mainly on features derived from physicochemical properties. Recently, modern protein language models that directly use sequence information have demonstrated superior performance in several tasks. In this study, we evaluate the usefulness of protein language model embeddings for thermophilicity prediction with ProLaTherm, a Protein Language model-based Thermophilicity predictor. ProLaTherm significantly outperforms all feature-, sequence- and literature-based comparison partners on multiple evaluation metrics. In terms of the Matthew’s correlation coefficient, ProLaTherm outperforms the second-best competitor by 18.1% in a nested cross-validation setup. Using proteins from species not overlapping with species from the training data, ProLaTherm outperforms all competitors by at least 9.7%. On these data, it misclassified only one nonthermophilic protein as thermophilic. Furthermore, it correctly identified 97.4% of all thermophilic proteins in our test set with an optimal growth temperature above 70°C.
Mehr
Nikita Genze,
Maximilian Wirth,
Christian Schreiner,
Raymond Ajekwe,
Michael Grieb,
Prof. Dr. Dominik Grimm
BackgroundEfficient and site-specific weed management is a critical step in many agricultural tasks. Image captures from drones and modern machine learning based computer vision methods can be used to assess weed infestation in agricultural fields more efficiently. However, the image quality of the captures can be affected by several factors, including motion blur. Image captures can be blurred because the drone moves during the image capturing process, e.g. due to wind pressure or camera settings. These influences complicate the annotation of training and test samples and can also lead to reduced predictive power in segmentation and classification tasks.ResultsIn this study, we propose DeBlurWeedSeg, a combined deblurring and segmentation model for weed and crop segmentation in motion blurred images. For this purpose, we first collected a new dataset of matching sharp and naturally blurred image pairs of real sorghum and weed plants from drone images of the same agricultural field. The data was used to train and evaluate the performance of DeBlurWeedSeg on both sharp and blurred images of a hold-out test-set. We show that DeBlurWeedSeg outperforms a standard segmentation model that does not include an integrated deblurring step, with a relative improvement of 13.4% in terms of the Sørensen-Dice coefficient.ConclusionOur combined deblurring and segmentation model DeBlurWeedSeg is able to accurately segment weeds from sorghum and background, in both sharp as well as motion blurred drone captures. This has high practical implications, as lower error rates in weed and crop segmentation could lead to better weed control, e.g. when using robots for mechanical weed removal.
Mehr
Quirin Göttl,
Jonathan Pirnay,
Prof. Dr. Dominik Grimm,
Prof. Dr.-Ing. Jakob Burger
The determination of liquid phase equilibria plays an important role in chemical process simulation. This work presents a generalization of an approach called the convex envelope method (CEM), which constructs all liquid phase equilibria over the whole composition space for a given system with an arbitrary number of components. For this matter, the composition space is discretized and the convex envelope of the Gibbs energy graph is computed. Employing the tangent plane criterion, all liquid phase equilibria can be determined in a robust way. The generalized CEM is described within a mathematical framework and it is shown to work numerically with various examples of up to six components from the literature.
Mehr
Josef Eiglsperger,
Prof. Dr. Florian Haselbeck,
Prof. Dr. Dominik Grimm
Summary: Time series forecasting is a research area with applications in various domains, nevertheless without yielding a predominant method so far. We present ForeTiS, a comprehensive and open source Python framework that allows rigorous training, comparison, and analysis of state-of-the-art time series forecasting approaches. Our framework includes fully automated yet configurable data preprocessing and feature engineering. In addition, we use advanced Bayesian optimization for automatic hyperparameter search. ForeTiS is easy to use, even for non-programmers, requiring only a single line of code to apply state-of-the-art time series forecasting. Various prediction models, ranging from classical forecasting approaches to machine learning techniques and deep learning architectures, are already integrated. More importantly, as a key benefit for researchers aiming to develop new forecasting models, ForeTiS is designed to allow for rapid integration and fair benchmarking in a reliable framework. Thus, we provide a powerful framework for both end users and forecasting experts.Availability: ForeTiS is available at https://github.com/grimmlab/ForeTiS. We provide a setup using Docker, as well as a Python package at https://pypi.org/project/ForeTiS/. Extensive online documentation with hands-on tutorials and videos can be found at https://foretis.readthedocs.io.
Mehr
Prof. Dr. Florian Haselbeck,
Maura John,
Prof. Dr. Dominik Grimm
SummaryPredicting complex traits from genotypic information is a major challenge in various biological domains. With easyPheno, we present a comprehensive Python framework enabling the rigorous training, comparison, and analysis of phenotype predictions for a variety of different models, ranging from common genomic selection approaches over classical machine learning and modern deep learning based techniques. Our framework is easy-to-use, also for non-programming-experts, and includes an automatic hyperparameter search using state-of-the-art Bayesian optimization. Moreover, easyPheno provides various benefits for bioinformaticians developing new prediction models. easyPheno enables to quickly integrate novel models and functionalities in a reliable framework and to benchmark against various integrated prediction models in a comparable setup. In addition, the framework allows the assessment of newly developed prediction models under pre-defined settings using simulated data. We provide a detailed documentation with various hands-on tutorials and videos explaining the usage of easyPheno to novice users.Availability and ImplementationeasyPheno is publicly available at https://github.com/grimmlab/easyPheno and can be easily installed as Python package via https://pypi.org/project/easypheno/ or using Docker.Supplementary informationA comprehensive documentation including various tutorials complemented with videos can be found at https://easypheno.readthedocs.io/. In addition, we provide examples of how to use easyPheno with real and simulated data in the Supplementary.
Mehr
Natalia Bercovich,
Nikita Genze,
Marco Todesco,
Gregory L. Owens,
Sébastien Légaré,
Kaichi Huang,
Loren H. Rieseberg,
Prof. Dr. Dominik Grimm
Genomic studies often attempt to link natural genetic variation with important phenotypic variation. To succeed, robust and reliable phenotypic data, as well as curated genomic assemblies, are required. Wild sunflowers, originally from North America, are adapted to diverse and often extreme environments and have historically been a widely used model plant system for the study of population genomics, adaptation, and speciation. Moreover, cultivated sunflower, domesticated from a wild relative (Helianthus annuus) is a global oil crop, ranking fourth in production of vegetable oils worldwide. Public availability of data resources both for the plant research community and for the associated agricultural sector, are extremely valuable. We have created HeliantHOME (http://www.helianthome.org), a curated, public, and interactive database of phenotypes including developmental, structural and environmental ones, obtained from a large collection of both wild and cultivated sunflower individuals. Additionally, the database is enriched with external genomic data and results of genome-wide association studies. Finally, being a community open-source platform, HeliantHOME is expected to expand as new knowledge and resources become available.
Mehr
Maura John,
Markus J Ankenbrand,
Carolin Artmann,
Jan A Freudenthal,
Arthur Korte,
Prof. Dr. Dominik Grimm
Motivation: Genome-wide Association Studies (GWAS) are an integral tool for studying the architecture ofcomplex genotype and phenotype relationships. Linear Mixed Models (LMMs) are commonly used to detectassociations between genetic markers and a trait of interest, while at the same time allowing to account for population structure and cryptic relatedness. Assumptions of LMMs include a normal distribution of theresiduals and that the genetic markers are independent and identically distributed - both assumptions are often violated in real data. Permutation-based methods can help to overcome some of these limitations and provide more realistic thresholds for the discovery of true associations. Still, in practice they are rarely implemented due to the high computational complexity.Results: We propose permGWAS, an efficient linear mixed model reformulation based on 4D-tensors that can provide permutation-based significance thresholds. We show that our method outperforms current state-of-the-art LMMs with respect to runtime and that permutation-based thresholds have a lower false discovery rates for skewed phenotypes compared to the commonly used Bonferroni threshold. Furthermore, using permGWAS we re-analyzed more than 500 Arabidopsis thaliana phenotypes with 100 permutations each in less than eight days on a single GPU. Our re-analyses suggest that applying a permutation-based threshold can improve and refine the interpretation of GWAS results.Availability: permGWAS is open-source and publicly available on GitHub for download: https://github.com/grimmlab/permGWAS
Mehr
Richa Bharti,
Daniel Siebert,
Bastian Blombach,
Prof. Dr. Dominik Grimm
Transcriptional-translational coupling is accepted to be a fundamental mechanism of gene expression in prokaryotes and therefore has been analyzed in detail. However, the underlying genomic architecture of the expression machinery has not been well investigated so far. In this study, we established a bioinformatics pipeline to systematically investigated >1800 bacterial genomes for the abundance of transcriptional and translational associated genes clustered in distinct gene cassettes. We identified three highly frequent cassettes containing transcriptional and translational genes, i.e. rplk-nusG (gene cassette 1; in 553 genomes), rpoA-rplQ-rpsD-rpsK-rpsM (gene cassette 2; in 656 genomes) and nusA-infB (gene cassette 3; in 877 genomes). Interestingly, each of the three cassettes harbors a gene (nusG, rpsD and nusA) encoding a protein which links transcription and translation in bacteria. The analyses suggest an enrichment of these cassettes in pathogenic bacterial phyla with >70% for cassette 3 (i.e. Neisseria, Salmonella and Escherichia) and >50% for cassette 1 (i.e. Treponema, Prevotella, Leptospira and Fusobacterium) and cassette 2 (i.e. Helicobacter, Campylobacter, Treponema and Prevotella). These insights form the basis to analyze the transcriptional regulatory mechanisms orchestrating transcriptional–translational coupling and might open novel avenues for future biotechnological approaches.
Mehr
Indrajit Nanda,
Sarah K Schröder,
Claus Steinlein,
Thomas Haaf,
Eva M. Buhl,
Prof. Dr. Dominik Grimm,
Ralf Weiskirchen
Hepatic stellate cells (HSCs) are also known as lipocytes, fat-storing cells, perisinusoidal cells, or Ito cells. These liver-specific mesenchymal cells represent about 5% to 8% of all liver cells, playing a key role in maintaining the microenvironment of the hepatic sinusoid. Upon chronic liver injury or in primary culture, these cells become activated and transdifferentiate into a contractile phenotype, i.e., the myofibroblast, capable of producing and secreting large quantities of extracellular matrix compounds. Based on their central role in the initiation and progression of chronic liver diseases, cultured HSCs are valuable in vitro tools to study molecular and cellular aspects of liver diseases. However, the isolation of these cells requires special equipment, trained personnel, and in some cases needs approval from respective authorities. To overcome these limitations, several immortalized HSC lines were established. One of these cell lines is CFSC, which was originally established from cirrhotic rat livers induced by carbon tetrachloride. First introduced in 1991, this cell line and derivatives thereof (i.e., CFSC-2G, CFSC-3H, CFSC-5H, and CFSC-8B) are now used in many laboratories as an established in vitro HSC model. We here describe molecular features that are suitable for cell authentication. Importantly, chromosome banding and multicolor spectral karyotyping (SKY) analysis demonstrate that the CFSC-2G genome has accumulated extensive chromosome rearrangements and most chromosomes exist in multiple copies producing a pseudo-triploid karyotype. Furthermore, our study documents a defined short tandem repeat (STR) profile including 31 species-specific markers, and a list of genes expressed in CFSC-2G established by bulk mRNA next-generation sequencing (NGS).
Mehr
Nikita Genze,
Raymond Ajekwe,
Zeynep Güreli,
Prof. Dr. Florian Haselbeck,
Michael Grieb,
Prof. Dr. Dominik Grimm
Weeds are undesired plants in agricultural fields that affect crop yield and quality by competing for nutrients, water, sunlight and space. For centuries, farmers have used several strategies and resources to remove weeds. The use of herbicide is still the most common control strategy. To reduce the amount of herbicide and impact caused by uniform spraying, site-specific weed management (SSWM) through variable rate herbicide application and mechanical weed control have long been recommended. To implement such precise strategies, accurate detection and classification of weeds in crop fields is a crucial first step. Due to the phenotypic similarity between some weeds and crops as well as changing weather conditions, it is challenging to design an automated system for general weed detection. For efficiency, unmanned aerial vehicles (UAV) are commonly used for image capturing. However, high wind pressure and different drone settings have a severe effect on the capturing quality, what potentially results in degraded images, e.g., due to motion blur. In this paper, we investigate the generalization capabilities of Deep Learning methods for early weed detection in sorghum fields under such challenging capturing conditions. For this purpose, we developed weed segmentation models using three different state-of-the-art Deep Learning architectures in combination with residual neural networks as feature extractors.We further publish a manually annotated and expert-curated UAV imagery dataset for weed detection in sorghum fields under challenging conditions. Our results show that our trained models generalize well regarding the detection of weeds, even for degraded captures due to motion blur. An UNet-like architecture with a ResNet-34 feature extractor achieved an F1-score of over 89 % on a hold-out test-set. Further analysis indicate that the trained model performed well in predicting the general plant shape, while most misclassifications appeared at borders of the plants. Beyond that, our approach can detect intra-row weeds without additional information as well as partly occluded plants in contrast to existing research.All data, including the newly generated and annotated UAV imagery dataset, and code is publicly available on GitHub: https://github.com/grimmlab/UAVWeedSegmentation and Mendeley Data: https://doi.org/10.17632/4hh45vkp38.3
Mehr
Maura John,
Prof. Dr. Florian Haselbeck,
Rupashree Dass,
Christoph Malisi,
Christian Dreischer,
Sebastian J Schultheiss,
Prof. Dr. Dominik Grimm
Genomic selection is an integral tool for breeders to accurately select plants directly from genotype data leading to faster and more resource-efficient breeding programs. Several prediction methods have been established in the last few years. These range from classical linear mixed models to complex non-linear machine learning approaches, such as Support Vector Regression, and modern deep learning-based architectures. Many of these methods have been extensively evaluated on different crop species with varying outcomes. In this work, our aim is to systematically compare twelve different phenotype prediction models, including basic genomic selection methods to more advanced deep learning-based techniques. More importantly, we assess the performance of these models on simulated phenotype data as well as on real-world data from Arabidopsis thaliana and two breeding datasets from soy and corn. The synthetic phenotypic data allows us to analyze all prediction models and especially the selected markers under controlled and predefined settings. We show that Bayes B and linear regression models with sparsity constraints perform best under different simulation settings with respect to explained variance. Further, we can confirm results from other studies that there is no superiority of more complex neural network-based architectures for phenotype prediction compared to well established methods. However, on real-world data, for which several prediction models yield comparable results with slight advantages for Elastic Net, this picture is less clear, suggesting that there is a lot of room for future research.
Mehr
Beiträge in Monografien, Sammelwerken, Schriftenreihen
Genome-wide association studies (GWAS) are a powerful tool to elucidate the genotype–phenotype map. Although GWAS are usually used to assess simple univariate associations between genetic markers and traits of interest, it is also possible to infer the underlying genetic architecture and to predict gene regulatory interactions. In this chapter, we describe the latest methods and tools to perform GWAS by calculating permutation-based significance thresholds. For this purpose, we first provide guidelines on univariate GWAS analyses that are extended in the second part of this chapter to more complex models that enable the inference of gene regulatory networks and how these networks vary.
Mehr
Beiträge zu wissenschaftlicher Konferenz/Tagung
Jonathan Pirnay,
Quirin Göttl,
Jakob Burger,
Prof. Dr. Dominik Grimm
AlphaZero-type algorithms may stop improving on single-player tasks in case the value network guiding the tree search is unable to approximate the outcome of an episode sufficiently well. One technique to address this problem is transform- ing the single-player task through self-competition. The main idea is to com- pute a scalar baseline from the agent’s historical performances and to reshape an episode’s reward into a binary output, indicating whether the baseline has been exceeded or not. However, this baseline only carries limited information for the agent about strategies how to improve. We leverage the idea of self-competition and directly incorporate a historical policy into the planning process instead of its scalar performance. Based on the recently introduced Gumbel AlphaZero (GAZ), we propose our algorithm GAZ ‘Play-to-Plan’ (GAZ PTP), in which the agent learns to find strong trajectories by planning against possible strategies of its past self. We show the effectiveness of our approach in two well-known combina- torial optimization problems, the Traveling Salesman Problem and the Job-Shop Scheduling Problem. With only half of the simulation budget for search, GAZ PTP consistently outperforms all selected single-player variants of GAZ.
Mehr
Jan D Hüwel,
Prof. Dr. Florian Haselbeck,
Prof. Dr. Dominik Grimm,
Christian Beecks
One of the major challenges in time series analysis are changing data distributions, especially when processing data streams. To ensure an up-to-date model delivering useful predictions at all times, model reconfigurations are required to adapt to such evolving streams. For Gaussian processes, this might require the adaptation of the internal kernel expression. In this paper, we present dynamically self-adjusting Gaussian processes by introducing Event Triggered Kernel Adjustments in Gaussian process modelling (ETKA), a novel data stream modelling algorithm that can handle evolving and changing data distributions. To this end, we enhance the recently introduced Adjusting Kernel Search with a novel online change point detection method. Our experiments on simulated data with varying change point patterns suggest a broad applicability of ETKA. On real-world data, ETKA outperforms comparison partners that differ regarding the model adjustment and its refitting trigger in nine respective ten out of 14 cases. These results confirm ETKA's ability to enable a more accurate and, in some settings, also more efficient data stream processing via Gaussian processes.Code availability: https://github.com/JanHuewel/ETKA
Mehr
Quirin Göttl,
Prof. Dr. Dominik Grimm,
Prof. Dr.-Ing. Jakob Burger
The present work uses reinforcement learning (RL) for automated flowsheet synthesis. The task of synthesizing a flowsheet is reformulated into a two-player game, in which an agent learns by self-play without prior knowledge. The hierarchical RL scheme developed in our previous work (Göttl et al., 2021b) is coupled with an improved training process. The training process is analyzed in detail using the synthesis of ethyl tert-butyl ether (ETBE) as an example. This analysis uncovers how the agent’s evolution is driven by the two-player setup.
Mehr
Vorträge
Jan D Hüwel,
Prof. Dr. Florian Haselbeck,
Prof. Dr. Dominik Grimm,
Christian Beecks
One of the major challenges in time series analysis are changing data distributions, especially when processing data streams. To ensure an up-to-date model delivering useful predictions at all times, model reconfigurations are required to adapt to such evolving streams. For Gaussian processes, this might require the adaptation of the internal kernel expression. In this paper, we present dynamically self-adjusting Gaussian processes by introducing Event Triggered Kernel Adjustments in Gaussian process modelling (ETKA), a novel data stream modelling algorithm that can handle evolving and changing data distributions. To this end, we enhance the recently introduced Adjusting Kernel Search with a novel online change point detection method. Our experiments on simulated data with varying change point patterns suggest a broad applicability of ETKA. On real-world data, ETKA outperforms comparison partners that differ regarding the model adjustment and its refitting trigger in nine respective ten out of 14 cases. These results confirm ETKA's ability to enable a more accurate and, in some settings, also more efficient data stream processing via Gaussian processes.Code availability: https://github.com/JanHuewel/ETKA
Mehr
Prof. Dr. Dominik Grimm
Towards a better understanding of the genetic architecture of complex traits (2022) Keynote @TüBMI 2022, Tübinger Bioinformatics and Medical Informatics Days 2022 .
Ziel ist es, intelligente, adaptierbare, frei verfügbare und einfach zu handhabende Methoden zur KI- und bildbasierten Phänotypisierung diverser Blattkrankheiten für eine verbesserte Bewertung …
Ziel ist es, mittels Drohnenfotos in Sorghum und Mais Karten zum räumlichen Verteilungsmuster des Beikrautbesatzes zu entwickeln und validieren – als Basis für teilflächenspezifischen mechanischen …
Wir verwenden Cookies. Einige sind notwendig für die Funktion der Webseite, andere helfen uns, die Webseite zu verbessern. Um unseren eigenen Ansprüchen beim Datenschutz gerecht zu werden, erfassen wir lediglich anonymisierte Nutzerdaten mit „Matomo“. Um unser Internetangebot für Sie ansprechender zu gestalten, binden wir außerdem externe Inhalte unserer Social-Media-Kanäle ein.