Tobias Jakobi
- Assistant Professor
- Member of the Graduate Faculty
- Assistant Professor, BIO5 Institute
- (602) 827-2078
- COLLEGE OF MEDICINE PHX, Rm. 2225
- PHOENIX, AZ 85004-2230
- tjakobi@arizona.edu
Biography
I am a bioinformatician trained with an emphasis on the interconnection of wet lab research and computational research. My academic and research training included eukaryotic biology, genome research, and wet lab training, in addition to comprehensive training in theoretical and applied bioinformatics to allow fluent communication between wet lab and bioinformatics. My work has focused on the analysis of genomics and transcriptomics data to answer questions in industrial biotechnology, aging, and cardiovascular research. Furthermore, I was part of the team that developed a highly efficient DNA compression and indexing algorithm during an internship at Illumina, Cambridge, UK. As a postdoctoral fellow at the Max Planck Institute for Biology of Ageing in Cologne, Germany, I continued my work on developing reproducible RNA-seq bioinformatics workflows and became interested in circular RNAs (circRNAs), an exciting species of covalently closed RNA molecules that are expressed in diverse cell types but very little is known about functions, potential involvement in disease, and their regulation. I moved to Heidelberg University Hospital, Heidelberg, Germany in 2015 to help establish a bioinformatics research group focusing on computational cardiology and continued my work as postdoctoral fellow. I was responsible for design, setup, and maintenance of a high-performance computer cluster in the Department of Cardiology that is used by the bioinformatics group for collaborations with cardiovascular researchers. I contributed my expertise in transcriptomics data analysis and data visualization to several cardiac-centric studies and followed up on functions and effects of circRNAs in the cardiac context and published the first study of circRNAs in the murine heart. Moreover, while in Heidelberg, I also started developing software and analysis methods for single cell sequencing projects in collaboration with other research groups.
In March 2021 I was recruited as a tenure-track in the Department of Internal Medicine and in the new Translational Cardiovascular Research Center (TCRC) at The University of Arizona College of Medicine - Phoenix and established my own research lab. My independent research program develops state-of-the-art computational approaches to answer cardiovascular questions, with a specific interest in the dynamics of circRNAs and RNA biology in health and disease.
Degrees
- Ph.D. Bioinformatics & Genome Research
- Bielefeld University, Bielefeld, Germany
- Bioinformatic methods for eukaryotic RNA-Seq-based promoter identification
- M.S. Bioinformatics & Genome Research
- Bielefeld University, Bielefeld, Germany
- Semiglobal Alignment of Short Reads using CUDA and Needleman-Wunsch
- B.S. Bioinformatics & Genome Research
- Bielefeld University, Bielefeld, Germany
- Adaption of the IGetDB data warehouse for the TRUNCATULIX project
Work Experience
- Heidelberg University Hospital (2015 - 2021)
- Max Planck Institute for Biology of Ageing (2014 - 2015)
- UK Computational Biology Group Illumina Ltd (2011)
- Bielefeld University (2010 - 2013)
Awards
- Career Development Award
- University of Arizona Health Sciences, Spring 2023
- New Investigator Award
- Arizona Biomedical Research Centre, Spring 2023
- Rapid-Turnaround Seed Grant
- BIO5 Institute, Fall 2022
- Best Paper Award
- International Conference on Intelligent Systems for Molecular Biology (ISMB), Fall 2012
Interests
Teaching
• Foundations of bioinformatics• Data science• Introduction to high performance computing for bioinformatics• RNA Biology
Research
• Bioinformatics• Circular RNAs (circRNAs)• Computational RNA biology• RNA editing• Computational RNA biology• Systems cardiology• High performance computing• Compute cluster infrastructure
Courses
2024-25 Courses
-
Independent Study
CTS 699 (Fall 2024)
2023-24 Courses
-
Introduction to Bioinformatics
CTS 505 (Fall 2023)
Scholarly Contributions
Chapters
- Jakobi, T. (2024). State-of-the-Art Circular RNA Analytics Using the Circtools Software Suite. In Methods in Molecular Biology(pp 23-46).More infoCircular RNAs (circRNAs) are types of RNA molecules that have been discovered relatively recently and have been found to be widely expressed in eukaryotic cells. Unlike canonical linear RNA molecules, circRNAs form a covalently closed continuous loop structure without a 5' or 3' end. They are generated by a process called back-splicing, in which a downstream splice donor site is joined to an upstream splice acceptor site. CircRNAs have been found to play important roles in various biological processes, including gene regulation, alternative splicing, and protein translation. They can act as sponges for microRNAs or RNA-binding proteins and can also encode peptides or proteins. Additionally, circRNAs have been implicated in several diseases, including cancer, neurological disorders, and cardiovascular diseases.This protocol provides all necessary steps to detect and analyze circRNAs in silico from RNA sequencing data using the circtools circRNA analytics software suite. The protocol starts from raw sequencing data with circRNA detection via back-splice events and includes statistical testing of circRNAs as well as primer design for follow-up wet lab experiments.
- Worpenberg, L., Jakobi, T., Dieterich, C., & Roignant, J. Y. (2019). Identification of Methylated Transcripts Using the TRIBE Approach. In Methods in Molecular Biology(pp 89-106).More infomA is the most abundant internal modification on mRNA. Recent improvements of high-throughput sequencing techniques enables its detection at the transcriptome level, even at the nucleotide resolution. However most current techniques require large amounts of starting material to detect the modification. Here, we describe a complementary technique of standard meRIP-seq/miCLIP-seq approaches to identify methylated RNA using a low amount of material. We believe this approach can be applied in vivo to identify methylated targets in specific tissues or subpopulations of cells.
- Jakobi, T., & Dieterich, C. (2018). Deep Computational Circular RNA Analytics from RNA-seq Data. In Methods in Molecular Biology(pp 9-25).More infoCircular RNAs (circRNAs) have been first described as "scrambled exons" in the 1990s. CircRNAs originate from back splicing or exon skipping of linear RNA templates and have continuously gained attention in recent years due to the availability of high-throughput whole-transcriptome sequencing methods. Numerous manuscripts describe thousands of circRNAs throughout uni- and multicellular eukaryote species and demonstrated that they are conserved, stable, and abundant in specific tissues or conditions. This manuscript provides a walk-through of our bioinformatics toolbox, which covers all aspects of in silico circRNA analysis, starting from raw sequencing data and back-splicing junction discovery to circRNA quantitation and reconstruction of internal the circRNA structure.
Journals/Publications
- Hofmann, C., Aghajani, M., Alcock, C. D., Blackwood, E. A., Sandmann, C., Herzog, N., Groß, J., Plate, L., Wiseman, R. L., Kaufman, R. J., Katus, H. A., Jakobi, T., Völkers, M., Glembotski, C. C., & Doroudgar, S. (2024). ATF6 protects against protein misfolding during cardiac hypertrophy. Journal of molecular and cellular cardiology, 189, 12-24.More infoCardiomyocytes activate the unfolded protein response (UPR) transcription factor ATF6 during pressure overload-induced hypertrophic growth. The UPR is thought to increase ER protein folding capacity and maintain proteostasis. ATF6 deficiency during pressure overload leads to heart failure, suggesting that ATF6 protects against myocardial dysfunction by preventing protein misfolding. However, conclusive evidence that ATF6 prevents toxic protein misfolding during cardiac hypertrophy is still pending. Here, we found that activation of the UPR, including ATF6, is a common response to pathological cardiac hypertrophy in mice. ATF6 KO mice failed to induce sufficient levels of UPR target genes in response to chronic isoproterenol infusion or transverse aortic constriction (TAC), resulting in impaired cardiac growth. To investigate the effects of ATF6 on protein folding, the accumulation of poly-ubiquitinated proteins as well as soluble amyloid oligomers were directly quantified in hypertrophied hearts of WT and ATF6 KO mice. Whereas only low levels of protein misfolding was observed in WT hearts after TAC, ATF6 KO mice accumulated increased quantities of misfolded protein, which was associated with impaired myocardial function. Collectively, the data suggest that ATF6 plays a critical adaptive role during cardiac hypertrophy by protecting against protein misfolding.
- Hofmann, C., Serafin, A., Schwerdt, O. M., Fischer, J., Sicklinger, F., Younesi, F. S., Byrne, N. J., Meyer, I. S., Malovrh, E., Sandmann, C., Jürgensen, L., Kamuf-Schenk, V., Stroh, C., Löwenthal, Z., Finke, D., Boileau, E., Beisaw, A., Bugger, H., Rettel, M., , Stein, F., et al. (2024). Transient Inhibition of Translation Improves Cardiac Function After Ischemia/Reperfusion by Attenuating the Inflammatory Response. Circulation.More infoThe myocardium adapts to ischemia/reperfusion (I/R) by changes in gene expression, determining the cardiac response to reperfusion. mRNA translation is a key component of gene expression. It is largely unknown how regulation of mRNA translation contributes to cardiac gene expression and inflammation in response to reperfusion and whether it can be targeted to mitigate I/R injury.
- Kmietczyk, V., Gupta, P., Varma, E., Hartl, S., Furkel, J., Konstandin, M., Marx, A., Loewenthal, Z., Kamuf-Schenk, V., Stroh, C., Gorska, A., Martin-Garrido, A., Heineke, J., Jakobi, T., Frey, N., Jürgensen, L., Oelschläger, J., & Völkers, M. (2023). Ythdf2 regulates cardiac remodeling through its mRNA target transcripts. J Mol Cell Cardiol., 181. doi:10.1016/j.yjmcc.2023.06.001More infom6A mRNA methylation controls cardiomyocyte function and increased overall m6A levels are a stereotyping finding in heart failure independent of the underlying etiology. However, it is largely unknown how the information is read by m6A reader proteins in heart failure. Here we show that the m6A reader protein Ythdf2 controls cardiac function and identified a novel mechanism how reader proteins control gene expression and cardiac function. Deletion of Ythdf2 in cardiomyocytes in vivo leads to mild cardiac hypertrophy, reduced heart function, and increased fibrosis during pressure overload as well as during aging. Similarly, in vitro the knockdown of Ythdf2 results in cardiomyocyte growth and remodeling. Mechanistically, we identified the eucaryotic elongation factor 2 as post-transcriptionally regulated by Ythdf2 using cell type specific Ribo-seq data. Our study expands our understanding on the regulatory functions of m6A methylation in cardiomyocytes and how cardiac function is controlled by the m6A reader protein Ythdf2.
- Vromman, M., Anckaert, J., Bortoluzzi, S., Buratin, A., Chen, C. Y., Chu, Q., Chuang, T. J., Dehghannasiri, R., Dieterich, C., Dong, X., Flicek, P., Gaffo, E., Gu, W., He, C., Hoffmann, S., Izuogu, O., Jackson, M. S., Jakobi, T., Lai, E. C., , Nuytens, J., et al. (2023). Large-scale benchmarking of circRNA detection tools reveals large differences in sensitivity but not in precision. Nature methods, 20(8), 1159-1169.More infoThe detection of circular RNA molecules (circRNAs) is typically based on short-read RNA sequencing data processed using computational tools. Numerous such tools have been developed, but a systematic comparison with orthogonal validation is missing. Here, we set up a circRNA detection tool benchmarking study, in which 16 tools detected more than 315,000 unique circRNAs in three deeply sequenced human cell types. Next, 1,516 predicted circRNAs were validated using three orthogonal methods. Generally, tool-specific precision is high and similar (median of 98.8%, 96.3% and 95.5% for qPCR, RNase R and amplicon sequencing, respectively) whereas the sensitivity and number of predicted circRNAs (ranging from 1,372 to 58,032) are the most significant differentiators. Of note, precision values are lower when evaluating low-abundance circRNAs. We also show that the tools can be used complementarily to increase detection sensitivity. Finally, we offer recommendations for future circRNA detection and validation.
- Jakobi, T., Groß, J., Cyganek, L., & Doroudgar, S. (2022). Transcriptional Effects of Candidate COVID-19 Treatments on Cardiac Myocytes. Frontiers in cardiovascular medicine, 9, 844441.More infoSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disease (COVID-19) has emerged as a major cause of morbidity and mortality worldwide, placing unprecedented pressure on healthcare. Cardiomyopathy is described in patients with severe COVID-19 and increasing evidence suggests that cardiovascular involvement portends a high mortality. To facilitate fast development of antiviral interventions, drugs initially developed to treat other diseases are currently being repurposed as COVID-19 treatments. While it has been shown that SARS-CoV-2 invades cells through the angiotensin-converting enzyme 2 receptor (ACE2), the effect of drugs currently repurposed to treat COVID-19 on the heart requires further investigation.
- Jakobi, T., Voelkers, M., Frey, N., Gorska, A., Stroh, C., Juergensen, L., Kamuf-Schenk, V., Loewenthal, Z., Gupta, P., Marx, A., Konstandin, M. H., Hartl, S., Varma, E., Oelschlaeger, J., & Kmietczyk, V. (2022). Ythdf2 regulates cardiac remodeling through its m6A-mRNA target transcripts. Cold Spring Harbor Laboratory - bioRxiv. doi:10.1101/2022.12.16.520765More infoAbstract m 6 A mRNA methylation controls cardiomyocyte function and increased overall m 6 A levels are a stereotyping finding in heart failure independent of the underlying etiology. However, it is largely unknown how the information is read by m 6 A reader proteins in heart failure. Here we show that the m 6 A reader protein Ythdf2 controls cardiac function and identified a novel mechanism how reader proteins control gene expression and cardiac function. Deletion of Ythdf2 in cardiomyocytes in vivo leads to cardiac hypertrophy, reduced heart function, and increased fibrosis during pressure overload as well as during aging. Similarly, in vitro the knockdown of Ythdf2 results in cardiomyocyte growth and remodeling. Mechanistically, we identified the eucaryotic elongation factor 2 as a major target of Ythdf2 using cell type specific Ribo-seq data. Our study expands our understanding on the regulatory functions of m 6 A methylation in cardiomyocytes and how cardiac function is controlled by the m 6 A reader protein Ythdf2.
- Wagner, J. U., Bojkova, D., Shumliakivska, M., Luxán, G., Nicin, L., Aslan, G. S., Milting, H., Kandler, J. D., Dendorfer, A., Heumueller, A. W., Fleming, I., Bibli, S. I., Jakobi, T., Dieterich, C., Zeiher, A. M., Ciesek, S., Cinatl, J., & Dimmeler, S. (2021). Increased susceptibility of human endothelial cells to infections by SARS-CoV-2 variants. Basic research in cardiology, 116(1), 42.More infoCoronavirus disease 2019 (COVID-19) spawned a global health crisis in late 2019 and is caused by the novel coronavirus SARS-CoV-2. SARS-CoV-2 infection can lead to elevated markers of endothelial dysfunction associated with higher risk of mortality. It is unclear whether endothelial dysfunction is caused by direct infection of endothelial cells or is mainly secondary to inflammation. Here, we investigate whether different types of endothelial cells are susceptible to SARS-CoV-2. Human endothelial cells from different vascular beds including umbilical vein endothelial cells, coronary artery endothelial cells (HCAEC), cardiac and lung microvascular endothelial cells, or pulmonary arterial cells were inoculated in vitro with SARS-CoV-2. Viral spike protein was only detected in HCAECs after SARS-CoV-2 infection but not in the other endothelial cells tested. Consistently, only HCAEC expressed the SARS-CoV-2 receptor angiotensin-converting enzyme 2 (ACE2), required for virus infection. Infection with the SARS-CoV-2 variants B.1.1.7, B.1.351, and P.2 resulted in significantly higher levels of viral spike protein. Despite this, no intracellular double-stranded viral RNA was detected and the supernatant did not contain infectious virus. Analysis of the cellular distribution of the spike protein revealed that it co-localized with endosomal calnexin. SARS-CoV-2 infection did induce the ER stress gene EDEM1, which is responsible for clearance of misfolded proteins from the ER. Whereas the wild type of SARS-CoV-2 did not induce cytotoxic or pro-inflammatory effects, the variant B.1.1.7 reduced the HCAEC cell number. Of the different tested endothelial cells, HCAECs showed highest viral uptake but did not promote virus replication. Effects on cell number were only observed after infection with the variant B.1.1.7, suggesting that endothelial protection may be particularly important in patients infected with this variant.
- Blackwood, E. A., Thuerauf, D. J., Stastna, M., Stephens, H., Sand, Z., Pentoney, A., Azizi, K., Jakobi, T., Van Eyk, J. E., Katus, H. A., Glembotski, C. C., & Doroudgar, S. (2020). Proteomic analysis of the cardiac myocyte secretome reveals extracellular protective functions for the ER stress response. Journal of molecular and cellular cardiology, 143, 132-144.More infoThe effects of ER stress on protein secretion by cardiac myocytes are not well understood. In this study, the ER stressor thapsigargin (TG), which depletes ER calcium, induced death of cultured neonatal rat ventricular myocytes (NRVMs) in high media volume but fostered protection in low media volume. In contrast, another ER stressor, tunicamycin (TM), a protein glycosylation inhibitor, induced NRVM death in all media volumes, suggesting that protective proteins were secreted in response to TG but not TM. Proteomic analyses of TG- and TM-conditioned media showed that the secretion of most proteins was inhibited by TG and TM; however, secretion of several ER-resident proteins, including GRP78 was increased by TG but not TM. Simulated ischemia, which decreases ER/SR calcium also increased secretion of these proteins. Mechanistically, secreted GRP78 was shown to enhance survival of NRVMs by collaborating with a cell-surface protein, CRIPTO, to activate protective AKT signaling and to inhibit death-promoting SMAD2 signaling. Thus, proteins secreted during ER stress mediated by ER calcium depletion can enhance cardiac myocyte viability.
- Jakobi, T., Siede, D., Eschenbach, J., Heumüller, A. W., Busch, M., Nietsch, R., Meder, B., Most, P., Dimmeler, S., Backs, J., Katus, H. A., & Dieterich, C. (2020). Deep Characterization of Circular RNAs from Human Cardiovascular Cell Models and Cardiac Tissue. Cells, 9(7).More infoFor decades, cardiovascular disease (CVD) has been the leading cause of death throughout most developed countries. Several studies relate RNA splicing, and more recently also circular RNAs (circRNAs), to CVD. CircRNAs originate from linear transcripts and have been shown to exhibit tissue-specific expression profiles. Here, we present an in-depth analysis of sequence, structure, modification, and cardiac circRNA interactions. We used human induced pluripotent stem cell-derived cardiac myocytes (hiPSC-CMs), human healthy and diseased (ischemic cardiomyopathy, dilated cardiomyopathy) cardiac tissue, and human umbilical vein endothelial cells (HUVECs) to profile circRNAs. We identified shared circRNAs across all samples, as well as model-specific circRNA signatures. Based on these circRNAs, we identified 63 positionally conserved and expressed circRNAs in human, pig, and mouse hearts. Furthermore, we found that the sequence of circRNAs can deviate from the sequence derived from the genome sequence, an important factor in assessing potential functions. Integration of additional data yielded evidence for mA-methylation of circRNAs, potentially linked to translation, as well as, circRNAs overlapping with potential Argonaute 2 binding sites, indicating potential association with the RISC complex. Moreover, we describe, for the first time in cardiac model systems, a sub class of circRNAs containing the start codon of their primary transcript (AUG circRNAs) and observe an enrichment for mA-methylation for AUG circRNAs.
- Kapoor, U., Licht, K., Amman, F., Jakobi, T., Martin, D., Dieterich, C., & Jantsch, M. F. (2020). ADAR-deficiency perturbs the global splicing landscape in mouse tissues. Genome research, 30(8), 1107-1118.More infoAdenosine-to-inosine RNA editing and pre-mRNA splicing largely occur cotranscriptionally and influence each other. Here, we use mice deficient in either one of the two editing enzymes ADAR (ADAR1) or ADARB1 (ADAR2) to determine the transcriptome-wide impact of RNA editing on splicing across different tissues. We find that ADAR has a 100× higher impact on splicing than ADARB1, although both enzymes target a similar number of substrates with a large common overlap. Consistently, differentially spliced regions frequently harbor ADAR editing sites. Moreover, catalytically dead ADAR also impacts splicing, demonstrating that RNA binding of ADAR affects splicing. In contrast, ADARB1 editing sites are found enriched 5' of differentially spliced regions. Several of these ADARB1-mediated editing events change splice consensus sequences, therefore strongly influencing splicing of some mRNAs. A significant overlap between differentially edited and differentially spliced sites suggests evolutionary selection toward splicing being regulated by editing in a tissue-specific manner.
- Bischof, L., Haurat, M., Hoffmann, L., Albersmeier, A., Wolf, J., Neu, A., Pham, T., Albaum, S., Jakobi, T., Schouten, S., Neumann-Schaal, M., Wright, P., Kalinowski, J., Siebers, B., & Albers, S. (2019). Early response of Sulfolobus acidocaldarius to nutrient limitation. Front. Microbiol., 10(JAN). doi:10.3389/fmicb.2018.03201More infoIn natural environments microorganisms encounter extreme changes in temperature, pH, osmolarities and nutrient availability. The stress response of many bacterial species has been described in detail, however, knowledge in Archaea is limited. Here, we describe the cellular response triggered by nutrient limitation in the thermoacidophilic crenarchaeon Sulfolobus acidocaldarius. We measured changes in gene transcription and protein abundance upon nutrient depletion up to 4 h after initiation of nutrient depletion. Transcript levels of 1118 of 2223 protein coding genes and abundance of approximately 500 proteins with functions in almost all cellular processes were affected by nutrient depletion. Our study reveals a significant rerouting of the metabolism with respect to degradation of internal as well as extracellular-bound organic carbon and degradation of proteins. Moreover, changes in membrane lipid composition were observed in order to access alternative sources of energy and to maintain pH homeostasis. At transcript level, the cellular response to nutrient depletion in S. acidocaldarius seems to be controlled by the general transcription factors TFB2 and TFEβ. In addition, ribosome biogenesis is reduced, while an increased protein degradation is accompanied with a loss of protein quality control. This study provides first insights into the early cellular response of Sulfolobus to organic carbon and organic nitrogen depletion.
- Blackwood, E. A., Hofmann, C., Santo Domingo, M., Bilal, A. S., Sarakki, A., Stauffer, W., Arrieta, A., Thuerauf, D. J., Kolkhorst, F. W., Müller, O. J., Jakobi, T., Dieterich, C., Katus, H. A., Doroudgar, S., & Glembotski, C. C. (2019). ATF6 Regulates Cardiac Hypertrophy by Transcriptional Induction of the mTORC1 Activator, Rheb. Circulation research, 124(1), 79-93.More infoEndoplasmic reticulum (ER) stress dysregulates ER proteostasis, which activates the transcription factor, ATF6 (activating transcription factor 6α), an inducer of genes that enhance protein folding and restore ER proteostasis. Because of increased protein synthesis, it is possible that protein folding and ER proteostasis are challenged during cardiac myocyte growth. However, it is not known whether ATF6 is activated, and if so, what its function is during hypertrophic growth of cardiac myocytes.
- Doroudgar, S., Hofmann, C., Boileau, E., Malone, B., Riechert, E., Gorska, A. A., Jakobi, T., Sandmann, C., Jürgensen, L., Kmietczyk, V., Malovrh, E., Burghaus, J., Rettel, M., Stein, F., Younesi, F., Friedrich, U. A., Mauz, V., Backs, J., Kramer, G., , Katus, H. A., et al. (2019). Monitoring Cell-Type-Specific Gene Expression Using Ribosome Profiling In Vivo During Cardiac Hemodynamic Stress. Circulation research, 125(4), 431-448.More infoGene expression profiles have been mainly determined by analysis of transcript abundance. However, these analyses cannot capture posttranscriptional gene expression control at the level of translation, which is a key step in the regulation of gene expression, as evidenced by the fact that transcript levels often poorly correlate with protein levels. Furthermore, genome-wide transcript profiling of distinct cell types is challenging due to the fact that lysates from tissues always represent a mixture of cells.
- Jakobi, T., & Dieterich, C. (2019). Computational approaches for circular RNA analysis. Wiley interdisciplinary reviews. RNA, 10(3), e1528.More infoCircular RNAs (circRNAs) are a recent addition to the expanding universe of RNA species and originate through back-splicing events from linear primary transcripts. CircRNAs show specific expression profiles with regards to cell type and developmental stage. Importantly, only few circRNAs have been functionally characterized to date. The detection of circRNAs from RNA sequencing data is a complex computational workflow that, depending on tissue and condition typically yields candidate sets of hundreds or thousands of circRNA candidates. Here, we provide an overview on different computational analysis tools and pipelines that became available throughout the last years. We outline technical and experimental requirements that are common to all approaches and point out potential pitfalls during the computational analysis. Although computational prediction of circRNAs has become quite mature in recent years, we provide a set of valuable validation strategies, in silico as well as in vitro-based approaches. In addition to circRNA detection via back-splicing junction, we present available analysis pipelines for delineating the primary sequence and for predicting possible functions of circRNAs. Finally, we outline the most important web resources for circRNA research. This article is categorized under: RNA Methods > RNA Analyses in vitro and In Silico RNA Evolution and Genomics > Computational Analyses of RNA.
- Jakobi, T., Uvarovskii, A., & Dieterich, C. (2019). circtools-a one-stop software solution for circular RNA research. Bioinformatics (Oxford, England), 35(13), 2326-2328.More infoCircular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, are not polyadenylated and have been shown to be highly specific for cell type and developmental stage. CircRNA detection starts from high-throughput sequencing data and is a multi-stage bioinformatics process yielding sets of potential circRNA candidates that require further analyses. While a number of tools for the prediction process already exist, publicly available analysis tools for further characterization are rare. Our work provides researchers with a harmonized workflow that covers different stages of in silico circRNA analyses, from prediction to first functional insights.
- Bischof, L. F., Haurat, M. F., Hoffmann, L., Albersmeier, A., Wolf, J., Neu, A., Pham, T. K., Albaum, S. P., Jakobi, T., Schouten, S., Neumann-Schaal, M., Wright, P. C., Kalinowski, J., Siebers, B., & Albers, S. V. (2018). Early Response of to Nutrient Limitation. Frontiers in microbiology, 9, 3201.More infoIn natural environments microorganisms encounter extreme changes in temperature, pH, osmolarities and nutrient availability. The stress response of many bacterial species has been described in detail, however, knowledge in Archaea is limited. Here, we describe the cellular response triggered by nutrient limitation in the thermoacidophilic crenarchaeon . We measured changes in gene transcription and protein abundance upon nutrient depletion up to 4 h after initiation of nutrient depletion. Transcript levels of 1118 of 2223 protein coding genes and abundance of approximately 500 proteins with functions in almost all cellular processes were affected by nutrient depletion. Our study reveals a significant rerouting of the metabolism with respect to degradation of internal as well as extracellular-bound organic carbon and degradation of proteins. Moreover, changes in membrane lipid composition were observed in order to access alternative sources of energy and to maintain pH homeostasis. At transcript level, the cellular response to nutrient depletion in seems to be controlled by the general transcription factors TFB2 and TFEβ. In addition, ribosome biogenesis is reduced, while an increased protein degradation is accompanied with a loss of protein quality control. This study provides first insights into the early cellular response of to organic carbon and organic nitrogen depletion.
- Jakobi, T., Czaja-Hasse, L. F., Reinhardt, R., & Dieterich, C. (2016). Profiling and Validation of the Circular RNA Repertoire in Adult Murine Hearts. Genomics, proteomics & bioinformatics, 14(4), 216-23.More infoFor several decades, cardiovascular disease has been the leading cause of death throughout all countries. There is a strong genetic component to many disease subtypes (e.g., cardiomyopathy) and we are just beginning to understand the relevant genetic factors. Several studies have related RNA splicing to cardiovascular disease and circular RNAs (circRNAs) are an emerging player. circRNAs, which originate through back-splicing events from primary transcripts, are resistant to exonucleases and typically not polyadenylated. Initial functional studies show clear phenotypic outcomes for selected circRNAs. We provide, for the first time, a comprehensive catalogue of RNase R-resistant circRNA species for the adult murine heart. This work combines state-of-the-art circle sequencing with our novel DCC software to explore the circRNA landscape of heart tissue. Overall, we identified 575 circRNA species that pass a beta-binomial test for enrichment (false discovery rate of 1%) in the exonuclease-treated sequencing sample. Several circRNAs can be directly attributed to host genes that have been previously described as associated with cardiovascular disease. Further studies of these candidate circRNAs may reveal disease-relevant properties or functions of specific circRNAs.
- Langenkämper, D., Jakobi, T., Feld, D., Jelonek, L., Goesmann, A., & Nattkemper, T. W. (2016). Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations. Frontiers in genetics, 7, 5.More infoWithin the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
- Morgner, J., Ghatak, S., Jakobi, T., Dieterich, C., Aumailley, M., & Wickström, S. A. (2015). Integrin-linked kinase regulates the niche of quiescent epidermal stem cells. Nature communications, 6, 8198.More infoStem cells reside in specialized niches that are critical for their function. Quiescent hair follicle stem cells (HFSCs) are confined within the bulge niche, but how the molecular composition of the niche regulates stem cell behaviour is poorly understood. Here we show that integrin-linked kinase (ILK) is a key regulator of the bulge extracellular matrix microenvironment, thereby governing the activation and maintenance of HFSCs. ILK mediates deposition of inverse laminin (LN)-332 and LN-511 gradients within the basement membrane (BM) wrapping the hair follicles. The precise BM composition tunes activities of Wnt and transforming growth factor-β pathways and subsequently regulates HFSC activation. Notably, reconstituting an optimal LN microenvironment restores the altered signalling in ILK-deficient cells. Aberrant stem cell activation in ILK-deficient epidermis leads to increased replicative stress, predisposing the tissue to carcinogenesis. Overall, our findings uncover a critical role for the BM niche in regulating stem cell activation and thereby skin homeostasis.
- Jakobi, T., Brinkrolf, K., Tauch, A., Noll, T., Stoye, J., Pühler, A., & Goesmann, A. (2014). Discovery of transcription start sites in the Chinese hamster genome by next-generation RNA sequencing. Journal of biotechnology, 190, 64-75.More infoChinese hamster ovary (CHO) cell lines are one of the major production tools for monoclonal antibodies, recombinant proteins, and therapeutics. Although many efforts have significantly improved the availability of sequence information for CHO cells in the last years, forthcoming draft genomes still lack the information depth known from the mouse or human genomes. Many genes annotated for CHO cells and the Chinese hamster reference genome still are in silico predictions, only insufficiently verified by biological experiments. The correct annotation of transcription start sites (TSSs) is of special interest for CHO cells, as these directly define the location of the eukaryotic core promoter. Our study aims to elucidate these largely unexplored regions, trying to shed light on promoter landscapes in the Chinese hamster genome. Based on a 5' enriched dual library RNA sequencing approach 6547 TSSs were identified, of which over 90% were assigned to known genes. These TSSs were used to perform extensive promoter studies using a novel, modular bioinformatics pipeline, incorporating analyses of important regulatory elements of the eukaryotic core promoter on per-gene level and on genomic scale.
- Cox, A. J., Bauer, M. J., Jakobi, T., & Rosone, G. (2012). Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform. Bioinformatics (Oxford, England), 28(11), 1415-9.More infoThe Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets.
- Hackl, M., Jadhav, V., Jakobi, T., Rupp, O., Brinkrolf, K., Goesmann, A., Pühler, A., Noll, T., Borth, N., & Grillari, J. (2012). Computational identification of microRNA gene loci and precursor microRNA sequences in CHO cell lines. Journal of biotechnology, 158(3), 151-5.More infoMicroRNAs (miRNAs) have recently entered Chinese hamster ovary (CHO) cell culture technology, due to their severe impact on the regulation of cellular phenotypes. Applications of miRNAs that are envisioned range from biomarkers of favorable phenotypes to cell engineering targets. These applications, however, require a profound knowledge of miRNA sequences and their genomic organization, which exceeds the currently available information of ~400 conserved mature CHO miRNA sequences. Based on these recently published sequences and two independent CHO-K1 genome assemblies, this publication describes the computational identification of CHO miRNA genomic loci. Using BLAST alignment, 415 previously reported CHO miRNAs were mapped to the reference genomes, and subsequently assigned to a distinct genomic miRNA locus. Sequences of the respective precursor-miRNAs were extracted from both reference genomes, folded in silico to verify correct structures and cross-compared. In the end, 212 genomic loci and pre-miRNA sequences representing 319 expressed mature miRNAs (approximately 50% of miRNAs represented matching pairs of 5' and 3' miRNAs) were submitted to the miRBase miRNA repository. As a proof-of-principle for the usability of the published genomic loci, four likely polycistronic miRNA cluster were chosen for PCR amplification using CHO-K1 and DHFR (-) genomic DNA. Overall, these data on the genomic context of miRNA expression in CHO will simplify the development of tools employing stable overexpression or deletion of miRNAs, allow the identification of miRNA promoters and improve detection methods such as microarrays.
- Becker, J., Hackl, M., Rupp, O., Jakobi, T., Schneider, J., Szczepanowski, R., Bekel, T., Borth, N., Goesmann, A., Grillari, J., Kaltschmidt, C., Noll, T., Pühler, A., Tauch, A., & Brinkrolf, K. (2011). Unraveling the Chinese hamster ovary cell line transcriptome by next-generation sequencing. Journal of biotechnology, 156(3), 227-35.More infoThe pyrosequencing technology from 454 Life Sciences and a novel assembly approach for cDNA sequences with the Newbler Assembler were used to achieve a major step forward to unravel the transcriptome of Chinese hamster ovary (CHO) cells. Normalized cDNA libraries originating from several cell lines and diverse culture conditions were sequenced and the resulting 1.84 million reads were assembled into 32,801 contiguous sequences, 29,184 isotigs, and 24,576 isogroups. A taxonomic classification of the isotigs showed that more than 70% of the assembled data is most similar to the transcriptome of Mus musculus, with most of the remaining isotigs being homologous to DNA sequences from Rattus norvegicus. Mapping of the CHO cell line contigs to the mouse transcriptome demonstrated that 9124 mouse transcripts, representing 6701 genes, are covered by more than 95% of their sequence length. Metabolic pathways of the central carbohydrate metabolism and biosynthesis routes of sugars used for protein N-glycosylation were reconstructed from the transcriptome data. All relevant genes representing major steps in the N-glycosylation pathway of CHO cells were detected. The present manuscript represents a data set of assembled and annotated genes for CHO cells that can now be used for a detailed analysis of the molecular functioning of CHO cell lines.
- Blom, J., Jakobi, T., Doppmeier, D., Jaenicke, S., Kalinowski, J., Stoye, J., & Goesmann, A. (2011). Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming. Bioinformatics (Oxford, England), 27(10), 1351-8.More infoThe introduction of next-generation sequencing techniques and especially the high-throughput systems Solexa (Illumina Inc.) and SOLiD (ABI) made the mapping of short reads to reference sequences a standard application in modern bioinformatics. Short-read alignment is needed for reference based re-sequencing of complete genomes as well as for gene expression analysis based on transcriptome sequencing. Several approaches were developed during the last years allowing for a fast alignment of short sequences to a given template. Methods available to date use heuristic techniques to gain a speedup of the alignments, thereby missing possible alignment positions. Furthermore, most approaches return only one best hit for every query sequence, thus losing the potentially valuable information of alternative alignment positions with identical scores.
- Hackl, M., Jakobi, T., Blom, J., Doppmeier, D., Brinkrolf, K., Szczepanowski, R., Bernhart, S. H., Höner Zu Siederdissen, C., Bort, J. A., Wieser, M., Kunert, R., Jeffs, S., Hofacker, I. L., Goesmann, A., Pühler, A., Borth, N., & Grillari, J. (2011). Next-generation sequencing of the Chinese hamster ovary microRNA transcriptome: Identification, annotation and profiling of microRNAs as targets for cellular engineering. Journal of biotechnology, 153(1-2), 62-75.More infoChinese hamster ovary (CHO) cells are the predominant cell factory for the production of recombinant therapeutic proteins. Nevertheless, the lack in publicly available sequence information is severely limiting advances in CHO cell biology, including the exploration of microRNAs (miRNA) as tools for CHO cell characterization and engineering. In an effort to identify and annotate both conserved and novel CHO miRNAs in the absence of a Chinese hamster genome, we deep-sequenced small RNA fractions of 6 biotechnologically relevant cell lines and mapped the resulting reads to an artificial reference sequence consisting of all known miRNA hairpins. Read alignment patterns and read count ratios of 5' and 3' mature miRNAs were obtained and used for an independent classification into miR/miR* and 5p/3p miRNA pairs and discrimination of miRNAs from other non-coding RNAs, resulting in the annotation of 387 mature CHO miRNAs. The quantitative content of next-generation sequencing data was analyzed and confirmed using qPCR, to find that miRNAs are markers of cell status. Finally, cDNA sequencing of 26 validated targets of miR-17-92 suggests conserved functions for miRNAs in CHO cells, which together with the now publicly available sequence information sets the stage for developing novel RNAi tools for CHO cell engineering.
- Henckel, K., Runte, K. J., Bekel, T., Dondrup, M., Jakobi, T., Küster, H., & Goesmann, A. (2009). TRUNCATULIX--a data warehouse for the legume community. BMC plant biology, 9, 19.More infoDatabases for either sequence, annotation, or microarray experiments data are extremely beneficial to the research community, as they centrally gather information from experiments performed by different scientists. However, data from different sources develop their full capacities only when combined. The idea of a data warehouse directly adresses this problem and solves it by integrating all required data into one single database - hence there are already many data warehouses available to genetics. For the model legume Medicago truncatula, there is currently no such single data warehouse that integrates all freely available gene sequences, the corresponding gene expression data, and annotation information. Thus, we created the data warehouse TRUNCATULIX, an integrative database of Medicago truncatula sequence and expression data.
Proceedings Publications
- Cox, A., Jakobi, T., Rosone, G., & Schulz-Trieglaff, O. (2012). Comparing DNA sequence collections by direct comparison of compressed text indexes. In ISMB.More infoPopular sequence alignment tools such as BWA convert a reference genome to an indexing data structure based on the Burrows-Wheeler Transform (BWT), from which matches to individual query sequences can be rapidly determined. However the utility of also indexing the query sequences themselves remains relatively unexplored. Here we show that an all-against-all comparison of two sequence collections can be computed from the BWT of each collection with the BWTs held entirely in external memory, i.e. on disk and not in RAM. As an application of this technique, we show that BWTs of transcriptomic and genomic reads can be compared to obtain reference-free predictions of splice junctions that have high overlap with results from more standard reference-based methods. Code to construct and compare the BWT of large genomic data sets is available at http://beetl.github.com/BEETL/ as part of the BEETL library. © 2012 Springer-Verlag.