Tobias Jakobi
- Assistant Professor
- Member of the Graduate Faculty
- (602) 827-2078
- COLLEGE OF MEDICINE PHX
- PHOENIX, AZ 85004-2230
- tjakobi@arizona.edu
Biography
I am a bioinformatician trained with an emphasis on the interconnection of wet lab research and computational research. My academic and research training included eukaryotic biology, genome research, and wet lab training in addition to comprehensive training in theoretical and applied bioinformatics to allow the fluent communication between wet lab and bioinformatics.
I am working as Assistant Professor in the Department of Internal Medicine and in the new Translational Cardiovascular Research Center (TCRC) at The University of Arizona College of Medicine – Phoenix .
My lab focuses on the interplay of different RNA species and their underlying functional networks in the heart.
Additionally, my lab develops new computational open source tools that can be used by other researchers in their field of interest.
Interests
No activities entered.
Courses
2024-25 Courses
-
Introduction to Bioinformatics
CTS 505 (Fall 2024)
2023-24 Courses
-
Introduction to Bioinformatics
CTS 505 (Fall 2023)
Scholarly Contributions
Journals/Publications
- Jakobi, T., Groß, J., Cyganek, L., & Doroudgar, S. (2022). Transcriptional Effects of Candidate COVID-19 Treatments on Cardiac Myocytes. Frontiers in cardiovascular medicine, 9, 844441.More infoSevere acute respiratory syndrome coronavirus 2 (SARS-CoV-2) disease (COVID-19) has emerged as a major cause of morbidity and mortality worldwide, placing unprecedented pressure on healthcare. Cardiomyopathy is described in patients with severe COVID-19 and increasing evidence suggests that cardiovascular involvement portends a high mortality. To facilitate fast development of antiviral interventions, drugs initially developed to treat other diseases are currently being repurposed as COVID-19 treatments. While it has been shown that SARS-CoV-2 invades cells through the angiotensin-converting enzyme 2 receptor (ACE2), the effect of drugs currently repurposed to treat COVID-19 on the heart requires further investigation.
- Jakobi, T., Voelkers, M., Frey, N., Gorska, A., Stroh, C., Juergensen, L., Kamuf-Schenk, V., Loewenthal, Z., Gupta, P., Marx, A., Konstandin, M. H., Hartl, S., Varma, E., Oelschlaeger, J., & Kmietczyk, V. (2022). Ythdf2 regulates cardiac remodeling through its m6A-mRNA target transcripts. Cold Spring Harbor Laboratory - bioRxiv. doi:10.1101/2022.12.16.520765More infoAbstract m 6 A mRNA methylation controls cardiomyocyte function and increased overall m 6 A levels are a stereotyping finding in heart failure independent of the underlying etiology. However, it is largely unknown how the information is read by m 6 A reader proteins in heart failure. Here we show that the m 6 A reader protein Ythdf2 controls cardiac function and identified a novel mechanism how reader proteins control gene expression and cardiac function. Deletion of Ythdf2 in cardiomyocytes in vivo leads to cardiac hypertrophy, reduced heart function, and increased fibrosis during pressure overload as well as during aging. Similarly, in vitro the knockdown of Ythdf2 results in cardiomyocyte growth and remodeling. Mechanistically, we identified the eucaryotic elongation factor 2 as a major target of Ythdf2 using cell type specific Ribo-seq data. Our study expands our understanding on the regulatory functions of m 6 A methylation in cardiomyocytes and how cardiac function is controlled by the m 6 A reader protein Ythdf2.
- Wagner, J. U., Bojkova, D., Shumliakivska, M., Luxán, G., Nicin, L., Aslan, G. S., Milting, H., Kandler, J. D., Dendorfer, A., Heumueller, A. W., Fleming, I., Bibli, S. I., Jakobi, T., Dieterich, C., Zeiher, A. M., Ciesek, S., Cinatl, J., & Dimmeler, S. (2021). Increased susceptibility of human endothelial cells to infections by SARS-CoV-2 variants. Basic research in cardiology, 116(1), 42.More infoCoronavirus disease 2019 (COVID-19) spawned a global health crisis in late 2019 and is caused by the novel coronavirus SARS-CoV-2. SARS-CoV-2 infection can lead to elevated markers of endothelial dysfunction associated with higher risk of mortality. It is unclear whether endothelial dysfunction is caused by direct infection of endothelial cells or is mainly secondary to inflammation. Here, we investigate whether different types of endothelial cells are susceptible to SARS-CoV-2. Human endothelial cells from different vascular beds including umbilical vein endothelial cells, coronary artery endothelial cells (HCAEC), cardiac and lung microvascular endothelial cells, or pulmonary arterial cells were inoculated in vitro with SARS-CoV-2. Viral spike protein was only detected in HCAECs after SARS-CoV-2 infection but not in the other endothelial cells tested. Consistently, only HCAEC expressed the SARS-CoV-2 receptor angiotensin-converting enzyme 2 (ACE2), required for virus infection. Infection with the SARS-CoV-2 variants B.1.1.7, B.1.351, and P.2 resulted in significantly higher levels of viral spike protein. Despite this, no intracellular double-stranded viral RNA was detected and the supernatant did not contain infectious virus. Analysis of the cellular distribution of the spike protein revealed that it co-localized with endosomal calnexin. SARS-CoV-2 infection did induce the ER stress gene EDEM1, which is responsible for clearance of misfolded proteins from the ER. Whereas the wild type of SARS-CoV-2 did not induce cytotoxic or pro-inflammatory effects, the variant B.1.1.7 reduced the HCAEC cell number. Of the different tested endothelial cells, HCAECs showed highest viral uptake but did not promote virus replication. Effects on cell number were only observed after infection with the variant B.1.1.7, suggesting that endothelial protection may be particularly important in patients infected with this variant.
- Blackwood, E. A., Thuerauf, D. J., Stastna, M., Stephens, H., Sand, Z., Pentoney, A., Azizi, K., Jakobi, T., Van Eyk, J. E., Katus, H. A., Glembotski, C. C., & Doroudgar, S. (2020). Proteomic analysis of the cardiac myocyte secretome reveals extracellular protective functions for the ER stress response. Journal of molecular and cellular cardiology, 143, 132-144.More infoThe effects of ER stress on protein secretion by cardiac myocytes are not well understood. In this study, the ER stressor thapsigargin (TG), which depletes ER calcium, induced death of cultured neonatal rat ventricular myocytes (NRVMs) in high media volume but fostered protection in low media volume. In contrast, another ER stressor, tunicamycin (TM), a protein glycosylation inhibitor, induced NRVM death in all media volumes, suggesting that protective proteins were secreted in response to TG but not TM. Proteomic analyses of TG- and TM-conditioned media showed that the secretion of most proteins was inhibited by TG and TM; however, secretion of several ER-resident proteins, including GRP78 was increased by TG but not TM. Simulated ischemia, which decreases ER/SR calcium also increased secretion of these proteins. Mechanistically, secreted GRP78 was shown to enhance survival of NRVMs by collaborating with a cell-surface protein, CRIPTO, to activate protective AKT signaling and to inhibit death-promoting SMAD2 signaling. Thus, proteins secreted during ER stress mediated by ER calcium depletion can enhance cardiac myocyte viability.
- Jakobi, T., Siede, D., Eschenbach, J., Heumüller, A. W., Busch, M., Nietsch, R., Meder, B., Most, P., Dimmeler, S., Backs, J., Katus, H. A., & Dieterich, C. (2020). Deep Characterization of Circular RNAs from Human Cardiovascular Cell Models and Cardiac Tissue. Cells, 9(7).More infoFor decades, cardiovascular disease (CVD) has been the leading cause of death throughout most developed countries. Several studies relate RNA splicing, and more recently also circular RNAs (circRNAs), to CVD. CircRNAs originate from linear transcripts and have been shown to exhibit tissue-specific expression profiles. Here, we present an in-depth analysis of sequence, structure, modification, and cardiac circRNA interactions. We used human induced pluripotent stem cell-derived cardiac myocytes (hiPSC-CMs), human healthy and diseased (ischemic cardiomyopathy, dilated cardiomyopathy) cardiac tissue, and human umbilical vein endothelial cells (HUVECs) to profile circRNAs. We identified shared circRNAs across all samples, as well as model-specific circRNA signatures. Based on these circRNAs, we identified 63 positionally conserved and expressed circRNAs in human, pig, and mouse hearts. Furthermore, we found that the sequence of circRNAs can deviate from the sequence derived from the genome sequence, an important factor in assessing potential functions. Integration of additional data yielded evidence for mA-methylation of circRNAs, potentially linked to translation, as well as, circRNAs overlapping with potential Argonaute 2 binding sites, indicating potential association with the RISC complex. Moreover, we describe, for the first time in cardiac model systems, a sub class of circRNAs containing the start codon of their primary transcript (AUG circRNAs) and observe an enrichment for mA-methylation for AUG circRNAs.
- Kapoor, U., Licht, K., Amman, F., Jakobi, T., Martin, D., Dieterich, C., & Jantsch, M. F. (2020). ADAR-deficiency perturbs the global splicing landscape in mouse tissues. Genome research, 30(8), 1107-1118.More infoAdenosine-to-inosine RNA editing and pre-mRNA splicing largely occur cotranscriptionally and influence each other. Here, we use mice deficient in either one of the two editing enzymes ADAR (ADAR1) or ADARB1 (ADAR2) to determine the transcriptome-wide impact of RNA editing on splicing across different tissues. We find that ADAR has a 100× higher impact on splicing than ADARB1, although both enzymes target a similar number of substrates with a large common overlap. Consistently, differentially spliced regions frequently harbor ADAR editing sites. Moreover, catalytically dead ADAR also impacts splicing, demonstrating that RNA binding of ADAR affects splicing. In contrast, ADARB1 editing sites are found enriched 5' of differentially spliced regions. Several of these ADARB1-mediated editing events change splice consensus sequences, therefore strongly influencing splicing of some mRNAs. A significant overlap between differentially edited and differentially spliced sites suggests evolutionary selection toward splicing being regulated by editing in a tissue-specific manner.
- Blackwood, E. A., Hofmann, C., Santo Domingo, M., Bilal, A. S., Sarakki, A., Stauffer, W., Arrieta, A., Thuerauf, D. J., Kolkhorst, F. W., Müller, O. J., Jakobi, T., Dieterich, C., Katus, H. A., Doroudgar, S., & Glembotski, C. C. (2019). ATF6 Regulates Cardiac Hypertrophy by Transcriptional Induction of the mTORC1 Activator, Rheb. Circulation research, 124(1), 79-93.More infoEndoplasmic reticulum (ER) stress dysregulates ER proteostasis, which activates the transcription factor, ATF6 (activating transcription factor 6α), an inducer of genes that enhance protein folding and restore ER proteostasis. Because of increased protein synthesis, it is possible that protein folding and ER proteostasis are challenged during cardiac myocyte growth. However, it is not known whether ATF6 is activated, and if so, what its function is during hypertrophic growth of cardiac myocytes.
- Doroudgar, S., Hofmann, C., Boileau, E., Malone, B., Riechert, E., Gorska, A. A., Jakobi, T., Sandmann, C., Jürgensen, L., Kmietczyk, V., Malovrh, E., Burghaus, J., Rettel, M., Stein, F., Younesi, F., Friedrich, U. A., Mauz, V., Backs, J., Kramer, G., , Katus, H. A., et al. (2019). Monitoring Cell-Type-Specific Gene Expression Using Ribosome Profiling In Vivo During Cardiac Hemodynamic Stress. Circulation research, 125(4), 431-448.More infoGene expression profiles have been mainly determined by analysis of transcript abundance. However, these analyses cannot capture posttranscriptional gene expression control at the level of translation, which is a key step in the regulation of gene expression, as evidenced by the fact that transcript levels often poorly correlate with protein levels. Furthermore, genome-wide transcript profiling of distinct cell types is challenging due to the fact that lysates from tissues always represent a mixture of cells.
- Jakobi, T., & Dieterich, C. (2019). Computational approaches for circular RNA analysis. Wiley interdisciplinary reviews. RNA, 10(3), e1528.More infoCircular RNAs (circRNAs) are a recent addition to the expanding universe of RNA species and originate through back-splicing events from linear primary transcripts. CircRNAs show specific expression profiles with regards to cell type and developmental stage. Importantly, only few circRNAs have been functionally characterized to date. The detection of circRNAs from RNA sequencing data is a complex computational workflow that, depending on tissue and condition typically yields candidate sets of hundreds or thousands of circRNA candidates. Here, we provide an overview on different computational analysis tools and pipelines that became available throughout the last years. We outline technical and experimental requirements that are common to all approaches and point out potential pitfalls during the computational analysis. Although computational prediction of circRNAs has become quite mature in recent years, we provide a set of valuable validation strategies, in silico as well as in vitro-based approaches. In addition to circRNA detection via back-splicing junction, we present available analysis pipelines for delineating the primary sequence and for predicting possible functions of circRNAs. Finally, we outline the most important web resources for circRNA research. This article is categorized under: RNA Methods > RNA Analyses in vitro and In Silico RNA Evolution and Genomics > Computational Analyses of RNA.
- Jakobi, T., Uvarovskii, A., & Dieterich, C. (2019). circtools-a one-stop software solution for circular RNA research. Bioinformatics (Oxford, England), 35(13), 2326-2328.More infoCircular RNAs (circRNAs) originate through back-splicing events from linear primary transcripts, are resistant to exonucleases, are not polyadenylated and have been shown to be highly specific for cell type and developmental stage. CircRNA detection starts from high-throughput sequencing data and is a multi-stage bioinformatics process yielding sets of potential circRNA candidates that require further analyses. While a number of tools for the prediction process already exist, publicly available analysis tools for further characterization are rare. Our work provides researchers with a harmonized workflow that covers different stages of in silico circRNA analyses, from prediction to first functional insights.
- Worpenberg, L., Jakobi, T., Dieterich, C., & Roignant, J. Y. (2019). Identification of Methylated Transcripts Using the TRIBE Approach. Methods in molecular biology (Clifton, N.J.), 1870, 89-106.More infomA is the most abundant internal modification on mRNA. Recent improvements of high-throughput sequencing techniques enables its detection at the transcriptome level, even at the nucleotide resolution. However most current techniques require large amounts of starting material to detect the modification. Here, we describe a complementary technique of standard meRIP-seq/miCLIP-seq approaches to identify methylated RNA using a low amount of material. We believe this approach can be applied in vivo to identify methylated targets in specific tissues or subpopulations of cells.
- Bischof, L. F., Haurat, M. F., Hoffmann, L., Albersmeier, A., Wolf, J., Neu, A., Pham, T. K., Albaum, S. P., Jakobi, T., Schouten, S., Neumann-Schaal, M., Wright, P. C., Kalinowski, J., Siebers, B., & Albers, S. V. (2018). Early Response of to Nutrient Limitation. Frontiers in microbiology, 9, 3201.More infoIn natural environments microorganisms encounter extreme changes in temperature, pH, osmolarities and nutrient availability. The stress response of many bacterial species has been described in detail, however, knowledge in Archaea is limited. Here, we describe the cellular response triggered by nutrient limitation in the thermoacidophilic crenarchaeon . We measured changes in gene transcription and protein abundance upon nutrient depletion up to 4 h after initiation of nutrient depletion. Transcript levels of 1118 of 2223 protein coding genes and abundance of approximately 500 proteins with functions in almost all cellular processes were affected by nutrient depletion. Our study reveals a significant rerouting of the metabolism with respect to degradation of internal as well as extracellular-bound organic carbon and degradation of proteins. Moreover, changes in membrane lipid composition were observed in order to access alternative sources of energy and to maintain pH homeostasis. At transcript level, the cellular response to nutrient depletion in seems to be controlled by the general transcription factors TFB2 and TFEβ. In addition, ribosome biogenesis is reduced, while an increased protein degradation is accompanied with a loss of protein quality control. This study provides first insights into the early cellular response of to organic carbon and organic nitrogen depletion.
- Jakobi, T., & Dieterich, C. (2018). Deep Computational Circular RNA Analytics from RNA-seq Data. Methods in molecular biology (Clifton, N.J.), 1724, 9-25.More infoCircular RNAs (circRNAs) have been first described as "scrambled exons" in the 1990s. CircRNAs originate from back splicing or exon skipping of linear RNA templates and have continuously gained attention in recent years due to the availability of high-throughput whole-transcriptome sequencing methods. Numerous manuscripts describe thousands of circRNAs throughout uni- and multicellular eukaryote species and demonstrated that they are conserved, stable, and abundant in specific tissues or conditions. This manuscript provides a walk-through of our bioinformatics toolbox, which covers all aspects of in silico circRNA analysis, starting from raw sequencing data and back-splicing junction discovery to circRNA quantitation and reconstruction of internal the circRNA structure.
- Jakobi, T., Czaja-Hasse, L. F., Reinhardt, R., & Dieterich, C. (2016). Profiling and Validation of the Circular RNA Repertoire in Adult Murine Hearts. Genomics, proteomics & bioinformatics, 14(4), 216-23.More infoFor several decades, cardiovascular disease has been the leading cause of death throughout all countries. There is a strong genetic component to many disease subtypes (e.g., cardiomyopathy) and we are just beginning to understand the relevant genetic factors. Several studies have related RNA splicing to cardiovascular disease and circular RNAs (circRNAs) are an emerging player. circRNAs, which originate through back-splicing events from primary transcripts, are resistant to exonucleases and typically not polyadenylated. Initial functional studies show clear phenotypic outcomes for selected circRNAs. We provide, for the first time, a comprehensive catalogue of RNase R-resistant circRNA species for the adult murine heart. This work combines state-of-the-art circle sequencing with our novel DCC software to explore the circRNA landscape of heart tissue. Overall, we identified 575 circRNA species that pass a beta-binomial test for enrichment (false discovery rate of 1%) in the exonuclease-treated sequencing sample. Several circRNAs can be directly attributed to host genes that have been previously described as associated with cardiovascular disease. Further studies of these candidate circRNAs may reveal disease-relevant properties or functions of specific circRNAs.
- Langenkämper, D., Jakobi, T., Feld, D., Jelonek, L., Goesmann, A., & Nattkemper, T. W. (2016). Comparison of Acceleration Techniques for Selected Low-Level Bioinformatics Operations. Frontiers in genetics, 7, 5.More infoWithin the recent years clock rates of modern processors stagnated while the demand for computing power continued to grow. This applied particularly for the fields of life sciences and bioinformatics, where new technologies keep on creating rapidly growing piles of raw data with increasing speed. The number of cores per processor increased in an attempt to compensate for slight increments of clock rates. This technological shift demands changes in software development, especially in the field of high performance computing where parallelization techniques are gaining in importance due to the pressing issue of large sized datasets generated by e.g., modern genomics. This paper presents an overview of state-of-the-art manual and automatic acceleration techniques and lists some applications employing these in different areas of sequence informatics. Furthermore, we provide examples for automatic acceleration of two use cases to show typical problems and gains of transforming a serial application to a parallel one. The paper should aid the reader in deciding for a certain techniques for the problem at hand. We compare four different state-of-the-art automatic acceleration approaches (OpenMP, PluTo-SICA, PPCG, and OpenACC). Their performance as well as their applicability for selected use cases is discussed. While optimizations targeting the CPU worked better in the complex k-mer use case, optimizers for Graphics Processing Units (GPUs) performed better in the matrix multiplication example. But performance is only superior at a certain problem size due to data migration overhead. We show that automatic code parallelization is feasible with current compiler software and yields significant increases in execution speed. Automatic optimizers for CPU are mature and usually no additional manual adjustment is required. In contrast, some automatic parallelizers targeting GPUs still lack maturity and are limited to simple statements and structures.
- Morgner, J., Ghatak, S., Jakobi, T., Dieterich, C., Aumailley, M., & Wickström, S. A. (2015). Integrin-linked kinase regulates the niche of quiescent epidermal stem cells. Nature communications, 6, 8198.More infoStem cells reside in specialized niches that are critical for their function. Quiescent hair follicle stem cells (HFSCs) are confined within the bulge niche, but how the molecular composition of the niche regulates stem cell behaviour is poorly understood. Here we show that integrin-linked kinase (ILK) is a key regulator of the bulge extracellular matrix microenvironment, thereby governing the activation and maintenance of HFSCs. ILK mediates deposition of inverse laminin (LN)-332 and LN-511 gradients within the basement membrane (BM) wrapping the hair follicles. The precise BM composition tunes activities of Wnt and transforming growth factor-β pathways and subsequently regulates HFSC activation. Notably, reconstituting an optimal LN microenvironment restores the altered signalling in ILK-deficient cells. Aberrant stem cell activation in ILK-deficient epidermis leads to increased replicative stress, predisposing the tissue to carcinogenesis. Overall, our findings uncover a critical role for the BM niche in regulating stem cell activation and thereby skin homeostasis.
- Jakobi, T., Brinkrolf, K., Tauch, A., Noll, T., Stoye, J., Pühler, A., & Goesmann, A. (2014). Discovery of transcription start sites in the Chinese hamster genome by next-generation RNA sequencing. Journal of biotechnology, 190, 64-75.More infoChinese hamster ovary (CHO) cell lines are one of the major production tools for monoclonal antibodies, recombinant proteins, and therapeutics. Although many efforts have significantly improved the availability of sequence information for CHO cells in the last years, forthcoming draft genomes still lack the information depth known from the mouse or human genomes. Many genes annotated for CHO cells and the Chinese hamster reference genome still are in silico predictions, only insufficiently verified by biological experiments. The correct annotation of transcription start sites (TSSs) is of special interest for CHO cells, as these directly define the location of the eukaryotic core promoter. Our study aims to elucidate these largely unexplored regions, trying to shed light on promoter landscapes in the Chinese hamster genome. Based on a 5' enriched dual library RNA sequencing approach 6547 TSSs were identified, of which over 90% were assigned to known genes. These TSSs were used to perform extensive promoter studies using a novel, modular bioinformatics pipeline, incorporating analyses of important regulatory elements of the eukaryotic core promoter on per-gene level and on genomic scale.
- Cox, A. J., Bauer, M. J., Jakobi, T., & Rosone, G. (2012). Large-scale compression of genomic sequence databases with the Burrows-Wheeler transform. Bioinformatics (Oxford, England), 28(11), 1415-9.More infoThe Burrows-Wheeler transform (BWT) is the foundation of many algorithms for compression and indexing of text data, but the cost of computing the BWT of very large string collections has prevented these techniques from being widely applied to the large sets of sequences often encountered as the outcome of DNA sequencing experiments. In previous work, we presented a novel algorithm that allows the BWT of human genome scale data to be computed on very moderate hardware, thus enabling us to investigate the BWT as a tool for the compression of such datasets.
- Hackl, M., Jadhav, V., Jakobi, T., Rupp, O., Brinkrolf, K., Goesmann, A., Pühler, A., Noll, T., Borth, N., & Grillari, J. (2012). Computational identification of microRNA gene loci and precursor microRNA sequences in CHO cell lines. Journal of biotechnology, 158(3), 151-5.More infoMicroRNAs (miRNAs) have recently entered Chinese hamster ovary (CHO) cell culture technology, due to their severe impact on the regulation of cellular phenotypes. Applications of miRNAs that are envisioned range from biomarkers of favorable phenotypes to cell engineering targets. These applications, however, require a profound knowledge of miRNA sequences and their genomic organization, which exceeds the currently available information of ~400 conserved mature CHO miRNA sequences. Based on these recently published sequences and two independent CHO-K1 genome assemblies, this publication describes the computational identification of CHO miRNA genomic loci. Using BLAST alignment, 415 previously reported CHO miRNAs were mapped to the reference genomes, and subsequently assigned to a distinct genomic miRNA locus. Sequences of the respective precursor-miRNAs were extracted from both reference genomes, folded in silico to verify correct structures and cross-compared. In the end, 212 genomic loci and pre-miRNA sequences representing 319 expressed mature miRNAs (approximately 50% of miRNAs represented matching pairs of 5' and 3' miRNAs) were submitted to the miRBase miRNA repository. As a proof-of-principle for the usability of the published genomic loci, four likely polycistronic miRNA cluster were chosen for PCR amplification using CHO-K1 and DHFR (-) genomic DNA. Overall, these data on the genomic context of miRNA expression in CHO will simplify the development of tools employing stable overexpression or deletion of miRNAs, allow the identification of miRNA promoters and improve detection methods such as microarrays.
- Becker, J., Hackl, M., Rupp, O., Jakobi, T., Schneider, J., Szczepanowski, R., Bekel, T., Borth, N., Goesmann, A., Grillari, J., Kaltschmidt, C., Noll, T., Pühler, A., Tauch, A., & Brinkrolf, K. (2011). Unraveling the Chinese hamster ovary cell line transcriptome by next-generation sequencing. Journal of biotechnology, 156(3), 227-35.More infoThe pyrosequencing technology from 454 Life Sciences and a novel assembly approach for cDNA sequences with the Newbler Assembler were used to achieve a major step forward to unravel the transcriptome of Chinese hamster ovary (CHO) cells. Normalized cDNA libraries originating from several cell lines and diverse culture conditions were sequenced and the resulting 1.84 million reads were assembled into 32,801 contiguous sequences, 29,184 isotigs, and 24,576 isogroups. A taxonomic classification of the isotigs showed that more than 70% of the assembled data is most similar to the transcriptome of Mus musculus, with most of the remaining isotigs being homologous to DNA sequences from Rattus norvegicus. Mapping of the CHO cell line contigs to the mouse transcriptome demonstrated that 9124 mouse transcripts, representing 6701 genes, are covered by more than 95% of their sequence length. Metabolic pathways of the central carbohydrate metabolism and biosynthesis routes of sugars used for protein N-glycosylation were reconstructed from the transcriptome data. All relevant genes representing major steps in the N-glycosylation pathway of CHO cells were detected. The present manuscript represents a data set of assembled and annotated genes for CHO cells that can now be used for a detailed analysis of the molecular functioning of CHO cell lines.
- Blom, J., Jakobi, T., Doppmeier, D., Jaenicke, S., Kalinowski, J., Stoye, J., & Goesmann, A. (2011). Exact and complete short-read alignment to microbial genomes using Graphics Processing Unit programming. Bioinformatics (Oxford, England), 27(10), 1351-8.More infoThe introduction of next-generation sequencing techniques and especially the high-throughput systems Solexa (Illumina Inc.) and SOLiD (ABI) made the mapping of short reads to reference sequences a standard application in modern bioinformatics. Short-read alignment is needed for reference based re-sequencing of complete genomes as well as for gene expression analysis based on transcriptome sequencing. Several approaches were developed during the last years allowing for a fast alignment of short sequences to a given template. Methods available to date use heuristic techniques to gain a speedup of the alignments, thereby missing possible alignment positions. Furthermore, most approaches return only one best hit for every query sequence, thus losing the potentially valuable information of alternative alignment positions with identical scores.
- Hackl, M., Jakobi, T., Blom, J., Doppmeier, D., Brinkrolf, K., Szczepanowski, R., Bernhart, S. H., Höner Zu Siederdissen, C., Bort, J. A., Wieser, M., Kunert, R., Jeffs, S., Hofacker, I. L., Goesmann, A., Pühler, A., Borth, N., & Grillari, J. (2011). Next-generation sequencing of the Chinese hamster ovary microRNA transcriptome: Identification, annotation and profiling of microRNAs as targets for cellular engineering. Journal of biotechnology, 153(1-2), 62-75.More infoChinese hamster ovary (CHO) cells are the predominant cell factory for the production of recombinant therapeutic proteins. Nevertheless, the lack in publicly available sequence information is severely limiting advances in CHO cell biology, including the exploration of microRNAs (miRNA) as tools for CHO cell characterization and engineering. In an effort to identify and annotate both conserved and novel CHO miRNAs in the absence of a Chinese hamster genome, we deep-sequenced small RNA fractions of 6 biotechnologically relevant cell lines and mapped the resulting reads to an artificial reference sequence consisting of all known miRNA hairpins. Read alignment patterns and read count ratios of 5' and 3' mature miRNAs were obtained and used for an independent classification into miR/miR* and 5p/3p miRNA pairs and discrimination of miRNAs from other non-coding RNAs, resulting in the annotation of 387 mature CHO miRNAs. The quantitative content of next-generation sequencing data was analyzed and confirmed using qPCR, to find that miRNAs are markers of cell status. Finally, cDNA sequencing of 26 validated targets of miR-17-92 suggests conserved functions for miRNAs in CHO cells, which together with the now publicly available sequence information sets the stage for developing novel RNAi tools for CHO cell engineering.
- Henckel, K., Runte, K. J., Bekel, T., Dondrup, M., Jakobi, T., Küster, H., & Goesmann, A. (2009). TRUNCATULIX--a data warehouse for the legume community. BMC plant biology, 9, 19.More infoDatabases for either sequence, annotation, or microarray experiments data are extremely beneficial to the research community, as they centrally gather information from experiments performed by different scientists. However, data from different sources develop their full capacities only when combined. The idea of a data warehouse directly adresses this problem and solves it by integrating all required data into one single database - hence there are already many data warehouses available to genetics. For the model legume Medicago truncatula, there is currently no such single data warehouse that integrates all freely available gene sequences, the corresponding gene expression data, and annotation information. Thus, we created the data warehouse TRUNCATULIX, an integrative database of Medicago truncatula sequence and expression data.