We provide data analysis and statistics support to other departments and engage in research projects to explore alternative sequencing technologies (e.g., PacBio or Oxford Nanopore sequencing), novel algorithmic approaches, or machine learning for their potential in improving HLA genotyping, the characterization of novel alleles, or enhanced quality assurance.
Long-read sequencing technologies provide data with characteristics and error profiles that deviate from standard Illumina sequencing. At the same time, these technologies deliver read lengths that potentially allow high-throughput HLA genotyping based on the full genomic sequence of the genes. We explore novel procedures and algorithms to achieve robust, accurate, and fast identification of HLA genotypes within the constraints of a high-throughput workflow. Together with the IT team, we maintain a high-performance compute cluster infrastructure, design and build database infrastructures, and production-grade data pipelines that support large-scale and data-intensive genotyping operations.
At present, for only 30% of the known HLA alleles, the complete genomic sequence is known, and the reliable reference-grade characterization of genomic sequences of newly discovered HLA alleles remains technically challenging. To facilitate efforts to improve the quality and comprehensiveness of the international reference sequence databases for HLA alleles, we develop and maintain tools for generating full-length phase-defined haplotype sequences in reference quality (DR2S) and for automatizing the database submission process (TypeLoader).
Together with the Research and Development Department and other partners we also engage in basic research centered around HLA and KIR polymorphisms and diversity and tool development projects aimed at improving future leukemia diagnostics and therapy.