LOLAweb The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data resources and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, annotations from external data sources can be easily connected to new genomic data.
SOM Research Computing is working with faculty in the UVA Center for Public Health Genomics to implement LOLAweb, an online tool for performing genomic locus overlap annotations and analyses. This project, written in the statistical programming language R, allows users to specify region set data in BED format for automated enrichment analysis.
Reference genome assemblies are essential for high-throughput sequencing analysis projects. Typically, genome assemblies are stored on disk alongside related resources; e.g., many sequence aligners require the assembly to be indexed. The resulting indexes are broadly applicable for downstream analysis, so it makes sense to share them. However, there is no simple tool to do this.
Refgenie is a reference genome assembly asset manager. Refgenie makes it easier to organize, retrieve, and share genome analysis resources. In addition to genome indexes, refgenie can manage any files related to reference genomes, including sequences and annotation files. Refgenie includes a command line interface and a server application that provides a RESTful API, so it is useful for both tool development and analysis.
BART (Binding Analysis for Regulation of Transcription) Web Working with researchers in the Zang Lab in the Center for Public Health Genomics (CPHG), RC helped launch BARTweb, an interactive web-based tool for users to analyze their Genelist or ChIP-seq datasets. BARTweb is a containerized Flask front-end (written in Python) that ingests files and submits them to a more robust Python-based genomics pipeline running on Rivanna, UVA’s high performance computing cluster (HPC). This architecture – of a public web application that uses a supercomputer to process data – is a new model for UVA, and one that eases the learning curve for researchers who may not have access to an HPC system or the expertise to run a BART pipeline in the command-line.
Coronary artery disease (CAD) is the major cause of morbidity and mortality worldwide. Recent genome wide association studies (GWAS) have revealed more than 50 genomic loci that are associated with increased risk for CAD. However, the pathological mechanisms for majority of the GWAS loci leading to increased susceptibility to this complex disorder are still unclear. Many of the CAD loci appear to act through the vessel wall, presumably affecting smooth muscle cell (SMC) function.
UVA Research Computing (RC) is working with Redouane Aherrahrou from the Center for Public Health Genomics who aims to study the impact of the CAD-associated genetic factors on the cellular and molecular SMC phenotypes, as well as the underlying biological pathways that are perturbed by these genetic factors.
RC is working with researchers in the Center for Public Health Genomics to write an R package to calculate Relative Proportion of Sites with Intermediate Methylation (RPIM) scores, which represent the epigenetic heterogeneity in a bisulfite sequencing sample.
PI: Nathan Sheffield (Center for Public Health Genomics)