Previous research has shown that older adults are more susceptible to severe injury than their younger counterparts after being involved in a motor vehicle collision. Dr. Hartka was interested in determining whether there are age-related differences in the accuracy of severe injury prediction following a motor vehicle collision. Using R, Research Computing developed age-specific logistic regression models and assessed their accuracy, and generated unique graphs and animations to visualize the data more effectively.
PI: Thomas Hartka
Calcium oscillations signify communication between zona glomerulosa cells of the mouse adrenal gland. Researchers in the Barrett Lab can capture these oscillatory events with calcium imaging, but they had difficulty analyzing the results. The Barrett Lab was in need of a comprehensive MATLAB program for quantitative analysis of the intracellular calcium signals from their cell imaging experiments. Prior to Research Computing’s involvement in the project, the Barrett Lab had been using fragments of code to analyze their data with little success. Research Computing developed a MATLAB application to create an efficient, centralized workflow that is also accessible to people who are new to MATLAB and programming.
The Biocomplexity Institute at the University of Virginia has been at the forefront of epidemiological modeling to track the COVID-19 pandemic and has developed a suite of COVID-19 epidemic response resources including a series of dashboards to better help the public and the government better understand the pandemic. This is a static view of the Institute’s interactive COVID-19 Surveillance Dashboard, which provides a visualization of COVID-19 cases, recoveries, and deaths across the globe. In an effort to support the planning and response efforts for the recent Coronavirus pandemic, researchers prepared this visualization tool that provides a unique way of examining data curated by different data sources.
Multiphoton FLIM microscopy offers many opportunities to investigate processes in live cells, tissue and animal model systems. For redox measurements, FLIM data is mostly published by cell mean values and intensity-based redox ratios. Our method is based entirely on FLIM parameters generated by 3-detector time domain microscopy capturing autofluorescent signals of NAD(P)H, FAD and novel FLIM-FRET application of Tryptophan and NAD(P)H-a2%/FAD-a1% redox ratio. Furthermore, image data is analyzed in segmented cells thresholded by 2 × 2 pixel Regions of Interest (ROIs) to separate mitochondrial oxidative phosphorylation from cytosolic glycolysis in a prostate cancer cell line. Hundreds of data points allow demonstration of heterogeneity in response to intervention, identity of cell responders to treatment, creating thereby different sub-populations.
LOLAweb The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data resources and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, annotations from external data sources can be easily connected to new genomic data.
SOM Research Computing is working with faculty in the UVA Center for Public Health Genomics to implement LOLAweb, an online tool for performing genomic locus overlap annotations and analyses. This project, written in the statistical programming language R, allows users to specify region set data in BED format for automated enrichment analysis.
Reference genome assemblies are essential for high-throughput sequencing analysis projects. Typically, genome assemblies are stored on disk alongside related resources; e.g., many sequence aligners require the assembly to be indexed. The resulting indexes are broadly applicable for downstream analysis, so it makes sense to share them. However, there is no simple tool to do this.
Refgenie is a reference genome assembly asset manager. Refgenie makes it easier to organize, retrieve, and share genome analysis resources. In addition to genome indexes, refgenie can manage any files related to reference genomes, including sequences and annotation files. Refgenie includes a command line interface and a server application that provides a RESTful API, so it is useful for both tool development and analysis.
Dr. Zhigilei and his team are using Rivanna to perform large-scale atomistic simulations aimed at revealing fundamental processes responsible for the modification of surface morphology and microstructure of metal targets treated by short pulse laser irradiation. The simulations are performed with a highly-optimized parallel computer code capable of reproducing collective dynamics in systems consisting of up to billions of atoms. As a result, the simulations naturally account for the complexity of the material response to the rapid laser energy deposition and provide clear visual representations, or “atomic movies,” of laser-induced dynamic processes. The mechanistic insights revealed in the simulations have an immediate impact on the development of the theoretical understanding of laser-induced processes and assist in optimization of laser processing parameters in current applications based on laser surface modification and nanoparticle generation in laser ablation.
Professor Reidenbach and his team are using Rivanna to run computational fluid dynamics simulations of wave and tide driven flows over coral reefs in order to determine how storms, nutrient inputs, and sediments impact reef health. This is an image of dye fluxing from the surface of the Hawaiian coral Porites compressa utilizing a technique known as planar laser induced fluorescence (PLIF). Reefs such as this one have been severely impacted by human alteration, both locally through additional inputs of sediments and nutrients, and globally through increased sea surface temperatures caused by climate change. Reidenbach is hopeful that his computational models will allow scientists to better predict the future health of reefs based on human activity and improve global reef restoration efforts.
While conducting research for a highly-technical study of market behavior, Dr. Ciliberto realized that he needed to parallelize an integration over a sample distribution. RC staff member Ed Hall successfully parallelized Ciliberto’s Matlab code and taught him how to do production runs on the University’s high-performance clusters. “The second stage estimator was computationally intensive,” Ciliberto recalls. “We needed to compute the distribution of the residuals and unobservables for multiple parameter values and at many different points of the distribution, which requires parallelizing the computation. Ed Hall’s expertise in this area was crucial. In fact, without Ed’s contribution, this project could not have been completed.
Ed Hall worked with the Brodie Lab in the Biology department, to set up a workflow to analyze videos of bug tracking experiments on the Rivanna Linux cluster. They wanted to use the community Matlab software (idTracker) for beetle movement tracking. Their two goals were to shorten the software runtime and to automate the process. There was a large backlog of videos to go through. Ed installed the idTracker software on Rivanna and modified the code to parallelize the bug tracking process. He wrote and documented shell scripts to automate their workflow on the cluster.
PI: Edmund Brodie, PhD (Department of Biology)
Some galaxies have an extremely energetic central region known as an Active Galactic Nucleus. These regions are among the brightest objects in the universe, often outshining all of the stars in their home galaxy combined. In at least some cases, the power source at the center of these extraordinary nuclei is actually a black hole; as gases are drawn toward the black hole, they spiral around it, generating gravitational energy that is converted into heat and electromagnetic waves. A simulation created by Prof. John Hawley (CLAS) with collaborators from Johns Hopkins University reveals this process at an unprecedented level of detail.
A powerful new technique for quantifying regions of the cerebral cortex was developed by Nick Tustison and James Stone at the University of Virginia along with collaborators from the University of Pennsylvania. It was evaluated using large data sets comprised of magnetic resonance imaging (MRI) of the human brain processed on a high-performance computing cluster at the University of Virginia. By making this technique available as open-source software, other neuroscientists are now able to investigate various hypotheses concerning the relationship between brain structure and development. Tustison’s and Stone’s software has been widely disseminated and is being actively incorporated into a variety of clinical research studies, including a collaborative effort between the Department of Defense and Department of Veterans Affairs, exploring the long term effects of traumatic brain injury (TBI) among military service members.
BART (Binding Analysis for Regulation of Transcription) Web Working with researchers in the Zang Lab in the Center for Public Health Genomics (CPHG), RC helped launch BARTweb, an interactive web-based tool for users to analyze their Genelist or ChIP-seq datasets. BARTweb is a containerized Flask front-end (written in Python) that ingests files and submits them to a more robust Python-based genomics pipeline running on Rivanna, UVA’s high performance computing cluster (HPC). This architecture – of a public web application that uses a supercomputer to process data – is a new model for UVA, and one that eases the learning curve for researchers who may not have access to an HPC system or the expertise to run a BART pipeline in the command-line.
Episodes of bradycardia and oxygen desaturation (BD) are common among preterm very low birthweight (VLBW) infants and their association with adverse outcomes such as bronchopulmonary dysplasia (BPD) is unclear. A better understanding of this relationship could lead to improved clinical interventions.
RC is helping neonatologists describe BD events in a large single-NICU VLBW cohort and test the hypothesis that measures of BD in the neonatal period add to clinical variables to predict BPD or death and other adverse outcomes. RC has implemented statistical modeling and machine learning techniques to assess the primary outcome of BPD in the context of a combination of clinical characteristics (like birthweight and gestational age) and bedside monitor features.
Coronary artery disease (CAD) is the major cause of morbidity and mortality worldwide. Recent genome wide association studies (GWAS) have revealed more than 50 genomic loci that are associated with increased risk for CAD. However, the pathological mechanisms for majority of the GWAS loci leading to increased susceptibility to this complex disorder are still unclear. Many of the CAD loci appear to act through the vessel wall, presumably affecting smooth muscle cell (SMC) function.
UVA Research Computing (RC) is working with Redouane Aherrahrou from the Center for Public Health Genomics who aims to study the impact of the CAD-associated genetic factors on the cellular and molecular SMC phenotypes, as well as the underlying biological pathways that are perturbed by these genetic factors.
In their research around constant glucose monitoring and the automated maintenance of insulin for patients, the CDT is exploring data drawn from external data sources such as DexCom and FitBit. RC has assisted the CDT by designing a secure computing footprint in Amazon Web Services to pull in these data, parse and process them, in order to perform deeper analytics through machine learning. In January 2018, CDT sponsored a ski camp at Wintergreen Resort for a group of youth diagnosed with Type I diabetes with the goal of importing glucose, insulin, and exercise metrics at the end of each day through remote web APIs.
RC is working with researchers in the Center for Public Health Genomics to write an R package to calculate Relative Proportion of Sites with Intermediate Methylation (RPIM) scores, which represent the epigenetic heterogeneity in a bisulfite sequencing sample.
PI: Nathan Sheffield (Center for Public Health Genomics)
Functional magnetic resonance imaging (fMRI) can be used to assess functional activity in the brain and connectivity between different regions of interest (ROIs), and a functional connectome is a map of the interactions between ROIs. Previous research has shown that a functional connectome contains enough unique characteristics, not unlike a fingerprint, that it can be used for accurate identification of an individual subject from a large group. RC is working with the UVA Functional Neuroradiology Lab to perform this fingerprinting analysis for a wide variety of populations and to develop innovative ways to visualize the results.
PI: Jason Druzgal (Radiology and Medical Imaging)
There are limited evidence-based published heart rate ranges for premature neonates. However, knowing heart rate reference ranges in the premature neonatal population can be beneficial for bedside assessment in the Neonatal Intensive Care Unit (NICU).
RC is collaborating with clinical researchers in the Department of Pediatrics to establish baseline ranges for heart rate data in premature infants. These results are summarized from more than two billion data points collected via bedside monitoring in the NICU. RC staff has contributed data analysis and visualization expertise to aggregate the data, generate interactive heatmaps and produce tables of these ranges by gestational age.
Sink drains are notoriously characterized as reservoirs of pathogens causing nosocomial transmissions in hospitals worldwide. Outbreaks where sinks have been implicated as source of antibiotic resistant bacteria have upsurged over the last few years. To understand transmission dynamics University of Virginia School of Medicine has established a unique “Sink Lab” for this research. This one-of-the kind laboratory establishes UVa as worldwide frontrunners in investigating sink related antibiotic resistant bacteria and how they spread. RC is working with the UVa Sink Lab for genomic analysis of the sink biomass.
RC is contributing to:
Comparative genomic analysis of gram-negative bacterial isolates: The analysis aims at tracking the mobile genetic element blaKPC gene, which encodes for Klebsiella pneumoniae carbapenemase (KPC) enzyme that confers resistance to all beta lactam agents including penicillins, cephalosporins, monobactams and carbapenems.
Coronary artery disease (CAD) is the major cause of morbidity and mortality worldwide. Recent genome wide association studies (GWAS) have revealed more than 50 genomic loci that are associated with increased risk for CAD. However, the pathological mechanisms for majority of the GWAS loci leading to increased susceptibility to this complex disorder are still unclear. RC is working with Redouane Aherrahrou (CPHG) who aims to study the impact of the CAD-associated genetic factors on the cellular and molecular SMC phenotypes. Support for this project has included preparation of scripts for programmatic data analyses, data visualization, statistical modeling, and assistance with use of the Rivanna high-performance computing cluster.
Before patients are admitted to the emergency room, they are assigned a triage level based on the severity of their health problems. This is accomplished using the Emergency Severity Index (ESI), an emergency department triage algorithm that classifies patient cases into five different levels of urgency. Researchers are interested in using machine learning to develop a model to predict patient triage level. This model would not only analyze the typical vital signs that are used in the ESI, but also demographic data and patients’ history of health.
Demographic and health data have been collected. RC is helping to prepare and normalize the data for use in a machine learning model.
RC is working with Dr. Eric Schneider to create a secure computing environment for the research of the Healthcare Surgical Outcome team. Data from this project will contain HIPAA identifiers, as well as Medicare information, and requires more security and control of data ingress/egress than projects previously hosted on the Ivy platform. After successful implementation of this project, RC will create a similar computing environment for DoD blast and traumatic brain injury data collected by Dr. Schneider before he joined UVA.
PI: Eric Schneider (Department of Surgery)
In partnership with researchers in the Center for Public Health Genomics, School of Medicine Research Computing has contributed to the development of a novel package for computationally efficient caching and loading of data in R. simpleCache provides an interface to a series of functions to store and retrieve cached objects, including in the context batch processing or HPC environments. The package further extends base R functionality of saving and loading external representations of objects by enabling caching to pre-defined directories and timed cache operations.
RC helped document and develop new functions for the package ahead of its release to the Comprehensive R Archive Network (CRAN).
Researchers are using sonomicrometry to study the biomechanics of the human brain. While at times the signals collected do not require any preprocessing, more frequently they do require some denoising or are too noisy to analyze. Currently, researchers are manually categorizing the quality of thousands of these sonomicrometry signals and preprocessing them individually. RC is helping researchers develop a machine learning model to classify the signals and to determine the necessary preprocessing steps.
Preliminary sonomicrometry data have been collected, and RC is working to classify, prepare, and normalize the data for use in a machine learning model. RC is currently developing preliminary models to classify the data by signal quality and preprocess automation techniques that will later be applied to noisy signals.
Two important measures of the in vivo interaction of transcription factors with chromatin are the search time and the residence time. The former refers to the time it takes a factor to find its binding location, while the latter is the time the factor physically attaches to the chromatin. By quantifying the interaction dynamics of transcription factors, researchers hope to understand the role of these factors in basic cellular processes such as transcription and gene regulation. The RC team is working with collaborators from UVA and the NIH to understand the dynamics of the Gal4 protein in yeast. The project involves quantitatively analyzing ChIP-qPCR data, writing and running non-linear regression and statistical routines in Mathematica, and developing numerical simulations to determine the error bounds on the kinetic parameters.