/tag/r
Workshops
UVA Research Computing provides training opportunities covering a variety of data analysis, basic programming and computational topics. All of the classes listed below are taught by experts and are freely available to UVa faculty, staff and students.
New to High-Performance Computing? We offer orientation sessions to introduce you to the Afton & Rivanna HPC systems on Wednesdays (appointment required).
– Wednesdays 3:00-4:00pm Sign up for an “Intro to HPC” session Upcoming Workshops DATE WORKSHOP INSTRUCTOR Oct 23, 2024
Distributed Deep Learning on HPCMarcus Bobar, Ahmad Sheikhzada Nov 4, 2024
Using Containers on HPCRuoshi Sun Nov 6, 2024
education,
workshops
bioinformatics,
containers,
HPC,
image processing,
Ivy,
Matlab,
programming,
Python,
R,
Rivanna,
Shiny
ACCORD
Welcome to ACCORD (Assuring Controls Compliance of Research Data), a web-based platform which allows researchers from public universities across the state of Virginia to analyze and store their sensitive data in a central location.
ACCORD is appropriate for de-identified PII, FERPA, de-identified HIPAA, business confidential, and other types of de-identified sensitive data
Thanks to funding provided by the National Science Foundation (Award #: 1919667), ACCORD is available at no cost to researchers in the state of Virginia.
Partners Listed below are our partner universities for ACCORD:
Get Started About Learn about ACCORD.
Courses
In addition to providing free, in-person workshop training, UVA Research Computing staff teach for-credit courses. Below is a selection of courses that members of our group have taught, co-taught or provided guest lectures:
BIMS 8382: Introduction to Biomedical Data Science Spring 2017, Spring 2018
This course introduces methods, tools, and software for reproducibly managing, manipulating, analyzing, and visualizing large-scale biomedical data. Specifically, the course introduces the R statistical computing environment and packages for manipulating and visualizing high-dimensional data, covers strategies for reproducible research, and culminates with analysis of data from a real RNA-seq experiment using R and Bioconductor packages.
CS 6501: Distributed & Cloud Computing Spring 2017, Spring 2018
ACCORD Environments
Back to Overview
After creating a project and logging into the ACCORD platform, you will next choose an environment. The environments currently available on ACCORD are listed below. We welcome your suggestions for additional environments to be included in the future.
RStudio RStudio is the standard IDE for research using the R programming language. JupyterLab Jupyter Lab allows for interactive, notebook-based analysis of data. A good choice for pulling quick results or refining your code in numerous languages including Python, R, Julia, bash, and others. Theia Python Theia Python is a rich IDE that allows researchers to manage their files and data, write code with an intelligent editor, and execute code within a terminal session.
Launching RStudio Server from an Apptainer Container
Rocker provides many software containers for R. Due to the default permission settings of our file system, launching an RStudio Server session is not straightforward. If you are interested in using their containers on the HPC system, please follow these steps.
Pull container Use Apptainer to pull the container. We will use geospatial in this example.
module load apptainer apptainer pull docker://rocker/geospatial You should see geospatial_latest.sif in your current directory.
One-time setup The commands in this section are to be executed as a one-time setup on the frontend. You may need to repeat the steps here when running a new rocker container.
Political Sentiment Analysis
The nature of political communication has been fundamentally altered by the emergence of social media. In earlier eras, social scientists, journalists, and citizens could focus on static statements by politicians and candidates in order to understand the nature of political discourse. Social scientists studying political communication would design surveys and focus groups to understand which messages were received by citizens, and with what effect. Today, as news moves to digital platforms and as political figures increasingly rely on social media, political communication is fundamentally dynamic. Studying patterns of communication among politicians, their supporters, and their critics requires scholarly focus on the content, sentiment, and framing of posts on various social media platforms.
Predicting Injury Severity
Previous research has shown that older adults are more susceptible to severe injury than their younger counterparts after being involved in a motor vehicle collision. Dr. Hartka was interested in determining whether there are age-related differences in the accuracy of severe injury prediction following a motor vehicle collision. Using R, Research Computing developed age-specific logistic regression models and assessed their accuracy, and generated unique graphs and animations to visualize the data more effectively.
PI: Thomas Hartka
LOLAweb
The past few years have seen an explosion of interest in understanding the role of regulatory DNA. This interest has driven large-scale production of functional genomics data resources and analytical methods. One popular analysis is to test for enrichment of overlaps between a query set of genomic regions and a database of region sets. In this way, annotations from external data sources can be easily connected to new genomic data.
SOM Research Computing is working with faculty in the UVA Center for Public Health Genomics to implement LOLAweb, an online tool for performing genomic locus overlap annotations and analyses. This project, written in the statistical programming language R, allows users to specify region set data in BED format for automated enrichment analysis.
epihet
RC is working with researchers in the Center for Public Health Genomics to write an R package to calculate Relative Proportion of Sites with Intermediate Methylation (RPIM) scores, which represent the epigenetic heterogeneity in a bisulfite sequencing sample.
https://github.com/databio/epihet
PI: Nathan Sheffield (Center for Public Health Genomics)
PHACTR1 and Smooth Muscle Cell Behavior
Coronary artery disease (CAD) is the major cause of morbidity and mortality worldwide. Recent genome wide association studies (GWAS) have revealed more than 50 genomic loci that are associated with increased risk for CAD. However, the pathological mechanisms for the majority of the GWAS loci leading to increased susceptibility to this complex disorder are still unclear. RC is working with Redouane Aherrahrou (CPHG) who aims to study the impact of the CAD-associated genetic factors on the cellular and molecular SMC phenotypes. Support for this project has included preparation of scripts for programmatic data analyses, data visualization, statistical modeling, and assistance with use of the Rivanna high-performance computing cluster.
simpleCache
In partnership with researchers in the Center for Public Health Genomics, School of Medicine Research Computing has contributed to the development of a novel package for computationally efficient caching and loading of data in R. simpleCache provides an interface to a series of functions to store and retrieve cached objects, including in the context batch processing or HPC environments. The package further extends base R functionality of saving and loading external representations of objects by enabling caching to pre-defined directories and timed cache operations.
RC helped document and develop new functions for the package ahead of its release to the Comprehensive R Archive Network (CRAN).
Preinstalled R on Ivy Linux VM
R Overview R is an open source programming language, used by Data Miners, Scientists, Data Analysts,
and Statisticians. It is available under the GNU GPL V2 license from the Comprehensive R
Archive Network
R can be used for many statistical, modeling, and graphical solutions. It is very Object-Oriented in nature and is
easily extensible.
Running the command line R console Type R at the terminal to launch the R console.
Installing packages Our Linux VMs come equipped with R preinstalled. Most major R packages are also installed
and further could be installed from CRAN using (from within the R console)
Preinstalled R on Ivy Windows VM
R Overview R is an open source programming language, used by Data Miners, Scientists, Data Analysts,
and Statisticians. It is available under the GNU GPL V2 license from the Comprehensive R
Archive Network
R can be used for many statistical, modeling, and graphical solutions. It is very Object-Oriented in nature and is
easily extensible.
Running Rstudio from the desktop You can start R in a Graphical interface using the RStudio application from the desktop
Running the command line R console Type R at the command prompt to launch the R console.
Installing packages Our Windows VMs come equipped with R preinstalled.