911±¬ÁÏÍø

XClose

UCL Centre for Digital Innovation

Home
Menu

Technical Case Study: Enhancing Batch Processing Workflows for UCL Spin-Outs with ARC's Expertise

UCL Advanced Research Computing collaborated with startups Chronostics and StoreGene to create an open-source Terraform module for efficient AWS batch-processing pipelines.

case study tech

6 June 2024

As part of their journey in the UCL CDI Impact Accelerator, startups chronostics and StoreGene  collaborated with Research Software Engineers (RSEs) from UCL Advanced Research Computing to enhance their technical capabilities. ARC developers provided crucial support, helping these startups to develop and implement advanced infrastructure solutions. A key result of this was the development by ARC RSEs of an open-source Terraform module for deploying a batch-processing pipeline on AWS. This innovation allows users to quickly spin up infrastructure to handle a common processing pattern - upload data to S3, trigger containerised processing pipelines, then store the results back into S3. The module, named terraform-aws-batch-processing, is designed using the AWS Well Architected framework, ensuring efficiency in both cost and computational resources. 

Technical Overview: Streamlined Serverless Architecture 

The module, terraform-aws-batch-processing, implements a serverless architecture that uses best practices set out in the AWS Well Architected framework, resulting in an efficient implementation from a cost and computational perspective. The high-level architecture is shown below. 

github

First, a user uploads an object to S3. This sends an upload event notification to AWS Lambda, which triggers a Lambda function to initiate an AWS Step Functions workflow. AWS Step Functions is a service that can be used to orchestrate workflows involving other AWS services. In this instance, the workflow consists of three steps in serial: 

  1. Data Synchronization: Use AWS DataSync to sync data between the 'Upload' S3 bucket and EFS. 

  1. Batch Processing: Submit a job to an AWS Batch queue to a run a containerised processing pipeline on Fargate. There is an EFS mountpoint on Fargate so the containers can access the necessary data. 

  1. Results Storage: Use AWS DataSync to sync data between EFS and a 'Reports' S3 bucket. 

The Terraform module optimises for cost and performance, but not at the expense of the other Well Architected Pillars. For instance, security in the cloud is a priority for everyone these days. In this respect, terraform-aws-batch-processing defines IAM roles that have only the necessary capabilities for each part of the workflow, following the principle of least privilege. The infrastructure is also automatically replicated in private subnets across multiple Availability Zones, providing regional resilience and improving reliability. 

Application in Startups: chronostics and StoreGene add batch processing capabilities to improve patient outcomes 

The terraform-aws-batch-processing was developed during Cohort 3 of the Impact Accelerator in collaboration with two start-ups from the cohort - chronostics and StoreGene. 

chronostics: Enhancing Clinical Trial Efficiency 

chronotics is a UCL spin-out using data-driven disease progression modelling to improve clinical trials and patient outcomes. Their platform aids clinical researchers recruit the most appropriate patients, streamline the clinical trial process, and in the future will ensure that patients receive the most effective treatment for their needs. 

Challenge: Before collaborating with ARC, chronostics performed all processing locally on an ad-hoc basis, limiting their scalability and responsiveness. 

Solution: As part of Cohort 3 of the Impact Accelerator, chronostics worked with ARC to integrate the terraform-aws-batch-processing module into their platform. This transformation allowed them to scale their processing operations and respond to processing requests more effectively.  

Outcome: The enhanced batch processing capabilities have enabled chronostics to respond to processing requests more effectively, providing a better service to their customers and enabling the startup to support a larger number of clinical trials efficiently. 

StoreGene: Advancing Personalized Healthcare 

StoreGene is another UCL spin-out, providing personalised healthcare based on a patient's whole genome. Their platform helps clinicians to interpret, understand, and make decisions based on the vast amounts of data in their patients' genomes. Given that around 50% of diseases are heritable, this personalised approach to medicine based on whole genome analysis has the potential to improve a person's health at every stage of their life - from birth to old age. 

Challenge: StoreGene needed to improve their data processing pipelines for whole genome analysis and streamline the process of running bioinformatics pipelines upon data upload. 

Solution: During Cohort 3 of the Impact Accelerator, StoreGene collaborated with ARC to integrate the terraform-aws-batch-processing module into their platform. This automation allows them to run bioinformatics pipelines when a clinician uploads a patient's genome data to the site. Additionally, they integrated AWS QuickSight into their platform to present the final reports to clinicians. Together, this provides a streamlined means for clinicians to get actionable insight and provide patients with the most effective care based on their genome. 

Outcome: This integration provided StoreGene with a seamless and efficient way to process and analyze genomic data, offering clinicians actionable insights to deliver personalized care to patients. 

A Leap Forward in Digital Innovation 

The collaboration between ARC's Research Software Engineers and the UCL spin-outs chronostics and StoreGene, facilitated by the UCL CDI Impact Accelerator, highlights the significant impact of advanced batch processing workflows. By leveraging the terraform-aws-batch-processing module, both startups have enhanced their platforms, improved their service delivery, and positioned themselves for greater scalability and success in their respective fields. Through this partnership, ARC has demonstrated its vital role in driving technological innovation and supporting the growth of cutting-edge healthcare solutions. 

Further info 

The terraform-aws-batch-processing module is permissively licensed and available on GitHub .