Overview
Documentation and User Guide for the Personal Health Train (PHT), an open source, container-based secure distributed analysis platform. For more information about the PHT team, projects and collaborations, you can also visit our website.
If you want to deploy our platform productively in a clinical environment, please e-mail us at: pht(at)medizin.uni-tuebingen.de
We can share operational and technical documentation to get clearance from your local IT-Security and data protection officers.
Introduction
The Personal Health Train (PHT) is a paradigm proposed within the GO:FAIR initiative as one solution for distributed analysis of medical data, enhancing their FAIRness. Rather than transferring data to a central analysis site, the analysis algorithm (wrapped in a ‘train’), travels between multiple sites (e.g., hospitals – so-called ‘train stations’) securely hosting the data.
The following overview shows all interactions between service components to execute a train iteratively over three stations with our PHT-TBI architecture.
Mission Statement
From Machine Learning (ML) healthcare can profit by ‘learning’ models which support clinical practice in treatment decision support systems (TDSS). To increase the robustness of an obtained model and produce meaningful results, generally, the analysis outcome depends on the number of training samples and data quality.
But meaningful data to improve predictions in medical research and healthcare is often distributed across multiple sites and is not easily accessible. This data contains highly sensitive patient information, may consist at each site different data formats and cannot be shared without explicit consent of the patient. Our goal is to make this data available for trains with stations to support privacy-preserving distributed machine learning in healthcare with our open-source implementation of the PHT.
Implementing trains as light-weight containers enable even complex data analysis workflows to travel between sites, for example, genomics pipelines or deep-learning algorithms – analytics methods that are not easily amenable to established distributed queries or simple statistics.
Architecture
Central Services
RabbitMQ
- Message broker for consuming and publishing commands & events between different servicesHarbor
- Docker registry to manage (train-) imagesVault
- Secret storage to securely store sensitive informationUser Interface (UI)
- Frontend application for proposal and train management, downloading of results and much moreAPI
- Backend application to manage resources and trigger commands & events through the message brokerTrain Manager
- Microservice serving different components:Train Building
- Build and distribute train images to a registryTrain Routing
- Move trains between projects & registries accordingly to the route of the trainResult Extracting
- Download, extract & serve encrypted results from the registry
Local/Station Services
Airflow
- Open-Source-Tool to create and schedule workflows and enables persistent access to data, execution and monitoring of trainsKeycloak
- Identity and Access Management (IAM) to manage users and roles
Desktop App
- GUI to manage key pairs and decrypt results locally
Security
Security Protocol
The following flow chart depicts the security protocol used for protecting participating stations against malicious code, as well as encrypting any stored results using envelope encryption.
This ensures that only approved algorithms are executed and that only previously registered participants in an analysis can access the results.
Languages
JavaScript
Wikipedia: JavaScript (https://developer.mozilla.org/en/docs/Web/JavaScript) often abbreviated JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS.
TypeScript
Wikipedia: TypeScript (https://www.typescriptlang.org/) is a programming language developed and maintained by Microsoft. It is a strict syntactical superset of JavaScript and adds optional static typing to the language. It is designed for the development of large applications and transpiles to JavaScript.
Python
Wikipedia: Python (https://python.org) is an interpreted high-level general-purpose programming language. Python's design philosophy emphasizes code readability with its notable use of significant indentation. Its language constructs as well as its object-oriented approach aim to help programmers write clear, logical code for small and large-scale projects.