Because infection causes expansion of pathogen specific T cells that express pathogen specific unique receptors encoded in DNA, exposure history of an individual can be accessed through examination of their T-cell receptors (TCRs). These unique sequences can be used as biomarkers for tracking T-cell responses and cataloging immunological history.

The advent of high-throughput sequencing and analyses of immune cell receptor sequences has presented a unique opportunity to inform our understanding of immunological responses to infections. Following the body’s recovery, pathogen-specific immune cells and their receptor sequences remain present at higher frequencies, with their increase in frequency preventing subsequent infections. As a result of their persistence in the body, T-cells are a useful tool for diagnosing infections and evaluating vaccine efficacy as a stable biomarker. However, this process requires thorough analysis of massive datasets at an accuracy beyond the capabilities of traditional statistical tests.

Here we utilize a Deep Neural Network to identify specific viral infections or vaccination statuses of 4 TCR sequenced cohorts: mouse monkeypox, mouse smallpox, human cytomegalovirus serostatus, and human smallpox. The success of our intensive experiments holds the potential for the speedy creation of low-cost, highly accurate diagnostic assays.

Data: T-Cell Receptor Sequences

Figure_1

Figure 1. A visualization of one of our four developmental TCR cohorts. Cohorts are created by merging all samples of a particular disease and patient species into a single representative tab delimited file, with individual sample columns denoting TCR abundance and antigen exposure status. Our work focused primarily on 4 cohorts across 2 species. Monkeypox / smallpox in mice, and cytomegalovirus / smallpox in humans.

T-Cell Proliferation

*T cells are lymphocytes that play a pivotal role in the adaptive immune system. *T cells are identified by the presence of a TCR locus, a receptor on the cell surface carrying a DNA sequences randomly generated by a process known as VDJ recombination. *During infection, T-cells with unique TCR DNA sequences that match the invading pathogen quickly proliferate. *The effective TCR sequences are preserved within memory T cells, which retain the unique pathogen specific DNA rearrangements within their TCR loci. *These long last memory T cells can be surveyed as biomarkers of pathogen exposure to assess immunological history.

Figure_2

Figure 2. The binding of a T cell to an antigen-presenting cell (APC). The T cell receptor (TCR) on both CD4+ helper T cells and CD8+ cytotoxic T cells binds to the antigen as it is held in a structure called the MHC complex, on the surface of the APC. This initializes the primary activation of the T cells.

Figure_3

Figure 3. The formation of a TCRβ chain from the random selection and recombination of the V, D, and J segments. While the V and J segments of the TCR locus can be influenced by a patient’s major histocompatibility complex (MHC). The CDR3 segment is solely antigen specific.

Figure_4

Figure 4. Following the initial spike in pathogen related TCR sequences following infection. A heightened level of these specific sequences remains as both a preventative measure and stable immunological biomarker.

Infection Differentiation

*In samples with habitually low TCR diversity, specifically our 2 mouse cohorts, direct statistical comparisons can be made between samples before and after antigen exposure. *Through use of Fisher’s exact test, we can easily identify and catalog disease specific TCR sequences. *However, mice average around 50,000 total T cells. There are approximately 4 × 1011 T cells circulating in the adult human body with less than 1 × 107 relating to disease. *In response to this abundance of white noise, we developed a multi-layer deep neural network capable of discerning hidden patterns and features in our TCR data and outputting a binary classification through a final logistic regression layer. *Unlike our statistical methods which return libraries of pathogen related TCR sequences, a deep neural network remains as a diagnostic assay. This gives us the ability to classify new TCR samples based on learned features.

Figure_5

Figure 5. An illustration of the noticeable separation between exposed / unexposed samples in mouse cohorts. Due to the low TCR count and shorter immune cell lifespan in mice, an adaptive immune response is much more likely to significantly mutate the TCR profile as a whole. Allowing for basic statistical comparisons to identify TCR sequences of significant impact.

Figure_6

Figure 6. An illustration of the considerable overlap in exposed / unexposed TCR libraries characteristic of the human immune system, and the confusion this white noise can cause a basic statistical model. Due to the incredible resilience and diversity of human T cells, human samples maintain a healthy TCR repertoire irrelevant of recent infection status.

Results

Figure_7

Figure 7. A confusion matrix visualizing the performance of our deep neural network on our flagship dataset. A 129-sample human smallpox cohort sequenced at the ultra-deep level for the highest possible TCR diversity, and as a by-product, white noise. The cohort was split 75%/25% into a training and testing set, with the deep neural network classifying the training set with 100% accuracy and the test set with 97.0% accuracy.

Conclusion

The ability to identify pathogen specific TCR sequences in a variety of infected datasets is crucial to a quick response to novel pathogens. Moreover, the ability to leverage the differences between infected and uninfected samples into a persistent diagnostic assay in the form of a deep neural network offers the possibility of a highly adaptive diagnostic platform built from an existing abundance of TCR sequence data. However, T cells recognize more than invasive pathogens. The adaptive immune system’s antigen response includes but is not limited to cancers, allergens, markers for diabetes, and poisons. The diagnostic power of our deep neural network is similarly unbound, and as we move forward, we hope to expand both its accessibility and flexibility in order to provide a low-cost alternative to current diagnostic assay development.