Mathematics, computers and medicine

By Benny M Chain, on 6 February 2017

The eminent 19th Century physician Sir William Osler, the founder of the John Hopkins School of Medicine in Baltimore was famously quoted as having said “Medicine will become a science when doctors learn to count”. Sadly, his prophetic words have had little impact. Mathematics, and its younger partner, computer science make barely any contribution to the current medical school curriculum in most universities, just as they did not 150 years ago. And yet medicine is now in the midst of a revolution which may be as profound as the impact of molecular biology in the second half of the last Century. A revolution driven primarily by the extraordinary developments in computers and processing power. Scarcely a field of medicine remains untouched. Primary care provision is coming to terms with vast new data sets comprising comprehensive electronic patient records which will link an individual’s genetic makeup to behavioural and social data and medical history. Miniaturisation and price reduction are boosting point-of-care diagnostics and real-time physiological monitoring. A score of new imaging modalities which leverage the latest advances in artificial intelligence and machine learning are driving the emergence of computational anatomy, providing real time dynamic images of every part of the human body. Robotics are transforming surgery. Genomics, proteomics and metabolomics are creating precision medicine. The list is endless.
Computers and mathematical models lie at the heart of all these advances. Computers are needed to drive the new instrumentation, and then to capture, store and organise the vast data sets they generate. But generating, storing and classifying data are not enough. The real benefits can only arise from the ability to extract the underlying biology and pathology which are embedded in this data. This ability, in turn, requires mathematical modelling : the ability to recognise and formulate the rules and relationships which allow the data to make sense, which enable us to predict the future based on the past, and which ultimately will guide and inform the clinician and establish best medical practice.
The mathematics may be of many different types. It may be classical mathematics, fields such as classical , statistical or quantum mechanics, linear and non-linear differential equations, probability theory, the mathematics of light etc. Or it may be newer branches of mathematics, fields which some mathematicians may not even recognise as legitimate branches of mathematics at all: artificial intelligence, machine learning, high dimensional statistics, computational algorithms etc. Indeed so fluid and dynamic is the interface between mathematics, computer science and biology and medicine that it would be a brave individual who would predict with any confidence the pace or direction of progress which we may observe in the coming decades.
So how can we prepare the next generation of doctors for this brave new world, where they will properly be able to exploit the data and computational revolutions which are already transforming medical care in the 21st Century ? Clearly mathematics and computation will gradually need to be embedded in the medical school curriculum, in the same way that molecular biology and biochemistry have now taken their rightful place alongside the classical study of anatomy and pathology. UCL, one of the top medical schools in the UK, with a long-established commitment to train doctors who are at the cutting edge of medical advances, has recently taken an important step in this direction. UCL’s Programme and Module Approval Panel has recently approved my proposal for a new Intercalated BSc degree in Mathematics, Computers and Medicine, which will be launched in the academic year 2018/2019. The one year degree will be offered as an option in the third year of the medical school curriculum, when all students at UCL have to chose a BSC in a subject of their choice. The course will be interdisciplinary, and will be run jointly by the Faculties of Medical Sciences and the Faculty of Engineering. It will provide teaching in advanced mathematics (capitalising on the fact that a substantial portion of the entry into medical school have an A in Mathematics at A level) , computer programing , mathematical modelling and analysis of big data. Perhaps of greatest importance, the new degree will capitalise on the wealth of interdisciplinary research which already occurs at UCL to offer student research projects at the frontier between biomedicine, computers science and mathematics.
The new degree is a small step in the necessary reform of the medical school curriculum to keep pace with the computer revolution. The graduates will not necessarily become adept mathematicians or computer scientists, just as most of the graduates of the other IBScs do not spend their careers as laboratory scientists. But it will provide a core group of new talented doctors, who have confidence in their ability to understand and interact with mathematicians and computer scientists, and who will provide the leaders in the computational revolution of medical practice.

Two new jobs at the Chain lab. !! Please forward to anyone who might be interested,

By Benny M Chain, on 8 January 2017

Position  1 http://www.jobs.ac.uk/job/AWG425/research-technician/

Research Technician
University College London – Cancer Institute / Research Department of Cancer Biology
Location: London
Salary: £34,056 to £41,163 per annum, inclusive of London Allowance, UCL Grade 7
Hours: Full Time
Contract Type: Contract / Temporary
Placed on: 19th December 2016
Closes: 15th January 2017
Job Ref: 1619025
A position for Research Technician is available to carry out RNAseq and T cell receptor sequencing on clinical samples for a variety of projects related to tumour immunotherapy.

The postholder will work alongside existing research technicians to help run the Pathogen Genomics Unit (PGU), a unique facility providing pathogen genomics, transcriptional profiling and T/B cell receptor repertoire sequencing. The role will involve processing samples for next generation sequencing and adapting existing library preparation pipelines for high throughput robotics. The postholder will receive training in all aspects of sample processing and next generation sequencing.

The post is funded by the Cancer Research UK Cancer Immunotherapy Accelerator (CITA) Network for twenty four months in the first instance.

Position 2 http://www.jobs.ac.uk/job/AWM627/research-associate/

Research Associate
University College London – UCL Division of Infection and Immunity
Location: London
Salary: £34,056 to £41,163 per annum (UCL Grade 7), inclusive of London Allowance.
Hours: Full Time
Contract Type: Contract / Temporary
Placed on: 6th January 2017
Closes: 24th January 2017
Job Ref: 1620086
Applications are invited for a Postdoctoral Research Associate in the Division of Infection & Immunity at UCL, funded by Unilever PLC, and available immediately, to work on an analysis of the T cell receptor repertoire in the context of allergic contact dermatitis.

The postholder will be supervised by Professor Benny Chain and be based in Cruciform Building. Prof Chain works closely with members of the Computer Sciences Department at UCL, to develop algorithms with which to study high dimensional immunological data sets, with a special emphasis on T cell receptor repertoire (see http://blogs.ucl.ac.uk/innate2adaptive/ and www.innate2adaptive.com) The postholder will also have the opportunity to interact with other groups within the Unilever led consortium, including in-house modeling expertise within the toxicology group at Unilever.

The post is available until 31st December 2017 in the first instance. Further funding may then become available.

How quantitative is quantitative immunology ?

By Benny M Chain, on 9 March 2016

I recently attended a workshop on Quantitative Immunology held under the auspices of the Kavli Institute of Theoretical Physics at the University of California, Santa Barbara (UCSB). The 30 or so participants came for a variety of backgrounds : the majority from physics, a fair number of classical experimental immunologists and a handful of applied mathematicians. The meeting, which lasted three weeks, was structured round seminars given by the participants, but these provided the focus for lots of interaction, discussion and debate. In refreshing contrast to most conventional immunology conferences, the emphasis was very much on ideas rather than overwhelming displays of data. I would like to consider briefly what themes emerged from the meeting, and how these may help define the scientific space of Quantitative Immunology.
Evolution, in many guises, was a nexus joining the realms of physics and biology. Mutation/selection, in the evolutionary sense was invoked at a molecular level to shape antibody and T cell repertoires, at a cellular level to create the naïve and memory compartments of adaptive immunity populations, and at a systems level to form entire signalling networks. The ability to apply well developed stochastic models proved an enticing approach for physicists to enter the world of immunology. Quantities borrowed directly from physics (information content, entropy) , or from ecology (e.g. diversity) were frequently used although the meaning of these terms in the immunological context would have benefited from more precise explanation.
The meeting was full of interesting approaches and I certainly learnt a lot about how physicists might approach the complicated world of classical immunology. But I was struck by how often the emphasis on the “quantitative” was absent. There were of course exceptions. The careful measurements of IL2 consumption/diffusion Oleg Krichevsky; the probabilistic models of TCR generation (Thierry Mora and Aleksandra Walczak) , the kinetic proof reading models of TCR transduction (Paul Francois) , and the simple but elegant birth/death models of repertoire generation (Rob de Boer) were examples (just a few of my favourites and not meant to be an exclusive list!) of modelling which generated precise quantitative predictions amenable to experimental measurement and verification/falsification.
But often, quantitative model parametrisation seemed almost an afterthought, and quantitative prediction not really the goal. Instead, models were used as conceptual tools to describe and hence gain intuition into very complex biological systems. This type of modelling can surely be helpful. But it is a long way from the precise equations of physics, based on a handful of fundamental “constants” and whose power lies precisely in their ability to precisely predict quantitative behaviour of a physical system to the limits of measurable accuracy available.

The impression I came away with is that bringing immunology into the quantitative era is still a project in its infancy. Of course, data of a more-or-less quantitative nature is being produced at an exponentially increasing rate : single cell data, flow cytometric data, genomics data, transcriptomics data. But we lack a conceptual framework which will somehow capture this data in mathematical formulae, which can be predictive, accurate and offer intuitively appealing interpretations of immunological complexity.

What causes HIV-induced immunodeficiency?

By Benny M Chain, on 11 January 2016

Chronic HIV infection almost always leads to a progressive immunodeficiency. The manifestations of the immunodeficiency are quite characteristic to HIV infection. Specifically, commensal or non-pathogenic environmental micro-organisms, which are generally contained by the immune system asymptomatically, lead to uncontrolled infections, morbidity and, if untreated, death. Interestingly, although HIV infection leads to a specific loss of CD4 cells, many of the major HIV-associated infections are caused by intracellular organism, both viral (e.g. KSHV, HSV, CMV), bacterial (e.g. Mycobacterium) and fungal (e.g. Cryptococcus), infections in which the CD8 T cell response has an important protective role. The mechanisms which underlie HIV pathogenesis continue to be hotly debated. Functional defects have been reported not just in the primary target of HIV (CD4 T cells) but in almost every other facet of immunity. But unravelling which changes cause immunodeficiency, and which are the indirect consequence of it remains an unsolved challenge.
We have recently re-examined the T cell repertoire in chronic untreated HIV infection using high throughput sequencing to generate a global picture of alpha and beta gene usage (Frontiers in Immunology). We observe multiple levels of immune dysregulation, including a decrease in overall diversity, which seems to be driven both by a reduced CD4 cell repertoire, but also by a very extensive over expansion of individual CD8-associated T cell clones which take up a progressively larger proportion of the available repertoire space. We repeated our analysis on samples taken from the same patients three-four months after they started ART treatment, by which time viral load was very low or undetectable in all patients. Clinical experience has established that this short treatment period leads to dramatic improvements in patients suffering from HIV-associated infections at the time of treatment initiation, suggesting that even this brief time is sufficient for significant functional repair of immune function. As expected, CD4 T cell numbers were still very low at the time of the second sample, and we observed only a very small increase in overall repertoire diversity. However, the ability to quantitatively track the frequency of individual alpha and beta genes in the two sequential samples, showed a remarkably dynamic picture underlying the apparent stable overall T cell numbers and diversity (see fig 5 in paper). Many of the TCRs which were highly expanded in the HIV+ samples showed decreases in abundance of 100 fold or more, while other TCRs increase in abundance. These dramatic changes in T cell receptor abundance are much larger than are observed when comparing paired samples taken at similar time intervals from healthy volunteers. Although the repertoire method we used does not allow pairing between alpha and beta chains, we were able to identify a number of TCR genes present in the HIV+ samples which had been characterised as forming part of HIV specific TCRs in previous studies. These sequences generally showed a fall in abundance following three months of therapy, in agreement with the rapid decrease in HIV specific T cells previously observed to accompany the ART induced fall in viral load.
On the basis of these observations, we propose a model linking dysregulation of TCR repertoire to immunodeficiency. The model remains speculative, but we hope it will stimulate further discussion and experimental verification. We propose that HIV infection drives an unregulated expansion of specific CD8 T cells, responding either directly to chronic exposure to HIV antigens, or to gut microbial antigens which enter the circulation as a result of HIV induced damage to the gastrointenstinal barrier . The overexpansion of these clones distorts the CD8 T cell repertoire, and may specifically out-compete some of the CD8 T cells responsible for containing the growth of microorganisms to which we are commonly exposed or by which we are already chronically but asymptomatically infected. As a result, organisms which are normally harmless can no longer be adequately controlled and lead to overt infection and morbidity. The rapid fall in viral load following ART leads to a rapid decrease in the abundance of these over-expanded clones (as antigen load diminishes). This in turn leads to a normalisation of the CD8 effector repertoire, and hence reversal of functional immunodeficiency and control of facultative pathogens long before the slow regeneration of the CD4 compartment can take place.

Heterogeneity in the PCR

By Benny M Chain, on 13 October 2015

Unexpected heterogeneity in the PCR highlights the importance of molecular barcoding in immune repertoire studies. Benny Chain and Katharine Best Read the paper here
The polymerase chain reaction (PCR) is probably the most widely used technique in biology today. It is based on the process which defines all living organisms, namely the template driven replication of a nucleic acid chain by a polymerase enzyme. Since each piece of DNA gives rise to two daughter strands at each cycle, the amplification is exponential. This gives PCR extraordinary sensitivity (a single DNA molecule can readily be detected). However, it also means that the degree of amplification achieved is very sensitive to the efficiency of amplification, namely the proportion of strands which are correctly replicated at each step. This efficiency is well known to depend on a whole variety of experimental factors, including the sequences of both oligonucleotide primers and template DNA. Traditional approaches to quantitative PCR therefore relied on measuring the average amplification of an unknown number of molecules of a specific target sequence, in comparison to known numbers of a reference sequence. However, the advent of massively parallel sequence technologies now allows one to actually count the number of times each a particular sequence is present in a PCR amplified mixture. Indeed counting heterogeneous DNA mixtures by sequencing lies at the heart of RNA-seq transcriptional profiling, and especially at attempts to describe the adaptive immune repertoire by T cell or B cell receptor sequencing. We have therefore re-examined the heterogeneity of PCR amplification using a combination of experimental and computational analysis.

In our paper we label each template molecule of DNA with a unique sequence identifier (a molecular barcode), amplify by PCR and count the number of times each DNA molecule is replicated using Illumina parallel sequencing. Remarkably, identical molecules of DNA, amplified within the same reaction tube, can give rise to numbers of progeny which can differ by orders of magnitude. PCR is an example of a branching process (see adjacent figure ). The mathematical properties of such processes have been studied quite extensively, but turn out to be complex. Indeed equations describing the processes cannot be obtained in closed form except in certain very limited specific cases. Instead, we build a “PCR” simulator (the code for which is freely available) which carries out PCR reactions in silico, tracking the fate of each molecule though a series of cycles. Using the simulator, we demonstrate that the observed heterogeneity can only be explained if different molecules of identical DNA are replicated with different efficiencies, and these differences are inherited by the progeny at ach cycle. The physical explanation of this heterogeneity remains mysterious. But these results highlight that the results of single molecule counting following PCR must be treated with caution, and highlight the importance of molecular barcoding in achieving reliable quantitative descriptions of complex molecular mixtures such as those representing the vertebrate adaptive immune repertoire.
Benny Chain and Katharine Best

Are anti-self responses ubiquitous and necessary for a successful immune response ? 

By Benny M Chain, on 7 August 2015


Some of you might be interested in a new paper which we have just published in Frontiers in Immunology (link). The paper outlines a new model built on cellular cooperation to explain the establishment of peripheral T cell tolerance. Unusually, the paper is wholly theoretical. The paper is also somewhat technical, since the model is formulated mathematically as a linear programing optimization problem that can be implemented as a multiplicative update algorithm, which shows a rapid convergence to a stable state. We (i.e. the authors) would welcome feedback on the paper, as it represents the first step in a more ambitious plan to build a broader theoretical framework underpinning the human immune system.

The topic I want to discuss here, however, relates to some fundamental predictions of the model, with far reaching implications across many areas of immunology. In brief the model predicts that anti-self T cells remain present as part of the normal self-tolerant T cell repertoire. The recognition that cells with the potential to react against self are present in normal healthy people (i.e. thymic deletion is not complete)   is now widely accepted. A particularly striking example is the latest paper from Mark Davis’s lab. http://www.ncbi.nlm.nih.gov/pubmed/25992863 ). The model further predicts, however, that some of these anti-self T cells will become activated (to a greater or lesser extent), whenever an immune response to a foreign antigen (e.g. a pathogen) is initiated. I formulate the possible significance of these predictions as three clear and testable hypotheses :

  1. T cells carrying receptors which recognise and respond to self are activated whenever there is a response to a foreign antigen.
  1. The dynamics of anti-self and anti-foreign T cell activation and proliferation  will generally be different, such that tolerance to self is usually rapidly re-established.
  1. The self-responsive T cells play a key role in amplifying and strengthening normal responses to foreign antigen.

If correct, these hypotheses have wide-ranging implications for many aspects of immunity. I select just three settings in which anti-self responses may play a key role in immune protection or immune pathogenesis:

  1. Cancer. The extraordinary successes of anti-checkpoint inhibitors in many types of cancer suggest a paradox for conventional theories of specific antigen-driven immunity. There is increasing evidence that the efficacy of checkpoint inhibitor therapy is driven by responses to non-self “neoantigens” (for example see a recent paper from the Schumacher http://www.ncbi.nlm.nih.gov/pubmed/25531942) .  Yet specific responses against such neoantigens, in the context of a mutationally unstable tumour, might be expected to rapidly drive the emergence of escape mutations, and the outgrowth of resistant tumour clones. The profound and long lasting responses observed clinically may depend on a concomitant but transient activation of polyclonal “helper” antiself responses, bolstered by the therapeutic suppression of natural self-regulating networks.
  1. Tuberculosis. There is a growing appreciation that “successful” immunological control of Mycobacterium tuberculosis infection reflects a subtle balance between too strong and too weak an immune response. Indeed the “caseating granuloma” which epitomises tuberculosis is a classic example of damage to self resulting from responses to foreign antigens. The hypotheses outlined above suggest that the robust and chronic stimulation of anti-mycobacterial responses may drive a concerted antiself response which will overcome the normal  homeostatic regulatory mechanisms and result in autoimmunity and cell death.
  1. HIV. Despite decades of research, the exact mechanism which drives AIDS pathogenesis remain unclear. In particular, the rapid turnover of T cells far exceeds the predicted number of infected T cells, and suggests an important role for large scale “bystander” killing of uninfected T cells. As in the case of tuberculosis, the persistent strong immune response to HIV antigens may activate a sustained and pathogenic antiself response, which will contribute to the gradual erosion of the T cell compartment.

The hypotheses and these illustrative examples of how they may operate were stimulated by a purely theoretical paper, a relative rarity in the immunological literature. Theoretical physics has a well-established and respected role within the physics corpus. In contrast, theoretical biology (including theoretical immunology) remains very much a poor relation (the Journal of Theoretical Biology’s most recent impact factor was a modest 2.1). It is true that in the absence of experimental data, the hypotheses presented remain speculative. Nevertheless, I hope that putting these ideas in the public domain may stimulate experiments which will support, or perhaps falsify the hypotheses. It is time our concepts of self-nonself balance were revisited.

Single cells and their stochastic perambulations

By Benny M Chain, on 30 June 2015

I returned last week from a small but intense meeting entitled “Stochastic single-cell dynamics in immunology : experimental and theoretical approaches”. The meeting took place in the splendid 17th century Tripenhuis building in Amsterdam, the home of  the Royal Netherlands Academy of Arts and Science for over two centuries. The meeting was organised by Leïla Perié, Johannes Textor and  Rob de Boer (all from the University of Utrecht) and Ton Schumacher (NKI Amsterdam).

Perhaps unsurprisingly, the meeting was driven by technical and methodological advances, both experimental and computational. Dirk Busch (TUM, Munich) presented further results from his elegant  single cell adoptive transfer in vivo model system, which has already revealed the striking heterogeneity in both proliferative and differentiation fate which arise from different naïve T cells or the same specificity. A nice extension of the system was the introduction of dendritic cells carrying the diphtheria toxin receptor, which could be depleted at fixed time after immunisation, thus allowing the role of persistence of antigen to be dissected. Remarkably, perhaps, the mechanistic source of the stochastic variation remains incompletely understood, and both this group and other groups are looking to in vitro models to give a more precise handle on this fundamental question.

Another highlight of technological virtuosity was, as ever, Ido Amit’s (Weizmann Institute, Israel) presentation of his experimental system which allows global transcriptomic and chromatin modification analysis to be carried out on ever smaller numbers of cells. Single cell transcriptomics of blood has revealed an increasingly complex mixture of heterogenous “subpopulations”, although the challenge of conceptualising meaningful “shapes” in such vertiginously high dimensional space remains.

A very considerable effort from many groups focused on improving the length of time, and the precision, with which cells could be imaged in vitro. The prodigious ability of T cells to race around in three dimensions provide special challenges for imaging, which Philippe Bousso  (Institut Pasteur, France) tackled by forcing them to migrate up and down narrow microchannels (reminding me rather of Olympic swimmers in their allotted lanes), thus allowing very accurate quantitation of migration velocities in one dimension. Tim Schroeder (ETH, Switzerland) presented another very elegant experimental system to tackle the challenge arising from the very slow turnover kinetics of stem cells, which therefore requires very long tracking times (in the order of weeks) in order to follow cells through differentiation. Although perhaps relying less on technical virtuosity, I liked Gregoire Altan-Bonnet’s (Memorial Sloan-Kettering Cancer Centre, USA) approach of extracting information from the natural heterogeneity captured by most flow cytometry data sets, which is otherwise largely ignored by  the thousands of practitioners of flow cytometry. He illustrated how a combination of this data analysis with elegant and simple mathematical modelling can be used to extract functional regulatory networks from quite “low-tech” data sets.

Technology aside, what were the conceptual questions emerging from the meeting ? Questions of lineage loomed very large. Tracing cell family trees (questions of progenitors, descendants , branch points) has long exercised immunologists and haematologists, and new technologies have certainly promised to provide more accurate and detailed answers. In the context of the very real prospect of successful clinical stem cell therapies such questions are surely important, although the nagging concern over lineage stamp collecting remains for me a niggling concern. An idea raised by several speakers was the “stemness” properties of the T memory cell. For this writer, the idea that memory cell “self renew” seemed almost tautological.  And in this context, it seemed astonishing that teleomeres scarcely got a mention ! Another surprising emphasis was the re-emergence of the concept that T cells can affect neighbouring cells through cytokine secretion. This idea, which was in some way the fundamental concept underlying the prolonged era of cytokine discovery, has gradually been replaced by the concept of specific targeted secretion dominated by tight immunological junctions. But the wheel, it appears, is about to turn full circle. A particularly intriguing extension of this is the suggestion that cytotoxic T cell mediated killing is often mediated at a distance via interferon gamma secretion. Further experimental and quantifiable evidence on the importance of this process in vivo is urgently required.

I did not emerge from this meeting with the feeling of having heard any major specific breakthrough, but rather with the sensation that the meeting represented a gathering of pioneers  who are exploring the possibilities of a new type of immunology. The common theme among the participants was a willingness to engage at the frontier between computational and experimental science. The rules of this new subject, how to develop it, which approaches will be fruitful, and which will turn out to be relics of the physical sciences with little relevance to the world of cell biology : all these questions remain to be answered. But it is encouraging to find a group of scientists seriously engaged in this exciting challenge (discussion throughout the meeting was lively).  Many of the senior investigators told me they faced a continual battle to secure funding and to ensure high impact publication in the face of a scientific establishment which often pays lip service to interdisciplinary research, but doesn’t understand or appreciate it when it actually finds some. In this context, one can hardly avoid mention of the enormous influence of Rob de Boer and his ever growing school of talented and imaginative disciples on the field as a whole.

I finish with a plea to conference organisers. Can we introduce a rule that any talk must present one, or at absolute maximum, two key findings ?  Perhaps its my limited attention span, but learning one really important new thing from a talk is about as much as I can take in or remember. And anyway, how many of us have the privileges of discovering more than one really important thing at a time?

On the spread of viruses : the enemy within.

By Benny M Chain, on 13 April 2015

I hope all readers saw our recent paper in PLOS Computational Biology  where we explored the parallels between spread of HIV within the body , which uses a combination of cell-to-cell and cell-blood-cell transmission and the spread of computer worms, some of which use a combination of local neighbourhood and remote probing to achieve maximal efficiency and persistence. Here I would like to extend the discussion of these results to a more speculative level than is generally allowed in the confines of rigorous academic peer review. I apologise that this discussion is light on quantitative analysis, contrary to the spirit of the blog. But perhaps someone can propose some quantitative way to address the questions I raise below.


The first question I would like to pose is from the perspective of HIV. What advantage, from an evolutionary perspective, is provided by dual spreading modes for the virus ? There is clearly a substantial cost to the virus involved in producing high levels of secreted virions, the vast majority of which die within 20-60 minutes without infecting another cell.

One hypothesis might be to suggest that it allows more rapid circulation of the virus and hence access to a larger reservoir of susceptible cells. Migrating  cells routinely encounter hundreds, or even thousands of other T cells per hour [1,2] ) . Infected cells, however, may migrate much more slowly [3]. Furthermore activated proliferating T cells, which provide by far the most effective targets of infection, may also migrate much more slowly, and remain attached to their cognate antigen presenting cell for much of their susceptible period [3].  Nevertheless, the extent of T cell recirculation makes this hypothesis less attractive.

An alternative hypothesis to explain the evolutionary advantage of hybrid spreading might be that cell free virus is transmitted from host to host much more efficiently than cell-associated virus, since the half-life of transferred cells in an allogeneic host is likely to be in the order of minutes, and may be too short to allow the establishment of the cellular synapse which is required for cell-cell transmission.  An instructive comparison is to compare HIV-1 with HTLV-1, another human retrovirus. HTLV-1 has been shown to spread almost exclusively by cell-cell transmission [4], and virus in blood is generally at undetectable levels. Although the prevalence of HTLV-1 in some endemic areas is very high, the virus has an ancient history in humans, and is accompanied by a relatively indolent pathology. The virus has therefor had plenty of time to establish itself in its host population. Certainly, there is no evidence for the remarkably rapid pandemic which has characterised the spread of HIV within the last 50 years. Comparison to other viruses would surely be instructive. I am not aware of what other viruses might use two parallel modes of spread, and would love to hear of any examples.

The second question is from the point of view of computer virus security. A key component of our analysis, which is explored in more detail in two other companion papers (http://arxiv.org/abs/1409.7291; http://arxiv.org/abs/1406.6046) is that the spread of computer malaware within a local network environment, is much more efficient and rapid than spread between one local network and another. We did not investigate the reasons for this behaviour in detail, but plausibly the major block to spread of the virus are institutional firewalls and once these have been penetrated the ability to infect another virus within the same system is much greater. A corollary of this phenomenon is that the ability to discriminate and suppress viruses or worms within an organisation may be as important, or even more important, than preventing the original security breach in the first place. Strategies for identifying and decommissioning malaware spread within a local network (“neighbourhood watch” approaches ) poses formidable challenges. But such strategies do start with the advantage that the set of potential sources and targets of infection are limited in number and known in advance. And useful solutions could be of major benefit to preventing the catastrophic results of cybercrime even if the primary infection event itself cannot always be prevented.

  1. Munoz MA, Biro M, Weninger W. T cell migration in intact lymph nodes in vivo. Curr Opin Cell Biol. 2014;30: 17–24. doi:10.1016/j.ceb.2014.05.002
  2. Gadhamsetty S, Marée AFM, Beltman JB, de Boer RJ. A general functional response of cytotoxic T lymphocyte-mediated killing of target cells. Biophys J. 2014;106: 1780–91. doi:10.1016/j.bpj.2014.01.048
  3. Murooka TT, Deruaz M, Marangoni F, Vrbanac VD, Seung E, von Andrian UH, et al. HIV-infected T cells are migratory vehicles for viral dissemination. Nature. Nature Publishing Group; 2012;490: 283–7. doi:10.1038/nature11398
  4. Pique C, Jones KS. Pathways of cell-cell transmission of HTLV-1. Front Microbiol. 2012;3: 378. doi:10.3389/fmicb.2012.00378


How do we cope with so many potential antigen-specific receptors ?

By Benny M Chain, on 26 October 2014

Welcome back to Immunology with Numbers and apologies for a long break. Struggling to interpret high throughput T cell receptor sequence data from antigen stimulated repertoires prompted the following thoughts. As always, I would love to receive comments and especially criticism. Also, do take a moment to have a look at our new Chain/Noursadeghi web site.

The vertebrate immune system uses a combination of somatic gene rearrangements and non-template junctional editing to produce a very large number of different receptors which bind a wide variety of antigens. Recent T cell receptor and B cell receptor high throughput sequencing have allowed some preliminary estimates of the size of the potential diversity . The most thoroughly analysed datasets are those from the T cell receptor β chain. The consensus appears to be that the number of possible different human β chain sequences is in the order of 1014 or above. The α chain diversity has been studied much less. Nevertheless our own sequencing, as well as other published studies suggest that the diversity in the human α chain is in the same order of magnitude. We don’t know the rules which govern the interaction of α and β  chains, or what restrictions on such pairings exist. But, even if we assume that such rulings impose a significant restriction on the total number of possible receptors, the number of possible receptors is likely to exceed 1020. And the overall germ line diversity of antibody is likely to be of the same order of magnitude.

It is immediately apparent that these large numbers pose some interesting conceptual problems in terms of specificity. If, as an extreme example, only one single receptor (defined by a unique germline sequence for both chains) recognizes a single antigen epitope, then since an individual has only around 1012 lymphocytes, almost all individuals would be missing the relevant receptor from their repertoire and would never recognise the antigen in question. Even if an individual did have such a receptor, it would at best occur only once during recombination. The frequency of individual T cell clones can be extremely low, possibly as little as 109. Despite the relatively high motility of T cells, dendritic cells sample T cells at rates of less than 1000 cells per hour. Each dendritic cell presenting the specified antigen would therefore have to wait many hours before encountering even one relevant T cell. What might be a solution to this paradox arising from the sheer number of possible receptors ? One possibility has been discussed in some depth by Alexandra Walczak and Thierry Mora. As they and others have shown, recombination is a random but non-uniform process. Some receptors are produced at much higher frequencies than others, and indeed are produced so frequently that they are apparently expressed in all individuals (so called public clones). In their most recent publication they suggest that recombination machinery, as well as the germline sequences themselves, have somehow evolved to produce receptors which are naturally selected and perhaps therefore more useful than others. This idea could be extended to suggest that the commoner receptors are those which have evolved to recognise common pathogen components : a sort of adaptive pattern recognition receptor.Putting aside the difficulty of envisaging a mechanism by which an apparently stochastic process which generates non-germline encoded diversity has “evolved” to favour certain sequences over others, this answer can at best account for only a small proportion of the diverse array of antigens against which the human immune system responds.

I would like to suggest that the diversity paradox can be resolved in only two inter-related but distinct ways. One approach is to increase receptor cross-reactivity at the expense of loss of specificity. If, for example, a single antigen epitope is recognised by 1010 different receptors, which are each generated with a wide range of probabilities, then even from a pool of 1020 possible receptors, each individual will have a high probability of having at least a few receptors which recognise any antigen, and the frequency of available receptors within an individual will in general be high enough that a new immune response can be generated within a few hours. The cross reactivity of the T cell receptor has been debated for some considerable time. A recent elegant paper from Christopher Garcia and Mark Davis’s laboratory demonstrate clearly that a single T cell receptor can indeed recognise millions, or even thousands of millions of different peptide targets, although interestingly the targets tend to share conserved “hot-spots” which represent conserved areas of TCR/peptide interaction. Of course, the correlate of high cross-reaction is low affinity, a well known property of T cell receptors, which is offset by the extreme sensitivity of the signalling mechanism, and the avidity factor which arises from displaying many identical receptors on the cell surface. The most common TCRs generated by the recombination machinery are likely to be of lower affinity than rarer higher affinity TCRs, but they may provide some cover while the higher affinity clones have time to find antigen and expand. Nevertheless, the overall characteristic of this highly diverse TCR repertoire will be low affinity and high cross reactivity.

The second approach is provided by somatic hypermutation, the characteristic feature of B cells and antibody. In this case, once again low precursor frequencies can be mitigated by having early responses mediated by low affinity, cross-reactive clones. This ensures that sufficient numbers of precursor cells exist for any epitope and can be recruited into an ongoing immune response within hours of exposure to an antigen. However, once these clones have been selected and expanded, somatic hypermutation provides a way that the BCR can be gradually fashioned to provide higher and higher affinity by repeated rounds of selection. Since selection only has to operate on the initial limited pool of low-affinity precursors recruited into the immune response, the initial enormous diversity of the BCR will not preclude the gradual emergence of antibodies with extremely high specificity, and very low cross-reactivity.

In conclusion, the extraordinary hypervariability built into the immune system’s receptor generating system imposes severe constraints on the ability of an individual to recognise anything at all. The evolutionary solution is to combine two sets of lymphocytes, T cells which respond at low affinity with extensive cross-reactivity and B cells which evolve gradually to produce antibody of high affinity and low cross-reactivity.

Novel insights from computational approaches to infection and immunity

By Benny M Chain, on 4 July 2014

The UCL  Infection and Immunity Computational Hub was officially launched with a one day meeting entitled “Novel Insights from Computational Approaches to Infection and Immunity”, which took place at the Royal Free Hospital, London.  The meeting certainly highlighted the diversity of computational approaches which can be used to better understand host/pathogen interactions. But themes encapsulated the day’s presentations for me. The first was the use of genomics, and specifically high throughput sequencing, to map both geographical and  evolutionary  change (whether of viruses, bacteria, T cells or humans !) over space and time. A second theme was that simplified but intelligently designed mechanistic models (whether peptide/MHC binding, T cell homeostasis, or cytotoxic T cell killing) could provide real insight into understanding extremely complex biological systems.

The meeting was opened by Hans Stauss, the Director of the new research Institute of Immunity and Transplantation , who emphasized the central role for  computational and bioinformatics analysis in maximizing the potential of the new Institute to deliver advances in biomedical research which would have a real impact on patient care. And, on cue, personalized medicine was the theme for the first speaker , Peter Coveney (Chemistry, UCL). He emphasized that large multiscale mechanistic models of biological processes would be key to using patient-specific  data to inform clinical care, and hence deliver truly  personalized medicine. He briefly outlined two examples, the processing of HIV polyproteins, and subsequent loading of individual peptides onto MHC molecules. The molecular dynamic simulations required for the latter were computationally intensive, and needed to be replicated many times in order to have confidence in the outcome. Access to high-performance parallel grid computing, potentially using tens of thousands of cores, was an absolute requirement for such approaches. In a similar vein, Robin Callard (Institute of Child Health) integrated data-driven statistical regression, with mechanistic ODE models of T cell homeostasis to probe the recovery of the CD4 cell compartment in HIV children following antiretroviral therapy. Strikingly, delay in treatment led to a more rapid rate of recovery, but a long term deficit in CD4 T cell counts. The models clearly predicted that earlier intervention led to better long term reconstitution, putting into question the  current clinical practice of delaying anti-retroviral treatment.

Richard Goldstein (Division of Infection and Immunity, UCL and my co-organiser of the meeting) addressed the question of whether HIV viral sequencing can be used to identify transmission events and hence map the parameters of the HIV epidemic at global or individual scale. Using a large data set linking viral sequence to clinical information, he developed evolutionary Bayesian models which could be used to infer infectivity values (man-to-man, man-to-woman etc.) , and predict the most probable infection pathways within a specific outbreak. Richard argued the large data set made the inferred model parameters robust at a population levels, although stochasticity and uncertainty limited the predictive accuracy in individual cases. The results suggested that over reliance on patient derived information introduced significant systematic bias into estimates of infectivity rates in different patient groups. Vincent Plagnol explored the topical area of expression quantitative trait loci (eQTL), describing improved more statistically rigorous methods for linking known single base pair polymorphisms (SNPs) to heterogeneous levels of specific gene transcription. The strategy offers a powerful new approach for identifying functional gene modules, thus gaining insight into the mechanisms regulating the host pathogen interaction.

After lunch, I spoke about computational challenges in analysis of T cell receptor sequence sets obtained by  high throughput sequencing. The enormous diversity (up to 10^14) of possible alpha and beta chains means that even genetically identical individuals will end up with a very distinct set of receptors. The challenge is to recognize antigen-dependent changes in repertoire against this enormous amount of “molecular noise”. I discussed the analogous problem of automatically identifying objects in images or  sentences in texts, and outlined an approach which deconstructs TCR CDR3 sequences into a series of overlapping  amino acid triplets, and then counts the frequency of clusters of similar triplets in immunized or unimmunized mice. The approach, borrowed from the “bag-of-words”  machine learning algorithm showed promise in distinguishing repertoires of T cells isolated from mice at different times post-immunization.

I was followed by Nick Thomson (Sanger Institute and London school of Tropical Medicine and Hygiene). He discussed the application of bacterial  genomics to map both the global distribution and local transmission routes of Shigella and Cholera. The enormous current political and media interest in the spread of antibiotic-resistance in human bacterial pathogens made this a particularly timely topic. Francois Balloux (Institute of Human Genetics) returned to evolutionary questions, discussing the challenge of reconstructing either human migration patterns or the spread of microbial pathogens from genomic sequence data. Despite showing some beautiful dynamic images modeling the global spread of the human population through the ages, François ended with a note of caution, emphasizing that temporal as well as spatial data series were required to make robust inferences of migratory or transmission patterns. A lively discussion between Richard and Francois ensued and was eventually adjourned for continuation at a later date over a drink ! The final UCL talk was given by Alexei Zaikin (Mathematics and Women’s Health, UCL) who asked whether intracellular regulatory circuits could give rise to intelligent behavior. He outlined several examples of such molecular intelligences, including a fascinating genetic implementation of Pavlovian conditioning. He then went on to show how, counter intuitively, the introduction of “noise” (stochastic variation in the signal) could under certain circumstances improve the reliability of the decision making process. For those of us who don’t like noise, its worth mentioning that in all cases too much noise utterly destroyed intelligent behavior !

The final key note talk was given by Rob de Boer, Director of the Institute for Biodynamics and Biocomplexity, Utrecht University, The Netherlands. His elegant talk addressed the surprising observation that, as HIV infection progresses the rate of mutational escape decreases, and the apparent benefit in terms of increased viral replication diminishes. This data had led some to the provocative suggestion that cytotoxic T cell immune control of HIV becomes unimportant as infection progresses. Using a simplified but beautiful model of a multi-epitope immune response, Rob showed that the observed changes arose naturally from the fact that the more epitopes are involved in a response, the less important is each epitope individually. Far from implying that CTL responses were unimportant, the breadth of the response as well as the magnitude turn out to be key to long term control of this virus.

The meeting was closed by Judy Breuer, head of the Department of Infection, UCL, who thanked all the speakers and reiterated the commitment of the Division of Infection and Immunity to strengthening the research in the computational arena. It was a long but fascinating day, and reinforced the extraordinary breadth of high quality computational biology already going on at UCL and the LSTHM. The signs are this area is going to be one of continued growth for some time to come !