Dr James Bray
Bioinformatics Postdoctoral Research Scientist
Maiden Lab
I am a bioinformatics scientist working in the bacterial genomics field.
Bacterial species identification using ribosomal multilocus sequence typing (rMLST)
I curate the ribosomal multilocus sequence typing (rMLST) scheme and associated databases. The goal of rMLST is the accurate taxonomic classification of all bacterial isolates using the DNA sequences found within the ribosomal protein-encoding genes (pubmlst.org/species-id).
My research involves applying rMLST in three ways:
Large-scale bacterial genomics projects
I am responsible for assembling high-throughput DNA sequencing data for the large-scale genomic projects with the Maiden Group (with a focus on Campylobacter, Neisseria and Streptococcus). As a result, I manage the exchange of large amounts of data between sequencing centres, different labs and the ENA Sequence Read Archive (SRA).
Assembled genomes are accessible on the PubMLST website (pubmlst.org) for bacterial population studies, pathogen surveillance and gene-by-gene analysis.
My research interests include:
In double-membraned bacteria, non-equilibrium processes that occur at the outer membrane are typically coupled to the chemiosmotically energized inner membrane. TolA and TonB are homologous proteins which energetically couple inner membrane motor proteins to the essential processes of outer membrane stabilization and substrate import, respectively. The evolutionary trajectories of these proteins have been difficult to elucidate due to low-sequence conservation, yet they are thought to transduce force similarly. Here, this problem was addressed using structural prediction approaches to identify and annotate force transduction operons to trace their distribution and evolutionary origins. In the process, we identify a novel outer membrane-tethering system and a previously unknown family of monomeric force transducers. This approach revealed putative tolA genes, and thus the core organizational principles of the tol-pal operon throughout diverse bacterial taxa. We discovered that the α-helical structure of the periplasm-spanning domain II of TolA previously thought its hallmark, is anomalous amongst most Tol-Pal systems. This structure is mainly prevalent in γ-proteobacteria, likely in adaptation to their lifestyle. Comparison of Tol-Pal and Ton system distribution suggests that TolA emerged from a TonB paralogue and co-emerged with Pal, the outer membrane-tethering lipoprotein that functionalizes the Tol-Pal system. We also determined that TolB, the Pal-mobilizing protein, likely emerged from a family of outer membrane proteins; and CpoB, a periplasmic factor that coordinates peptidoglycan remodeling with cell division, was originally a lipoprotein present in the ancestral Tol-Pal system. The extensive conservation of the Tol-Pal system throughout Gracilicutes highlights its significance in bacterial cell biology.
Tol-Pal system
,Ton system
,bacteria
,bacterial cell envelope
,force transduction
,molecular motors
Motivation: Target enrichment strategies generate genomic data from multiple pathogens in a single process, greatly improving sensitivity over metagenomic sequencing and enabling cost-effective, high-throughput surveillance and clinical applications. However, uptake by research and clinical laboratories is constrained by an absence of computational tools that are specifically designed for the analysis of multi-pathogen enrichment sequence data. Here we present an analysis pipeline, Castanet, for use with multi-pathogen enrichment sequencing data. Castanet is designed to work with short-read data produced by existing targeted enrichment strategies, but can be readily deployed on any BAM file generated by another methodology. Also included are an optional graphical interface and installer script. Results: In addition to genome reconstruction, Castanet reports method-specific metrics that enable quantification of capture efficiency, estimation of pathogen load, differentiation of low-level positives from contamination, and assessment of sequencing quality. Castanet can be used as a traditional end-to-end pipeline for consensus generation, but its strength lies in the ability to process a flexible, pre-defined set of pathogens of interest directly from multi-pathogen enrichment experiments. In our tests, Castanet consensus sequences were accurate reconstructions of reference sequences, including in instances where multiple strains of the same pathogen were present. Castanet performs effectively on standard computers and can process the entire output of a 96-sample enrichment sequencing run (50M reads) using a single batch process command, in $
Investigating the genomic epidemiology of major bacterial pathogens is integral to understanding transmission, evolution, colonization, disease, antimicrobial resistance and vaccine impact. Furthermore, the recent accumulation of large numbers of whole genome sequences for many bacterial species enhances the development of robust genome-wide typing schemes to define the overall bacterial population structure and lineages within it. Using the previously published data, we developed the Pneumococcal Genome Library (PGL), a curated dataset of 30 976 genomes and contextual data for carriage and disease pneumococci recovered between 1916 and 2018 in 82 countries. We leveraged the size and diversity of the PGL to develop a core genome multilocus sequence typing (cgMLST) scheme comprised of 1222 loci. Finally, using multilevel single-linkage clustering, we stratified pneumococci into hierarchical clusters based on allelic similarity thresholds and defined these with a taxonomic life identification number (LIN) barcoding system. The PGL, cgMLST scheme and LIN barcodes represent a high-quality genomic resource and fine-scale clustering approaches for the analysis of pneumococcal populations, which support the genomic epidemiology and surveillance of this leading global pathogen.
population structure
,cgMLST
,genotyping
,genome library
Respiratory syncytial virus (RSV) is the leading cause of hospitalisation for respiratory infection in young children. RSV disease severity is known to be age-dependent and highest in young infants, but other correlates of severity, particularly the presence of additional respiratory pathogens, are less well understood. In this study, nasopharyngeal swabs were collected from two cohorts of RSV-positive infants 100 pathogens, including all common respiratory viruses and bacteria, from samples collected from 433 infants, that burden of additional viruses is common (111/433, 26%) but only modestly correlates with RSV disease severity. In contrast, there is strong evidence in both cohorts and across age groups that presence of Haemophilus bacteria (194/433, 45%) is associated with higher severity, including much higher rates of hospitalisation (odds ratio 4.25, 95% CI 2.03–9.31). There is no evidence for association between higher severity and other detected bacteria, and no difference in severity between RSV genotypes. Our findings reveal the genomic diversity of additional pathogens during RSV infection in infants, and provide an evidence base for future causal investigations of the impact of co-infection on RSV disease severity.
immunization
,genetics
,biotechnology
,human genome
,generic health relevance
,infectious diseases
,lung
,infection
,vaccine related
E: | james.bray@biology.ox.ac.uk | |
T: | 01865 281067 | |
Maiden Lab profile |