Like natural selection, the journey of discovery of the gut microbiome started on a boat. Charles Darwin’s On the Origin of Species began aboard the HMS Beagle. Craig Venter set sail on his voyage to the edges of the mapped scientific world on Sorcerer II more than 170 years later.
Venter, an American biotechnologist, was the pioneer explorer of the human genome. He is well known for his part in mapping our DNA, the instruction manual of the human body. But when Bill Clinton’s administration effectively nationalised the human genome project and put the data in the hands of public bodies, rather than leaving it to West Coast businesses for commercial exploitation, Venter took to the high seas on his yacht. As you would.
His voyage was the beginning of a much bigger odyssey, perhaps the most consequential for our understanding of life on earth since Darwin set forth from Devonport on the Beagle in December 1831.
The Sorcerer II’s mission was to sample the microorganisms of the ocean. The more of those organisms it mapped, the more of the minuscule world of bacteria came into focus, not just in the depths of the ocean but deep within ourselves. The staging posts in our understanding of the gut microbiome began with the mapping of the human genome and then the sampling of oceanic microbiota.
But, first, what is the gut microbiome? Potentially, it’s the backstage pass to all the organs in the human body. It’s the mechanism by which we can revolutionise medicine and enable healthier, happier lives.
But the honest answer to the question of what is the gut microbiome is this: we don’t know. Yet. We are discovering a universe within. One hundred million universes, in fact.
Bacteria are the most successful forms of reproductive life on Earth. To give you some idea of scale, there are 100 million times as many bacteria in the oceans (13 × 1028) as there are stars in the known universe. And remember that 70 per cent of all bacteria actually live at least three miles underground (that is about 23 billion tonnes of bacteria).
They also exist in highly diverse and complex colonies throughout our bodies. This symbiosis is more intricate and important for our day-to-day existence when we view it from the perspective of the human genome, which has to interact with the bacteria, yeasts, viruses and other parasites that coexist and naturally occur within the many nooks and crannies of your body.
The largest of these exists in your gut and is known as the gut microbiome. The genes stored within the gut microbiome and the functions they encode are critically important for your organ development, growth and health.
But, if the microbiome is really that important to us, how come we are only just hearing about it?
A good place to start this story is with the human genome, which was finally mapped by the Human Genome Project (HGP) in 2003. This staggering achievement was supposed to be a complete understanding of human biology; from this blueprint, the plan was to conquer modern medicine by recoding our DNA to cure all known diseases. It was a human-centric version of health that has not in itself had the transformative impact imagined at its inception. Your doctor doesn’t routinely screen your genes and give you a diagnosis or even a treatment that alters your genes. Not yet, anyway.
Human genomic data only tells you about the potential for disease; it doesn’t really tell you what these genes are doing at any point in time, nor what the biological consequences of these genes being switched on or off actually are. More importantly, the HGP didn’t tell us anything about gene environment interactions, nor about the trillions of other genomes that ours has to interact with to stay healthy.
The HGP was a global initiative, launched by the National Institute for Health at a cost of $3 billion. There is gold in the human genome so the mapping experiment actually took place as part of a biological “space race” fought between academia and a commercial enterprise led by the colourful biotech engineer Venter and his company Cerela Genomics. The result was the rapid escalation in the development and application of genomic sequencing and computing technologies for mining the genome. Cerela was banking on patenting genes for drug and biomarker discovery, right up until it was announced in 2000 that all genomic data had to be in the public domain.
So what do you do when you have completed one of the greatest achievements in modern medicine and your share price takes a dive? Venter headed off on Sorcerer II – and he took sampling pots for storing oceanic water.
In the early 2000s, microbiome prospectors began to gather in California. A new technology born out of the HGP was starting to make noise: an ingenious way of studying not just the genes of one organism but the genes of many organisms simultaneously from within any environmental niche. It was first developed by Ed Delong and Steve Giovanonni, but Jo Handelsmann and Bob Goodman coined the neologism “Metagenomics” in 1998. This revolution was enabled by parallel advances in computing power and Moore’s Law. Scientists needed supercomputers to process and store this information. Even with a supercomputer, it took four to six hours of computation just to assemble one metagenome in 2008. But scientists also had to develop the mathematical tools to make sense of these new mountains of data. This gave birth to a completely new field of bioinformatics, and mathematicians became the rockstars of microbiome science. They are pretty wild.
One of the first environmental microbiomes to be reported was the oceanic ecosystem sampled from the back of Venter’s yacht. At the time, this represented the largest Metagenomic dataset ever put into the public domain, with more than 7.7 million sequences or 6.3 billion base pairs of DNA. It quickly became apparent that this approach had applications in human healthcare and ultimately to the analysis of human microbial ecosystems.
Until that point, our knowledge of the gut microbiome was largely based on the science of culture and chemistry. To identify and study a bacteria, you had to use a carefully constructed menu of molecular soups and environmental conditions that allowed it to grow. You were basically trying to recreate the specific conditions of bacteria’s natural habitat. At the start of the Metagonomics revolution we were only able to culture about 30 per cent of the organisms within the gut. More modern culture techniques can now successfully grow about 90 to 95 per cent.
In the late noughties, two large microbiome projects were funded. The US-based Human Microbiome Project and the European MetaHIT consortium. A second race began to try and apply this technology to the study of the human microbial niches, with the aim of trying to determine “who was there” and to work out if humans host a core microbiome of bugs that could be targeted to improve our health.
They both had a dramatic impact on our depth of knowledge about the microbiome. But, much like the Human Genome Project, when one layer of complexity was pulled back, another was discovered underneath.
Two fundamental techniques were used to study the structure and function of the gut microbiome. The first is true Metagenomics; this is an attempt to try and sequence all of the DNA genes found in all of the genomes of all of the life forms found in a specific niche, such as your gut. Confusingly, some viruses have genomes made from RNA.
Bacterial genomes are typically (but not exclusively) circular and contain somewhere between 130,000 and 14,000,000 letters of DNA code. This is sort of like computer memory, where 1,000 letters (or “base pairs”, as they known) is referred to as a “kilobase pair”, and a “megabase pair” equals one million. To do this, scientists will typically employ a technique called shotgun sequencing, by which large pieces of DNA are broken down into smaller, more manageable chunks for random sequencing and amplification, before being reconstructed, using a computer, back into a whole genome map. This reconstruction is the hardest part, think of it like doing a jigsaw in which you have millions of pieces but no reference picture.
The sheer size of the gut microbiome means this is difficult to do. It turns out that these bug genes are so numerous that they dwarf your puny human genome. To give you a sense of scale, genetically speaking you are made by 22,300 protein-coding genes. In 2010 the MetaHit group published the first gut microbial gene catalogue. They characterised 3.3 million microbial protein-coding genes. Since then, a number of other reference microbiome metagenomes have been published and the most recent estimation puts the total number of bacterial genes somewhere nearer 30 to 50 million, and it is getting bigger all the time.
So, if you hadn’t realised it by now, from a genetic standpoint at least, you are less than one per cent human.
The main benefit of Metagenomics is that it allows us to measure all the genes in a genome and get a very precise measure of “who” is there. It also gives us some idea of what potential functions these bacteria actually perform. The downside is that it is expensive to perform and complicated to analyse. A single teaspoon of stool contains enough bacterial DNA that its genetic data would need to be stored on 100,000 1Tb USB drives. Typically this is not the sort of analysis you can run on a desktop computer and you will need a kind bioinformatician to help you make sense of the information.
A second, more focused, approach just sequences short gene sequences that are highly specific to different types of bugs. For DNA from any living organism to be useful it must first unzip itself and generate lines of molecular messaging code called RNA, which are transported to cellular assembly plants known as ribosomes. Here, RNA is magically turned into proteins as one amino acid is bolted onto another. Ribosomes are amazing molecular machines made up of different sections. One of these, the, 16S subunit, is very specific to different types of bacteria. If you have a database of these short 16S ribosomal RNA sequences, you can then work out what is in the sample of bugs you are analysing without having to culture them or look at the whole genome. It’s a bit like a barcode for poo.
The relative simplicity of this method meant that, suddenly, microbiome science was accessible. Because it is not necessary to sequence every gene in a bacterial genome, it was also cheap. This was therefore the technology that launched tens of thousands of microbiome clinical association studies and which gave birth to commercial companies that now offer to sequence your microbiome. The problem is that 16s rRNA sequencing doesn’t tell you what is there with amazing accuracy (you can’t tell the difference between subspecies of bacteria) and it doesn’t tell you what these bacteria are doing.
Despite the challenges, both these techniques have helped us build a completely new map of life. So what have we learned from the human microbiome project and the MetaHIT programme? Is there a core microbiome? It depends on how you define it. In the gut, bacteria and archaea (single-celled organisms with nuclei that look like bacteria, but which are actually from a completely different branch of the tree of life) account for more than 99 per cent of the unique characterised genes and biomass. We can draw phylogenetic trees for all of the microbiomes in our body. In the gut, four bacterial phyla (the broadest possible description of bacterial families) predominate, namely Bacteroidetes, Actinobacteria, Firmicutes and Proteobacteria. Their abundances in faeces are relatively stable over time in adults but they are affected by events such as travel, sickness, and antibiotic usage.
Twin studies also show us that the gut microbiome is shared among family members, but even in identical twins there are marked variations in some bacterial lineages. Certainly there is a core microbiome at the gene level, but not at the strain level.
Bacteria do mutate, but unlike humans they do not need to mutate to evolve. They can ingeniously share genes through a process known as “horizontal gene transfer”. This is very useful if you are bacteria and suddenly exposed to a large and frightening environmental catastrophe, like an antibiotic and you want to share a survival gene. But this may also have a benefit for us. For example, Japanese Sushi-eaters can only metabolise a certain type of seaweed because of a bacterium called Bacteroides plebeius. This bug contains an enzyme that breaks down a complex sugar found in red seaweed called agarose. The bacteria is only found in individuals with Japanese ancestry and it can only perform this task because other marine bacterium have transferred the genes that code for this enzyme.
The most important finding from the last decade of research is that inter-individual (ie the difference between you and me) variation in the gut microbiome is massive, and typically overwhelms the differences driven by your genome or even a disease. The MetaHIT project found 1,150 species of bugs in the faeces of 124 people, but only ten per cent of these species were consistently shared between those individuals. In 2019, an analysis suggested that there are actually 2,058 species, so we probably have even less in common than 10 per cent. To put that in perspective, you share about 99.9 per cent of your genome with me. In other words, your faeces has more personality than your genome.
The gut microbiome does not just contain bacterial DNA. The virome (the viruses that inhabit the gut) is even larger than the gut bacteriome and is dominated by bacteriophages. A virus is a microscopic parasite, generally much smaller than a bacteria. It teeters on the definition of what we could call life, because although it contains DNA or RNA, it cannot do anything with that information. In other words it cannot reproduce without the host it is inhabiting. They get a bad rap because they generally cause outbreaks of infection, like flu (H5N1) or the norovirus which attacks the elderly on cruise liners. Phage are viruses that dominate the gut and they are like something from your worst nightmare; they are dead-eyed bacterial predators that resemble alien spacecraft. They land on the surface of bacteria and inject their viral DNA through the bacterial cell wall, which then gets spliced into the bacterial genome. When the viral genes are switched on, the bacteria will keep creating phage until the bacteria bursts like the victim of a Tarantino assassination.
As a result, viruses shape microbial communities and they are important mediators of horizontal gene transfer. Phage represent one of the biggest gaps in our understanding of the human microbiome. All of bacterial life on earth is recycled every 48 hours by phage, which regulate the ecology of the gut, and we know very little about how they influence human health. Up to 90 per cent of virome sequences do not appear on any of our databases. We also have a limited understanding of the fungi, yeast and parasite communities that exist within the gut and more importantly how they interact to maintain a healthy ecosystem. These unmapped regions of the gut microbiome are referred to as “dark matter”.
The Human Microbiome Project also demonstrated that how you sample the microbiome and how you choose to analyse it has a massive impact on the results.
Fields of science that require computational analysis of large data are known as “omics” sciences, and genomics isn’t the only thoroughbred in the stable. We have proteomics (proteins), metabonomics (metabolism), transcriptomics (gene expression) … the list goes on. Intriguingly, gene expression is relatively stable in the microbiome, although changes are found in response to some dietary interventions and drugs.
From these disciplines “systems biology” was born. ie the computational analysis and integration of these omics data sets, to study human biology as a whole. Instead of looking from the bottom up (ie at a single gene or pathway) like 19th century scientists would have done, we look from the top down to get a god’s eye view. We look at all the information we can get, and identify patterns from within it.
This has brought with it a bewildering new language of bioinformatics and it is the language through which we interpret human health in the 21st century. And one of its most important observations is that the microbiome is a very important cog in a very complicated engine. It’s not that the human genome is not important: it defines you. It predicts the colour of your eyes and your hair and your chances of getting some inherited diseases.
But over the last decade we have begun to appreciate that the gut microbiome is much more important for our health than was previously realised.
The gut is not just a fermentation engine, it is a massive chemical superhighway. Thousands of biochemical signalling pathways either originate there or transit throughout it. So how do we begin to understand the molecular language of these communities of bacteria, given this level of complexity? Well, imagine the gut microbiome is a giant orchestra. Its signalling to you comes in the form of a rich and endless symphony that manipulates your mood and maintains your health each day.
Note, at no stage have I told you what a “healthy” microbiome is. That is because I don’t know what that is. This is because the structural and functional variation is so great that at the moment it is hard to say. What is healthy for you, may not be healthy for me.
The average gut microbiome has a degree of functional redundancy. This means that even when the orchestra loses a horn section, it can play on and the melody will usually remain constant. It is able to regulate its core activities, the size and complexity of the orchestra depending on the situation of the host. It also means that in the face of adversity (illness, travel, antibiotics or surgery) it can be resilient and stabilise the ecosystem, its functions and maintain your health.
In short, you are made up of trillions of genomes from trillions of bacteria that have an infinite power to share genes. These amazing bugs can change how your brain works and sustain you in any environment. They protect you from evil pathogens and regenerate you when you are injured. If we could engineer the gut microbiome we might even give you superhuman powers.
So the Sorcerer II really did conjure some magic on its two-year voyage. The discoveries in those pots dangled from the back of the yacht showed the way to layers of interlocking, genetic complexity, not just among bacteria in the sea but within us. And the gut microbiome – of all the genomes now being mapped, measured and monitored – is the book of spells that may yet help us understand the magic tricks in all the other parts of the body. Both the ones that make us healthier and happier. And those that can destroy us.
It is not a magic wand. But it is good shit.
Photographs Getty Images