Debug Helps Build Improved Aedes aegypti Genome Assembly
Monday, November 19, 2018
We are pleased to announce the publication of a highly improved reference genome of the mosquito Aedes aegypti, details of which were reported last week in the journal Nature. Led by Dr. Leslie Vosshall from Rockefeller University, the project to build the genome of Ae. aegypti began more than 2 years ago and involved 72 different authors from 32 institutions, including four Debug researchers from Verily. Below we discuss why this genome assembly is so important, how the broader team addressed some of the research challenges, and what Verily contributed to the effort.
Why is a good genome assembly important?
Like humans and other animals, mosquito DNA is organized into long strands called chromosomes. Not all organisms have the same number of chromosomes, for example humans have twenty-three pairs of chromosomes, while Ae. aegypti has only three. Understanding the order and sequence of DNA along the different chromosomes enables scientists to answer a wide range of questions about the biology of an organism, which in the case of the mosquito, can help us control the spread of diseases they carry.
Indeed, this new assembly of the Ae. aegypti genome has already helped researchers better understand fundamental aspects of mosquito biology including how mosquitoes find blood meals using chemosensory receptors that guide them to their human hosts, why some mosquito populations are better at transmitting disease such as dengue than others, and the molecular basis of how mosquito sex is determined.
What were some challenges?
High-quality genome assemblies have been available for insects related to Aedes, such as the fruit fly (Drosophila melanogaster) and the malaria mosquito (Anopheles gambiae), for more than a decade. Despite having roughly the same number of genes as these other species, the genome of Ae. aegypti has more than five times as much DNA! So what is all that other DNA? It is mostly repeated regions of DNA sequence known as repetitive elements, which account for more than two-thirds of the Aedes genome (Figure 1). With so much repetitive DNA, it was challenging to determine the correct order of DNA along the chromosomes, and to correctly annotate different parts of genes (exons) as they are separated by vast stretches of repeats. A previous genome assembly contained over 36,204 DNA fragments for just three chromosomes! To overcome these challenges, our team of collaborators utilized a variety of cutting-edge technologies, including a number of long-read sequencing platforms, to correctly map the genome of Ae. aegypti.
Figure 1. As shown in the pie chart above, a majority of the Aedes aegypti genome is composed of DNA repeats as opposed to unique DNA that codes for genes.
Our contribution
After a genome is correctly assembled, the next step is identifying and annotating the genes encoded. Verily produced data that helped identify the locations and sequences of genes. We did this through a process called iso-seq, which unlike many commonly employed short read technologies, allows researchers to sequence genes in their entirety. The long-read sequencing platform we used, developed by Pacific Biosciences, can achieve reads as long as 20,000 DNA base-pairs, over 60 times longer than short read techniques! Our full length transcript data helped to ensure that when scientists have a gene of interest, they will find the correct sequence in the genome.
Verily scientists also generated data highlighting the utility of the new genome. To do this, we performed whole genome sequencing on four Ae. aegypti colonies collected from different parts of the world. The sequencing data provided insight into regions of the genome that are especially diverse or conserved across the globe, an inference that would not have been possible without the new genome assembly.
Figure 2. Results of whole genome sequencing performed by the Debug team. On the left, diversity (e.g. nucleotide variation) of Aedes aegypti is visualized along the three chromosomes of the mosquito. A pronounced decrease in diversity occurs around the centromeres (center) of each chromosome. Summary statistics for each sequenced colony are given on the right.
How will the new genome assembly benefit the field going forward?
Debug is also leading an effort to sequence the genomes of 1000 wild Ae. aegypti from > 40 populations around the globe. This collaborative effort with members of the mosquito research community, will enable the Debug team to understand genetic factors that could impede efforts for mosquito control on the ground. With the improved genome assembly, we will gain insights into when this mosquito migrated out of Africa, how it has been able to adapt to so many different environments around the world, and determine the genetic flow between different populations. Such data will be critical as we scale our Wolbachia-based SIT program and bring Debug to new locations.
Sara Mitchell, PhD, Senior Scientist, Verily and Brad White, PhD, Debug Lead Scientist, Verily
Why is a good genome assembly important?
Like humans and other animals, mosquito DNA is organized into long strands called chromosomes. Not all organisms have the same number of chromosomes, for example humans have twenty-three pairs of chromosomes, while Ae. aegypti has only three. Understanding the order and sequence of DNA along the different chromosomes enables scientists to answer a wide range of questions about the biology of an organism, which in the case of the mosquito, can help us control the spread of diseases they carry.
Indeed, this new assembly of the Ae. aegypti genome has already helped researchers better understand fundamental aspects of mosquito biology including how mosquitoes find blood meals using chemosensory receptors that guide them to their human hosts, why some mosquito populations are better at transmitting disease such as dengue than others, and the molecular basis of how mosquito sex is determined.
What were some challenges?
High-quality genome assemblies have been available for insects related to Aedes, such as the fruit fly (Drosophila melanogaster) and the malaria mosquito (Anopheles gambiae), for more than a decade. Despite having roughly the same number of genes as these other species, the genome of Ae. aegypti has more than five times as much DNA! So what is all that other DNA? It is mostly repeated regions of DNA sequence known as repetitive elements, which account for more than two-thirds of the Aedes genome (Figure 1). With so much repetitive DNA, it was challenging to determine the correct order of DNA along the chromosomes, and to correctly annotate different parts of genes (exons) as they are separated by vast stretches of repeats. A previous genome assembly contained over 36,204 DNA fragments for just three chromosomes! To overcome these challenges, our team of collaborators utilized a variety of cutting-edge technologies, including a number of long-read sequencing platforms, to correctly map the genome of Ae. aegypti.
Figure 1. As shown in the pie chart above, a majority of the Aedes aegypti genome is composed of DNA repeats as opposed to unique DNA that codes for genes.
Our contribution
After a genome is correctly assembled, the next step is identifying and annotating the genes encoded. Verily produced data that helped identify the locations and sequences of genes. We did this through a process called iso-seq, which unlike many commonly employed short read technologies, allows researchers to sequence genes in their entirety. The long-read sequencing platform we used, developed by Pacific Biosciences, can achieve reads as long as 20,000 DNA base-pairs, over 60 times longer than short read techniques! Our full length transcript data helped to ensure that when scientists have a gene of interest, they will find the correct sequence in the genome.
Verily scientists also generated data highlighting the utility of the new genome. To do this, we performed whole genome sequencing on four Ae. aegypti colonies collected from different parts of the world. The sequencing data provided insight into regions of the genome that are especially diverse or conserved across the globe, an inference that would not have been possible without the new genome assembly.
Figure 2. Results of whole genome sequencing performed by the Debug team. On the left, diversity (e.g. nucleotide variation) of Aedes aegypti is visualized along the three chromosomes of the mosquito. A pronounced decrease in diversity occurs around the centromeres (center) of each chromosome. Summary statistics for each sequenced colony are given on the right.
How will the new genome assembly benefit the field going forward?
Debug is also leading an effort to sequence the genomes of 1000 wild Ae. aegypti from > 40 populations around the globe. This collaborative effort with members of the mosquito research community, will enable the Debug team to understand genetic factors that could impede efforts for mosquito control on the ground. With the improved genome assembly, we will gain insights into when this mosquito migrated out of Africa, how it has been able to adapt to so many different environments around the world, and determine the genetic flow between different populations. Such data will be critical as we scale our Wolbachia-based SIT program and bring Debug to new locations.
Sara Mitchell, PhD, Senior Scientist, Verily and Brad White, PhD, Debug Lead Scientist, Verily