One of my current projects in the Matthew Hahn Lab is to investigate the effectiveness of a few different full-genome alignment methods. My mentor and I have been studying a new program called progressiveCactus, and comparing its output to other alignment methods. By comparing the number of indels (that is, insertions and deletions) between different species, we can compare the effectiveness of different genome-alignment methods. But my work has mostly been spent struggling to figure out how to get programs to run, and deciding the best way to parse output files.
How does progressiveCactus work, you ask? When I tried to answer that question the moment I began working in the Hahn Lab, I couldn’t figure a thing out. After gaining much more experience in bioinformatics and analysis of complex systems, though, it has made more sense to me.
|Circular genome plot|
In order to allow multiple genomes to align to one another in any possible way, we can arrange them in a circular pattern, as shown above. This lets us create threads of different colors, in which each color represents a different sequence. The ends of the boxes (A1 and A4) are the telomeres, as in, the ends of the chromosomes. It’s easy to find reverse complements, similarities, and other neat features. When we combine all of the different circular genome plots, this way, we can create “cactus” graphs.
|Pictured: a cactus|
From these chains, we can create entire networks upon networks to give us full-aligned genomes.
progressiveMauve has been shown to be very quick and effective with a small number of different genomes, and it has a very attractive GUI, as well. We’re focusing on the output from this program to compare to that of progressiveCactus.