What is Bioinformatics?

One scientist might call it “intersection of biology, computer science, and sometimes statistics.” Another may say it is the use of “computational methods for comparative analysis of genome data.” For most people, it’s just a bunch of compiling errors and pull requests. To most people, “bioinformatics” is so new and obscure that there isn’t even a standardized or popularly-used abbreviation for it. (“Bioinfo”? “Bioinf” “BI”?) Fortunately this only means that you may exhaust of air when speaking about your discipline or run out of characters when tweeting about your genome-wide association results #JustBioinfoThings. But, before I dive into what bioinformatics is, let’s understand a bit about the history of computer science and how biologists came to need them.

When English mathematician Charles Babbage had finished performing research on the “Difference Engine”, a set of metal gears that computed differences between shifts and to perform basic mathematical operations in the mid-19th century, he only had in mind calculating answers to simple problems at a level just a bit above a modern four-function calculator. But, when he appointed the young Ada Lovelace (daughter of Lord Byron) to work on created an “Analytical Engine” (the precursor to our computer), she helped Dr. Babbage realized greater dreams. Through her metaphysical interpretations of the nature of computer technology as well as her innovative innovative skills in mathematics and science, Lovelace, the first computer scientist, helped provide a foundation for algorithm design, number theory, logic, and other tenets of computer science. From her work, we would later create subroutines, loops, and conditions for writing programs that computers could read. But, more importantly, her metaphysical writing would pave the road for the discovery of greater purpose in computers.

Of cloudless compilers and starry stacks

Lovelace has written that the science of operations derived from mathematics is a science in itself with its own “abstract truth.” By “operation” she is referring to anything you can do in mathematics (addition, multiplication, exponentiation, etc.) Under the influence of Lovelace, studying science through computers emerged as a unique discipline that distinguished itself from the early 19th-century doldrums of machines solely used for mathematical calculations. One might like to think that she picked up poetic beauty from her father as she wrote, “We may say most aptly that the Analytical Engine weaves algebraic patterns just as the Jacquard loom weaves leaves and flowers.” The study of the world around us shifted towards a pursuit of relationships and patterns in all facets of what we could observe when Lovelace commented that “It may be desirable to explain that, by the word operation, we mean any process which alters mutual relations of two or more things…. [this] would include all subjects in the universe.”

Pictured: Jacquard Loom of the 18th century, “The holes punched in different sequence instructs the machine to weave out specific patterns and designs. It will later on be used in computer programming to instruct the early computers on performing certain tasks.” 

Did Lovelace’s notes on the Analytical Engine predict the formation of bioinformatics? After the 1950’s discoveries of the structure of DNA and the nucleotide-amino acid-protein structure, and the genetic code in which those biological structures were written, the dreams of biologists became data-driven, marked by searches for information-based tools and techniques to manage this newfound information. Scientists needed a better way to access, deposit, and play around with the multitudes of data, numbers, and theory of the natural world. Chemist Margaret Oakley Dayhoff, the “mother and father of bioinformatics”, created books of protein-sequencing information (on the structure of proteins) in the 1960’s. By then, Frederick Sanger had completed the first protein sequencing (of insulin) and discovered its underlying amino acid structure. Biomedical scientist Elvin A. Kabat wrote volumes of antibody sequences in the following decade, and, in 1981, the European Molecular Biology Institute initiated the first computer database of nucleotide sequence data in the world. Overall, these advances in what we now call “computational sciences” paved the way for the 1970 birth of the term “bioinformatics” to describe “the study of informatic processes in biotic systems.”

Though the coined term “bioinformatics” was originally asserted without the advent of computers, it was the revolution of the computer that drove the success and popularity of the field. Talk about modeling, data mining, and simulation of the genetic expression and mutation became hip among computer scientists and biologists alike. Could the advent of bioinformatics have been possible without computers? While it is true that the foundations of bioinformatics were created long before the sweet grace of technology, the field would hardly be where it is nowadays without the PC. When biology transformed from peapod-counting to phylogenetic modeling, we saw a beauty that we continue to illuminate. And the ideologies upon which computer science created prevail in the convoluted, serendipitous turnings of bioinformatics.

Phylogenetic trees allow us to visualize evolutionary relationships as they stem from common ancestors.

If we observe the surface of bioinformatics (when we look at all the things that “bioinformaticians” do), bioinformatics could probably best be described as “the study of biology through computers.” But this assertion doesn’t do adequate justice to the value of the discipline and leaves the non-scientist confuzzled as to how the science is carried out. Computers might be a possible medium for conducting work and might have even been necessary for the birth of bioinformatics itself, but the heart and soul of the field lie in the way the study is conducted, not in the instruments used. To make matters more confusing, nowadays, the goals and features of the intersections of biology and computer science have become so diverse that we’ve heard separate names given to “computational biology” (the development of computational methods to study biology) and “biological computation” (the development biological techniques for computational problems) to further distance “bioinformatics” into “making biological data understandable”, but, obviously, the aforementioned fields overlap much more than necessary.

More recently, though, bioinformatics has moved towards elucidating the deeper, more nuanced meaning behind biological terms that we had previously taken for granted. We now see “evolution” in terms of different levels of organization at different size levels. These different “levels” have distinct ways of grouping and classification into their own parts of the whole. In biology, we have developed models that view evolution in terms of strength, evolvability, neutrality, and several other factors. From the most basic mutations in an individual’s DNA to differences in survivability of individual organisms in a species, these discoveries have given a deeper meaning to biology the same way Lovelace’s contributions heightened the limits of the computer.

And though the meaning of “bioinformatics” might just be an umbrella term for several different ways of performing research relevant to biology and computer science, there is no doubt that we are returning to our original definition to a study of informatics processes. But, as Lovelace emphasized, computers themselves don’t give thought to the truth of the relationships between natural processes. It is up to us to continue to wonder where science will take us.

Ada! Wilt thou by affection’s law,
My mind from the darken’d past withdraw?
Teach me to live in the future day …
-Lord Byron

Published by

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: