Heuristic

The Uncertainty of Stochastic Models and Human Mortality

Stochastic models help us predict events that deal with uncertainty. We can use them to do cool things like predicting the levels of noise in gene expression [1]. The randomness of genetic mutation, epigenetic factors, and other biological mechanisms that influence genetic expression isn’t something that we look at as some sort of black box that we can never know. Not only is it a truly remarkable demonstration of concepts that are inherent to theoretical physics in the messy world of biology, but I loved how these types of models incorporate the epigenetic factors that we have previously deemed “unpredictable” on the gene expression scale.

There are some thing we can’t really know with much certainty, though. Death is one of them.

My grandparents are the coolest people I know. Growing up in a household with my parents and grandparents is like living in a time capsule. But the message I get from them is not as clear as you might think. Both my grandpa and my grandma are about the same age, but, if you had met both of them, you would have never guessed they shared similar life stories. When you visit my house, you can find my grandpa remains in his room as he watches TV and reads book for most of his time. But, while you’re in my house, you might not expect to meet my grandma because she’s always spending time with the neighbors, working on the garden, or swimming. (Yes, swimming. My grandma swims. Usually, for four hours a day.) To me, it always seemed like my grandpa accepted his poor health as a harbinger of the end of his life while my grandma wanted to punch death in the face. I always admired both of them.

My understanding of the world is not only shaped by what my parents have to offer, but what my grandparents have to, as well, and I’ve always had a tremendous amount of respect for the elderly. Aside from the unique experience and wisdom that comes from their long, meaningful lives, the contrast between the way my grandpa and my grandma view their roles in life raised questions about how we should address issues in the elderly care. Particularly, end-of-life issues such as predicting the risks of certain treatments for fatal disease and judging the quality of life for those patients who undergo such treatment methods were reflected in my own home.

After reading Atul Gawande’s new book, “Being Mortal.”, it has become more and more apparent to my me the extent to which we need to re-evaluate the way we care for the elderly and address these end-of-life issues. (Dr. Gawande was actually one of the people who inspired me to take an interest in medical ethics.) From my own perspective of living with my grandparents, the idea of sending the elderly off to nursery homes and foster care for senior citizens has always been completely foreign and horrendous to me. Both points of view have their own benefits that we should try to embrace, and these types of living facilities have been becoming more common in most parts of the world. The markedly different cultures between my generation and the generation before have helped me realize that, as human beings, we can all view death not as something that should be avoided without regard to our own lifestyle otherwise. I hope that, in the future, I can tackle these problems in an ever-changing world to make the world safe for the elderly. (But hopefully they won’t be the same problems that I will face when I approach the end of my life.)

[1] Raser JM, O’Shea EK (2004) Control of stochasticity in eukaryotic gene expression. Science 304: 1811–1814 http://www.sciencemag.org/content/304/5678/1811.full

November 30, 2014

Medicine, Science
Float like a butterfly; sting like a bee.

Your hands can’t hit what your eyes can’t see.

For this reason, we have to be careful when feeding numbers into our computer. Check out what happens when you ask a basic math question to Python:

Pictured: the folly of man

Huh, that’s strange. Where did all those zeros come from? It turns out that machines use binary to represent integers. This means that, for the number .13, instead of summing 0/1 + 1/10 + 3/100, the computer must use 0/2 + 0/4 + 1/8 + … whatever else comes after that in binary code. (Python calls this a float.) Make sure you keep this in mind when working with mathematically intensive projects.

Pi is an oddity. It’s never-ending just like our efforts to calculate it. At the same time, it’s always nice to appreciate how there are so many different ways to calculate pi. For example, you could drop needles. And, sure, we can always measure the ratio of the circumference of a circle to its diameter, but how do we tell a computer, a fundamentally deterministic and causal entity, how to calculate pi? Let’s ask two different programming languages and see what they have to say.

Python:

pi=4*np.arctan2(1,1)

Fortran:

double precision pi
pi=4.d0*datan(1.d0)

(Notice the “d” in the Fortran statement. That tells us the double precision, or, to what power of ten we multiple our number.)

It appears Python and Fortran are in consensus about this one. If you use a bit of intuition when reading the two statements, you might be able to tell that they are both defining pi as four times the angle created from arctan 1. But what value of pi do they both actually give us?

Python: 3.1415926535897931
Fortran: 3.1415926535897931

According to Value of Pi, pi is 3.1415926535897932384…with several different computational techniques. Looks like it’s a tie for this round.

I’ve found that working with two different languages on the same project forces you to really understand the syntax and meaning behind each of the languages. With my example of Fortran and Python, We can really see the difference between the two.

October 22, 2014

Science
Promoting the discussion of Ethics

The time is 6:30 am. I’m outside my residence hall, having completed my morning run. I look around me and see faces and lights begin to appear. People in cars and buses slowly move into the empty streets. The chirps of birds and songs of the bugs break the desolate wasteland of the hour before. The dark void of isolation is warm as always. The sun will shine, and the world is mine. In the morning, I tell world that I’ll be there when you wake up. I own the world in everything I do during the day. Time and tide wait for no man because time had better catch up with me.

As I began my years at Indiana University-Bloomington in the fall of 2013, the spurious curtain of the undergraduate’s desire for achievement and professionalism soon fell before my eyes. I found an atmosphere in which we were told to seek leadership positions and involvement. To the contrary, I stood resolute in my conviction that one does not become a leader before making a difference. I believed that one makes a difference and, through that process, becomes a leader. I knew that, in order to be successful, I had to create my own meaning and not receive it from somewhere else. On top of that, I noticed that neurotic and obsessive pre-medical students weren’t realizing the real beauty of an undergraduate education that comes from the curious pursuit of knowledge. I wanted to create an environment that would give students, especially pre-medical students, this opportunity.

Sometime in the middle of my Introduction to Philosophy class freshman year, I had the idea to create a Medical Ethics Discussion Circle. We would meet as groups to read articles with relevant ethical issues in bioethics, sociopolitical ethics, and other areas that would relate to medicine and health care. This would prepare students for careers in health care, but, more importantly, give them a greater, curious understanding of the world to help them cultivate their own passions. I wanted to create an atmosphere in which students could offer ideas, criticism, discussions, and arguments. On top of that, I plan on inviting other speakers to lead discussions and provide their own perspectives on ethical issues in medicine if we have enough support. (Perhaps it was also fueled by my skeptical nature that lead me to arrive at the conclusions I stated in the previous paragraph.)

When I pitched the idea to MAPS (Minority Association of Pre-medical Students), I was met with with immediate praise, and, with a bit of planning and recruiting, the Medical Ethics Discussion Circle is now under way. I’ve heard a lot of positive feedback already, even thought we’ve had only one meeting so far. I hope that my efforts to promote a love of humanistic causes will help others pursue their goals in life.

The path to making a difference can be lonely and tortuous. But, every now and then I stumble upon moments of satisfaction and glory. I believe that the undergraduate education is a journey, and I hope you enjoy how I share it with you.

October 12, 2014

Philosophy
Programming for Particle Physics – Monte Carlo simulations and Markov Chains

Call me Ishmael.Some years ago – never mind how long precisely – having little or no money in my purse, and nothing particular to interest me on shore, I thought I would sail about a little and see the watery part of the world.

I’ll tell you a story from my ongoing adventure in physics.

Though I do a lot of bioinformatics research, I’ve always secretly loved physics more than any other subject. (Maybe it’s not much a secret since I am a physics major.) I never really saw a reason why I couldn’t do research in both fields while I was an undergraduate, so I decided to just follow wherever my heart leads me in whatever field I enjoy the most. That’s why I do research in both fields. Recently, I’ve joined a project at a computational physics lab here at IU in the Fox Lab. As exited as I am to get into this research, it requires a very deep level of understanding of mathematics and abstract physics concepts, unlike my research in Biology.

From my weeklong excursion to Canada last summer. (Kitchener uptown)

Imagine that, one fine, beautiful night, around 1 or 2 in the morning, you’re walking home from a party. Unfortunately, you had a bit of too much fun, so you struggle down the street and stumble to see the road in front of you. Instead of doing the right thing and calling for a friend or trusted adult to help, you decide to latch yourself onto a streetlamp until you start really feeling it again. Now, as you stand by this street lamp, every now and then, you put together your willpower and courage in order to let go and start walking again. But, you find yourself meandering aimlessly without any control whatsoever, and occasionally, end up hitting your head in the street lamp.

The moral of the story is to study mathematics. Especially if you want to be a physicist.

This drunken behavior is a random walk. And it’s similar to the basis for the Markov Chain, which constructs the probability of future events that are independent of the probabilities of the past events, but dependent on the present conditions. It basically accounts for randomness in different sciences and processes such as Brownian motion.

Brownian Motion (Source)

A while ago I made a blog post about Comparing Genome Alignments in which I touched on a few models for creating alignments form given genome inputs. What this algorithm uses is a Markov Model which creates nodes that string together to form full-fledged genomes. Markov Models work by assuming the Markov property that I just described, and the Genome Alignment algorithm uses it to, well, align genomes of different species. But why limit yourself to Biology? Why not extend it to particle physics?

And what could be more exciting than the world of particle physics? Making Monte Carlo simulations to study the decay and interactions of subatomic particles, of course!

We can probabilistically determine the energies of momenta of the particles in the following reaction:

γ + p –> π+ + π- + πo + p

This is how a photon and proton collide in order to form three pions (plus, minus, and naught), along with a proton. When this collision happens, we want to know how the energy and momenta of the particles change. We can do a Monte Carlo simulation (or a computational test that gives us probabilities of different results) of this Markov process to get the probabilities of various outcomes of this interaction.We run thousands of trials with different collisions and take a look at the values of the particles throughout the collision. This means that, according to our Markov property, the process is like a bunch of drunk people running into each other on the street.

I like to think of the randomness of particles physics almost like the tumultuous waves of the ocean that rock back and forth. A single event doesn’t tell much, but, together, they show beautiful patterns about the world we live in.

From my stay at Cornell University last summer

I do not know what I may appear to the world, but to myself I seem to have been only like a boy playing on the sea-shore, and diverting myself in now and then finding a smoother pebble or a prettier shell than ordinary, whilst the great ocean of truth lay all undiscovered before me. – Sir Isaac Newton

October 1, 2014

Science
The Importance of Hashing

People who are just beginning to code make a lot of mistakes and do a lot of stupid things. I once used to struggle with parsing thorough every line of a file before I learned that it would be much easier to use split-functions and similar lists. One common mistake people make is that they need to store large amounts of data and parse through them every time they need to access that information. But, with a bit more expertise, those newbies can throw away those “for” loops and sort-methods. There’s a new kid in town. And his name is “Hashing.”

for i in range(len(s)):

c+=1

if c in indices:

line=int(indices_start.index(c))

for i in s[i:indices_end[line]+1]:

s2+=“N”

The problem is that this takes forever and a half to run. But, if we just replaced the two lists with a single hash (or dictionary) which we will name indices, then the program runs like clockwork.

for i in range(len(s)):

c+=1

if c in indices:

line=int(indices[c])

for i in s[i:line+1]:

s2+=“N”

Just goes to show how much of a difference code optimization can make.

END

(Excuse me for that last line. My Fortran is leaking.)

September 24, 2014

Science

Longest Common Subsequence and NumPy

Perl may be crafty and efficient like a ninja, Ruby may be written like a prose or work of fiction, but, for most purposes, Python, with its simplicity and elegance, is usually my weapon of choice when it comes to programming languages. (To be frank, as long as it’s not some cryptic code like Fortran that should probably be waiting for the rain to wash it away, it floats my boat.) With my knack for mathematics, I had been reconstructing various equations and theorems from scratch in most of my scripts. Recently, I’ve begun to embrace NumPy to give me more functionality for purposes like matrices and arrays, but also that I can do all the things my MATLAB friends do without too much effort to learn extra languages.

(In fact, while we’re at it, let’s just put everything in Python! Python-Excel, Python-sql, import everything!)

Back in my pancake post, I talked about how you can use a simple two-step algorithm for sorting out a string of numbers. In this post, however, I’m going to talk about sorting through two different string of letters to find their longest common subsequence. Unlike the challenge of the longest common substring, the subsequence need not consist solely of letters that are adjacent to one another, but can contain letters separated. So, this means that the longest common subsequence between “AACTTG” and “ACTGG” would be not be “ACT”, but “ACTG”.

I found this problem interesting because it gave me the chance to flex my NumPy muscles and look for more “indirect” ways of solving a problem rather than using a brute-force Ctrl+F-esque approach that wipes away your entire RAM when you search through strings longer than 30 characters.

Fret not! For we can construct a matrix of some sort to help us with this issue. In bioinformatics, we can solve this problem using a scoring matrix. By taking advantage of the set of possible DNA bases {A, C, T, G}:

Simply place your two DNA strings on the axes and move each number in the grid from the top-left to bottom-right . If there is a match in the base between the two strings at a certain location, add one to the number. Then follow your path form highest to lowest value. (source)

This is known as the Traceback approach, and we can optimize it further with hashes for lengths and in other ways.

I wrote a solution from this method (drawing heavily upon other sources) to solve this problem here, although I’m still fixing up some issues in it from converting between different syntaxes and formats.

September 24, 2014

Science

Comparing genome alignment methods

One of my current projects in the Matthew Hahn Lab is to investigate the effectiveness of a few different full-genome alignment methods. My mentor and I have been studying a new program called progressiveCactus, and comparing its output to other alignment methods. By comparing the number of indels (that is, insertions and deletions) between different species, we can compare the effectiveness of different genome-alignment methods. But my work has mostly been spent struggling to figure out how to get programs to run, and deciding the best way to parse output files.

How does progressiveCactus work, you ask? When I tried to answer that question the moment I began working in the Hahn Lab, I couldn’t figure a thing out. After gaining much more experience in bioinformatics and analysis of complex systems, though, it has made more sense to me.

Circular genome plot

In order to allow multiple genomes to align to one another in any possible way, we can arrange them in a circular pattern, as shown above. This lets us create threads of different colors, in which each color represents a different sequence. The ends of the boxes (A1 and A4) are the telomeres, as in, the ends of the chromosomes. It’s easy to find reverse complements, similarities, and other neat features. When we combine all of the different circular genome plots, this way, we can create “cactus” graphs.

Pictured: a cactus

From these chains, we can create entire networks upon networks to give us full-aligned genomes. progressiveMauve has been shown to be very quick and effective with a small number of different genomes, and it has a very attractive GUI, as well. We’re focusing on the output from this program to compare to that of progressiveCactus.

Ever since I finished my work at Cornell, I’ve been much more confident and focused in my research at IU. I look forward to continuously keep moving onto bigger and better things in research and elsewhere.

September 11, 2014

Science
Genetic Inversions, Bill Gates, and Pancakes

Imagine that you are a waiter running back and forth in your breakfast restaurant. Your life is constantly moving between the kitchen and the seating area in your usual “flow”. Most days you have to work very hard to make ends meet, so you don’t have time to sit back and smell the roses or rose the smells. It’s a shame that your work prevents you from studying the world around you through mathematics and algorithms. Every now and then, a guest orders a stack of pancakes, but, when the cook hands you the plate of pancakes, you’re a bit disappointed because the pancakes aren’t stacked by size.

what is this madness

What self-respecting philopancakist would tolerate such blasphemy? The proper pancake stack must place the largest pancake on bottom, the smallest pancake on top, and fill the space with the pancakes in ascending order. This is the only way you can pour syrup on it so that the syrup touches each pancake. It should be the responsibility of the cook to flip the pancakes in such a way that lines them up from largest on bottom to smallest to top with everything making sense in between.

This begs the question, if someone gives you a randomly assorted stack of pancakes, how do you sort them through flipping them? Namely, what’s the most efficient way for us to look at a set of random numbers and sort them from least to greatest (or vice versa) by reversing different segments of those numbers? This is the Pancake-Sort problem, and the number of flips is known as the reversal distance.

A man named Bill Gates proposed a solution to the Pancake-Sort problem. You can read it here.

What makes this problem more interesting is that it has application in biology in the study of genetic inversions. DNA bases experience a type of mutation known as inversions in which segments of bases are reversed. This can occur with a small segments of genes or multiple genes.

We want to know how many inversions that a certain gene or region of the genome has undergone because that tells us how old the DNA is or how much it has evolved. Two species that share a large reversal distance may have evolved farther apart than two that share similar reversal distances.

Perhaps it is ironic to mention that mother nature’s love for making molecular biological interactions actually makes this problem much easier to solve. In biology, we don’t have more than four different DNA bases, and our bases are actually aligned in such a way that there is a “forward” and “reverse” direction to each string. This means that each base must be aligned in the forward or reverse direction in order for that string to function properly. Taking these into account will make the problem simpler because we can restrict ourselves to aligning the DNA strings so that these conditions are satisfied.

When I first approached a simple version of this problem, I wrote a solution that would take the input string that needs to be sorted and judge potential inversions by their hamming distance from the desired end sequence. By constantly following the potential inversion that had the lowest Hamming distance, we would hope to find the end result. (Hamming Distance is the number of bases between two strings that do not match when the two are aligned. So “AAAG” and “AAAA” would have a hamming distance of 1 since they differ by one base.) Basically, this approach would try to find the shortest way to get from the beginning to the end by seeing which inversion would match the end result the most, and repeating this process until the end result is reached. But, even intuitively, this approach would not necessarily find the end result in all scenarios. It may end up creating loops and traversing through inversions that would have low hamming distances but not move in the most optimal path from the first string to the end. (You can see some of my solution here.)

September 4, 2014

Science
Helping other students with Undergraduate Research Awards and Opportunities

My university recently featured me on their webpage for their new Office of Competitive Awards and Research for my recent REU at Cornell University. It’s definitely exciting to get press coverage. And it looks like REU’s are definitely the gift that keeps on giving.

This new office at Indiana University is actually keeping in touch with me to promote research opportunities for other students at my university.

To give a brief background, when I entered Indiana University, I was so obsessed with science that I was almost desperate to join a research lab. After emailing around a few professors, I was offered spot in the Matthew Hahn Lab to study Bioinformatics. Soon enough, I helped a few of my friends get into labs, too, by giving them advice and instructions about how I did it. Later, during my freshman year, I was accepted to a full-time summer internship at Cornell University that paid for transportation, housing, food, and a gave a $5000 stipend. From all of these experiences, I’ve compiled my advice and instructions into a guide from the beginning to the end.

If you’re a college student reading this blog and looking for research opportunities at your university or advice and tips about undergraduate research or you just want to read more about my experience, check out my “Scientific Research Guide” under “My Work” over there on the right.

As for now, I’m currently working on a new approach to my project in my lab.

September 1, 2014

Science
“Memorizing a Deck of Cards in a Minute”or “Why You Have an Amazing Untapped Memory”

I like hobbies. Hobbies are fun. Especially when they’re challenging.

during the summer of 2013, a young boy sat on the floor of his living room, staring into the empty void that is the internet. With three empty months in his hands, unburdened by school, limited by nothing but his imagination, liberated by the warm blessing of the summer, you’d find it disturbingly ironic that he’s occupied and entertained for hours on end by looking at image macros and reading about the late Steve Jobs. But little did he know that adventure was knocking at his doorstep.

Okay, okay, enough with the fiction novel. Last summer, while I was looking for hobbies and fun things to learn, I stumbled upon this video of a world-famous memory superstar named Dominic O’Brien. Despite his old age, he was able to memorize a deck of cards in a minute using a technique called memory palaces (or Method of Loci). This technique could also be used to memorize grocery lists, names of people, or anything about anything. I had always wanted to become exceptionally good at something that most others weren’t. Fascinated by the challenge, I decided to take it up myself.

If you’re not familiar with how memory palaces, work, here’s how it works. We’ll use the deck of cards as an example. For each item you want to memorize, choose a visual image. That means, for each card in a standard deck, you choose a familiar, memorable, easy-to-imagine, visual image to associate with that card. Choose things that you can make into a story. It’s better to choose people or characters for these images since you can give them “life” by crafting them into a story. For example, if you are a Beatles fan:

Ace of Diamonds – John
Ace of Hearts – Paul
Ace of Spades – George
Ace of Clubs – Ringo

Be creative and make each card distinct from one another. (For some people, the Beatles might not be the best choice unless they can distinguish each member clearly). After each card has an assigned “character,” you’re ready for the next step. Also, it may help for you to associate a distinct and unique “object” for each “character.” (For example John – Peace symbol/Guitar, Paul – Bass, George – Guitar/Sitaar, Ringo – Drums.)

To actually memorize the deck of cards, go through each card of the deck one by one and, as you see the card, visualize the “object” in your mind. Choose a familiar setting (for example, your daily commute, walking through your house, etc.) As you “walk” through that setting, visualize each “object” from the cards of the deck. Make a story and see each image. I suggest you start with a few cards at first, then gradually increase the number until you can walk through an entire deck of cards. Make a “chain” of ideas that can let you move from one object to the next. Don’t stop, and don’t hesitate. Also, don’t go too fast or the story won’t “stick” with you. Trust yourself that you’ll be able to remember the story. Try not to cycle or “revisit” the same place in your story twice, or that may confuse you in the end. Also, don’t make it too “logical.” Make it bizarre or fun. That way, it’s much more memorable and easy.

I’ve bolded the characters and objects in this small example:

As I shut the door to my dorm room, I see John Lennon in front of me, strumming his guitar. He is confronted by Giorgio Armani, wearing the finest of suits. I turn left and see Sam Eagle sitting on the floor and waving an America flag while listening to a lecture by Albert Einstein, writing on the chalkboard on the opposite side of the hall. I walk down the hall towards the exit door and see Che Guevera standing outside, sporting his signature bandana, and having a conversation with Ellen Degeneres who sits on her couch.

After your story is over, you will find that you can easily remember the story from the beginning to the end. You’ll be able to recall each image and the associated card that goes with it.

Anyways, when I began my memory journey, I made my mnemonic dictionary. I began with only 5-10 cards at once and increased the number until I was able to memorize a whole deck of cards. From there, I began to impress my friends at parties and even use it to memorize a few concepts in my classes. In about 4-6 weeks, however, I could memorize a shuffled deck of cards in a minute.

Human memory isn’t as important as it once was. Before the invention of paper and spread of modern technology, our ancestors had to rely on their own memories for preserving history, geography, science, and other information. For this reason, people must have had to had amazing memories. Sure, ancient civilizations such at the Egyptians and Mesopotamians had systems of writing such as hieroglyphics and cuneiform, respectively, but human beings will always create more information and require more efficient means of being able to store and retrieve that information. Scholars must have had to commit a large amount of information to memory.

Nowadays, that information can be stored in computers, books, and just about anything you can think of. But, the people who are alive today, the descendants of the ones of those ancient civilizations, still retain much of this amazing memory. All people have this amazing memory ability, but, because we don’t need to use it anymore, we’ve grown to distrust ourselves in remembering things. Nowadays, we forget peoples’ names after a few minutes of meeting them. We forget where we place our keys. We forget what we learn in class a few hours ago. Obviously I’m not advocating that we should abandon daily reminders or throw out all forms of writing, but we should take advantage of our own potential and never underestimate ourselves.

Personally, however, I grew a lot more confident in my ability to learn and retain information. I hope to develop more as a scholar and a human on this amazing adventure through memory.

Memrise is a helpful website that can teach you this method, too.

Learn more about how I used memory techniques in my Organic Chemistry class freshman year.

August 23, 2014

Education