September 6th, 2006


You will fall before my Screaming Eagle Fist technique

We had our first "real" project in the C++ class today, which was actually surprisingly cool considering how simplistic a first project needs to be. We had to write a small program to compare the contents of two chromosome sites, nucleotide by nucleotide, and give an overall "genetic distance" between those sites, based on the chart on the right.

Dr. Tang said we all ought to be able to get it done in less than 20 lines of code, but I'm pretty sure he was smoking crack. I mean don't get me wrong, I got it done in 14 - but only by noticing the fact that, if you treat each character as a byte-sized integer, adding any given nucleotide combination will give you a unique result: that is to say, A+G is not equal to A+A or A+C or A+T or C+G or C+T or G+T. Once you realize that, you can do the individual nucleotide comparisons with a single line of code: increment the value of a counter by the Boolean value of one lazy OR comparison plus the Boolean value of another lazy OR comparison multiplied by 5.
distance += ((n1+n2)==132 || (n1+n2)==138 || (n1+n2)==149 || (n1+n2)==155))*5 + ((n1+n2)==136 || (n1+n2)==151);
I seriously doubt he was expecting all - if any - of his students to take that approach. Regardless, I was pleased as hell with it, even though this is such a simple problem: after all, it's a very simple problem that's likely to need resolving on very large datasets, and my additive method is a pretty damn efficient way to do it.

It is sooooo nice to finally be taking a class that teaches a language I actually want to learn. =)