So you want to be a computational biologist?

Nice review that I found in Nature:

A very important point:

You’re a scientist, not a programmer

The perfect is the enemy of the good. Remember you are a scientist and the quality of your research is what is important, not how pretty your source code looks. Perfectly written, extensively documented, elegant code that gets the answer wrong is not as useful as a basic script that gets it right. Having said that, once you’re sure your core algorithm works, spend time making it elegant and documenting how to use it. Use your biological knowledge as much as possible—that’s what makes you a computational biologist.


Be suspicious and trust nobody

The following experiment is often performed during statistics training. First, a large matrix of random numbers is created and each column is designated as ‘case’ or ‘control’. A statistical test is then applied to each row to test for significant differences between the case data and the control data. You should not be surprised to learn that hundreds of rows come back with P values indicating statistical significance. Biological datasets, such as those generated by genomics experiments are just like this, large and full of noise. Your data analysis will produce both false positives and false negatives; and there may be systematic bias in the data, introduced either in the experiment or during the analysis.

“Knowledge of biology is vital in the interpretation of computational results.”

There is a temptation, even among biologists trained in statistical techniques, to throw caution to the wind when particular software or pipelines produce an interesting result. Instead, treat results with great suspicion, and carry out further tests to determine whether the results can be explained by experimental error or bias. If multiple approaches agree, then your confidence in those answers increases. But for many findings, validation and further work in the laboratory may be necessary. Knowledge of biology is vital in the interpretation of computational results. Setting traps, or tests, as mentioned above, is only part of this. Those tests are meant to ensure that your software or pipeline is working as you expect it to work; it doesn’t necessarily mean that the answers produced are correct.

Have fun ;)

Building mongodb with custom boost library using linux. Client C++ driver too &:)

In fact it is much more a memory for myself than some tutorial. But as I have not found any (actual) reference of it.
By the way, I tested during 2.5.x development cicle. Probably it can change. (And will. (As it did.))

Just the commands.

Download boost, unzip it, go to the folder:

./b2 variant=debug link=static  threading=multi  --with-system --with-thread --with-date_time --with-regex --with-serialization   --with-system --with-program_options --with-filesystem stage
mkdir /opt/boost
./b2 install --prefix=/opt/boost

For mongo:

cd ~
mkdir mongo
cd mongo
git clone
mkdir /opt/mongo
scons --full -j 64 --prefix=/opt/mongo/ --cpppath=/opt/boost --libpath=/epidb/opt/boost/

It takes some time because it builds *everything*.


For testing:

cd ~
mkdir db
/opt/mongo/mongod --dbpath db

Building MongoDB with clang and libc++

I am working in a project that I use MongoDB as data storage system. I really like to “trunk” version, directly from git []. Even using a Mac, I used to compile MongoDB with gcc. Because MongoDB had a problem with the clang suite. Today I realized that it was fixed:

In the same way, since I updated to Maverick, I had small problems builind my project. Mainly problem in the linking phase, where for some reason it was linking the binaries with libc++ where it was expected to link it with libc++ (because MongoDB and Boost were built using libc++).

So, after some time, I compiled the BOOST libraries, MongoDB, and my project using  clang/clang++ and libc++.

I will explain here the steps, if you want to know why I used some parameter or other, please ask at the comments.

For it, I did:

mkdir /opt/boost-clang-libc++/ 
mkdir /opt/mongo-clang-libc++/ 

Downloaded Boost libraries from
ncompressed,  and inside the directory I did:

./  --prefix=/opt/boost-clang-libc++/
./b2 variant=debug link=static threading=multi toolset=clang cxxflags="-stdlib=libc++ -arch x86_64" linkflags="-stdlib=libc++” --with-system --with-thread --with-date_time --with-regex --with-serialization   --with-system --with-program_options --with-filesystem stage 
./b2 install --prefix=/opt/boost-clang-libc++/

Paciente… Paciente… built!

for MongoDB:

git clone
# Remember that it is a VERY unstable version! You can clone some Tag or download the source code of some mongodb stable version
scons --full install  --64=FORCE64  -j 16 --prefix /opt/mongo-clang-libc++/  --use-system-boost --extrapath=/opt/boost-clang-libc++/ --osx-version-min=10.8 --libc++  # osx-version-min=10.8 is for maverick, you can use 10.7 as well.

Wait, wait… built!

For running mongodb you have to setup where Boost’s Dynamic libraries are:

export DYLD_LIBRARY_PATH=/opt/boost-clang-libc++/lib:$DYLD_LIBRARY_PATH

I suggest you to put it into you .bash_profile

To compile your program that uses, for example, boost_filesystem and the mongodb c++ Library you have to:

clang++ yourprogram.cpp -o program -L/opt/mongo-clang-libc++/lib -lmongoclient /opt/boost-clang-libc++/lib/libboost_filesystem.a

I have one problem…

– I have one problem.
– Let’s use XML and Perl
– Now I have three problems!

Let’s use Java instead!
Now you have a ProblemFactory problemFactory = new ProblemFactory(ProblemFactory.PROBLEM_FACTORY);
Let’s use C instead!
Now I am rewriting the string library! Now I am rewriting malloc()!
Let’s use Erlang.
Now I have no idea what I am doing.
Let’s use LISP.
Now I feel superior, but it takes 200 times to run.
Let’s use FORTRAN!
And now every computer scientist hates me
Let’s use Python.
Now I just “import work” and go home.
“Knock, knock.”
“Who’s there?”

very long pause….

If you put a million monkeys at a million keyboards, one of them will eventually write a Java program.
The rest of them will write Perl programs.
A Cobol programmer made so much money doing Y2K remediation that he was able to have himself cryogenically frozen when he died. One day in the future, he was unexpectedly resurrected.
When he asked why he was unfrozen, he was told:
“It’s the year 9999 – and you know Cobol”

Q: How many prolog programmers does it take to change a lightbulb?
A: Yes.