Skip to content

Pastures new

As of this week (Feb 2011) I’ve left UCL to start at as a data scientist. Big up to the CATH and Gene3D crew who’ve made it such an excellent few years!

This site’s staying here but I probably won’t be updating much — check the journal instead for updates.


Importing Delicious bookmarks to Google Bookmarks, with tags

Along with a lot of other people, I was a bit perturbed by the impending closure of Delicious. I’ve been using it for years and have hundreds of bookmarks, and none of the new crop of competitors seem up to the job, either lacking in features, having no bulk-import facility, or just being too new to have the all-important smell of permanence.

Google Bookmarks is the closest one that feels reliable enough and is perfectly adequate if you don’t care about the social features or other bells and whistles — which I don’t, I just want a portable, centralized bookmarks list with an informal tagging scheme. But getting Delicious bookmarks into it in bulk is a real pain, involving merging a couple of files with a Ruby script, importing the results into Firefox, installing the Firefox Google toolbar, and importing into Google Bookmarks through that. And all the tags are given the prefix “Tag:” by the toolbar, for no good reason whatsoever. And of course this requires Firefox, and bulk actions on hundreds of bookmarks is amazingly slow in Firefox, taking several minutes at near-100% CPU.

Until Hacker News user kevko mentioned their JS bookmarklet which does a bulk-import from a Delicious dump, with no hassle, no requirement for Firefox, and no silly tag mangling. From 2006! Check it out here.

Much respect to kevko.

Tagged ,

Maschinenfest 2010 highlights

My Maschinenfest roundup, featuring my top five acts of the festival — Frl. Linientreu, Matta, Niveau Zero, Subheim and Architect — is online now at Connexion Bizarre.

Tagged ,

RapidMiner tutorial, 25 Nov 2010

For the next BioGeeks Tech Meet, I’ll be giving a tutorial on RapidMiner, the nifty data analysis package.

RapidMiner — machine learning for the rest of us

All are welcome. Although I’ll be using examples from biology, there’s nothing bio-specific in RapidMiner, and it might be useful/interesting to geeks of other varieties too.

Tagged , , ,

October BioGeeks at Imperial — next gen sequencing

This month’s London BioGeeks will be at Imperial on the 21st of October. This month we’re bringing you a special selection of talks on next generation sequencing:

Experience in variant calling from exome sequencing
Francesco Lescai, Elia Stupka

Sequencing whole exomes in order to identify high penetrant variants in few individuals is becoming relatively easy, and calling variants is apparently an easy push-one-button procedure. However, understanding data quality and filtering out potential false positives in SNP calling is far more difficult. We will give a tour among the key QC and filtering issues, and discuss our experiences in calling variants from exome sequencing projects at UCL Genomics.

Analysing sequencing data on the NGS Cloud
Caroline Johnston, Matteo Turilli

The generation of large next-generation sequencing datasets is rapidly becoming a standard procedure in biology, but the resulting data requires compute resources beyond those normally available in a lab. The National Grid Service’s prototype Cloud is a first step towards a non-commercial, scalable solution for UK researchers. We will give a brief introduction to the NGS and to the Cloud prototype and will run a demo to process some short read data.

AQuA-NGS: A Quality Assessment Tool for Next Generation Sequencing Data
Zabeen Patel

The recent advancement of high-throughput sequencing enables the experimentalist to generate huge amounts of data at the genomic, transcriptomic, and epigenetic levels. However, as this is a relatively new technology, the methods for assessing the quality of the data are still limited. The AQuA-NGS system was developed as a platform-independent, desktop application for the viewing of quality assessment metrics generated by the R/Bioconductor package, ShortRead. These metrics are stored in a MySQL database, together with run and sample metadata. Using the flash-based GUI, the Bioinformatician may submit new data, browse the database, view metrics via interactive tables and charts, and directly compare QA metrics across samples, on the basis of multiple criteria. The system in its present, foundational state can perform the basic functions of generating, viewing, and comparing a limited set of QA metrics generated from Illumina/Solexa export files. It requires additional development to make it ready for public release, such as the ability to process non-Solexa files, and work with remote destinations. Once complete, it will hopefully be incorporated into the pre-processing pipeline of multiple next generation sequencing platforms.

Head for the Flowers Building, room G47A for 18:00 [map ref 31]. There’ll be drinks in the lovely Eastside Bar afterwards [map ref 19]. Campus map (PDF)


BioGeeks tech meet, Oct 2010 — looking for speaker

The next London BioGeeks tech meet will be on 21st of October at Imperial College — full details to follow.

We’re looking for another speaker. If you want to do an informal talk on a topic to do with bioinformatics, genomics, or any practical tech subject that might be of interest to biogeeks — cloud computing, big data management, machine learning, development tools, algorithm tuning etc. etc. etc. — then give me a shout.

Talks are normally 20-30mins but that’s negotiable.

See here for examples of previous meets, to get an idea of what we’re about.


Posting to Twitter automatically using OAuth

Twitter recently switched off basic HTTP authentication, forcing developers to use the more complex (but hopefully safer) OAuth. There are lots of OAuth examples out there, but they all seem to focus on interactive apps, where the user is sent to Twitter to authenticate, and then the app uses the resulting access token to post on the user’s behalf.

However, for FuncNet we have a simple script running in a cron job, which posts a status message to @FuncNet every so often. This runs without any supervision, so I was left scratching my head as to how I could obtain the access token and access token secret required to post. The app registration page at for each app only shows its consumer key and consumer secret which are something different.

Eventually, thanks to Net::Twitter developer Marc Mims on this thread, I discovered that there’s a whole separate page for each app at which has the access token and access token secret required for the app to post to its own account.

N.B. In the URLs above, NNNNNN corresponds to the numeric ID for your application. If you don’t know what this is, just go to, click on the app name to get the consumer strings, and then click on My Access Token to get the access strings.

Once you have these, you can post like this (thanks again to Marc for example):

    use Net::Twitter;
    my $nt = Net::Twitter->new(
        traits => [qw/OAuth API::REST/],
        consumer_key        => $YOUR_CONSUMER_KEY,
        consumer_secret     => $YOUR_CONSUMER_SECRET,
        access_token        => $YOUR_ACCESS_TOKEN,
        access_token_secret => $YOUR_ACCESS_SECRET,
    $nt->update("Bob's your uncle!");

Why you need 4 distinct incomprehensible strings in order to post a single tweet, I don’t know, but presumably it’s justified on security grounds. What isn’t justified is Twitter hiding two of them somewhere else entirely, and not linking to that place from the main OAuth page for the app.

Not useful, guys.

UPDATE: It seems there are bigger problems with Twitter’s new process. This detailed ArsTechnica article describes OAuth 1.0a as “an inelegant hack” and Twitter’s implementation of it as being “against all reason”.

If you’re having trouble with it, you’re not alone; it seems almost designed to cause problems for app developers, and particularly open-source app developers. The article’s well worth a read.

Tagged ,

Installing Spotify on Fedora 13

I’ve recently taken possession of a flashy new workstation running Fedora, and with joy in my heart went to install Spotify on it, only to discover… Noooo… They only offer .deb packages for Debian and Ubuntu.

Thankfully, the solution was easier than I thought. Do all the following as root…

1. Install dpkg, the Debian package management tool:

yum install dpkg

2. Download the base package for your architecture, and the gnome support package, from here:

3. Create a temp directory, and unpack them there (we want to check for clashes):

mkdir spotify
cd spotify
dpkg -x ../spotify-client-qt_0.4.7.132.g9df34c0-1_amd64.deb .
dpkg -x ../spotify-client-gnome-support_0.4.7.132.g9df34c0-1_all.deb .

4. This gives you a directory tree starting at /usr. You can check for clashes like this:

find -not -type d -exec ls -l /'{}' \;

… and make sure there’s no files found.

5. Then re-extract them into your root partition (hence checking for clashes first):

dpkg -x ../spotify-client-qt_0.4.7.132.g9df34c0-1_amd64.deb /
dpkg -x ../spotify-client-gnome-support_0.4.7.132.g9df34c0-1_all.deb /

6. Finally, we need to manually install qt-x11 as this dependency is needed:

yum install qt-x11

7. Then just type spotify and log in!

Seems pretty stable so far, apart from a couple of minor glitches.

EDIT: As suggested by Tyson Key in this thread, I got rid of the audio glitches by starting the PulseAudio volume control (/usr/bin/pavucontrol) before Spotify. Maybe this adds some buffering or something.

N.B. I take no responsibility if this process damages your computer, your music collection, your hearing or your sanity. Try at your own risk :-)

Tagged , , ,

Tunnelling a connection through 2 servers via ssh

This took a bit of head-scratching, so for future reference, or anyone else looking:

Say I am working outside the office firewall, on a machine called home, and I need to get into a MySQL server inside it. (Doesn’t have to be MySQL, but just for argument’s sake.)

There’s a machine called gateway I can ssh to and tunnel through, but for security reasons, the database server mysql doesn’t accept connections from gateway directly. But my desktop machine at the office (err… desktop) can connect to mysql.

One way round it is to ssh from home to gateway and forward a port on gateway to the ssh server on desktop:

home $ ssh -L -tAY

And then in another terminal, ssh from home to desktop via this tunnel, forwarding another port on home to the incoming connections port on mysql (3306 in MySQL’s case usually):

home $ ssh -p 2222 -L

This time, you’re connecting to home port 2222, but because of the first command, this forwards you straight to desktop port 22.

Now both tunnels are in place, you can just connect to port 23306 on home and arrive by magic at mysql. In another terminal (or from your MySQL GUI):

home $ mysql -uUSER -pPASS -h127.0.0.1 -P23306

This example shows a tunnel-within-a-tunnel. There should be a way to make this work using end-to-end tunnelling instead, I tried but didn’t get anywhere. But that might be due to ssh server restrictions on our equivalent of gateway.

If none of this means anything, there’s an intro to ssh port forwarding here.

Tagged ,

Best new feature in Eclipse 3.6 Helios

… is hidden away on the last tab of the Java Formatting Profile editor (Preferences -> Java -> Code Style -> Formatter -> Edit).

They’ve finally (after nearly six years) added the ability to temporarily turn the code formatter off for a tricksy block that needs its own custom formatting. e.g.:

			// @formatter:off
			.append( "<node id='" )
			.append( protein )
			.append( "'><data key='class'>" )
			.append( cls )
			.append( "</data><data key='label'>" )
			.append( protein )
			.append( "</data></node>" );
			// @formatter:on

See the Off/On Tags tab for details.

Nice one guys… Eventually.

Tagged , ,