Gene/protein interaction extraction baseline
While working on my PhD I stumbled across a dumb-but-effective baseline measure for extracting gene/protein interactions from text. This is referred to as ‘Algorithm III’ in chapter 3 of my PhD thesis (PDF, 730KB) and is further tested in Kabiljo et al. 2009.
The complete word list used in the paper is here, and the script will be posted too as soon as I’ve removed everything specifically related to the LLL Challenge data format.
Syntactic pattern matching with GraphSpider and MPL
Please check the GraphSpider/MPL website for software and data for syntactic pattern matching and information extraction.
Legacy data
If you are looking for downloads relating to my thesis or any of my older published papers, see my old site at Birkbeck.