<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>biotext.org.uk &#187; bioinformatics</title>
	<atom:link href="http://biotext.org.uk/tag/bioinformatics/feed/" rel="self" type="application/rss+xml" />
	<link>http://biotext.org.uk</link>
	<description>Not a typewriter</description>
	<lastBuildDate>Sat, 05 Feb 2011 14:18:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0.1</generator>
		<item>
		<title>October BioGeeks at Imperial &#8212; next gen sequencing</title>
		<link>http://biotext.org.uk/october-biogeeks-at-imperial-next-gen-sequencing/</link>
		<comments>http://biotext.org.uk/october-biogeeks-at-imperial-next-gen-sequencing/#comments</comments>
		<pubDate>Thu, 07 Oct 2010 11:43:35 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=520</guid>
		<description><![CDATA[This month&#8217;s London BioGeeks will be at Imperial on the 21st of October. This month we&#8217;re bringing you a special selection of talks on next generation sequencing: Experience in variant calling from exome sequencing Francesco Lescai, Elia Stupka Sequencing whole exomes in order to identify high penetrant variants in few individuals is becoming relatively easy, [...]]]></description>
			<content:encoded><![CDATA[<p>This month&#8217;s <a href="http://biogeeks.wordpress.com/">London BioGeeks</a> will be at Imperial on the 21st of October. This month we&#8217;re bringing you a special selection of talks on next generation sequencing:</p>
<p><strong>Experience in variant calling from exome sequencing</strong><br />
<em>Francesco Lescai, Elia Stupka</em></p>
<p>Sequencing whole exomes in order to identify high penetrant variants in few individuals is becoming relatively easy, and calling variants is apparently an easy push-one-button procedure. However, understanding data quality and filtering out potential false positives in SNP calling is far more difficult. We will give a tour among the key QC and filtering issues, and discuss our experiences in calling variants from exome sequencing projects at UCL Genomics.</p>
<p><strong>Analysing sequencing data on the NGS Cloud</strong><br />
<em>Caroline Johnston, Matteo Turilli</em></p>
<p>The generation of large next-generation sequencing datasets is rapidly becoming a standard procedure in biology, but the resulting data requires compute resources beyond those normally available in a lab.  The National Grid Service&#8217;s prototype Cloud is a first step towards a non-commercial, scalable solution for UK researchers. We will give a brief introduction to the NGS and to the Cloud prototype and will run a demo to process some short read data.</p>
<p><strong>AQuA-NGS: A Quality Assessment Tool for Next Generation Sequencing Data</strong><br />
<em>Zabeen Patel</em></p>
<p>The recent advancement of high-throughput sequencing enables the experimentalist to generate huge amounts of data at the genomic, transcriptomic, and epigenetic levels. However, as this is a relatively new technology, the methods for assessing the quality of the data are still limited. The AQuA-NGS system was developed as a platform-independent, desktop application for the viewing of quality assessment metrics generated by the R/Bioconductor package, ShortRead.  These metrics are stored in a MySQL database, together with run and sample metadata. Using the flash-based GUI, the Bioinformatician may submit new data, browse the database, view metrics via interactive tables and charts, and directly compare QA metrics across samples, on the basis of multiple criteria. The system in its present, foundational state can perform the basic functions of generating, viewing, and comparing a limited set of QA metrics generated from Illumina/Solexa export files. It requires additional development to make it ready for public release, such as the ability to process non-Solexa files, and work with remote destinations. Once complete, it will hopefully be incorporated into the pre-processing pipeline of multiple next generation sequencing platforms. </p>
<p>Head for the Flowers Building, room G47A for 18:00 [map ref 31]. There&#8217;ll be drinks in the lovely Eastside Bar afterwards [map ref 19]. <a href="http://www.imperial.ac.uk/workspace/campusinfo/public/sthkencampus.pdf">Campus map</a> (PDF)</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/october-biogeeks-at-imperial-next-gen-sequencing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>BioGeeks tech meet, Oct 2010 &#8212; looking for speaker</title>
		<link>http://biotext.org.uk/biogeeks-tech-meet-oct-2010-looking-for-speaker/</link>
		<comments>http://biotext.org.uk/biogeeks-tech-meet-oct-2010-looking-for-speaker/#comments</comments>
		<pubDate>Thu, 30 Sep 2010 10:23:40 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=514</guid>
		<description><![CDATA[The next London BioGeeks tech meet will be on 21st of October at Imperial College &#8212; full details to follow. We&#8217;re looking for another speaker. If you want to do an informal talk on a topic to do with bioinformatics, genomics, or any practical tech subject that might be of interest to biogeeks &#8212; cloud [...]]]></description>
			<content:encoded><![CDATA[<p>The next London BioGeeks tech meet will be on 21st of October at Imperial College &#8212; full details to follow.</p>
<p>We&#8217;re looking for another speaker. If you want to do an informal talk on a topic to do with bioinformatics, genomics, or any practical tech subject that might be of interest to biogeeks &#8212; cloud computing, big data management, machine learning, development tools, algorithm tuning etc. etc. etc. &#8212; then give me a shout.</p>
<p>Talks are normally 20-30mins but that&#8217;s negotiable.</p>
<p>See <a href="http://is.gd/fBPUM">here</a> for examples of previous meets, to get an idea of what we&#8217;re about.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/biogeeks-tech-meet-oct-2010-looking-for-speaker/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>London BioGeeks &#8212; May Tech Meet is next week</title>
		<link>http://biotext.org.uk/london-biogeeks-may-tech-meet-is-next-week/</link>
		<comments>http://biotext.org.uk/london-biogeeks-may-tech-meet-is-next-week/#comments</comments>
		<pubDate>Thu, 13 May 2010 16:11:52 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=470</guid>
		<description><![CDATA[The May tech meet is on Thursday 20th at Imperial College. This month&#8217;s speakers: Catherine Canevet &#8212; Ondex: Data integration and visualisation Christopher Barnes &#8212; ABC-SysBio: Approximate Bayesian Computation in Python with GPU support N. Purswani, L. Tweedy, Z. Patel, C. Suriel-Melchor &#8212; DASbrick: A cloud based Rich internet application for Synthetic Biology Parts Registries [...]]]></description>
			<content:encoded><![CDATA[<p>The <a href="http://biogeeks.wordpress.com/2010/05/07/may-tech-meet/">May tech meet</a> is on Thursday 20th at Imperial College.</p>
<p>This month&#8217;s speakers:</p>
<p><a href="http://www.rothamsted.bbsrc.ac.uk/Research/Centres/PersonDetails.php?PIID=5497">Catherine Canevet</a> &#8212; <a href="http://ondex.org/">Ondex</a>: Data integration and visualisation</p>
<p><a href="http://www3.imperial.ac.uk/theoreticalsystemsbiology/people/christopherbarnes">Christopher Barnes</a> &#8212; <a href="http://abc-sysbio.sourceforge.net/">ABC-SysBio</a>: Approximate Bayesian Computation in Python with GPU support</p>
<p>N. Purswani, L. Tweedy, Z. Patel, C. Suriel-Melchor &#8212; DASbrick: A cloud based Rich internet application for Synthetic Biology Parts Registries</p>
<p>Does anyone have a link for DASbrick?</p>
<p>Drinks afterwards at Imperial&#8217;s Eastside Bar. See <a href="http://biogeeks.wordpress.com/2010/05/07/may-tech-meet/">the BioGeeks blog</a> for full details.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/london-biogeeks-may-tech-meet-is-next-week/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>London BioGeeks &#8212; April Tech Meet</title>
		<link>http://biotext.org.uk/london-biogeeks-april-tech-meet/</link>
		<comments>http://biotext.org.uk/london-biogeeks-april-tech-meet/#comments</comments>
		<pubDate>Mon, 19 Apr 2010 10:59:31 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=452</guid>
		<description><![CDATA[This month&#8217;s tech meet is at 6pm on 21st April at University College London. We have talks from&#8230; Alison Cuff, UCL The CATH database &#8212; Structural Diversity and the Question of the Fold Continuum Andrew Martin, UCL SAPTF &#8212; Sequence Analysis Plugin Tool Framework John Pinney, Imperial College GLASS &#8212; Gene LAyout by Semantic Similarity [...]]]></description>
			<content:encoded><![CDATA[<p><strong>This month&#8217;s tech meet is at 6pm on 21st April at University College London.</strong></p>
<p>We have talks from&#8230;</p>
<p><em>Alison Cuff, UCL</em></p>
<p>The CATH database &#8212; Structural Diversity and the Question of the Fold Continuum</p>
<p><em>Andrew Martin, UCL</em></p>
<p>SAPTF &#8212; Sequence Analysis Plugin Tool Framework</p>
<p><em>John Pinney, Imperial College</em></p>
<p>GLASS &#8212; Gene LAyout by Semantic Similarity</p>
<p>Followed by <strong>drinks at 7:30-ish</strong> at the <a href="http://www.yelp.co.uk/biz/college-arms-london">College Arms</a>.</p>
<p>Full details, maps, directions etc. are <a href="http://biogeeks.wordpress.com/2010/03/18/april-tech-meet/">on the BioGeeks blog</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/london-biogeeks-april-tech-meet/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Solr presentation slides available</title>
		<link>http://biotext.org.uk/solr-presentation-slides-available/</link>
		<comments>http://biotext.org.uk/solr-presentation-slides-available/#comments</comments>
		<pubDate>Wed, 17 Feb 2010 14:01:08 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[biogeeks]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[git]]></category>
		<category><![CDATA[karyodas]]></category>
		<category><![CDATA[solr]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=432</guid>
		<description><![CDATA[Tomorrow I&#8217;m giving a London BioGeeks talk about Solr, the Lucene-based search engine we&#8217;re using at CATH, and soon Smesh too. The slides are available here (PDF, 500KB). If you&#8217;re in London, come along, everyone&#8217;s welcome. Details here. We also have Manuel Corpas on KaryoDAS, and Phil Dawes on Git. And beer afterwards.]]></description>
			<content:encoded><![CDATA[<p><a href="http://biogeeks.wordpress.com/2010/01/07/february-tech-meet/">Tomorrow</a> I&#8217;m giving a <a href="http://groups.google.com/group/londonbiogeeks">London BioGeeks</a> talk about <a href="http://lucene.apache.org/solr/">Solr</a>, the <a href="http://lucene.apache.org/">Lucene</a>-based search engine we&#8217;re using at <a href="http://www.cathdb.info/">CATH</a>, and soon <a href="http://smeshup.com/">Smesh</a> too.</p>
<p><a href="/static/biogeeks_solr_feb10.pdf">The slides are available here (PDF, 500KB).</a></p>
<p>If you&#8217;re in London, come along, everyone&#8217;s welcome. Details <a href="http://biogeeks.wordpress.com/2010/01/07/february-tech-meet/">here</a>. We also have <a href="http://manuelcorpas.com/">Manuel Corpas</a> on <a href="https://decipher.sanger.ac.uk/karyodas/display.html">KaryoDAS</a>, and <a href="http://www.phildawes.net/blog/">Phil Dawes</a> on <a href="http://git-scm.com/">Git</a>. And <a href="http://www.theprinceregentgloucesterroad.co.uk/">beer</a> afterwards.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/solr-presentation-slides-available/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>More live gigs!</title>
		<link>http://biotext.org.uk/more-live-gigs/</link>
		<comments>http://biotext.org.uk/more-live-gigs/#comments</comments>
		<pubDate>Fri, 26 Jun 2009 09:45:36 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[FuncNet]]></category>
		<category><![CDATA[webservices]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=377</guid>
		<description><![CDATA[I&#8217;ll also be running an interactive workshop on FuncNet at: The EMBRACE-ENFIN workshop on Expression, Interactions, and System Level Modeling Helsinki, 5th-6th October 2009]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll also be running an interactive workshop on <a href="http://funcnet.eu/">FuncNet</a> at:</p>
<p><a href="http://www.enfin.org/page.php?page=embrace_enfin">The EMBRACE-ENFIN workshop on Expression, Interactions, and System Level Modeling</a></p>
<p>Helsinki, 5th-6th October 2009</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/more-live-gigs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Live gigs!</title>
		<link>http://biotext.org.uk/live-gigs/</link>
		<comments>http://biotext.org.uk/live-gigs/#comments</comments>
		<pubDate>Thu, 04 Jun 2009 17:52:33 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[FuncNet]]></category>
		<category><![CDATA[webservices]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=373</guid>
		<description><![CDATA[Couple of upcoming events I&#8217;ll be going to&#8230; 1. Data Integration in the Life Sciences (DILS 2009) in Manchester next month, with a poster and abstract about FuncNet. 2. EMBL-EBI/ENFIN 2009 annual forum for small-medium enterprises (SMEs), in Vienna in September, with a half-hour talk on the same subject. No ISMB for me this year, [...]]]></description>
			<content:encoded><![CDATA[<p>Couple of upcoming events I&#8217;ll be going to&#8230;</p>
<p>1. <a href="http://www.cs.manchester.ac.uk/DILS09/index.php">Data Integration in the Life Sciences</a> (DILS 2009) in Manchester next month, with a poster and abstract about <a href="http://funcnet.eu">FuncNet</a>.</p>
<p>2. <a href="http://www.ebi.ac.uk/industry/SME/">EMBL-EBI</a>/<a href="http://enfin.org/">ENFIN</a> <a href="http://www.enfin.org/page.php?page=sme_meeting_2009">2009 annual forum for small-medium enterprises</a> (SMEs), in Vienna in September, with a half-hour talk on the same subject.</p>
<p>No <a href="http://www.iscb.org/ismbeccb2009/">ISMB</a> for me this year, not economically justifiable without a speaking spot.</p>
<p>Andrew.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/live-gigs/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Bioinformatics in the pub. Free as in free beer&#8230;</title>
		<link>http://biotext.org.uk/bioinformatics-in-the-pub-free-as-in-free-beer/</link>
		<comments>http://biotext.org.uk/bioinformatics-in-the-pub-free-as-in-free-beer/#comments</comments>
		<pubDate>Thu, 14 May 2009 17:33:07 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Announcements]]></category>
		<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[drinking]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=357</guid>
		<description><![CDATA[Pub meet-up for bioinformaticians / technophile biologists at: The Miller pub, near Guy’s Hospital (London Bridge) on Wednesday 27th May from 6pm onwards. This first meeting will just be a social event and chance to chat other bio-geeks but if there’s enough interest then we might organise some more technical events in the future (topic [...]]]></description>
			<content:encoded><![CDATA[<p>Pub meet-up for bioinformaticians / technophile biologists at:</p>
<p><em><a href="http://www.themiller.co.uk/pub/map.asp">The Miller pub</a>, near Guy’s Hospital (London Bridge) on Wednesday 27th May from 6pm onwards.</em></p>
<p>This first meeting will just be a social event and chance to chat other bio-geeks but if there’s enough interest then we might organise some more technical events in the future (topic suggestions welcome).</p>
<p>Feel free to tell anyone you think might be interested. If you want to come you can just turn up, but it would be helpful if you let me or <a href="http://www.cassj.co.uk/blog/?p=237">Cass</a> know you&#8217;re coming. She&#8217;s found a recruitment agency who are interested in sponsoring the event (probably in the form of beer and food) so it would be useful to have an idea of numbers.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/bioinformatics-in-the-pub-free-as-in-free-beer/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>SESL 2009 day two</title>
		<link>http://biotext.org.uk/sesl-2009-day-two/</link>
		<comments>http://biotext.org.uk/sesl-2009-day-two/#comments</comments>
		<pubDate>Tue, 31 Mar 2009 10:45:39 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[ontologies]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[SESL]]></category>
		<category><![CDATA[text_mining]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=312</guid>
		<description><![CDATA[Semantic Enrichment of the Scientific Literature 2009 Tue 31 Mar: &#8220;Semantic Enrichment of the literature for the benefit of all users&#8221; (Monday&#8217;s notes are here) Missed the early morning session. I don&#8217;t work in pharma any more so it didn&#8217;t seem worth a 5:45am wake-up (unhelpful train times). Although apparently Eric Neumann&#8217;s talk on linked [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ebi.ac.uk/Rebholz-srv/SESL/sesl.html">Semantic Enrichment of the Scientific Literature 2009</a></p>
<p><strong>Tue 31 Mar: &#8220;Semantic Enrichment of the literature for the benefit of all users&#8221;</strong></p>
<p>(Monday&#8217;s notes are <a href="http://biotext.org.uk/workshop-notes-sesl-2009/">here</a>)</p>
<p>Missed the early morning session. I don&#8217;t work in pharma any more so it didn&#8217;t seem worth a 5:45am wake-up (unhelpful train times). Although apparently Eric Neumann&#8217;s talk on linked data was good (&#8220;semantic web without the &#8216;semantic&#8217;&#8221; &#8212; <a href="http://duncan.hull.name/">Duncan</a>)</p>
<p>Alfonso Valencia &#8212; <a href="http://www.elixir-europe.org/">ELIXIR</a> &#8212; an EU project to upgrade Europe&#8217;s bioinformatics infrastructure. Includes a work package on literature integration &#8212; making lit. repositories, ontologies and traditional biological databases interoperate better. Good &#8212; too much text mining happens in isolation from the rest of the bioinformatics world. Targeted at wet-lab scientists not just computational people. Looks like it might include an effort to turn raw algorithms into usable tools/platforms. Still in the early phases.</p>
<p>He also discussed the <a href="http://biocreative.sourceforge.net/biocreative_2.html">BioCreative</a> project which has released various data sets and held challenges on several aspects of text mining. A spin-off from these is the <a href="http://bcms.bioinfo.cnio.es/">BioCreative MetaServer</a> which identifies genes and proteins mentioned in text by aggregating predictions from several prediction services.</p>
<p>Dietrich Rebholz-Schuhmann &#8212; <a href="http://ukpmc.ac.uk/">UKPMC</a> &#8212; a UK mirror of PubMed Central (with added value apparently) co-ordinated by the British Library. Working on information retrieval and data integration improvements. Sounds like the funding bodies are getting involved, many referring specifically to UKPMC in their open access policies. Paying for OA journal submissions is an issue. Apparently the Wellcome Trust have an OA fund which is under-utilized.</p>
<p>Also, <a href="http://www.ebi.ac.uk/Rebholz-srv/CALBC/">CALBC</a> &#8212; a project to semantically annotate a large biomedical corpus (named entities only?) by getting a consensus annotation from iteratively integrating the output of various information extraction systems, and then manually cleaning up the disagreements.</p>
<p>Stefano Bertolo (EU) &#8212; funding calls &#8212; deadline 3rd November&#8230;</p>
<p>(Great analogy: Human history has entered a phase where we can produce information by machine quicker than we can interpret it. What we need is &#8216;cognitive levers&#8217;.)</p>
<p><a href="http://cordis.europa.eu/fp7/ict/content-knowledge/">7th framework, SO 4.3, Call 5</a>, themes:</p>
<ul>
<li>Capturing tractable information</li>
<li>Delivering pertinent information</li>
<li>Collaboration and decision support</li>
<li>Personal sphere</li>
<li>Impact and science &#038; tech leadership</li>
</ul>
<p>They all sound a bit vague and buzzwordy without the explanations&#8230;</p>
<p>Key themes: large data sets and (close to) real-time processing. Requirement for robust, strongly-tested tools that can be distributed &#8212; not just &#8216;only works on the PC of the postdoc that wrote it, on a good day&#8217; :-)</p>
<p>Informal queries about proposals: infso-e2 at ec.europa.eu</p>
<p>Lunch! Then&#8230;</p>
<p>Keynote from UMLS guru Olivier Bodenreider on normalizing terms/concepts across different lexical/taxonomical/ontological resources. Lexical vs. semantic approaches &#8212; e.g. string munging vs. traversing known relationships. Latter complicated by fact that some pairs of concepts are synonymous in one resource and hyponymous in another. Also, semantic similarity &#8212; lowest common subsumer/definition by extension, e.g. famous Resnik measure.</p>
<p>Also mentioned <a href="http://bioportal.bioontology.org/">BioPortal</a>, not sure exactly how this differs from the UMLS in scope, probably more biological than medical? Must be overlap though.</p>
<p>These are forming a key part of CALBC (see above).</p>
<p>Sophia Ananiadou from <a href="http://www.nactem.ac.uk/">NaCTeM</a>&#8211; NLP view of semantic enrichment: terms and names entities &#8212; concepts &#8212; events and relationships. Termine and Acromine &#8212; extraction of terms and acronyms. <a href="http://www.biomedcentral.com/1471-2105/9/S11/S8">Accelerated annotation</a> methods &#8212; cunning. More on the importance of building proper tools rather than just prototypes/in-house algorithms. Glad the NLP scene is catching on to this. Hopefully they allow querying by unique accession rather than just names &#8212; this is another area where the NLP people don&#8217;t always understand what the bio people need.</p>
<p>She discussed some of NaCTeM&#8217;s flagship tools like MEDIE, FACTA and KLEIO &#8212; it does look like they&#8217;re starting to take all the pain out of text mining, by doing the difficult bits for us, so we can use the results to do actual mining. Also they are offering web service interfaces (&#8216;overdue&#8217; for some of them according to Sophia) &#8212; excellent news.</p>
<p>More from Udo &#8212; what do we mean by &#8216;semantics&#8217;? Mixed-bag talk. Problems with folksonomies/tag clouds, e.g. Flickr: &#8220;newyork&#8221; &#8220;newyorkcity&#8221; &#8220;nyc&#8221; &#8220;new&#8221; &#8220;york&#8221;. Biomedical lexicon an order of magnitude bigger than general English lexicon (based on Wordnet and typical competent speaker). Wow. Domain dictionaries like GO/UMLS: these inherit some of the problems of natural language because the terms themselves are stated/defined in natural language! Also often ontological relations are vague/underspecified/changing.</p>
<p>Last session&#8230;</p>
<p>Anita de Waard (Elsevier) &#8212; <em>FEBS Letters</em> structural digital abstracts experiment (author-provided PPI annotation). 75% author compliance, avg 1 hour per abstract. They&#8217;ve moved responsibility to the MINT curators instead of the authors, to increase compliance and efficiency! What does that tell us&#8230; Also mentioned <a href="http://www.okkam.org/">OKKAM</a> &#8212; a consortium trying to provide a UID for <em>every single entity on the web</em>. Umm&#8230; Holy crap. So far, 1.5 million entities covered, so they have a bit of a way to go, to say the least. She went on to discuss some aspects of discourse analysis of scientific text. Interesting point, hedging gets eroded by citation &#8220;these results suggest that&#8221; becomes &#8220;author X shows that&#8221; becomes just a cited fact.</p>
<p>She also discussed the Elsevier Grand Challenge &#8212; what&#8217;s the most interesting thing you can do with half a million full text articles? Finalists have been chosen, the winners will be announced next month. Next year: Future of Research Communication conference on same themes, probably March at Harvard.</p>
<p>EU-ADR (Erik van Mulligen) &#8212; federated data mining/text mining/epidemiological analysis to discover &#038; monitor novel adverse drug reactions. Five-year pan-European project. Sounds like an enormous piece of work with lots of engineering challenges &#8212; anonymization etc. for a start.</p>
<p><a href="http://www.wikigenes.org/">WikiGenes</a> (Robert Hoffmann) &#8212; a wiki for genes, chemicals, MeSH terms obviously &#8212; but pre-seeded with sentences yanked from <a href="http://www.ihop-net.org/UniPub/iHOP/">iHOP</a>. So experts can step in and add/fix stuff but without the momentum barrier to getting started. &#8216;Narcissistic drive&#8217; for authors of missed papers to add their own &#8212; cunning. Custom engine based on Apache Cocoon and Lucene. Authorship tracking down to individual strings of text, and it&#8217;s easy to view this information. The idea is that scientists will want to add their own work and get credit for it.</p>
<p>He makes the point that this is in many ways a much better way to publish biological information than several thousand different journals, and gives much better influence metrics than impact factors and H-index etc.</p>
<p>Is it in direct competition with WikiProteins? Not according to Robert &#8212; that&#8217;s more about knowledge engineering and formal semantic relationships, machine-readable stuff, whereas this is more supposed to be a modern publishing medium for human-readable information. Which hopefully the biologists will take to more readily.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/sesl-2009-day-two/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Workshop notes &#8212; SESL 2009</title>
		<link>http://biotext.org.uk/workshop-notes-sesl-2009/</link>
		<comments>http://biotext.org.uk/workshop-notes-sesl-2009/#comments</comments>
		<pubDate>Mon, 30 Mar 2009 12:56:17 +0000</pubDate>
		<dc:creator>Andrew</dc:creator>
				<category><![CDATA[Events]]></category>
		<category><![CDATA[bioinformatics]]></category>
		<category><![CDATA[ontologies]]></category>
		<category><![CDATA[publishing]]></category>
		<category><![CDATA[SESL]]></category>
		<category><![CDATA[text_mining]]></category>

		<guid isPermaLink="false">http://biotext.org.uk/?p=288</guid>
		<description><![CDATA[Semantic Enrichment of the Scientific Literature 2009 Monday 30 Mar: &#8220;Reliable factual data from the literature based on ontological resources&#8221; Highlight of the morning session was Junichi Tsujii&#8217;s demo of the PathText system, which integrates manually-curated pathway information in CellDesigner or SBML format with text-mined relationships, and lets you browse the pathway maps and drill [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://www.ebi.ac.uk/Rebholz-srv/SESL/sesl.html">Semantic Enrichment of the Scientific Literature 2009</a></p>
<p><strong>Monday 30 Mar: &#8220;Reliable factual data from the literature based on ontological resources&#8221;</strong></p>
<p>Highlight of the morning session was Junichi Tsujii&#8217;s demo of the PathText system, which integrates manually-curated pathway information in CellDesigner or SBML format with text-mined relationships, and lets you browse the pathway maps and drill through to related literature.</p>
<p>It&#8217;s not finished yet but there&#8217;s a preview video available from <a href="http://www.nactem.ac.uk/pathtext/">NaCTeM</a>.</p>
<p>Also a bit of a preview of the <a href="http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/SharedTask/">BioNLP 2009 Shared Task</a> on extracting biomolecular events from text into semantic networks &#8212; which I&#8217;m reviewing entries for at the moment.</p>
<p>Lots of material about gene regulation today. An intro to the <a href="http://www.ebi.ac.uk/Rebholz-srv/GRO/GRO.html">Gene Regulation Ontology</a> (Jung-Jae Kim), a couple of talks about extracting regulatory events from free text (Kim and Udo Hahn), and the <a href="http://www.oreganno.org/oregano/Index.jsp">ORegAnno</a> project which is using text mining to support its manual curation of regulatory events (Stephen Montgomery). The new(ish) GeneReg corpus will be useful to anyone building systems like this, as would be the BioNLP 2009 data, I&#8217;ll find out if it is available to non-entrants.</p>
<p>Also a talk about populating/extending ontologies automatically from clinical reports (Wendy Chapman). Patterns like &#8220;NOUN_PHRASE_1, such as NOUN_PHRASE_2&#8243;. Simple and effective.</p>
<p>Back from lunch&#8230; And sitting at the front so I can hear better. Hence better notes!</p>
<p>Simonetta Montemagni just gave an excellent introduction to the BioLexicon project (also NaCTeM-related) which is essentially a huge database of biomedical/biological language, including such things as domain-specific syntax and semantic metadata, dead useful for text mining developers. Also contains a thesaurus of gene and protein names (inc. synonyms and variants) with links back to UniProt IDs which makes it much more useful for general bioinformatics use.</p>
<p>It isn&#8217;t all available yet, and will be published via a linguistic data provider, bit vague about licensing! So it may or may not be free (I&#8217;m guessing free for academic use, commercial for other uses).</p>
<p>Lots of data and tools for natural language processing from Udo Hahn&#8217;s group: <a href="http://www.julielab.de/">http://www.julielab.de/</a> &#8230; Plus some war stories about the difference in information extraction accuracy between &#8216;lab&#8217; tests and real world performance, e.g. from ~60% (close to human levels) to ~20% F-score. Ouch&#8230; But we&#8217;ve all been there. (See also note about GeneReg above)</p>
<p>Su Jian talked about designing evaluation tasks for genomic information retrieval (i.e. search engine) algorithms, and improving said algorithms with dedicated gene/protein name recognizers. Bit specialized for me &#8212; lots of score functions I didn&#8217;t know the definitions of&#8230;</p>
<p>Quick coffee break!</p>
<p>Nice talk about the <a href="http://www.ebi.ac.uk/microarray-srv/efo/">Experimental Factor Ontology</a> from the <a href="http://www.ebi.ac.uk/microarray-as/ae/">ArrayExpress</a> project (James Malone). This is for classifying experimental conditions in microarray experiments. They&#8217;ve gone to a lot of trouble to link their ontology into others as painlessly as possible, and have developed autonomous agents to trawl the semantic web for other ontologies that may be related, and to alert them when ontologies they link to change, as this might imply a link is no longer true. Cute. The EFO has also allowed them to offer federated queries with other databases, and they use it for sanity-checking the data people submit via reasoning rules &#8212; e.g. cardiovascular disease can&#8217;t occur in hair follicle cells.</p>
<p>UCSD (Lynn Fink) have written a very nice <a href="http://www.codeplex.com/UCSDBioLit">plugin for Word 2007</a> that watches your text as you type and automatically tags biomolecular database identifiers and terms from OBO ontologies when they appear &#8212; with the option to add/edit/remove/override manually of course, and the tags being preserved in Word&#8217;s XML files. Kind of like a spellchecker/thesaurus for semantic markup. I&#8217;m not a fan of word processors (give me LaTeX any day) but this is an excellent idea. Hopefully publishers and curators will be able to parse useful metadata out of the resulting files.</p>
<p>Some similar ideas from <a href="http://wwwdev.ebi.ac.uk/tc-test/textmining/PublicationValidator/">PaperMaker</a> (Piotr Pezik) which also does semantic tagging, along with things like spotting missing references, acronyms that haven&#8217;t been defined, and genes/proteins that have been referred to by non-recommended identifiers. It can also trawl PubMed for similar publications, at the whole-doc or paragraph level. Throws in spell checking, word count etc. Neat work, but I&#8217;m not entirely sure who it&#8217;s aimed at &#8212; biologists would surely prefer this to be a Word plugin like the previous.</p>
<p><strong>More <a href="http://biotext.org.uk/sesl-2009-day-two/">tomorrow</a>.</strong></p>
<p>Going back over notes and adding links as I have time.</p>
]]></content:encoded>
			<wfw:commentRss>http://biotext.org.uk/workshop-notes-sesl-2009/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

