Sunday, November 30, 2008

The TREC 2008 Blog track workshop

We just came back from Gaithersburg a few days ago. It was a nice (and cold!) week at the TREC 2008 conference. Besides presenting the main results of our participation in the Blog, Enterprise, and Relevance Feedback tracks, we had fruitful discussions at the Blog track workshop regarding the directions of the track for 2009.

It was a consensus among the attendees that opinion retrieval and polarity detection are still open, relevant problems. Yet a few groups managed to deploy interesting techniques that achieved consistent opinion retrieval performances across several strongly performing baselines in the track this year, polarity detection approaches looked rather naive. It was suggested that polarity detection be investigated at a finer granularity (e.g., at the sentence rather than the document level). This, however, could result in crossing the boundaries with respect to the TAC conference.

Nonetheless, believing that, after three years, the Blog track has contributed a comprehensive experimental setting for those who wish to continue investigating these search scenarios, the organisers decided to discontinue the opinion finding and polarity tasks, at least in their current format. Instead, they propose to investigate the opinionated nature of blogs as one of many interesting facets of a broader search task. This task extends the current blog distillation task by moving beyond topic relevance and introducing different requirements in order to qualify "good" blogs, i.e., blogs that have a recurrent interest in a given topic and that also fulfil a set of predefined "facets". This way, for instance, one could search for humorous blogs about the government, or opinionated blogs about whisky.

Besides this faceted blog distillation task, a second task was considered relevant and worth investigating by the workshop attendees, namely, tracking stories on the blogosphere. The aim is to investigate how stories emerge and evolve along the time frame of the blog corpus. It was also noted that this task could be linked to a news search task so as to draw a connection between stories published on the blogosphere and on the mainstream media.

As pointed out, however, the 11-weeks time frame of the Blogs06 collection does not adequately support the story tracking task. Furthermore, the availability of a more representative sample of the blogosphere is an important step towards addressing blog search as a social media problem. For such, a new corpus will be used in 2009, with a much larger size and time frame.

For those who did not attend the Blog track workshop at TREC, please feel free to post your comments about the proposed tasks for 2009.

Hope you all join us in the TREC 2009 Blog track!

Saturday, November 15, 2008

TREC 2008

Shortly, we will be travelling to attend the TREC 2008 conference in Gaithersburg, Maryland (18-21 November 2008). We have been very busy analysing the sheer volume of data that was collected in the Blog track this year. Indeed, this year, we ran a very large-scale experiment with the aim to draw a better understanding of the most effective and stable opinion-finding techniques. Moreover, we also tightened up the blog distillation task (feed search task), so as it truly runs as a distillation task. Following the traditional TREC conference cycle, the Blog track 2008 results will be first presented to the TREC 2008 participating groups next year. They will then be made available to all interested parties around February 2009 when the TREC 2008 final Proceedings go online.

Plans for the TREC 2009 Blog track will be discussed and refined during the TREC Blog track workshop in the afternoon of Thursday 20th November.

In addition to our involvement in the organisation of the Blog track, we will be giving a presentation on the work we did this year in the newly introduced Relevance Feedback track. We have also prepared two posters summarising our results in the Enteprise and Blog tracks.

It looks like we are set for a very exciting and busy week. We hope to see many of you in TREC.

Monday, November 10, 2008


We are continuing organising events in Glasgow. After the ESSIR2007 summer school and the ECIR2008 conference, we will be organising the second Practical Semantic Astronomy Workshop (SEMAST 2009) from 2nd to 5th March 2009.

The Practical Semantic Astronomy 2009 is the second in a series of workshops first held at Caltech in February 2008. The workshop brings together experts from a broad range of disciplines using semantic technologies, alongside practitioners experimenting with these techniques, to address current problems in astroinformatics.

Our involvement in the organisation of this workshop is under the auspices of the Explicator project, where we have been working with astronomers and physicists on developing techniques to provide intelligent access to multiple sources. The Explicator project supports the efforts of the Virtual Observatory community.

The Virtual Observatory is a loose planet-wide collaboration of astronomy computing projects, aiming to make available the high-volume and rich data of astronomy. Although astronomical data is generally well-described, it is very dispersed, so that there is a substantial data-discovery problem, making it fertile ground for the sorts of semantic approaches applied with such success in other disciplines.

The Explicator project aims to bridge the gap between information retrieval and semantic web technologies in a domain-specific application. The SEMAST 2009 workshop is a continuation of this effort. We hope to see many of you in Glasgow.