I am Head of Data Science at Lumi, Semion Ltd (www.lumi.do), a startup company by the founders of last.fm, that provides personalised recommendations of Web content to users based on interests discovered from their browsing history or Twitter streams. Before that, I worked as a research consultant at Microsoft Bing and at Microsoft Research. My background is in computer science with over 9 years of research experience in information retrieval (IR). My work focuses on user-oriented aspects of IR and personal information management (PIM), with influences from HCI. My research interests include recommender systems and their offline/online evaluation, crowdsourcing, IR evaluation, social IR, information seeking behaviour, activity based PIM, book search and personal digital libraries. My current focus is on crowdsourcing and search engine evaluation.
I am one of the founders and organisers of the INEX Book Track since 2007 and the TREC Crowdsourcing track since 2011. I am currently working on a book “Crowdsourcing for Search Engine Evaluation” with Omar Alonso and Stefano Mizzaro.
I hold a PhD in IR from Queen Mary University of London. My PhD work covered the evaluation of focused information retrieval approaches. I published over 90 papers and organized several workshops and an IR conference.
Recommender systems and their evaluation. I am heading a team of data scientists working with recommendation engineers to evaluate and improve the user experience at www.lumi.do, a personalised recommendation engine that lets you discover interesting, relevant or trending content on the Web that you may not have found othewise. Try it out for yourself and send us your feedback!
Crowdsourcing Search Relevance: My research in crowdsourcing, in the context of relevance data gathering, focuses on developing methods and metrics to measure the influence of task design decisions on the output quality of crowdsourcing engagements and on the human factors that characterize the crowds. I work on methods for spam worker detection, bias and noise analysis in the resulting labels using gold set data derived from search engine click logs and behavioural observations through, e.g., personality traits or mouse movement logging.
IR Evaluation Measures: This line of research builds on my PhD and includes the development of algorithms and methods for evaluating the effectiveness of search systems, taking into account the user's browsing behaviour.
Social Information Retrieval: My work in this area focuses on models incorporating the notions of trust and reputation, authoritativeness and popularity to aid personalised retrieval or recommender systems.
Book Search: Concerns the development of algorithms and systems for the domain specific searching and browsing of collections of digitized books, see www.booksearch.org.uk, as well as associated user studies, investigating user’s post-query browsing behaviour.
Research Desktop: Designing and developing technologies to aid the everyday work of knowledge workers, with focus on four key areas: 1) Support for activity based computing, 2) Pervasive research tools, 3) Library, and 4) Notes. See also Colletta project.
ScholarLynk: Design and development of a Cloud architecture and desktop client prototype supporting the collaborative use and sharing of scholarly search results through reading lists.
INEX Book Track: I am founder and organiser of the Book Track at the INEX evaluation initiative since 2007, which investigates techniques to support users in searching and navigating the full texts of digitized books and complementary social media. My current research focuses on methodology and systems for evaluating book search engines and crowdsourcing relevance judgements on parts of books. In this context, I developed a crowdsourcing system for collecting relevance judgements for digitized books as part of a social game.
TREC Crowdsourcing Track: I am co-founder and co-organiser of the TREC Crowdsourcing Track with Matthew Lease, Panagiotis G. Ipeirotis and Mark D. Smucker. The track investigates crowdsourcing techniques for IR evaluation for a range of media types and search tasks: textual documents, images, web pages.