Gabriella Kazai

Gabriella Kazai

I am Head of Data Science at Lumi, Semion Ltd (, a startup company by the founders of, that provides personalised recommendations of Web content to users based on interests discovered from their browsing history or Twitter streams. Before that, I worked as a research consultant at Microsoft Bing and at Microsoft Research. My background is in computer science with over 9 years of research experience in information retrieval (IR). My work focuses on user-oriented aspects of IR and personal information management (PIM), with influences from HCI. My research interests include recommender systems and their offline/online evaluation, crowdsourcing, IR evaluation, social IR, information seeking behaviour, activity based PIM, book search and personal digital libraries. My current focus is on crowdsourcing and search engine evaluation.

I am one of the founders and organisers of the INEX Book Track since 2007 and the TREC Crowdsourcing track since 2011. I am currently working on a book “Crowdsourcing for Search Engine Evaluation” with Omar Alonso and Stefano Mizzaro.

I hold a PhD in IR from Queen Mary University of London. My PhD work covered the evaluation of focused information retrieval approaches. I published over 90 papers and organized several workshops and an IR conference.

Recent professional activities

  • Organiser of ECIR 2015: PC Chair
  • Organiser of GamifIR at ECIR 2014
  • SIGIR 2014 Workshops Chair and Conference Area Chair
  • Organiser of the TREC 2013 Crowdsourcing Track
  • Keynote talks at IIiX 2014, DIR 2011, SIGIR 2011 Crowdsourcing for IR, KKNTPD 2010
  • Numerous invited talks at, e.g., CWI 2014, MSR PhD Summer Schools, UCL, City University, Glasgow IR group, Search Solutions 2011, etc.
  • I serve as a (Senior)PC member for several conferences, including SIGIR, CIKM, CHI, HComp, ECIR, WI.


  • Recommender systems and their evaluation. I am heading a team of data scientists working with recommendation engineers to evaluate and improve the user experience at, a personalised recommendation engine that lets you discover interesting, relevant or trending content on the Web that you may not have found othewise. Try it out for yourself and send us your feedback!

  • Crowdsourcing Search Relevance: My research in crowdsourcing, in the context of relevance data gathering, focuses on developing methods and metrics to measure the influence of task design decisions on the output quality of crowdsourcing engagements and on the human factors that characterize the crowds. I work on methods for spam worker detection, bias and noise analysis in the resulting labels using gold set data derived from search engine click logs and behavioural observations through, e.g., personality traits or mouse movement logging.

  • IR Evaluation Measures: This line of research builds on my PhD and includes the development of algorithms and methods for evaluating the effectiveness of search systems, taking into account the user's browsing behaviour.

  • Social Information Retrieval: My work in this area focuses on models incorporating the notions of trust and reputation, authoritativeness and popularity to aid personalised retrieval or recommender systems.

  • Book Search: Concerns the development of algorithms and systems for the domain specific searching and browsing of collections of digitized books, see, as well as associated user studies, investigating user’s post-query browsing behaviour.

  • Research Desktop: Designing and developing technologies to aid the everyday work of knowledge workers, with focus on four key areas: 1) Support for activity based computing, 2) Pervasive research tools, 3) Library, and 4) Notes. See also Colletta project.

  • ScholarLynk: Design and development of a Cloud architecture and desktop client prototype supporting the collaborative use and sharing of scholarly search results through reading lists.

  • INEX Book Track: I am founder and organiser of the Book Track at the INEX evaluation initiative since 2007, which investigates techniques to support users in searching and navigating the full texts of digitized books and complementary social media. My current research focuses on methodology and systems for evaluating book search engines and crowdsourcing relevance judgements on parts of books. In this context, I developed a crowdsourcing system for collecting relevance judgements for digitized books as part of a social game.

  • TREC Crowdsourcing Track: I am co-founder and co-organiser of the TREC Crowdsourcing Track with Matthew Lease, Panagiotis G. Ipeirotis and Mark D. Smucker. The track investigates crowdsourcing techniques for IR evaluation for a range of media types and search tasks: textual documents, images, web pages.

Selected Publications

  • O. Alonso, G. Kazai, S. Mizzaro: Crowdsourcing for Search Engine Evaluation. Springer. (In preparation)
  • M. Venanzi, J. Guiver, G. Kazai, P. Kohli, M. Shokouhi: Community-Based Bayesian Aggregation Models for Crowdsourcing. WWW 2014.
  • G. Kazai: Dissimilarity based Query Prioritization for Efficient Preference based IR Evaluation. ECIR 2014.
  • M. Venanzi, J. Guiver, G. Kazai, P. Kohli: Bayesian Combination of Crowd-Based Tweet Sentiment Analysis Judgments. Crowdsourcing at Scale 2013. (Winner of the Shared Task Challenge)
  • G. Kazai, E. Yilmaz, N. Craswell, S.M.M. Tahaghoghi: User intent and assessor disagreement in web search evaluation. CIKM 2013: 699-708.
  • J. Kim, G. Kazai, I. Zitouni: Relevance Dimensions in Preference-based IR Evaluation. SIGIR 2013.
  • G. Kazai, J. Kamps, N. Milic-Frayling: Human Factors and Label Accuracy in Crowdsourcing Relevance Judgements. Information Retrieval Journal, Volume 16, Issue 2 , pp 138-178 , Springer, 2013.
  • M. Koolen, J. Kamps, G. Kazai: Social book search: comparing topical relevance judgements and book suggestions for evaluation. CIKM 2012.
  • M Hosseini, I.J. Cox, N. Milic-Frayling, G. Kazai, V. Vinay: On Aggregating Labels from Multiple Crowd Workers to Infer Relevance of Documents. ECIR 2012.
  • E. Yilmaz, G. Kazai, N. Craswell, S.M.M. Tahaghoghi: On judgments obtained from a commercial search engine. SIGIR 2012.
  • G. Kazai, J. Kamps, N. Milic-Frayling: The face of quality in crowdsourcing relevance labels: demographics, personality and labeling accuracy. CIKM 2012.
  • G. Kazai, N. Craswell, E. Yilmaz, S.M.M. Tahaghoghi: An analysis of systematic judging errors in information retrieval. CIKM 2012.
  • G. Kazai, J. Kamps, M. Koolen, N. Milic-Frayling: Crowdsourcing for Book Search Evaluation: Impact of Quality on Comparative System Ranking. SIGIR 2011.
  • G. Kazai: In Search of Quality in Crowdsourcing for Search Engine Evaluation. ECIR 2011: 165-176.
  • S. Attfield, G. Kazai, M. Lalmas, B. Piwowarski: Towards a science of user engagement, WSDM Workshop on User Modelling for Web Applications 2011.
  • G. Kazai, J. Kamps, N. Milic-Frayling: Worker types and personality traits in crowdsourcing relevance labels. CIKM 2011: 1941-1944.
  • P. Bennett, E. Kamar, G. Kazai: MSRC at TREC 2011 Crowdsourcing Track. TREC 2011.
  • G. Kazai, P. Manghi, K. Iatropoulou, T. Haughton, M. Mikulicic, A. Lempesis, N. Milic-Frayling, N. Manola: Architecture for a Collaborative Research Environment Based on Reading List Sharing. ECDL 2010: 294-306.
  • G. Oleksik, M. Wilson, C. Tashman, E. Mendes Rodrigues, G. Kazai, N. Milic-Frayling, R. Jones: Lightweight Tagging Expands Information and Activity Management Practices. CHI 2009.
  • Chung Tong Lee, E. Mendes Rodrigues, G. Kazai, N. Milic-Frayling, A. Ignjatovic: Model for Voter Scoring and Best Answer Selection in Community Q&A Services. WI 2009.
  • G. Kazai, N. Milic-Frayling, J. Costello: Towards methods for the collective gathering and quality control of relevance assessments. SIGIR 2009.
  • G. Kazai, N. Milic-Frayling: Effects of Social Approval Votes on Search Performance. 6th Intl. Conf. on Information Technology: New Generations (ITNG’09), Social Computing Track, 2009.
  • M. Koolen, G. Kazai, N. Craswell: Wikipedia Pages as Entry Points for Book Search. WSDM 2009.
  • Gabriella Kazai, Natasa Milic-Frayling: Trust, authority and popularity in social information retrieval. CIKM 2008: 1503–1504. Best poster.
  • S. Ali, M. Consens, G. Kazai, M. Lalmas: Structural relevance: A common basis for the evaluation of structured document retrieval. CIKM 2008. Best paper runner up.
  • G. Kazai, B. Piwowarski, S. Robertson: Effort-precision and gain-recall based on a probabilistic navigation model. In Studies in Theory of Information Retrieval (Proceedings of ICTIR 2007), pp. 23–36, Foundation for Information Society, Budapest, 2007.
  • G. Kazai, M. Lalmas: eXtended cumulated gain measures for the evaluation of content-oriented XML retrieval. ACM Trans. Inf. Syst., vol. 24, no. 4, pp. 503-542, 2006.
  • G. Kazai, M. Lalmas, A.P. de Vries: The overlap problem in content-oriented XML retrieval evaluation. SIGIR 2004.
More publications at DBLP and on Google Scholar profiles.