International Workshop on Data Search

July 12th, 2018, Ann Arbor Michigan, USA. Co-located with SIGIR 2018.


We are excited to announce Professor Krisztian Balog as our keynote speaker.
His talk is titled "Table Retrieval and Generation".


As more and more data becomes available on the web, searching for it becomes an increasingly important, timely topic. The web hosts a whole range of new data species, published in structured and semi-structured formats - from web markup using and web tables to open government data portals, knowledge bases such as Wikidata and scientific data repositories. Just like any other resources on the web, data benefits from network effects - it becomes more useful, and creates more value, when it is discoverable.

The opportunities to share and establish links between different perspectives on search and discovery for different kinds of data are significant and can inform the design of a wide range of information retrieval technologies, including search engines, recommender systems and conversational agents. We will seek contributions and encourage interactions to discuss how principles, techniques and experiences could be applied across research elds that so far have mostly pursued related data search questions in isolation. We see a large space for discussion and future research in the development of federated data discovery and search technologies, which leverages the most recent advances in information retrieval, Semantic Web and databases, and is mindful of human factors.

The aim of the workshop is to be a venue to present and exchange ideas and experiences for discovering and searching all types of structured or semi-structured datasets and to discuss how concepts and lessons learned from academic search, entity search, digital libraries, and web search could be transferred to data search scenarios. We want to facilitate a discussion around data search across formats and domain-specific applications. We envision the workshop as a forum for researchers and practitioners from various disciplines to come together and discuss common challenges and identify synergies for joint initiatives.

DATA:SEARCH’18 is not meant to be a mini-conference, but a highly interactive event.

Important Dates

Workshop papers due: May 4, 2018   May 11, 2018
Workshop paper notifications: May 25, 2018
Camera-ready deadline for workshop papers: June 8, 2018
Workshop Day: July 12, 2018


  • Analyzing behavioral traces during data search
  • Approaches to personalization and contextualization in dataset search
  • Data indexing and profiling approaches
  • Data summarization
  • Dataset representation for retrieval (standards, models, workarounds)
  • Decentralized and distributed architectures and algorithms in data search
  • Deep linking of datasets
  • Entity recognition in datasets
  • Evaluation of dataset search tools and algorithms
  • Fusing, cleaning, ranking and re ning dataset search results
  • Information seeking behavior for data (interactive data retrieval)
  • Learning to rank for data search
  • Query routing taking into account relevance, quality and profiles of distributed datasets
  • Retrieval models for data search
  • Scalability and performance of distributed data queries
  • Search results presentation for datasets
  • Semantic dataset search
  • Systems and user studies in data search in vertical domains, including transport, geospatial data, science, weather etc.
  • Usability of data portals and data discovery tools
  • User modeling for data search
  • Visual and speech interfaces to datasets

Submission Guidelines

We encourage short papers (4 pages), position papers (2 pages) as well as demo submissions (1 page plus online demo).

Submissions of workshop papers must be in English, in PDF format, and should not exceed the appropriate length requirements in the current ACM two-column conference format. Submissions must describe work that is not previously published, not accepted for publication elsewhere, and not currently under review elsewhere.

We will follow a single-blind process with at least two reviewers per paper. Papers will be evaluated according to their significance, originality, technical content, style, clarity, relevance to the workshop, and likelihood of generating discussion. All papers are to be submitted via Easychair.

Tentative Schedule

Time Programme
1:30-2:30 Introduction + Keynote:
Krisztian Balog - Table Retrieval and Generation
2:35-3:00 Paper presentation:
Recognizing Quantity Names for Tabular Data
by Yang Yi, Zhiyu Chen, Jeff Heflin and Brian Davison
3:00-3:30 Coffee break
3:30-4:30 Lightning talks:
Philipp Mayr - Searching beyond datasets in the Social Sciences
Emilia Kacprzak - Discussing data search queries
Brian Davison - Searching for datasets
Jamie Callan - Scientific table search using keyword queries
'data search versus document search'
4:30-5:00 Summary of discussion & call to action


Paul Groth, Elsevier Labs

Laura Koesten, The Open Data Institute

Philipp Mayr, GESIS Leibniz-Institute for the Social Sciences

Maarten de Rijke, University of Amsterdam

Elena Simperl, University of Southampton

Program Committee

  • Alexander Kotov
  • Arjen de Vries
  • Arno Scharl
  • Axel Polleres
  • Eva Méndez
  • Kuansan Wang
  • Laura Dietz
  • Michael Gubanov
  • Peter Haase
  • Steffen Lohmann


For questions about the workshop contact: