Top-k queries have been studied intensively in the database community and they are an important means to reduce query cost when only the "best" or "most interesting" results are needed instead of the full output. While some optimality results exist, e.g., the famous Threshold Algorithm, they hold only in a fairly limited model of computation that does not account for the cost incurred by large intermediate results and hence is not aligned with typical database-optimizer cost models. On the other hand, the idea of avoiding large intermediate results is arguably the main goal of recent work on optimal join algorithms, which uses the standard RAM model of computation to determine algorithm complexity. This research has created a lot of excitement due to its promise of reducing the time complexity of join queries with cycles, but it has mostly focused on full-output computation.

We argue that the two areas can and should be studied from a unified point of view in order to achieve optimality in the common model of computation for a very general class of top-k-style join queries. This tutorial has two main objectives. First, we will explore and contrast the main assumptions, concepts, and algorithmic achievements of the two research areas. Second, we will cover recent, as well as some older, approaches that emerged at the intersection to support efficient ranked enumeration of join-query results. These are related to classic work on k-shortest path algorithms and more general optimization problems, some of which dates back to the 1950s. We demonstrate that this line of research warrants renewed attention in the challenging context of ranked enumeration for general join queries.

Part 1: Top-k (pdf| pptx)

Top-k selection problem
Threshold Algorithm
Top-k join problem
J* algorithm
Discussion on cost models

Part 2: Optimal Join Algorithms (pdf| pptx)

Lower Bound and the Yannakakis Algorithm
Problems Caused by Cycles
Tree Decompositions
Summary and Further Reading

Part 3: Ranked Enumeration over Joins (pdf| pptx)

Ranked Enumeration
Top-1 Result for Path Queries
From Top-1 to Any-k

Anyk-Part
Anyk-Rec

Beyond Path Queries
Ranking Function
Open Problems

Authors from the DATA Lab at Northeastern University

Nikolaos Tziavelis (PhD student lead researcher)
Wolfgang Gatterbauer (faculty)
Mirek Riedewald (faculty)

Bibliography

Optimal Join Algorithms meet Top-k

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

SIGMOD tutorials, pp. 2659–2665, 2020

ACM | preprint | arXiv:2005.00448 | gs | bib

For more technical details, please see our Any-k project page and/or the bibliography.

Funding

This work has been supported in part by the National Institutes of Health (NIH) under award number R01 NS091421 and by the National Science Foundation (NSF) under award number CAREER IIS-1762268. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Citation

To cite this tutorial, please use following bibtex entry:

@article{TziavelisGR:2020,
  author = {Nikolaos Tziavelis and Wolfgang Gatterbauer and Mirek Riedewald},
  title = {Optimal Join Algorithms Meet Top-k},
  booktitle = {SIGMOD},
  pages = {2659–2665},
  year = {2020},
  doi = {10.1145/3318464.3383132}
}

Optimal Join Algorithms Meet Top-k

SIGMOD 2020 Tutorial