Any-k: Optimal ranked enumeration for conjunctive queries

We study ranked enumeration of join-query results according to very general orders deﬁned by selective dioids (semirings with an ordering property). Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. By performing a careful cost analysis, we are able to uncover a previously unknown trade-oﬀ between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.

LEFT: For a path query of length ℓ=4, the naive approach of batch sorting all answers and then returning them in sorted order takes 7.7 sec to return the top result (TTF = time-to-first), and 8.2 sec to return the last (TTL = time-to-last). In contrast, our approach has TTF = 0.05 sec for the top result (and thus more than 100 times faster) and TTL = 7.7 sec for one variant called recursive enumeration ("Anyk-Rec"). In other words, we not only get the first faster, we even get all results faster.

MIDDLE: The advantage of our approach for TTF increases with the size of the query (more than 1000 times faster for a path query of length ℓ=6).

RIGHT: TTL for our approach gets increasingly faster the longer the query is. The intuitive reason is that Anyk-Rec exploits some algebraic properties of relational joins and reduces the number of comparisons needed for sorting all output results.

People & Affiliations

Nikolaos Tziavelis (PhD student lead researcher)

Wolfgang Gatterbauer (faculty)

Mirek Riedewald (faculty)

Nofar Carmeli (Technion, Israel)

Benny Kimelfeld (Technion, Israel)

Deepak Ajwani (University College Dublin)

Xiaofeng Yang (former DATA Lab PhD student)

Publications

Efficient Ranked Access over Joins

Nikolaos Tziavelis

VLDB PhD Workshop 2023
Best Paper Award

CEUR-WS | slides| gs

Overview of this line of work.

Efficient Computation of Quantiles over Joins

Nikolaos Tziavelis, Nofar Carmeli, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

PODS, pp. 303-315, 2023

ACM | arXiv:2305.16525 | video (17min) | slides| gs

Proposes algorithms for finding quantiles (e.g., the median) without materializing the join. For cases that are provably hard, gives approximation algorithms.

Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

Nofar Carmeli*, Nikolaos Tziavelis*, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

TODS, pp. 1-45, 2023

Extended version of the earlier PODS paper. Completes all dichotomy results and extends the study to Functional Dependencies.

Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

arXiv:2205.05649, 2022

arXiv:2205.05649 | gs | bib

main paper

Extended version of the earlier paper "Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries". Proposes the asymptotically fastest ranked enumeration approach called Anyk-Part+.

Toward Responsive DBMS: Optimal Join Algorithms, Enumeration, Factorization, Ranking, and Dynamic Programming

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

ICDE tutorials, pp. 3205-3208, 2022

Tutorial page: Toward Responsive DBMS

Explores connections between a wide range of fundamental problems in computer science, including dynamic programming, factorized representations, join algorithms, and answer enumeration.

Beyond Equi-joins: Ranking, Enumeration and Factorization

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

PVLDB, 14(11):2599-2612, 2021 Experiments reproduced by PVLDB

Develops methods for ranked enumeration of answer to queries with general join predicates that involve conjunctions and disjunctions of inequalities. Surprisingly, the suggested solution is within a polylogarithmic factor of the best-known guarantee for equi-joins.

Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries

Nofar Carmeli*, Nikolaos Tziavelis*, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald

PODS, pp. 325-341, 2021
(Invited to the Special Issue of TODS on "best of PODS 2021")

Poses a new question and gives partial answers: When can we allow for direct access to a ranked list of answers to a database query without (and considerably faster than) materializing all answers?

Optimal Join Algorithms meet Top-k

Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald

SIGMOD tutorials, pp. 2659–2665, 2020

Tutorial page: Optimal Join Algorithms meet Top-k

Presents a unified point of view for studying optimal join algorithms and top-k query evaluation.

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries

Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, Xiaofeng Yang

PVLDB 13(9):1582-1597, 2020
Experiments reproduced by PVLDB

First paper that generalizes dynamic programming over certain algebraic structures called "selective dioids" (semirings with an ordering property) to optimal ranked enumeration of answers. Applies the theory to the special case of ranked enumeration over full conjunctive queries. The full version also treats projections.

Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs

Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K. Nicholson, Mirek Riedewald, Alessandra Sala

WWW, pp. 489-498, 2018

ACM | preprint (long) | arXiv:1802.06060 (long) | gs | bib

Gives a ranked enumeration algorithm for tree patterns (or acyclic graph queries). Introduces the idea of any-k: a marriage between anytime algorithms and top-k evaluation.

Any-k Algorithms for Exploratory Analysis with Conjunctive Queries

Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer

ExploreDB@SIGMOD, pp. 1-3, 2018

ACM | preprint | slides | gs | bib

Introduces the idea of ranked enumeration of answers to cyclic CQs based on a union of multiple tree decompositions.

Code

Video tutorial and key ideas

The recordings of our VLDB 2020 talk and our SIGMOD 2020 tutorial give an easy and intuitive introduction to the key ideas. We also included a detailed bibliography on the tutorial page.

Funding

This work has been supported in part by the National Institutes of Health (NIH) under award number R01 NS091421 and by the National Science Foundation (NSF) under award numbers CAREER IIS-1762268 and IIS-1956096. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Citations

To cite the main paper of the project (the VLDB 2020 one), please use following bibtex entry: