We study ranked enumeration of join-query results according to very general orders defined by selective dioids (semirings with an ordering property). Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. By performing a careful cost analysis, we are able to uncover a previously unknown trade-off between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.

Above 3 figures illustrate the power of our approach for simple path queries of increasing length ℓ, yet constant output size |OUT| = 10 Million tuples.
LEFT: For a path query of length ℓ=4, the naive approach of batch sorting all answers and then returning them in sorted order takes 7.7 sec to return the top result (TTF = time-to-first), and 8.2 sec to return the last (TTL = time-to-last). In contrast, our approach has TTF = 0.05 sec for the top result (and thus more than 100 times faster) and TTL = 7.7 sec for one variant called recursive enumeration ("Anyk-Rec"). In other words, we not only get the first faster, we even get all results faster.
MIDDLE: The advantage of our approach for TTF increases with the size of the query (more than 1000 times faster for a path query of length ℓ=6).
RIGHT: TTL for our approach gets increasingly faster the longer the query is. The intuitive reason is that Anyk-Rec exploits some algebraic properties of relational joins and reduces the number of comparisons needed for sorting all output results.

People & Affiliations

DATA Lab team members
Nikolaos Tziavelis (PhD student lead researcher)
Mirek Riedewald (faculty)
External collaborators and past students
Nofar Carmeli (Technion, Israel)
Benny Kimelfeld (Technion, Israel)
Deepak Ajwani (University College Dublin)
Xiaofeng Yang (former DATA Lab PhD student)
Northeastern University, Khoury College of Computer Sciences University College Dublin, School of Computer Science Technion, Faculty of Computer Science

Publications

Efficient Ranked Access over Joins
Nikolaos Tziavelis
VLDB PhD Workshop 2023
Best Paper Award
Overview of this line of work.
Efficient Computation of Quantiles over Joins
Nikolaos Tziavelis, Nofar Carmeli, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald
Proposes algorithms for finding quantiles (e.g., the median) without materializing the join. For cases that are provably hard, gives approximation algorithms.
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
Nofar Carmeli*, Nikolaos Tziavelis*, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald
Extended version of the earlier PODS paper. Completes all dichotomy results and extends the study to Functional Dependencies.
Any-k Algorithms for Enumerating Ranked Answers to Conjunctive Queries
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
main paper
Extended version of the earlier paper "Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries". Proposes the asymptotically fastest ranked enumeration approach called Anyk-Part+.
Toward Responsive DBMS: Optimal Join Algorithms, Enumeration, Factorization, Ranking, and Dynamic Programming
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
Explores connections between a wide range of fundamental problems in computer science, including dynamic programming, factorized representations, join algorithms, and answer enumeration.
Beyond Equi-joins: Ranking, Enumeration and Factorization
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
PVLDB, 14(11):2599-2612, 2021 Experiments reproduced by PVLDB
Develops methods for ranked enumeration of answer to queries with general join predicates that involve conjunctions and disjunctions of inequalities. Surprisingly, the suggested solution is within a polylogarithmic factor of the best-known guarantee for equi-joins.
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
Nofar Carmeli*, Nikolaos Tziavelis*, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald
PODS, pp. 325-341, 2021
(Invited to the Special Issue of TODS on "best of PODS 2021")
Poses a new question and gives partial answers: When can we allow for direct access to a ranked list of answers to a database query without (and considerably faster than) materializing all answers?
Optimal Join Algorithms meet Top-k
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
Presents a unified point of view for studying optimal join algorithms and top-k query evaluation.
Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries
Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, Xiaofeng Yang
PVLDB 13(9):1582-1597, 2020
Experiments reproduced by PVLDB
First paper that generalizes dynamic programming over certain algebraic structures called "selective dioids" (semirings with an ordering property) to optimal ranked enumeration of answers. Applies the theory to the special case of ranked enumeration over full conjunctive queries. The full version also treats projections.
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs
Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K. Nicholson, Mirek Riedewald, Alessandra Sala
Gives a ranked enumeration algorithm for tree patterns (or acyclic graph queries). Introduces the idea of any-k: a marriage between anytime algorithms and top-k evaluation.
Any-k Algorithms for Exploratory Analysis with Conjunctive Queries
Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer
Introduces the idea of ranked enumeration of answers to cyclic CQs based on a union of multiple tree decompositions.

Code

We have developed a prototype implementation of our any-k framework in Java. The repository also includes instructions to reproduce the experiments of our VLDB 2020 paper.

Video tutorial and key ideas

The recordings of our VLDB 2020 talk and our SIGMOD 2020 tutorial give an easy and intuitive introduction to the key ideas. We also included a detailed bibliography on the tutorial page.

VLDB 2020:

Part-3 of Tutorial at SIGMOD 2020:

Funding

This work has been supported in part by the National Institutes of Health (NIH) under award number R01 NS091421 and by the National Science Foundation (NSF) under award numbers CAREER IIS-1762268 and IIS-1956096. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

National Institutes of Health National Science Foundation

Citations

To cite the main paper of the project (the VLDB 2020 one), please use following bibtex entry:

@article{TziavelisAGRY:2020,
  author = {Nikolaos Tziavelis and Deepak Ajwani and Wolfgang Gatterbauer and Mirek Riedewald and Xiaofeng Yang},
  title = {Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries},
  journal = {Proc. {VLDB} Endow.},
  volume = {13},
  number = {9},
  pages = {1582--1597},
  year = {2020},
  doi = {10.14778/3397230.3397250}
}