Optimal ranked enumeration for conjunctive queries
We study ranked enumeration of join-query results according to very general orders defined by selective dioids (semirings with an ordering property).
Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. By performing a careful cost analysis, we are able to uncover a previously unknown trade-off between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.
Above 3 figures illustrate the power of our approach for simple path queries
of increasing length ℓ, yet constant output size |OUT| = 10 Million tuples.
LEFT: For a path query of length ℓ=4,
the naive approach of batch sorting all answers and then returning them in sorted order takes 7.7 sec to return the top result (TTF = time-to-first), and 8.2 sec to return the last (TTL = time-to-last).
In contrast, our approach has TTF = 0.05 sec for the top result (and thus more than 100 times faster)
and TTL = 7.7 sec for one variant called recursive enumeration ("Anyk-Rec").
In other words, we not only get the first faster, we even get all results faster.
MIDDLE: The advantage of our approach for TTF increases with the size of the query (more than 1000 times faster for a path query of length ℓ=6).
RIGHT: TTL for our approach gets increasingly faster the longer the query is.
The intuitive reason is that Anyk-Rec exploits some algebraic properties of relational joins and reduces the number of comparisons needed for sorting all output results.
Proposes algorithms for finding quantiles (e.g., the median) without materializing the join.
For cases that are provably hard, gives approximation algorithms.
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
Nofar Carmeli*, Nikolaos Tziavelis*, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald
Extended version of the earlier paper "Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries".
Proposes the asymptotically fastest ranked enumeration approach called Anyk-Part+.
Explores connections between a wide range of fundamental problems in computer science, including dynamic programming, factorized representations, join algorithms, and answer enumeration.
Beyond Equi-joins: Ranking, Enumeration and Factorization
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
Develops methods for ranked enumeration of answer to queries with general join predicates that involve conjunctions and disjunctions of inequalities. Surprisingly, the suggested solution is within a polylogarithmic factor of the best-known guarantee for equi-joins.
Tractable Orders for Direct Access to Ranked Answers of Conjunctive Queries
Nofar Carmeli*, Nikolaos Tziavelis*, Wolfgang Gatterbauer, Benny Kimelfeld, Mirek Riedewald
Poses a new question and gives partial answers: When can we allow for direct access to a ranked list of answers to a database query without (and considerably faster than) materializing all answers?
Optimal Join Algorithms meet Top-k
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
First paper that generalizes dynamic programming over certain algebraic structures called "selective dioids" (semirings with an ordering property) to optimal ranked enumeration of answers. Applies the theory to the special case of ranked enumeration over full conjunctive queries. The full version also treats projections.
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs
Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K. Nicholson, Mirek Riedewald, Alessandra Sala
Gives a ranked enumeration algorithm for tree patterns (or acyclic graph queries).
Introduces the idea of any-k: a marriage between anytime algorithms and top-k evaluation.
Any-k Algorithms for Exploratory Analysis with Conjunctive Queries
Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer
The recordings of our VLDB 2020 talk and our SIGMOD 2020 tutorial give an easy and intuitive introduction to the key ideas. We also included a
detailed bibliography on the tutorial page.
This work has been supported in part
by the National Institutes of Health (NIH) under award number R01 NS091421 and by the National Science Foundation (NSF) under award numbers CAREER IIS-1762268 and IIS-1956096. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.
Citations
To cite the main paper of the project (the VLDB 2020 one), please use following bibtex entry:
@article{TziavelisAGRY:2020,
author = {Nikolaos Tziavelis and Deepak Ajwani and Wolfgang Gatterbauer and Mirek Riedewald and Xiaofeng Yang},
title = {Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries},
journal = {Proc. {VLDB} Endow.},
volume = {13},
number = {9},
pages = {1582--1597},
year = {2020},
doi = {10.14778/3397230.3397250}
}