We study ranked enumeration of join-query results according to very general orders defined by selective dioids. Our main contribution is a framework for ranked enumeration over a class of dynamic programming problems that generalizes seemingly different problems that had been studied in isolation. To this end, we extend classic algorithms that find the k-shortest paths in a weighted graph. For full conjunctive queries, including cyclic ones, our approach is optimal in terms of the time to return the top result and the delay between results. These optimality properties are derived for the widely used notion of data complexity, which treats query size as a constant. By performing a careful cost analysis, we are able to uncover a previously unknown trade-off between two incomparable enumeration approaches: one has lower complexity when the number of returned results is small, the other when the number is very large. We theoretically and empirically demonstrate the superiority of our techniques over batch algorithms, which produce the full result and then sort it. Our technique is not only faster for returning the first few results, but on some inputs beats the batch algorithm even when all results are produced.

Above 3 figures illustrate the power of our approach for simple path queries R1(x1,x2), R2(x2,x3), ..., R(x, xℓ+1) of increasing length ℓ yet constant output size |OUT| = 10 Mio tuples. LEFT: For a path query of length ℓ=4, the naive approach of batch sorting all answers and then returning them in sorted order takes 7.7 seconds to return the top result (TTF = time-to-first), and 8.2 seconds to return the last (TTL = time-to-last). In contrast, our approach has TTF = 0.05 seconds for the top result (and thus more than 100 times faster) and TTL = 7.7 seconds for one variant called recursive enumeration ("Anyk-Rec"). In other words, we get both the first and last result faster. MIDDLE: The advantage of our approach for TTF increases with the size of the query. RIGHT: TTL for our approach gets increasingly faster the longer the query is. The intuitive reason is that Anyk-Rec exploits some algebraic properties of relational joins and reduces the number of comparisons needed for sorting all output results.

People & Affiliations

DATA Lab team members
Nikolaos Tziavelis (PhD student lead researcher)
Mirek Riedewald (faculty)
External collaborators and past students
Deepak Ajwani (University College Dublin)
Xiaofeng Yang (former DATA Lab PhD student)
Northeastern University, Khoury College of Computer Sciences University College Dublin, School of Computer Science

Publications

Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries
Nikolaos Tziavelis, Deepak Ajwani, Wolfgang Gatterbauer, Mirek Riedewald, Xiaofeng Yang
Optimal Join Algorithms meet Top-k
Nikolaos Tziavelis, Wolfgang Gatterbauer, Mirek Riedewald
Any-k: Anytime Top-k Tree Pattern Retrieval in Labeled Graphs
Xiaofeng Yang, Deepak Ajwani, Wolfgang Gatterbauer, Patrick K. Nicholson, Mirek Riedewald, Alessandra Sala
Any-k Algorithms for Exploratory Analysis with Conjunctive Queries
Xiaofeng Yang, Mirek Riedewald, Rundong Li, Wolfgang Gatterbauer

Part-3 of Tutorial at SIGMOD 2020

Funding

This work has been supported in part by the National Institutes of Health (NIH) under award number R01 NS091421 and by the National Science Foundation (NSF) under award number CAREER IIS-1762268. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

National Institutes of Health National Science Foundation

Citations

To cite our VLDB 2020 paper, please use following bibtex entry:

@article{TziavelisAGRY:2020,
  author = {Nikolaos Tziavelis and Deepak Ajwani and Wolfgang Gatterbauer and Mirek Riedewald and Xiaofeng Yang},
  title = {Optimal Algorithms for Ranked Enumeration of Answers to Full Conjunctive Queries},
  journal = {Proc. {VLDB} Endow.},
  volume = {13},
  number = {9},
  pages = {1582--1597},
  year = {2020},
  doi = {10.14778/3397230.3397250}
}