Dichotomy theorems, which characterize the conditions under which a problem can be solved efficiently, have helped identify important tractability borders for problems such as probabilistic query evaluation, view maintenance, and query containment. However, dichotomy theorems for many such problems remain elusive under key settings such as bag semantics or for queries with self-joins. This work aims to unearth dichotomies (i.e. the frontiers of tractability) for fundamental problems in reverse data management and knowledge representation.

We discuss a novel approach for solving these problems: instead of creating dedicated algorithms for easy (PTIME) and hard (NP-complete) cases, we suggest using a unified algorithm that is guaranteed to terminate in PTIME for all easy cases. Our algorithm is also unified in that it can solve both previously studied restrictions and new cases (e.g., CQs with self-joins under set or bag semantics). Our approach opens up the door to new variants and new fine-grained analysis: we have discovered new tractable cases for the problem of minimal factorization of provenance formulas as well as dichotomies under bag semantics for the problems of resilience and causal responsibility.


A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations
Neha Makhija, Wolfgang Gatterbauer
Proposes a unified approach for solving reverse data management problems: all cases (including self-joins, bags, PTIME or NP-complete cases) are solved with the same algorithm. The algorithm just "happens" to terminate in PTIME for all known PTIME cases. We apply our techniques with sucess to the problems of resilience and causal responsibility, and envision that this approach can be applied to other problems in reverse data management - including various interventions for fairness and explanations. Also suggests a non-trivial Disjunctive Logic Program that can automatically find "hardness certificates" (think gadgets) for the resilience problem over conjunctive queries.
Minimally Factorizing the Provenance of Self-join Free Conjunctive Queries
Neha Makhija, Wolfgang Gatterbauer
Proposes the problem of finding the minimal-size factorization of the provenance of queries. Shows it to be a natural generalization ofand to recover all known PTIME cases of the problem of exact probabilistic inference. Makes first steps towards a dichotomy of the data complexity of the problem.
Discovering Dichotomies for Problems in Database Theory
Neha Makhija
VLDB 2023 PhD Workshop

Authors from the DATA Lab at Northeastern University

Northeastern University, Khoury College of Computer Sciences


This work has been supported in part by the National Science Foundation (NSF) under award numbers IIS-1762268 and IIS-1956096, and conducted in part while the authors were visiting the Simons Institute for the Theory of Computing. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the Funding Agencies.

National Science Foundation Simons Institute for the Theory of Computing