Dichotomy theorems, which characterize the conditions under which a problem can be solved efficiently, have helped identify important tractability borders for problems such as probabilistic query evaluation, view maintenance, and query containment. However, dichotomy theorems for many such problems remain elusive under key settings such as bag semantics or for queries with self-joins. This work aims to unearth dichotomies (i.e. the frontiers of tractability) for fundamental problems in reverse data management and knowledge representation.

We discuss a novel approach for solving these problems: instead of creating dedicated algorithms for easy (PTIME) and hard (NP-complete) cases, we suggest using a unified algorithm that is guaranteed to terminate in PTIME for all easy cases. Our algorithm is also unified in that it can solve both previously studied restrictions and new cases (e.g., CQs with self-joins under set or bag semantics). Our approach opens up the door to new variants and new fine-grained analysis: we have discovered new tractable cases for the problem of minimal factorization of provenance formulas as well as dichotomies under bag semantics for the problems of resilience and causal responsibility.


A Unified Approach for Resilience and Causal Responsibility with Integer Linear Programming (ILP) and LP Relaxations
Neha Makhija, Wolfgang Gatterbauer
PACMMOD, 1(4), Article 228, December 2023. (SIGMOD'24)
Proposes a unified approach for solving reverse data management problems (including various interventions for fairness): all cases (including self-joins, bags, PTIME or NP-complete cases) are solved with the same algorithm. The algorithm just "happens" to terminate in PTIME for all known PTIME cases.
Discovering Dichotomies for Problems in Database Theory
Neha Makhija
VLDB 2023 PhD Workshop
Towards a Dichotomy for Minimally Factorizing the Provenance of Self-Join Free Conjunctive Queries
Neha Makhija, Wolfgang Gatterbauer
Proposes the problem of finding the minimal-size factorization of the provenance of queries. Shows it to be a natural generalization and to recover all known PTIME cases of the problem of exact probabilistic inference. Makes first steps towards a dichotomy of the data complexity of the problem.

Authors from the DATA Lab at Northeastern University

Northeastern University, Khoury College of Computer Sciences


This work has been supported in part by the National Science Foundation (NSF) under award numbers IIS-1762268 and IIS-1956096, and conducted in part while the authors were visiting the Simons Institute for the Theory of Computing. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the Funding Agencies.

National Science Foundation Simons Institute for the Theory of Computing