Reverse Data Management problems are an important, well-studied class of problems in database literature. Unlike traditional Forward Data Management, where the focus is on applying transformations (queries) on data to produce results, Reverse Data Management involves determining the changes needed to achieve a desired outcome post transformation.
In other words, Reverse Data Management asks, "What modifications (interventions) should be made to the input data so that when a given transformation (query) is applied to it, the resulting output satisfies specific conditions?"
Decades of research on traditional data managment has lead to the modern Query Optimizer, which can efficiently find the optimal execution plans for a variety of different Data Management problems. However, no similar solution exists for Reverse Data Management problems. The known algorithms and tractability criteria for different RDM problems are significantly different. Furthermore, in cases of bag semantics or queries with self-joins, many computational complexity questions are still open.
We discuss a novel approach for solving these problems: instead of creating dedicated algorithms for easy (PTIME) and hard (NP-complete) cases, we suggest using a unified algorithm that is guaranteed to terminate in PTIME for all easy cases. Our algorithm is also unified in that it can solve both previously studied restrictions and new cases (e.g., CQs with self-joins under set or bag semantics).
Our approach opens up the door to new variants and new fine-grained analysis: we have discovered new tractable cases for the problem of minimal factorization of provenance formulas as well as dichotomies under bag semantics for the problems of resilience and causal responsibility.
The recordings of our conference talks at SIGMOD 2024 and PODS 2024 give an easy and intuitive introduction to the key ideas.
The SIGMOD 2024 talk is an overview of the unified approach and its application to the problems of resilience and causal responsibility. The PODS 2024 talk is a overview of the minimal factorization of provenance formulas and how we can apply the unified approach for this problem as well.
SIGMOD 2024:
PODS 2024:
This work has been supported in part by the National Science Foundation (NSF) under award numbers IIS-1762268 and IIS-1956096, and conducted in part while the authors were visiting the Simons Institute for the Theory of Computing. Any opinions, findings, and conclusions or recommendations expressed in this project are those of the author(s) and do not necessarily reflect the views of the Funding Agencies.