This work considers why-not questions in the context of top-k queries and score-based ranking functions. Following the popular linear scalarization approach for multi-objective optimization, we study rankings based on the weighted sum of multiple scores. A given weight choice may be controversial or perceived as unfair to certain individuals or organizations, triggering the question why some entity of interest has not yet shown up in the top-k. We introduce various notions of such why-not-yet queries and formally define them as satisfiability or optimization problems, whose goal is to propose alternative ranking functions that address the placement of the entities of interest. While some why-not-yet problems have linear constraints, others require quantifiers, disjunction, and negation. We propose several optimizations, ranging from a monotonic-core construction that approximates the complex constraints with a conjunction of linear ones, to various techniques that let the user control the tradeoff between running time and approximation quality. Experiments with real and synthetic data demonstrate the practicality and scalability of our technique, showing its superiority compared to the state of the art (SOA).

People & Affiliations

Zixuan Chen (PhD student lead researcher)
Mirek Riedewald (faculty)
Northeastern University, Khoury College of Computer Sciences

Publications

Why Not Yet: Fixing a Top-k Ranking that Is Not Fair to Individuals
Zixuan Chen, Panagiotis Manolios, Mirek Riedewald
defines new why-not-yet problems, formalizes them as satisfaction or optimization problems with linear constraints and proposes optimizations and approximation techniques that trade result quality for improved running time.

Code

We provide an implementation of the framework for answering why-not-yet questions over top-k queries in Java at this Github repository. The repository also includes code and data to reproduce the experiments from our VLDB 2023 paper: Why not yet: Fixing a top-k ranking that is not fair to individuals.

Funding

This work was supported in part by the National Science Foundation (NSF) under award number IIS-1956096.

National Science Foundation