CS 7240: Topics and approximate agenda (Spring'20)
This schedule will be updated regularly as the course goes on. Check back here frequently!
Topic 1: Data models and query languages
-
Lecture 1 (Tue 1/7):
Course introduction, SQL refresher
-
Lecture 2 (Fri 1/10):
SQL continued, Logic & relational calculus
-
Lecture 3 (Tue 1/14):
Relational calculus & relational algebra
-
Lecture 4 (Fri 1/17):
Relational algebra & Codd's theorem, Datalog
-
Lecture 5 (Tue 1/21):
Datalog and more expressive variations
-
Lecture 6 (Fri 1/24): (A1 due)
Datalog vs. stable model semantics
-
Lecture 7 (Tue 1/28):
Datalog evaluation strategies,
NoSQL
Pointers to relevant concepts & supplementary material:
-
SQL refresher: [SAMS'12], [Cow'03] Ch3 & Ch5, [Complete'08] Ch6
-
First-order logic,
relational calculus:
[Barland+'08] 4.1.2 & 4.2.1 & 4.4,
[Genesereth+] Ch6,
[Halpern+'01],
[Cow'03] Ch4.3 & 4.4,
[Elamsri, Navathe'15] Ch8.6 & Ch8.7,
[Silberschatz+'10] Ch6.2 & Ch6.3
[Alice'95] Ch3.1-3.3 & Ch4.2 & Ch4.4 & Ch5.3-5.4
-
Relational algebra,
Codd's theorem:
[Cow'03] Ch4.2,
[Complete'08] Ch2.4 & Ch5.1-5.2,
[Elmasri, Navathe'15] Ch8,
[Silberschatz+'10] Ch6.1,
[Alice'95] Ch4.4 & Ch5.4
-
Datalog, stable model semantics:
[Complete'08] Ch5.3, [Cow'03] Ch 24,
[Koutris'19] L9 & L10,
[Gatterbauer, Suciu'10]
-
Alternative data models, NoSQL:
[Hellerstein, Stonebraker'05], [Sadalage, Fowler'12], [Harrison'16]
Topic 2: Complexity of query evaluation
-
Lecture 8 (Fri 1/31):
Query equivalence, homomorphisms
-
Lecture 9 (Tue 2/4):
Homomorphism & conjunctive query containment
-
Lecture 10 (Fri 2/7): (A2 due)
Query minimization & nested queries
-
Lecture 11 (Tue 2/11):
Tree pattern queries, Provenance
-
Lecture 12 (Fri 2/14): (P1 due)
Resilience
-
Lecture 13 (Tue 2/18):
Resilience
Pointers to relevant concepts & supplementary material:
-
Query evaluation of conjunctive queries, data vs. query complexity, homomorphisms,
constraint satisfaction, query containment, query minimization, absorption:
[Kolaitis, Vardi'00], [Vardi'00],
[Kolaitis'16],
[Koutris'19] L1 & L2
-
Beyond conjunctive queries,
unions of conjunctive queries, bag semantics,
nested queries, tree pattern queries:
[Kolaitis'16],
[Tan+'14], [Gatterbauer'11], [Martens'17]
-
Provenance:
[Buneman+02],
[Green+07],
[Cheney+09],
[Green,Tannen'17],
[Kepner+16],
[Buneman, Tan'18]
-
Reverse data management, update propagation, resilience:
[Buneman+02],
[Kimelfeld+12],
[Freire+15]
Topic 3: Efficient query evaluation & factorized representations
-
Lecture 14 (Fri 2/21):
Acyclic queries
-
Lecture 15 (Tue 2/25): (A3 due)
Cyclic queries
-
Lecture 16 (Fri 2/28): (P2 due)
Cyclic queries
-
Spring break
-
Lecture 17 (Tue 3/10):
Top-1, factorized databases
Pointers to relevant concepts & supplementary material:
-
Acyclic joins:
query hypergraph, Yannakakis algorithm, GYO reduction,
dynamic programming, algebraic semirings, ranked enumeration:
[Alice] Ch6.4, [Koutris'19] L4, [Tziavelis+'19]
-
Cyclic joins:
tree & hypertree decomposition,
query widths,
fractional hypertree width,
AGM bound,
worst-case optimal join algorithms,
optimal algorithms,
submodular width and multiple decompositions:
[AGM'13], [NPRR'18], [KNR'17], [KNS'17]
-
Factorized databases:
-
Ranking, top-k: [Roughgarden'10], [Ilyas+08], [Rahul, Tao'19], [Tziavelis+'19]
-
Normal forms and their information-theoretic justification:
[Complete'18] Ch3, [Lee'87], [Arenas, Libkin'05]
Topic 4: Linear algebra & rankings on graphs
-
Lecture 17 (Tue 3/10):
Linear algebra, random walks, and graph propagation
-
Lecture 18 (Fri 3/13): Midterm exam
-
Lecture 19 (Tue 3/17):
Factorized graph representations
Pointers to relevant concepts & supplementary material:
-
linear algebra vs. relational algebra:
-
random walks, PageRank, label propagation, belief propagation, semi-supervised learning:
Topic 5: Logic, uncertainty & inconsistency
-
Lecture 20 (Fri 3/20):
Probabilistic inference & approximation
-
Lecture 21 (Tue 3/24):Lecture 24 (Fri 4/3):
Inconsistency management
Pointers to relevant concepts & supplementary material:
-
Boolean functions,
query lineage,
probabilistic databases,
complexity of probabilistic inference:
-
network reliability,
models of propagation on graphs and hypergraphs,
scalable approximations to probabilistic inference,
extensional query plans:
-
measures for inconsistency,
database repairs,
semantics for inconsistency resolution:
-
Maximum entropy principle:
Topic 6: Serial & Parallel data processing
-
Lecture 22 (Fri 3/27): (P3 due):
I/O-based sort and join algorithms
-
Lecture 23 (Tue 3/31):
Parallel query evaluation
-
Lecture 24 (Fri 4/3):
Distributed theta-joins
Pointers to relevant concepts & supplementary material:
-
query optimization, lifecycle of a query plan, I/O based algorithms
-
complexity of parallel data management:
-
pipeline versus partitioned parallelism,
MapReduce and Spark,
theta-joins vs equi-joins:
Space for other topics & project presentations
-
Lecture 25 (Tue 4/7):
TBD
-
Lecture 26 (Fri 4/10):
TBD
-
Lecture 27 (Tue 4/14): P4 Project presentations
-
Lecture 28 (Fri 4/17): P4 Project presentations
Literature list
Topic 1: Data models and query languages
-
SQL
-
Logic, relational calculus
-
[Barland+'08] Barland, Kolaitis, Vardi, Felleisen, Greiner.
Intro to Logic,
(alternative PDF version).
4.1.2 First-Order Logic: bound variables, free variables,
4.2.1 First-Order Logic: equivalences,
4.4 Exercises for First-Order Logic.
-
[Genesereth+] Genesereth et al. Introduction to logics.
Ch 6: Relational Logic.
-
[Halpern+'01] Halpern, Harper, Immerman, Kolaitis, Vardi, Vianu.
On the Unusual Effectiveness of Logic in Computer Science.
Bulletin of Symbolic Logic 2001.
-
[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.3: Relational calculus,
Ch 4.4: Safety.
-
[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8.6: Tuple relational calculus,
Ch 8.7: Domain relational calculus.
-
[Silberschatz+'10] Silberschatz, Korth, Sudarshan. Database system concepts. 6th ed 2011.
Ch 6.2: Tuple relational calculus,
Ch 6.3: Domain relational calculus.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 3.1: Structure of the relational model,
Ch 3.2: Named vs. unnamed perspective,
Ch 3.3: Conventioanl vs. logic programming perspective,
Ch 4.2: Logic-based perspective,
Ch 4.4: Algebraic perspectives,
Ch 5.3: Relational calculus, domain independence, Codd's theorem,
Ch 5.4: Syntactic Restrictions for Domain Independence.
-
Relational Algebra, Codd's theorem
-
[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.2: Relational algebra.
-
[Complete'08] Garcia-Molina, Ullman, Widom. Database Systems. 2nd ed 2008.
Ch 2.4: Relational algebra,
Ch 5.1-5.2: Relational Algebra and bags.
-
[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8: Relational Algebra.
-
[Silberschatz+'10] Silberschatz, Korth, Sudarshan. Database system concepts. 6th ed 2011.
Ch 6.1: Relational algebra,
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 4.4: Algebraic perspectives,
Ch 5.4: Equivalence of algebra and calculus.
-
Datalog, stable models
-
[Complete'08] Garcia-Molina, Ullman, Widom. Database Systems. 2nd ed. 2008.
Ch 5.3-5.4: Datalog and bags,
Ch 10.2: Recursion in SQL.
-
[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 24.1-24.2: Datalog.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 12.2: Datalog and model-theoretic semantics,
Ch 12.3: Fixpoint semantics,
Ch 13.1: Semi-naive evaluation,
Ch 15.2: Stratified semantics and precedence graphs
-
[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L9: Datalog evaluation,
L10: Datalog negation
-
[Gatterbauer, Suciu'10] Stable models: Data conflict resolution using trust mappings.
SIGMOD 2010.
[PPTX slides],
[PDF slides]
-
Alternative data models
-
[Sadalage, Fowler'12] NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. 2012.
Safari books eBook(NEU free online access).
Ch 8: Key-value stoes,
Ch 9: Document databases,
Ch 10: Column-famility stores,
Ch 11: Graph databases.
-
[Harrison'16] Next Generation Databases: NoSQL, NewSQL, and Big Data. 2016.
Safari books eBook (NEU free online access)].
p43-p51: CAP, Eventual Consistency, Hashing,
p61-p62: MongoDB,
p65-p68: Graphs,
p75-p80: Columns,
p145-156: Data models,
p163, 164, 166: Secondary indexing.
-
[Stonebraker, Hellerstein'05] What Goes Around Comes Around. 2005.
Online PDF.
Also see various discussions of the paper.
-
[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.
Topic 2: Complexity of query evaluation & reverse data management
-
Query evaluation of conjunctive queries
-
[Chandra, Merlin'77] Optimal implementation of conjunctive queries in relational data bases. STOC 1977.
-
[Vardi'81] The Complexity of Relational Query Languages. STOC 1982.
-
[Vardi'00] Constraint satisfaction and database theory: a tutorial. PODS 2000.
-
[Kolaitis, Vardi'00]: Conjunctive-Query Containment and Constraint Satisfaction. JCSS 2000.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 2.1: Theoretical background,
Ch 6.2: Conjunctive queries & homomorphisms & query containment.
-
[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
First-order logic as a database query language,
the query evaluation problem,
the query containment problem,
data complexity and combined complexity,
conjunctive queries.
-
[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L1: conjunctive queries,
L2: query containment.
-
Socratica on abstract algebra (Youtube):
why groups,
groups (details),
Cayley tables for groups,
group homomorphisms,
group isomorphisms
-
Beyond conjunctive queries
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.3: Undecidability of equivalence for calculus.
-
[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
unions of conjunctive queries,
undecidability of query containment for unions of conjunctive queries.
-
[Tan+'14] Tan, Van den Bussche, Zhang. Undecidability of satisfiability in the algebra of finite binary relations with union, composition, and difference. Corr 1406.0349.
-
[Gatterbauer'11] Databases will visualize queries too. VLDB 2011.
Video,
Slides.
-
[Martens'17] Short summary of longer papers below Optimizing tree pattern queries: why cutting is not enough (invited talk). STOC 2017.
Czerwinski, Martens, Niewerth, Parys. Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data. SIGMOD Record 2017.
Czerwinski, Martens, Niewerth, Parys. Minimization of Tree Patterns. JACM 2018.
-
Provenance
-
[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.
-
[Cheney+'09] Cheney, Chiticariu, Tan. Provenance in databases: why, how, and where. Foundations and trends in databases 2009.
-
[Green+'07] Green, Karvounarakis, Tannen. Provenance Semirings. PODS 2007.
-
[Green, Tannen'17] PODS 2017 Test of Time Award: The Semiring Framework for Database Provenance. PODS 2017. (Slides).
See also:
Val Tannen's EDBT 2010 keynote: Provenance for Database Transformations. EDBT 2010.
Val Tannen's talk @ Logical Structures in Computation Boot Camp (including video):
Provenance for Database Transformations, Berkeley 2016.
-
[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.
-
[Buneman, Tang'18] Data Provenance: What next? SIGMOD Record 2018.
-
Reverse data management
-
[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.
-
[Kimelfeld+12] Kimelfeld, Vondrak, Williams. Maximizing Conjunctive Views in Deletion Propagation. TODS 2012.
-
[Freire+'15] Freire, Gatterbauer, Immerman, Meliou. The complexity of resilience and responsibility for self-join-free conjunctive queries. PVLDB 2015.
-
[Freire+'20] Freire, Gatterbauer, Immerman, Meliou. New results for the complexity of resilience for binary conjunctive queries with self-joins. PODS 2020.
Topic 3: Efficient query evaluation & factorized representations
-
Acyclic queries
-
[Yannakakis'81] Algorithms for acyclic database schemes. VLDB 1981.
-
[Bernstein, Chiu'81]
Bernstein, Chiu. Using semi-joins to solve relational queries. JACM 1981.
-
[Bernstein, Goodman'81] Power of natural semi-joins. SIAM J., 1981.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.4: Acyclic joins, semi-join reduction, Yannakakis, GYO algorithm.
-
[Gottlob+02] Gottlob, Leone, Scarcello. Hypertree Decompositions and Tractable Queries. JCSS 2002.
-
[Ullman'01] CS345 Lecture Notes. 2001.
Acyclic hypergraphs.
Computing acyclic joins, Yannakakis' algorithm.
-
[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L4: acyclic queries,
-
Cyclic queries
-
[AGM'13] AGM bound:Atserias, Grohe, Marx. Size bounds and query plans for relational joins. SIAM J. Comput. 2013.
-
[NPRR'18] WCO joins: Ngo, Porat, Re, Rudra. Worst-case optimal join algorithms. JACM 2018.
-
[KNR'17] FAQ: Khamis, Ngo, Rudra. Juggling functions inside a database. SIGMOD record 2017.
-
[KNS'17] Khamis, Ngo, Suciu. What do Shannon-type inequalities, submodular width, and disjunctive Datalog have to do with one another? PODS 2017.
-
Factorized representations
-
Rankings
-
[Fagin01] Fagin's algorithm: Fagin. Combining Fuzzy Information from Multiple Systems. JCSS 2001. (PODS 1996)
-
[Fagin+03] Threshold algorithm: Fagin, Lotem, Naor. Optimal aggregation algorithms for middleware. JCSS 2003. (PODS 2001)
-
[Roughgarden'10]: CS369N: Beyond Worst-Case Analysis Lecture #1: Instance Optimality. Stanford 2010.
-
[Ilyas+08] Ilyas, Beskales, Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys 2008.
-
[Rahul, Tao'19] Rahul, Tao. A Guide to Designing Top-k Indices. SIGMOD record 2019.
-
[Tziavelis+'19] Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal algorithms for ranked enumeration of answers to full conjunctive queries. Arxiv 2019.
-
[Gondran, Minoux'08]. Graphs, dioids and semirings: New Models and Algorithms. Springer 2008.
-
[Kepner, Jananthan'18]. Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs. MIT press 2018.
-
Normalization
Topic 4: Linear algebra
Topic 5: Logic, uncertainty & inconsistency
Topic 6: Serial & Parallel data processing
Overviews and additional possible topics