CS 7240: Topics and approximate agenda (Spring'21)
This schedule will be updated regularly as the course goes on. Check back here frequently!
Lecture slides will usually be posted by the end of the day following a lecture.
Either here on this website or in Canvas if I feel they are not yet ready to be released in public.
Topic 1: Data models and query languages

Lecture 1 (Tue 1/19):
Course introduction,
SQL,
PostgreSQL setup,
SQL Activities

Lecture 2 (Fri 1/22):
SQL

Lecture 3 (Tue 1/26):
SQL,
Logic

Lecture 4 (Fri 1/29):
Logic

Lecture 5 (Tue 2/2):
Logic, Relational Algebra

Lecture 6 (Fri 2/5):
Relational Algebra

Lecture 7 (Tue 2/9):
Relational Algebra,
Datalog & Recursion

Lecture 8 (Fri 2/16):
Datalog & Recursion

Lecture 9 (Tue 2/19):
Datalog & Recursion,
Alternative Data Models
Pointers to relevant concepts & supplementary material:

Unit 1. SQL:
[SAMS'12],
[CS3200'18]
[Cow'03] Ch3 & Ch5,
[Complete'08] Ch6,
[Silberschatz+'20] Ch3.8

Unit 2. Logic:
firstorder logic (FOL),
relational calculus (RC):
[Barland+'08] 4.1.2 & 4.2.1 & 4.4,
[Genesereth+] Ch6,
[Halpern+'01],
[Cow'03] Ch4.3 & 4.4,
[Elmasri, Navathe'15] Ch8.6 & Ch8.7,
[Silberschatz+'20] Ch27.1 & Ch27.2,
[Alice'95] Ch3.13.3 & Ch4.2 & Ch4.4 & Ch5.35.4,
[BarkerPlummer+'11] Ch11

Unit 3. Relational Algebra:
relational algebra (RA) &
Codd's theorem:
[Cow'03] Ch4.2,
[Complete'08] Ch2.4 & Ch5.15.2,
[Elmasri, Navathe'15] Ch8,
[Silberschatz+'20] Ch2.6,
[Alice'95] Ch4.4 & Ch5.4

Unit 4. Datalog & Recursion:
Datalog, recursion, stratified Datalog with negation,
Datalog evaluation strategies
stable model semantics:
[Complete'08] Ch5.3, [Cow'03] Ch 24,
[Koutris'19] L9 & L10,
[G., Suciu'10]

Unit 5. Alternative Data Models:
NoSQL:
[Hellerstein, Stonebraker'05], [Sadalage, Fowler'12], [Harrison'16]
Topic 2: Complexity of query evaluation

Lecture 10 (Fri 2/12):
Conjunctive Queries

Lecture 11 (Tue 2/23):
Conjunctive Queries

Lecture 12 (Fri 2/26): (P1 due)
Conjunctive Queries, Beyond Conjunctive Queries

Lecture 13 (Tue 3/2):
Beyond Conjunctive Queries, Provenance

Lecture 14 (Fri 3/5):
Provenance, Reverse Data Management
Pointers to relevant concepts & supplementary material:

Unit 1. Conjunctive Queries:
query evaluation of conjunctive queries (CQs),
data vs. query complexity, homomorphisms,
constraint satisfaction (CS), query containment, query minimization, absorption:
[Kolaitis, Vardi'00], [Vardi'00],
[Kolaitis'16],
[Koutris'19] L1 & L2

Unit 2. Beyond Conjunctive Queries:
unions of conjunctive queries, bag semantics,
nested queries, tree pattern queries:
[Kolaitis'16],
[Tan+'14], [G.'11], [Martens'17]

Unit 3. Provenance:
provenance & semirings:
[Buneman+02],
[Green+07],
[Cheney+09],
[Green,Tannen'17],
[Kepner+16],
[Buneman, Tan'18]

Unit 4. Reverse Data Management:
update propagation, resilience:
[Buneman+02],
[Kimelfeld+12],
[Freire+15],
[Meliou+'11]
Topic 3: Efficient query evaluation & factorized representations

Lecture 15 (Tue 3/9):
Acyclic queries

Lecture 16 (Fri 3/12): (P2 due)
Acyclic queries

Lecture 17 (Tue 3/16):
Acyclic queries,
Cyclic queries

Lecture 18 (Fri 3/19):
Cyclic queries

Lecture 19 (Tue 3/23):
Cyclic queries, Factorized Databases

Lecture 20 (Fri 3/26):
Factorized Databases

Lecture 21 (Tue 4/2):
Topk

Lecture 22 (Fri 4/2): (P3 due):
Topk

Lecture 23 (Tue 4/6):
Data fusion & normalization
Pointers to relevant concepts & supplementary material:

Unit 1. Acyclic queries:
query hypergraph, Yannakakis algorithm, GYO reduction,
dynamic programming (DP),
algebraic semirings,
ranked enumeration:
[Alice] Ch6.4, [Koutris'19] L4, [Tziavelis+'19]

Unit 2. Cyclic queries:
tree & hypertree decomposition,
query widths,
fractional hypertree width,
AGM bound,
worstcase optimal (WCO) join algorithms ,
submodular width and multiple decompositions:
[AGM'13], [NPRR'18], [KNR'17], [KNS'17]

Unit 3. Factorized databases:
factorized representations, normalization,
factorized databases:
[Olteanu, Schleich'16]

Unit 4. Topk:
shortest paths, dynamic programming (DP),
Yannakakis, Semirings,
Rankings,
topk: [Roughgarden'10], [Ilyas+08], [Rahul, Tao'19], Ranked Enumeration [Tziavelis+'19]

Unit 5. Normal forms:
normal forms and their informationtheoretic justification:
[Complete'18] Ch3, [Lee'87], [Arenas, Libkin'05]

Unit 6. Data fusion: full disjunction, subsumption:
[Dong, Naumann'09]
Topic 4: Linear algebra & rankings on graphs

Lecture 24 (Fri 4/9):
Graphs

Lecture 25 (Tue 4/13):
Graphs
Pointers to relevant concepts & supplementary material:

Unit 1. Graphs:
Linear algebra (LA),
random walks with restarts (RWR), PageRank,
graph propagation, label propagation,
belief propagation (BP),
semisupervised learning (SSL),
factorized graph representations
Topic 5: Logic, uncertainty & inconsistency

Lecture 26 (Fri 4/16):
Probabilities & Uncertainty
Pointers to relevant concepts & supplementary material:

Unit 1. Probabilities & Uncertainty:
Probabilistic inference & approximation, Boolean functions,
query lineage, probabilistic databases,
extensional query plans,
complexity of probabilistic inference,
network reliability,
models of propagation on graphs and hypergraphs,
scalable approximations to probabilistic inference

Unit 2. Inconsistency:
measures for inconsistency,
database repairs,
semantics for inconsistency resolution,
maximum entropy principle:
Project presentations

Lecture 27 (Tue 4/20): P4 Project presentations

Lecture 28 (Fri 4/23): P4 Project presentations
Literature list
Topic 1: Data models and query languages

Unit 1: SQL

[SAMS'19] SAMS: Teach yourself SQL in 10min by Forta. 5th ed. 2019.
It is available for free for Northeastern students
from Safari books eBook
(you may have to first login from our library website, then try again the previous link).
If the book is checked out online, you can use the 4th edition (there is almost no difference between 4th and 5th ed) as
Safari books eBook,
or as EBSCOhost eBook.

[cs3200] PostgreSQL setup,
PgAdmin 4 tutorial.
Files to follow along our SQL lectures: SQL Activities.

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 5: SQL.

[Complete'08] GarciaMolina, Ullman, Widom. Database Systems. 2nd ed 2008.
Ch 6: SQL.

[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 6: SQL

[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 3.8: Nested subqueries.

Unit 2: Logic, relational calculus

[Barland+'08] Barland, Kolaitis, Vardi, Felleisen, Greiner.
Intro to Logic,
(alternative PDF version).
4.1.2 FirstOrder Logic: bound variables, free variables,
4.2.1 FirstOrder Logic: equivalences,
4.4 Exercises for FirstOrder Logic.

[Genesereth+] Genesereth et al. Introduction to logics.
Ch 6: Relational Logic.

[Halpern+'01] Halpern, Harper, Immerman, Kolaitis, Vardi, Vianu.
On the Unusual Effectiveness of Logic in Computer Science.
Bulletin of Symbolic Logic 2001.

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.3: Relational calculus,
Ch 4.4: Safety.

[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8.6: Tuple relational calculus,
Ch 8.7: Domain relational calculus,
Ch 8.6.8: Safe expressions.

[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 27.1: Tuple relational calculus,
Ch 27.2: Domain relational calculus,
Ch 27.1.3 & 27.2.3: Safe expressions.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 3.1: Structure of the relational model,
Ch 3.2: Named vs. unnamed perspective,
Ch 3.3: Conventional vs. logic programming perspective,
Ch 4.2: Logicbased perspective,
Ch 5.3: Relational calculus, domain independence,
Ch 5.4: Syntactic Restrictions for Domain Independence.

[BarkerPlummer+'11]: BarkerPlummer, Barwise, Etchemendy. Language, Proof and Logic. 2nd ed. 2011.
Ch 11: Multiple Quantifiers.

Unit 3: Relational Algebra, Codd's theorem

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.2: Relational algebra.

[Complete'08] GarciaMolina, Ullman, Widom. Database Systems. 2nd ed 2008.
Ch 2.4: Relational algebra,
Ch 5.15.2: Relational Algebra and bags.

[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8: Relational Algebra.

[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 2.6: Relational algebra.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 4.4: Algebraic perspectives,
Ch 5.35.4: Codd's theorem: Equivalence of algebra and calculus.

Unit 4: Datalog, stable models

[Complete'08] GarciaMolina, Ullman, Widom. Database Systems. 2nd ed. 2008.
Ch 5.35.4: Datalog and bags,
Ch 10.2: Recursion in SQL.

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 24.124.2: Datalog.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 12.2: Datalog and modeltheoretic semantics,
Ch 12.3: Fixpoint semantics,
Ch 13.1: Seminaive evaluation,
Ch 15.2: Stratified semantics and precedence graphs

[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L9: Datalog evaluation,
L10: Datalog negation

[G., Suciu'10] Stable models: Gatterbauer, Suciu. Data conflict resolution using trust mappings.
SIGMOD 2010.
[PPTX slides],
[PDF slides]

Unit 5: Alternative data models

[Sadalage, Fowler'12] NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. 2012.
Safari books eBook(NEU free online access).
Ch 8: Keyvalue stoes,
Ch 9: Document databases,
Ch 10: Columnfamility stores,
Ch 11: Graph databases.

[Harrison'16] Next Generation Databases: NoSQL, NewSQL, and Big Data. 2016.
Safari books eBook (NEU free online access)].
p43p51: CAP, Eventual Consistency, Hashing,
p61p62: MongoDB,
p65p68: Graphs,
p75p80: Columns,
p145156: Data models,
p163, 164, 166: Secondary indexing.

[Stonebraker, Hellerstein'05] What Goes Around Comes Around. 2005.
Online PDF.
Also see various discussions of the paper.

[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.
Topic 2: Complexity of query evaluation & reverse data management

Unit 1: Query evaluation of conjunctive queries

[Chandra, Merlin'77] Optimal implementation of conjunctive queries in relational data bases. STOC 1977.

[Vardi'81] The Complexity of Relational Query Languages. STOC 1982.

[Vardi'00] Constraint satisfaction and database theory: a tutorial. PODS 2000.

[Kolaitis, Vardi'00]: ConjunctiveQuery Containment and Constraint Satisfaction. JCSS 2000.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 2.1: Theoretical background,
Ch 6.2: Conjunctive queries & homomorphisms & query containment.

[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
Firstorder logic as a database query language,
the query evaluation problem,
the query containment problem,
data complexity and combined complexity,
conjunctive queries.

[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L1: conjunctive queries,
L2: query containment.

Socratica on abstract algebra (Youtube):
why groups,
groups with examples,
Cayley tables for groups,
group homomorphisms,
group isomorphisms

3blue1brown on group homomorphisms (Youtube):
Euler's formula with introductory group theory,
Euler's number e

Unit 2: Beyond conjunctive queries

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.3: Undecidability of equivalence for calculus.

[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
unions of conjunctive queries,
undecidability of query containment for unions of conjunctive queries.

[Tan+'14] Tan, Van den Bussche, Zhang. Undecidability of satisfiability in the algebra of ﬁnite binary relations with union, composition, and difference. Corr 1406.0349.

[G.'11] Gatterbauer. Databases will visualize queries too. VLDB 2011.
Video,
Slides.

[Martens'17] Short summary of longer papers below Optimizing tree pattern queries: why cutting is not enough (invited talk). STOC 2017.
Czerwinski, Martens, Niewerth, Parys. Optimizing Tree Patterns for Querying Graph and TreeStructured Data. SIGMOD Record 2017.
Czerwinski, Martens, Niewerth, Parys. Minimization of Tree Patterns. JACM 2018.

Unit 3: Provenance

[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.

[Cheney+'09] Cheney, Chiticariu, Tan. Provenance in databases: why, how, and where. Foundations and trends in databases 2009.

[Green+'07] Green, Karvounarakis, Tannen. Provenance Semirings. PODS 2007.

[Green, Tannen'17] PODS 2017 Test of Time Award: The Semiring Framework for Database Provenance. PODS 2017. (Slides).
See also:
Val Tannen's EDBT 2010 keynote: Provenance for Database Transformations. EDBT 2010.
Val Tannen's talk @ Logical Structures in Computation Boot Camp (including video):
Provenance for Database Transformations, Berkeley 2016.

[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.

[Buneman, Tang'18] Data Provenance: What next? SIGMOD Record 2018.

Socratica on abstract algebra (Youtube):
Rings with examples

Unit 4: Reverse data management

[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.

[Kimelfeld+12] Kimelfeld, Vondrak, Williams. Maximizing Conjunctive Views in Deletion Propagation. TODS 2012.

[Freire+'15] Freire, Gatterbauer, Immerman, Meliou. The complexity of resilience and responsibility for selfjoinfree conjunctive queries. PVLDB 2015.

[Freire+'20] Freire, Gatterbauer, Immerman, Meliou. New results for the complexity of resilience for binary conjunctive queries with selfjoins. PODS 2020.

[Meliou+'11] Meliou, Gatterbauer, Suciu.
Reverse data management.
PVLDB 2011 (challenges & vision track).

[Guidotti+'18] Guidotti, Monreale, Ruggieri, Turini, Giannotti, Pedreschi. A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys 2018.
Topic 3: Efficient query evaluation & factorized representations & normalization

Unit 1: Acyclic queries

[Yannakakis'81] Algorithms for acyclic database schemes. VLDB 1981.

[Bernstein, Chiu'81]
Bernstein, Chiu. Using semijoins to solve relational queries. JACM 1981.

[Bernstein, Goodman'81] Power of natural semijoins. SIAM J., 1981.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.4: Acyclic joins, semijoin reduction, Yannakakis, GYO algorithm.

[Gottlob+02] Gottlob, Leone, Scarcello. Hypertree Decompositions and Tractable Queries. JCSS 2002.

[Ullman'01] CS345 Lecture Notes. 2001.
Acyclic hypergraphs.
Computing acyclic joins, Yannakakis' algorithm.

[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L4: acyclic queries,

[BraultBaron'16] Hypergraph Acyclicity Revisited. ACM Computing Surveys 2016

Unit 2: Cyclic queries

[AGM'13] AGM bound:Atserias, Grohe, Marx. Size bounds and query plans for relational joins. SIAM J. Comput. 2013.

[NPRR'18] WCO joins: Ngo, Porat, Re, Rudra. Worstcase optimal join algorithms. JACM 2018.

[KNR'17] FAQ: Khamis, Ngo, Rudra. Juggling functions inside a database. SIGMOD record 2017.

[KNS'17] Khamis, Ngo, Suciu. What do Shannontype inequalities, submodular width, and disjunctive Datalog have to do with one another? PODS 2017.

Unit 3: Factorized representations

Unit 4: Rankings

[Fagin01] Fagin's algorithm: Fagin. Combining Fuzzy Information from Multiple Systems. JCSS 2001. (PODS 1996)

[Fagin+03] Threshold algorithm: Fagin, Lotem, Naor. Optimal aggregation algorithms for middleware. JCSS 2003. (PODS 2001)

[Roughgarden'10]: CS369N: Beyond WorstCase Analysis Lecture #1: Instance Optimality. Stanford 2010.

[Ilyas+08] Ilyas, Beskales, Soliman. A survey of topk query processing techniques in relational database systems. ACM Computing Surveys 2008.

[Rahul, Tao'19] Rahul, Tao. A Guide to Designing Topk Indices. SIGMOD record 2019.

[Tziavelis+'19] Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal algorithms for ranked enumeration of answers to full conjunctive queries. Arxiv 2019.

[Gondran, Minoux'08]. Graphs, dioids and semirings: New Models and Algorithms. Springer 2008.

[Kepner, Jananthan'18]. Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs. MIT press 2018.

[Dijkstra'16]. Dijkstra’s Shortest Path Algorithm: an intuitive Youtube video.

Unit 5: Normal forms

Unit 6: Data fusion
Topic 4: Linear algebra & rankings on graphs

[Kepner, Gilbert'11] Graph algorithms in the language of linear algebra. 2011.
Ch 13.

[Easley, Kleinberg'10] Networks, crowds, and markets: reasoning about a highly connected world. 2010. Ch 14.6.

[GGKF'15] Gatterbauer, Günnemann, Koutra, Faloutsos. Linearized and singlepass belief propagation. VLDB 2015. Youtube video.

[KLG'20] Kumar, Langton, Gatterbauer. Factorized Graph Representations for SemiSupervised Learning from Sparse Data. SIGMOD 2020. Youtube video.

Also see related past research seminar: Methods and Algorithms for determining relevance in networks (Fall'13).

[3Blue1Brown]: 3Blue1Brown on Essence of linear algebra:
2. Linear combination, Span, Basis vectors,
7. Inverse matrices, column space, null space,
14. Eigenvectors, eigenvalues,
15. Abstract vector spaces,
Topic 5: Logic, uncertainty & inconsistency

Unit 1: Probabilities & Uncertainty

[Dalvi, Suciu]
Efficient query evaluation on probabilistic databases. VLDBJ 2007.

[Suciu+'11]
Suciu, Olteanu, Re, Koch.
Probabilistic databases. Morgan & Claypool 2011.

[Gatterbauer, Suciu'14]
Oblivious bounds on the probability of Boolean functions. TODS 2014.

[Gatterbauer, Suciu'17]
Dissociation and propagation for approximate lifted inference with standard relational database management systems. VLDBJ 2017.

[Chou+'18]
Chou, Gatterbauer, Gogate.
Dissociationbased Oblivious Bounds for Weighted Model Counting . UAI 2018.

[Van den Heuvel+'19]
Van den Heuvel, Ivanov, Gatterbauer, Geerts, Theobald.
Anytime Approximation in Probabilistic Databases via Scaled Dissociations. SIGMOD 2019.

Unit 2: Inconsistency
Research best practice
Surveys and additional possible topics