CS 7240: Topics and approximate agenda (Spring'21)
This schedule will be updated regularly as the course goes on. Check back here frequently!
Lecture slides will usually be posted by the end of the day following a lecture.
Topic 1: Data models and query languages

Lecture 1 (Tue 1/19):
Course introduction,
SQL (a refresher),
PostgreSQL setup,
SQL Activities

Lecture 2 (Fri 1/22):
SQL (continued)

Lecture 3 (Tue 1/26):
SQL (continued),
Relational calculus

Lecture 4 (Fri 1/29):
Relational algebra & Codd's theorem, Datalog

Lecture 5 (Tue 2/2):
Datalog and more expressive variations

Lecture 6 (Fri 2/5): (A1 due)
Datalog vs. stable model semantics

Lecture 7 (Tue 2/9):
Datalog evaluation strategies,
NoSQL
Pointers to relevant concepts & supplementary material:

1. SQL:
[SAMS'12],
[CS3200'18]
[Cow'03] Ch3 & Ch5,
[Complete'08] Ch6,
[Silberschatz+'20] Ch3.8

2. Logic:
Firstorder logic,
relational calculus:
[Barland+'08] 4.1.2 & 4.2.1 & 4.4,
[Genesereth+] Ch6,
[Halpern+'01],
[Cow'03] Ch4.3 & 4.4,
[Elmasri, Navathe'15] Ch8.6 & Ch8.7,
[Silberschatz+'20] Ch27.1 & Ch27.2,
[Alice'95] Ch3.13.3 & Ch4.2 & Ch4.4 & Ch5.35.4,
[BarkerPlummer+'11] Ch11

3. Algebra:
Relational algebra,
Codd's theorem:
[Cow'03] Ch4.2,
[Complete'08] Ch2.4 & Ch5.15.2,
[Elmasri, Navathe'15] Ch8,
[Silberschatz+'20] Ch2.6,
[Alice'95] Ch4.4 & Ch5.4

4. Datalog:
Datalog, stable model semantics:
[Complete'08] Ch5.3, [Cow'03] Ch 24,
[Koutris'19] L9 & L10,
[G., Suciu'10]

5. Data models:
Alternative data models, NoSQL:
[Hellerstein, Stonebraker'05], [Sadalage, Fowler'12], [Harrison'16]
Topic 2: Complexity of query evaluation

Lecture 8 (Fri 2/12):
Query equivalence, homomorphisms

Lecture 9 (Tue 2/16):
Homomorphism & conjunctive query containment

Lecture 10 (Fri 2/19): (A2 due)
Query minimization & nested queries

Lecture 11 (Tue 2/23):
Tree pattern queries, Provenance

Lecture 12 (Fri 2/26): (P1 due)
Resilience

Lecture 13 (Tue 3/2):
Resilience
Pointers to relevant concepts & supplementary material:

1. Query evaluation of conjunctive queries: conjunctive queries, data vs. query complexity, homomorphisms,
constraint satisfaction, query containment, query minimization, absorption:
[Kolaitis, Vardi'00], [Vardi'00],
[Kolaitis'16],
[Koutris'19] L1 & L2

2. Beyond conjunctive queries:
unions of conjunctive queries, bag semantics,
nested queries, tree pattern queries:
[Kolaitis'16],
[Tan+'14], [G.'11], [Martens'17]

3. Provenance:
[Buneman+02],
[Green+07],
[Cheney+09],
[Green,Tannen'17],
[Kepner+16],
[Buneman, Tan'18]

4. Reverse data management: update propagation, resilience:
[Buneman+02],
[Kimelfeld+12],
[Freire+15]
Topic 3: Efficient query evaluation & factorized representations

Lecture 14 (Fri 3/5):
Acyclic queries

Lecture 15 (Tue 3/9): (A3 due)
Cyclic queries

Lecture 16 (Fri 3/12): (P2 due)
Cyclic queries

Lecture 17 (Tue 3/16):
Top1, factorized databases
Pointers to relevant concepts & supplementary material:

1. Acyclic queries:
query hypergraph, Yannakakis algorithm, GYO reduction,
dynamic programming, algebraic semirings, ranked enumeration:
[Alice] Ch6.4, [Koutris'19] L4, [Tziavelis+'19]

2. Cyclic queries:
tree & hypertree decomposition,
query widths,
fractional hypertree width,
AGM bound,
worstcase optimal join algorithms,
optimal algorithms,
submodular width and multiple decompositions:
[AGM'13], [NPRR'18], [KNR'17], [KNS'17]

3. Factorized representations: [Olteanu, Schleich'16]

4. Rankings: topk: [Roughgarden'10], [Ilyas+08], [Rahul, Tao'19], [Tziavelis+'19]

5. Normal forms: normal forms and their informationtheoretic justification:
[Complete'18] Ch3, [Lee'87], [Arenas, Libkin'05]

6. Data fusion: full disjunction, subsumption:
[Dong, Naumann'09]
Topic 4: Linear algebra & rankings on graphs

Lecture 18 (Fri 3/19):
Linear algebra, random walks, and graph propagation

Lecture 19 (Tue 3/23):
Factorized graph representations
Pointers to relevant concepts & supplementary material:

linear algebra vs. relational algebra:

random walks, PageRank, label propagation, belief propagation, semisupervised learning:
Topic 5: Logic, uncertainty & inconsistency

Lecture 20 (Fri 3/26):
Probabilistic inference & approximation

Lecture 21 (Tue 3/30):
Inconsistency management
Pointers to relevant concepts & supplementary material:

Boolean functions,
query lineage,
probabilistic databases,
complexity of probabilistic inference:

network reliability,
models of propagation on graphs and hypergraphs,
scalable approximations to probabilistic inference,
extensional query plans:

measures for inconsistency,
database repairs,
semantics for inconsistency resolution:

Maximum entropy principle:
Buffer space, other topics & project presentations

Lecture 22 (Fri 4/2): (P3 due):
TBD

Lecture 23 (Tue 4/6):
TBD

Lecture 24 (Fri 4/9):
TBD

Lecture 25 (Tue 4/13):
TBD

Lecture 26 (Fri 4/16):
TBD

Lecture 27 (Tue 4/20): P4 Project presentations

Lecture 28 (Fri 4/23): P4 Project presentations
Literature list
Topic 1: Data models and query languages

U1: SQL

[SAMS'19] SAMS: Teach yourself SQL in 10min by Forta. 5th ed. 2019.
It is available for free for Northeastern students
from Safari books eBook
(you may have to first login from our library website, then try again the previous link).
If the book is checked out online, you can use the 4th edition (there is almost no difference between 4th and 5th ed) as
Safari books eBook,
or as EBSCOhost eBook.

[cs3200] PostgreSQL setup,
PgAdmin 4 tutorial.
Files to follow along our SQL lectures: SQL Activities.

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 5: SQL.

[Complete'08] GarciaMolina, Ullman, Widom. Database Systems. 2nd ed 2008.
Ch 6: SQL.

[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 6: SQL

[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 3.8: Nested subqueries.

U2: Logic, relational calculus

[Barland+'08] Barland, Kolaitis, Vardi, Felleisen, Greiner.
Intro to Logic,
(alternative PDF version).
4.1.2 FirstOrder Logic: bound variables, free variables,
4.2.1 FirstOrder Logic: equivalences,
4.4 Exercises for FirstOrder Logic.

[Genesereth+] Genesereth et al. Introduction to logics.
Ch 6: Relational Logic.

[Halpern+'01] Halpern, Harper, Immerman, Kolaitis, Vardi, Vianu.
On the Unusual Effectiveness of Logic in Computer Science.
Bulletin of Symbolic Logic 2001.

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.3: Relational calculus,
Ch 4.4: Safety.

[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8.6: Tuple relational calculus,
Ch 8.7: Domain relational calculus,
Ch 8.6.8: Safe expressions.

[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 27.1: Tuple relational calculus,
Ch 27.2: Domain relational calculus,
Ch 27.1.3 & 27.2.3: Safe expressions.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 3.1: Structure of the relational model,
Ch 3.2: Named vs. unnamed perspective,
Ch 3.3: Conventional vs. logic programming perspective,
Ch 4.2: Logicbased perspective,
Ch 5.3: Relational calculus, domain independence,
Ch 5.4: Syntactic Restrictions for Domain Independence.

[BarkerPlummer+'11]: BarkerPlummer, Barwise, Etchemendy. Language, Proof and Logic. 2nd ed. 2011.
Ch 11: Multiple Quantifiers.

U3: Relational Algebra, Codd's theorem

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.2: Relational algebra.

[Complete'08] GarciaMolina, Ullman, Widom. Database Systems. 2nd ed 2008.
Ch 2.4: Relational algebra,
Ch 5.15.2: Relational Algebra and bags.

[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8: Relational Algebra.

[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 2.6: Relational algebra.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 4.4: Algebraic perspectives,
Ch 5.35.4: Codd's theorem: Equivalence of algebra and calculus.

U4: Datalog, stable models

[Complete'08] GarciaMolina, Ullman, Widom. Database Systems. 2nd ed. 2008.
Ch 5.35.4: Datalog and bags,
Ch 10.2: Recursion in SQL.

[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 24.124.2: Datalog.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 12.2: Datalog and modeltheoretic semantics,
Ch 12.3: Fixpoint semantics,
Ch 13.1: Seminaive evaluation,
Ch 15.2: Stratified semantics and precedence graphs

[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L9: Datalog evaluation,
L10: Datalog negation

[G., Suciu'10] Stable models: Gatterbauer, Suciu. Data conflict resolution using trust mappings.
SIGMOD 2010.
[PPTX slides],
[PDF slides]

U5: Alternative data models

[Sadalage, Fowler'12] NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. 2012.
Safari books eBook(NEU free online access).
Ch 8: Keyvalue stoes,
Ch 9: Document databases,
Ch 10: Columnfamility stores,
Ch 11: Graph databases.

[Harrison'16] Next Generation Databases: NoSQL, NewSQL, and Big Data. 2016.
Safari books eBook (NEU free online access)].
p43p51: CAP, Eventual Consistency, Hashing,
p61p62: MongoDB,
p65p68: Graphs,
p75p80: Columns,
p145156: Data models,
p163, 164, 166: Secondary indexing.

[Stonebraker, Hellerstein'05] What Goes Around Comes Around. 2005.
Online PDF.
Also see various discussions of the paper.

[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.
Topic 2: Complexity of query evaluation & reverse data management

U1: Query evaluation of conjunctive queries

[Chandra, Merlin'77] Optimal implementation of conjunctive queries in relational data bases. STOC 1977.

[Vardi'81] The Complexity of Relational Query Languages. STOC 1982.

[Vardi'00] Constraint satisfaction and database theory: a tutorial. PODS 2000.

[Kolaitis, Vardi'00]: ConjunctiveQuery Containment and Constraint Satisfaction. JCSS 2000.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 2.1: Theoretical background,
Ch 6.2: Conjunctive queries & homomorphisms & query containment.

[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
Firstorder logic as a database query language,
the query evaluation problem,
the query containment problem,
data complexity and combined complexity,
conjunctive queries.

[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L1: conjunctive queries,
L2: query containment.

Socratica on abstract algebra (Youtube):
why groups,
groups with examples,
Cayley tables for groups,
group homomorphisms,
group isomorphisms

3blue1brown on group homomorphisms (Youtube):
Euler's formula with introductory group theory,
Euler's number e

U2: Beyond conjunctive queries

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.3: Undecidability of equivalence for calculus.

[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
unions of conjunctive queries,
undecidability of query containment for unions of conjunctive queries.

[Tan+'14] Tan, Van den Bussche, Zhang. Undecidability of satisfiability in the algebra of ﬁnite binary relations with union, composition, and difference. Corr 1406.0349.

[G.'11] Gatterbauer. Databases will visualize queries too. VLDB 2011.
Video,
Slides.

[Martens'17] Short summary of longer papers below Optimizing tree pattern queries: why cutting is not enough (invited talk). STOC 2017.
Czerwinski, Martens, Niewerth, Parys. Optimizing Tree Patterns for Querying Graph and TreeStructured Data. SIGMOD Record 2017.
Czerwinski, Martens, Niewerth, Parys. Minimization of Tree Patterns. JACM 2018.

U3: Provenance

[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.

[Cheney+'09] Cheney, Chiticariu, Tan. Provenance in databases: why, how, and where. Foundations and trends in databases 2009.

[Green+'07] Green, Karvounarakis, Tannen. Provenance Semirings. PODS 2007.

[Green, Tannen'17] PODS 2017 Test of Time Award: The Semiring Framework for Database Provenance. PODS 2017. (Slides).
See also:
Val Tannen's EDBT 2010 keynote: Provenance for Database Transformations. EDBT 2010.
Val Tannen's talk @ Logical Structures in Computation Boot Camp (including video):
Provenance for Database Transformations, Berkeley 2016.

[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.

[Buneman, Tang'18] Data Provenance: What next? SIGMOD Record 2018.

Socratica on abstract algebra (Youtube):
Rings with examples

U4: Reverse data management

[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.

[Kimelfeld+12] Kimelfeld, Vondrak, Williams. Maximizing Conjunctive Views in Deletion Propagation. TODS 2012.

[Freire+'15] Freire, Gatterbauer, Immerman, Meliou. The complexity of resilience and responsibility for selfjoinfree conjunctive queries. PVLDB 2015.

[Freire+'20] Freire, Gatterbauer, Immerman, Meliou. New results for the complexity of resilience for binary conjunctive queries with selfjoins. PODS 2020.

[Guidotti+'18] Guidotti, Monreale, Ruggieri, Turini, Giannotti, Pedreschi. A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys 2018.
Topic 3: Efficient query evaluation & factorized representations & normalization

U1: Acyclic queries

[Yannakakis'81] Algorithms for acyclic database schemes. VLDB 1981.

[Bernstein, Chiu'81]
Bernstein, Chiu. Using semijoins to solve relational queries. JACM 1981.

[Bernstein, Goodman'81] Power of natural semijoins. SIAM J., 1981.

[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.4: Acyclic joins, semijoin reduction, Yannakakis, GYO algorithm.

[Gottlob+02] Gottlob, Leone, Scarcello. Hypertree Decompositions and Tractable Queries. JCSS 2002.

[Ullman'01] CS345 Lecture Notes. 2001.
Acyclic hypergraphs.
Computing acyclic joins, Yannakakis' algorithm.

[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L4: acyclic queries,

[BraultBaron'16] Hypergraph Acyclicity Revisited. ACM Computing Surveys 2016

U2: Cyclic queries

[AGM'13] AGM bound:Atserias, Grohe, Marx. Size bounds and query plans for relational joins. SIAM J. Comput. 2013.

[NPRR'18] WCO joins: Ngo, Porat, Re, Rudra. Worstcase optimal join algorithms. JACM 2018.

[KNR'17] FAQ: Khamis, Ngo, Rudra. Juggling functions inside a database. SIGMOD record 2017.

[KNS'17] Khamis, Ngo, Suciu. What do Shannontype inequalities, submodular width, and disjunctive Datalog have to do with one another? PODS 2017.

U3: Factorized representations

U4: Rankings

[Fagin01] Fagin's algorithm: Fagin. Combining Fuzzy Information from Multiple Systems. JCSS 2001. (PODS 1996)

[Fagin+03] Threshold algorithm: Fagin, Lotem, Naor. Optimal aggregation algorithms for middleware. JCSS 2003. (PODS 2001)

[Roughgarden'10]: CS369N: Beyond WorstCase Analysis Lecture #1: Instance Optimality. Stanford 2010.

[Ilyas+08] Ilyas, Beskales, Soliman. A survey of topk query processing techniques in relational database systems. ACM Computing Surveys 2008.

[Rahul, Tao'19] Rahul, Tao. A Guide to Designing Topk Indices. SIGMOD record 2019.

[Tziavelis+'19] Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal algorithms for ranked enumeration of answers to full conjunctive queries. Arxiv 2019.

[Gondran, Minoux'08]. Graphs, dioids and semirings: New Models and Algorithms. Springer 2008.

[Kepner, Jananthan'18]. Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs. MIT press 2018.

[Dijkstra'16]. Dijkstra’s Shortest Path Algorithm: an intuitive Youtube video.

U5: Normal forms

U6: Data fusion
Topic 4: Linear algebra

[Kepner, Gilbert'11] Graph algorithms in the language of linear algebra. 2011.
Ch 13.

[Easley, Kleinberg'10] Networks, crowds, and markets: reasoning about a highly connected world. 2010. Ch 14.6.

[GGKF'15] Gatterbauer, Günnemann, Koutra, Faloutsos. Linearized and singlepass belief propagation. VLDB 2015. Youtube video.

[KLG'20] Kumar, Langton, Gatterbauer. Factorized Graph Representations for SemiSupervised Learning from Sparse Data. SIGMOD 2020. Youtube video.

Also see related past research seminar: Methods and Algorithms for determining relevance in networks (Fall'13).

[3Blue1Brown]: 3Blue1Brown on Essence of linear algebra:
2. Linear combination, Span, Basis vectors,
7. Inverse matrices, column space, null space,
14. Eigenvectors, eigenvalues,
15. Abstract vector spaces,
Topic 5: Logic, uncertainty & inconsistency
Research best practice
Surveys and additional possible topics