CS 7240: Topics and approximate agenda (Spring'21)
This schedule will be updated regularly as the course goes on. Check back here frequently!
Lecture slides will usually be posted by the end of the day following a lecture.
Either here on this website or in Canvas if I feel they are not yet ready to be released in public.
Topic 1: Data models and query languages
-
Lecture 1 (Tue 1/19):
Course introduction,
SQL,
PostgreSQL setup,
SQL Activities
-
Lecture 2 (Fri 1/22):
SQL
-
Lecture 3 (Tue 1/26):
SQL,
Logic
-
Lecture 4 (Fri 1/29):
Logic
-
Lecture 5 (Tue 2/2):
Logic, Relational Algebra
-
Lecture 6 (Fri 2/5):
Relational Algebra
-
Lecture 7 (Tue 2/9):
Relational Algebra,
Datalog & Recursion
-
Lecture 8 (Fri 2/16):
Datalog & Recursion
-
Lecture 9 (Tue 2/19):
Datalog & Recursion,
Alternative Data Models
Pointers to relevant concepts & supplementary material:
-
Unit 1. SQL:
[SAMS'12],
[CS3200'18]
[Cow'03] Ch3 & Ch5,
[Complete'08] Ch6,
[Silberschatz+'20] Ch3.8
-
Unit 2. Logic:
first-order logic (FOL),
relational calculus (RC):
[Barland+'08] 4.1.2 & 4.2.1 & 4.4,
[Genesereth+] Ch6,
[Halpern+'01],
[Cow'03] Ch4.3 & 4.4,
[Elmasri, Navathe'15] Ch8.6 & Ch8.7,
[Silberschatz+'20] Ch27.1 & Ch27.2,
[Alice'95] Ch3.1-3.3 & Ch4.2 & Ch4.4 & Ch5.3-5.4,
[Barker-Plummer+'11] Ch11
-
Unit 3. Relational Algebra:
relational algebra (RA) &
Codd's theorem:
[Cow'03] Ch4.2,
[Complete'08] Ch2.4 & Ch5.1-5.2,
[Elmasri, Navathe'15] Ch8,
[Silberschatz+'20] Ch2.6,
[Alice'95] Ch4.4 & Ch5.4
-
Unit 4. Datalog & Recursion:
Datalog, recursion, stratified Datalog with negation,
Datalog evaluation strategies
stable model semantics:
[Complete'08] Ch5.3, [Cow'03] Ch 24,
[Koutris'19] L9 & L10,
[G., Suciu'10]
-
Unit 5. Alternative Data Models:
NoSQL:
[Hellerstein, Stonebraker'05], [Sadalage, Fowler'12], [Harrison'16]
Topic 2: Complexity of query evaluation
-
Lecture 10 (Fri 2/12):
Conjunctive Queries
-
Lecture 11 (Tue 2/23):
Conjunctive Queries
-
Lecture 12 (Fri 2/26): (P1 due)
Conjunctive Queries, Beyond Conjunctive Queries
-
Lecture 13 (Tue 3/2):
Beyond Conjunctive Queries, Provenance
-
Lecture 14 (Fri 3/5):
Provenance, Reverse Data Management
Pointers to relevant concepts & supplementary material:
-
Unit 1. Conjunctive Queries:
query evaluation of conjunctive queries (CQs),
data vs. query complexity, homomorphisms,
constraint satisfaction (CS), query containment, query minimization, absorption:
[Kolaitis, Vardi'00], [Vardi'00],
[Kolaitis'16],
[Koutris'19] L1 & L2
-
Unit 2. Beyond Conjunctive Queries:
unions of conjunctive queries, bag semantics,
nested queries, tree pattern queries:
[Kolaitis'16],
[Tan+'14], [G.'11], [Martens'17]
-
Unit 3. Provenance:
provenance & semirings:
[Buneman+02],
[Green+07],
[Cheney+09],
[Green,Tannen'17],
[Kepner+16],
[Buneman, Tan'18]
-
Unit 4. Reverse Data Management:
update propagation, resilience:
[Buneman+02],
[Kimelfeld+12],
[Freire+15],
[Meliou+'11]
Topic 3: Efficient query evaluation & factorized representations
-
Lecture 15 (Tue 3/9):
Acyclic queries
-
Lecture 16 (Fri 3/12): (P2 due)
Acyclic queries
-
Lecture 17 (Tue 3/16):
Acyclic queries,
Cyclic queries
-
Lecture 18 (Fri 3/19):
Cyclic queries
-
Lecture 19 (Tue 3/23):
Cyclic queries, Factorized Databases
-
Lecture 20 (Fri 3/26):
Factorized Databases
-
Lecture 21 (Tue 4/2):
Top-k
-
Lecture 22 (Fri 4/2): (P3 due):
Top-k
-
Lecture 23 (Tue 4/6):
Data fusion & normalization
Pointers to relevant concepts & supplementary material:
-
Unit 1. Acyclic queries:
query hypergraph, Yannakakis algorithm, GYO reduction,
dynamic programming (DP),
algebraic semirings,
ranked enumeration:
[Alice] Ch6.4, [Koutris'19] L4, [Tziavelis+'19]
-
Unit 2. Cyclic queries:
tree & hypertree decomposition,
query widths,
fractional hypertree width,
AGM bound,
worst-case optimal (WCO) join algorithms ,
submodular width and multiple decompositions:
[AGM'13], [NPRR'18], [KNR'17], [KNS'17]
-
Unit 3. Factorized databases:
factorized representations, normalization,
factorized databases:
[Olteanu, Schleich'16]
-
Unit 4. Top-k:
shortest paths, dynamic programming (DP),
Yannakakis, Semirings,
Rankings,
top-k: [Roughgarden'10], [Ilyas+08], [Rahul, Tao'19], Ranked Enumeration [Tziavelis+'19]
-
Unit 5. Normal forms:
normal forms and their information-theoretic justification:
[Complete'18] Ch3, [Lee'87], [Arenas, Libkin'05]
-
Unit 6. Data fusion: full disjunction, subsumption:
[Dong, Naumann'09]
Topic 4: Linear algebra & rankings on graphs
-
Lecture 24 (Fri 4/9):
Graphs
-
Lecture 25 (Tue 4/13):
Graphs
Pointers to relevant concepts & supplementary material:
-
Unit 1. Graphs:
Linear algebra (LA),
random walks with restarts (RWR), PageRank,
graph propagation, label propagation,
belief propagation (BP),
semi-supervised learning (SSL),
factorized graph representations
Topic 5: Logic, uncertainty & inconsistency
-
Lecture 26 (Fri 4/16):
Probabilities & Uncertainty
Pointers to relevant concepts & supplementary material:
-
Unit 1. Probabilities & Uncertainty:
Probabilistic inference & approximation, Boolean functions,
query lineage, probabilistic databases,
extensional query plans,
complexity of probabilistic inference,
network reliability,
models of propagation on graphs and hypergraphs,
scalable approximations to probabilistic inference
-
Unit 2. Inconsistency:
measures for inconsistency,
database repairs,
semantics for inconsistency resolution,
maximum entropy principle:
Project presentations
-
Lecture 27 (Tue 4/20): P4 Project presentations
-
Lecture 28 (Fri 4/23): P4 Project presentations
Literature list
Topic 1: Data models and query languages
-
Unit 1: SQL
-
[SAMS'19] SAMS: Teach yourself SQL in 10min by Forta. 5th ed. 2019.
It is available for free for Northeastern students
from Safari books eBook
(you may have to first login from our library website, then try again the previous link).
If the book is checked out online, you can use the 4th edition (there is almost no difference between 4th and 5th ed) as
Safari books eBook,
or as EBSCOhost eBook.
-
[cs3200] PostgreSQL setup,
PgAdmin 4 tutorial.
Files to follow along our SQL lectures: SQL Activities.
-
[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 5: SQL.
-
[Complete'08] Garcia-Molina, Ullman, Widom. Database Systems. 2nd ed 2008.
Ch 6: SQL.
-
[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 6: SQL
-
[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 3.8: Nested subqueries.
-
Unit 2: Logic, relational calculus
-
[Barland+'08] Barland, Kolaitis, Vardi, Felleisen, Greiner.
Intro to Logic,
(alternative PDF version).
4.1.2 First-Order Logic: bound variables, free variables,
4.2.1 First-Order Logic: equivalences,
4.4 Exercises for First-Order Logic.
-
[Genesereth+] Genesereth et al. Introduction to logics.
Ch 6: Relational Logic.
-
[Halpern+'01] Halpern, Harper, Immerman, Kolaitis, Vardi, Vianu.
On the Unusual Effectiveness of Logic in Computer Science.
Bulletin of Symbolic Logic 2001.
-
[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.3: Relational calculus,
Ch 4.4: Safety.
-
[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8.6: Tuple relational calculus,
Ch 8.7: Domain relational calculus,
Ch 8.6.8: Safe expressions.
-
[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 27.1: Tuple relational calculus,
Ch 27.2: Domain relational calculus,
Ch 27.1.3 & 27.2.3: Safe expressions.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 3.1: Structure of the relational model,
Ch 3.2: Named vs. unnamed perspective,
Ch 3.3: Conventional vs. logic programming perspective,
Ch 4.2: Logic-based perspective,
Ch 5.3: Relational calculus, domain independence,
Ch 5.4: Syntactic Restrictions for Domain Independence.
-
[Barker-Plummer+'11]: Barker-Plummer, Barwise, Etchemendy. Language, Proof and Logic. 2nd ed. 2011.
Ch 11: Multiple Quantifiers.
-
Unit 3: Relational Algebra, Codd's theorem
-
[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 4.2: Relational algebra.
-
[Complete'08] Garcia-Molina, Ullman, Widom. Database Systems. 2nd ed 2008.
Ch 2.4: Relational algebra,
Ch 5.1-5.2: Relational Algebra and bags.
-
[Elmasri, Navathe'15] Fundamentals of Database Systems. 7th ed 2015.
Ch 8: Relational Algebra.
-
[Silberschatz+'20] Silberschatz, Korth, Sudarshan. Database system concepts. 7th ed 2020.
Ch 2.6: Relational algebra.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 4.4: Algebraic perspectives,
Ch 5.3-5.4: Codd's theorem: Equivalence of algebra and calculus.
-
Unit 4: Datalog, stable models
-
[Complete'08] Garcia-Molina, Ullman, Widom. Database Systems. 2nd ed. 2008.
Ch 5.3-5.4: Datalog and bags,
Ch 10.2: Recursion in SQL.
-
[Cow'03] Ramakrishnan, Gehrke. Database Management Systems. 3rd ed 2003.
Ch 24.1-24.2: Datalog.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 12.2: Datalog and model-theoretic semantics,
Ch 12.3: Fixpoint semantics,
Ch 13.1: Semi-naive evaluation,
Ch 15.2: Stratified semantics and precedence graphs
-
[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L9: Datalog evaluation,
L10: Datalog negation
-
[G., Suciu'10] Stable models: Gatterbauer, Suciu. Data conflict resolution using trust mappings.
SIGMOD 2010.
[PPTX slides],
[PDF slides]
-
Unit 5: Alternative data models
-
[Sadalage, Fowler'12] NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. 2012.
Safari books eBook(NEU free online access).
Ch 8: Key-value stoes,
Ch 9: Document databases,
Ch 10: Column-famility stores,
Ch 11: Graph databases.
-
[Harrison'16] Next Generation Databases: NoSQL, NewSQL, and Big Data. 2016.
Safari books eBook (NEU free online access)].
p43-p51: CAP, Eventual Consistency, Hashing,
p61-p62: MongoDB,
p65-p68: Graphs,
p75-p80: Columns,
p145-156: Data models,
p163, 164, 166: Secondary indexing.
-
[Stonebraker, Hellerstein'05] What Goes Around Comes Around. 2005.
Online PDF.
Also see various discussions of the paper.
-
[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.
Topic 2: Complexity of query evaluation & reverse data management
-
Unit 1: Query evaluation of conjunctive queries
-
[Chandra, Merlin'77] Optimal implementation of conjunctive queries in relational data bases. STOC 1977.
-
[Vardi'81] The Complexity of Relational Query Languages. STOC 1982.
-
[Vardi'00] Constraint satisfaction and database theory: a tutorial. PODS 2000.
-
[Kolaitis, Vardi'00]: Conjunctive-Query Containment and Constraint Satisfaction. JCSS 2000.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 2.1: Theoretical background,
Ch 6.2: Conjunctive queries & homomorphisms & query containment.
-
[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
First-order logic as a database query language,
the query evaluation problem,
the query containment problem,
data complexity and combined complexity,
conjunctive queries.
-
[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L1: conjunctive queries,
L2: query containment.
-
Socratica on abstract algebra (Youtube):
why groups,
groups with examples,
Cayley tables for groups,
group homomorphisms,
group isomorphisms
-
3blue1brown on group homomorphisms (Youtube):
Euler's formula with introductory group theory,
Euler's number e
-
Unit 2: Beyond conjunctive queries
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.3: Undecidability of equivalence for calculus.
-
[Kolaitis'16]: Logic and Databases. Logical Structures in Computation Boot Camp, Berkeley 2016.
unions of conjunctive queries,
undecidability of query containment for unions of conjunctive queries.
-
[Tan+'14] Tan, Van den Bussche, Zhang. Undecidability of satisfiability in the algebra of finite binary relations with union, composition, and difference. Corr 1406.0349.
-
[G.'11] Gatterbauer. Databases will visualize queries too. VLDB 2011.
Video,
Slides.
-
[Martens'17] Short summary of longer papers below Optimizing tree pattern queries: why cutting is not enough (invited talk). STOC 2017.
Czerwinski, Martens, Niewerth, Parys. Optimizing Tree Patterns for Querying Graph- and Tree-Structured Data. SIGMOD Record 2017.
Czerwinski, Martens, Niewerth, Parys. Minimization of Tree Patterns. JACM 2018.
-
Unit 3: Provenance
-
[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.
-
[Cheney+'09] Cheney, Chiticariu, Tan. Provenance in databases: why, how, and where. Foundations and trends in databases 2009.
-
[Green+'07] Green, Karvounarakis, Tannen. Provenance Semirings. PODS 2007.
-
[Green, Tannen'17] PODS 2017 Test of Time Award: The Semiring Framework for Database Provenance. PODS 2017. (Slides).
See also:
Val Tannen's EDBT 2010 keynote: Provenance for Database Transformations. EDBT 2010.
Val Tannen's talk @ Logical Structures in Computation Boot Camp (including video):
Provenance for Database Transformations, Berkeley 2016.
-
[Kepner+'16] Kepner, Gadepally, Hutchison, Jananthan, Mattson, Samsi, Reuther. Associative Array Model of SQL, NoSQL, and NewSQL Databases. HPEC 2016.
-
[Buneman, Tang'18] Data Provenance: What next? SIGMOD Record 2018.
-
Socratica on abstract algebra (Youtube):
Rings with examples
-
Unit 4: Reverse data management
-
[Buneman+'02] Buneman, Khanna, Tan. On Propagation of Deletions and Annotations Through Views. PODS 2002.
-
[Kimelfeld+12] Kimelfeld, Vondrak, Williams. Maximizing Conjunctive Views in Deletion Propagation. TODS 2012.
-
[Freire+'15] Freire, Gatterbauer, Immerman, Meliou. The complexity of resilience and responsibility for self-join-free conjunctive queries. PVLDB 2015.
-
[Freire+'20] Freire, Gatterbauer, Immerman, Meliou. New results for the complexity of resilience for binary conjunctive queries with self-joins. PODS 2020.
-
[Meliou+'11] Meliou, Gatterbauer, Suciu.
Reverse data management.
PVLDB 2011 (challenges & vision track).
-
[Guidotti+'18] Guidotti, Monreale, Ruggieri, Turini, Giannotti, Pedreschi. A Survey of Methods for Explaining Black Box Models. ACM Computing Surveys 2018.
Topic 3: Efficient query evaluation & factorized representations & normalization
-
Unit 1: Acyclic queries
-
[Yannakakis'81] Algorithms for acyclic database schemes. VLDB 1981.
-
[Bernstein, Chiu'81]
Bernstein, Chiu. Using semi-joins to solve relational queries. JACM 1981.
-
[Bernstein, Goodman'81] Power of natural semi-joins. SIAM J., 1981.
-
[Alice'95] Abiteboul, Hull, Vianu. Foundations of Databases. 1995.
Ch 6.4: Acyclic joins, semi-join reduction, Yannakakis, GYO algorithm.
-
[Gottlob+02] Gottlob, Leone, Scarcello. Hypertree Decompositions and Tractable Queries. JCSS 2002.
-
[Ullman'01] CS345 Lecture Notes. 2001.
Acyclic hypergraphs.
Computing acyclic joins, Yannakakis' algorithm.
-
[Koutris'19] Lecture notes from CS784 Wisconsin 2019.
L4: acyclic queries,
-
[Brault-Baron'16] Hypergraph Acyclicity Revisited. ACM Computing Surveys 2016
-
Unit 2: Cyclic queries
-
[AGM'13] AGM bound:Atserias, Grohe, Marx. Size bounds and query plans for relational joins. SIAM J. Comput. 2013.
-
[NPRR'18] WCO joins: Ngo, Porat, Re, Rudra. Worst-case optimal join algorithms. JACM 2018.
-
[KNR'17] FAQ: Khamis, Ngo, Rudra. Juggling functions inside a database. SIGMOD record 2017.
-
[KNS'17] Khamis, Ngo, Suciu. What do Shannon-type inequalities, submodular width, and disjunctive Datalog have to do with one another? PODS 2017.
-
Unit 3: Factorized representations
-
Unit 4: Rankings
-
[Fagin01] Fagin's algorithm: Fagin. Combining Fuzzy Information from Multiple Systems. JCSS 2001. (PODS 1996)
-
[Fagin+03] Threshold algorithm: Fagin, Lotem, Naor. Optimal aggregation algorithms for middleware. JCSS 2003. (PODS 2001)
-
[Roughgarden'10]: CS369N: Beyond Worst-Case Analysis Lecture #1: Instance Optimality. Stanford 2010.
-
[Ilyas+08] Ilyas, Beskales, Soliman. A survey of top-k query processing techniques in relational database systems. ACM Computing Surveys 2008.
-
[Rahul, Tao'19] Rahul, Tao. A Guide to Designing Top-k Indices. SIGMOD record 2019.
-
[Tziavelis+'19] Tziavelis, Ajwani, Gatterbauer, Riedewald, Yang. Optimal algorithms for ranked enumeration of answers to full conjunctive queries. Arxiv 2019.
-
[Gondran, Minoux'08]. Graphs, dioids and semirings: New Models and Algorithms. Springer 2008.
-
[Kepner, Jananthan'18]. Mathematics of Big Data: Spreadsheets, Databases, Matrices, and Graphs. MIT press 2018.
-
[Dijkstra'16]. Dijkstra’s Shortest Path Algorithm: an intuitive Youtube video.
-
Unit 5: Normal forms
-
Unit 6: Data fusion
Topic 4: Linear algebra & rankings on graphs
-
[Kepner, Gilbert'11] Graph algorithms in the language of linear algebra. 2011.
Ch 1-3.
-
[Easley, Kleinberg'10] Networks, crowds, and markets: reasoning about a highly connected world. 2010. Ch 14.6.
-
[GGKF'15] Gatterbauer, Günnemann, Koutra, Faloutsos. Linearized and single-pass belief propagation. VLDB 2015. Youtube video.
-
[KLG'20] Kumar, Langton, Gatterbauer. Factorized Graph Representations for Semi-Supervised Learning from Sparse Data. SIGMOD 2020. Youtube video.
-
Also see related past research seminar: Methods and Algorithms for determining relevance in networks (Fall'13).
-
[3Blue1Brown]: 3Blue1Brown on Essence of linear algebra:
2. Linear combination, Span, Basis vectors,
7. Inverse matrices, column space, null space,
14. Eigenvectors, eigenvalues,
15. Abstract vector spaces,
Topic 5: Logic, uncertainty & inconsistency
-
Unit 1: Probabilities & Uncertainty
-
[Dalvi, Suciu]
Efficient query evaluation on probabilistic databases. VLDBJ 2007.
-
[Suciu+'11]
Suciu, Olteanu, Re, Koch.
Probabilistic databases. Morgan & Claypool 2011.
-
[Gatterbauer, Suciu'14]
Oblivious bounds on the probability of Boolean functions. TODS 2014.
-
[Gatterbauer, Suciu'17]
Dissociation and propagation for approximate lifted inference with standard relational database management systems. VLDBJ 2017.
-
[Chou+'18]
Chou, Gatterbauer, Gogate.
Dissociation-based Oblivious Bounds for Weighted Model Counting . UAI 2018.
-
[Van den Heuvel+'19]
Van den Heuvel, Ivanov, Gatterbauer, Geerts, Theobald.
Anytime Approximation in Probabilistic Databases via Scaled Dissociations. SIGMOD 2019.
-
Unit 2: Inconsistency
Research best practice
Surveys and additional possible topics