CS 7290, Fall 2017: Special Topics in Data Science:
Foundations in Scalable Data Management
Description, Project, Grading, Calendar, Readings
Coordinates: Tue 11:45 am - 1:25 pm, Thu 2:50 pm - 4:30 pm, in Ryder Hall 126
Instructors: Wolfgang Gatterbauer and Mirek Riedewald
Office hours: Directly after class and via email: {wolfgang,mirek)@ccs.neu.edu
Mailing list: Through Blackboard
Description
After
taking this class you will be able to understand latest trends and
research problems in large-scale data management & analysis and have
gained significant research experience in this field.
Topics: This
course explores research topics in analysis and management of large
data, with a focus on distributed & parallel approaches, join
processing, and imprecise data/approximation. We will discuss and
analyze papers covering applications, algorithms, systems, and
theory--with a focus on the most recent developments. This course is
designed for PhD students, as well as advanced Masters students
with a solid background in algorithms and one or more data-oriented
areas of computer science, incl. machine learning, AI, logics,
information retrieval, and security. A desired
outcome of the course project is the creation of research results that
are publishable in a peer-reviewed conference.
Course format:
Classes are intended to be discussion-based. Think of each class as an
extended brainstorming session or roundtable discussion about a specific
problem or paper. The goal is to create the maximum possible interaction
for the maximum possible understanding. For this purpose, a typical
class will be a mix of presentations of core material and research
papers by the instructors (e.g., to introduce the research areas during
the first 10 lectures), and presentations of research papers by students
(during the remaining classes), with lots of interactive discussions.
Prerequisites:
For CCIS PhD students, there are no formal requirements other than an
interest in the topics covered. For PhD students from other programs and
MS students, instructor approval is required in order to ensure that
this course is a good fit for you. This will be done case-by-case,
usually based on strong performance (grade A) in CCIS graduate-level
courses on big data topics (or equivalent outside CCIS). Please have a look at at the papers in our reading list and decide for yourself: if the content is *entirely* impenetrable for you, this class will be very time-intensive for you: We
are happy to provide you with extra background reading material that
you will need to study in order to understand the weekly readings and do
well on regular written paper summaries and in our class discussions.
A note to first and second year PhD students:
Prof. Gatterbauer is actively looking for multiple PhD students to work
with him. This course provides an excellent opportunity to explore
relevant topics.
Intended Learning Outcomes:
1. Learn about recent research topics and industry trends in large-scale data management.
2. Acquire a technical skill set and background knowledge that equips you to perform data science research.
3. Be able to critically read, evaluate, and present research papers.
4. Gain hands-on research experience during the project, ideally resulting in publishable results.
Course Project
A
big component of this course is a research project. By default each
team will have 2 members (exceptions require instructor approval). We
will facilitate group formation through informal socializing sessions at
the end of the first few lectures, but it generally is each student’s
responsibility to form and manage groups. Please start looking for
project partners right away and pick a topic that is related to this
class. We are happy to discuss ideas with you, and to suggest a list of
project topics, but you are free to select a project outside of this
list. You should meet with us periodically throughout the semester,
updating us on the progress of your project. We will help with the
direction, but unless you take the initiative to actively explore the
topic you choose, you are unlikely to accomplish much.
For
effective collaboration, we will work with tools such as Latex/Git or
Overleaf for document processing and Git or bitbucket for coding.
- Overleaf template for report authoring (template)
Project timeline
F, Oct 6 (noon) | Project proposal | - What is your thesis for his project (read this excellent advice: http://web.cs.ucla.edu/~palsberg/shivers.html)?
What is the problem you are solving? What is the key new idea or what
other work influenced your approach? What will be your contribution to
the state-of-the art (e.g., identifying a new problem, or solving an
existing problem better)?
- How are you going to evaluate success of your project?
- What are major risks why your project might not succeed?
- Maximal 1 page
|
T, Oct 17 | Proposal presentation | - Maximal 5 min overview of your project idea
|
M, Nov 13 (noon) | Interim report | - Show us that your team is on the right track
|
T, Nov 14 | Interim presentation | - Maximal 10 min update on your project progress
|
R, Nov 30 T, Dec 5 | Final presentation | - Maximal
25 min interactive presentation of your project. Be prepared to answer
questions during the presentation, not just at the end
- Optional demo your system / approach / algorithm
|
M, Dec 11 (noon) | Final report | - Incorporate feedback received during the presentations
|
Grading and Deadlines
There are no exams or traditional homework assignments. However, the class requires (1) an extensive project, (2) students are required to read, analyze, and review about 2-4 research papers per week, and (3) present 2-3 papers over the course of the class. Depending
on previous knowledge, some papers might require reading of additional
material to obtain the necessary background.
Paper reviews: Students will provide on average 2-4 brief paper reviews per week. We highly encourage you to form study groups, enabling you to jointly read and discuss the papers before coming to class.
You still submit your paper review individually, one for each required
paper. The review and optional discussions with your study group will
prepare you to be ready for the discussion in class. Reviews have to be submitted by 6pm the day before class via the following Google form https://goo.gl/forms/93IGjkVOCxp0fde62
Paper presentations:
Starting in October, students (alone or in team of 2) will present
research papers during the lecture, sometimes following a presentation
of introductory material by the instructors. Those presenting a paper
are also the "discussion leaders;" all other students will participate
in the discussion and ask questions. Send your slides to the instructors
by 6pm before the class you are presenting (we may respond with suggestions) and optionally come to office hours if you are stuck.
Class and Reading | 30% | - Pre-class paper reviews
- In-class participation
|
Paper presentation | 20% | - You will present and lead the discussion in some lectures
|
Course Project | 50% | - Project proposal
- Intermediate report
- Presentation
- Final report
|
Calendar
This calendar is very preliminary and will be updated as we progress
Overview classes (led by instructors) |
01 R, Sept 7 | Course introduction
Preparation - The Beckman Report on Database Research. Abadi et al. CACM 2016 (paper)
- Research Directions for Principles of Data Management. Abiteboul et al. Dagstuhl Workshop 16151 (paper)
|
02 T, Sept 12 | Data exploration
Papers to review - Controlling
False Discoveries During Interactive Data Exploration. Zhao, De
Stefani, Zgraggen, Binnig, Upfal, Kraska. SIGMOD 2017 (paper)
|
03 R, Sept 14 | Joins in databases
Papers to review - Skew Strikes Back: New Developments in the Theory of Join Algorithms. Ngo, Ré, Rudra. SIGMOD record 2013 (paper)
Additional preparation - Section 6.4 in Alice book: Computing with Acyclic Joins (website)
- Slides on GYO reduction (slides)
- Slides on Yannakakis algorithm (slides)
|
04 T, Sept 19 | Parallel query processing
Papers to review - Processing theta-joins using MapReduce. Okcan, Riedewald. SIGMOD 2011 (paper)
- Load Balancing and Skew Resilience for Parallel Joins. Vitorovic, Elseidy, Koch. ICDE 2016 (paper)
Additional preparation - "Mining book" Section 2.3.7 Computing Natural Join by MapReduce (website)
- "Mining book" Section 2.5.3 Multiway Joins (website)
|
05 R, Sept 21 | Graphs
Papers to review - Linearized and single-pass belief propagation. Gatterbauer, Günnemann, Koutra, Faloutsos. VLDB 2015 (paper) (Youtube video)
- "Network estimation" (Paper shared on Blackboard)
Additional preparation - Sections 1-3 from Graph algorithms in the language of linear algebra. Kepner, Gilbert. 2011
- Chapter 14.6 from Networks, crowds, and markets: reasoning about a highly connected world. Easley, Kleinberg. 2010 (book website)
|
06 T, Sept 26 | Dichotomy theorems in databases: Guest speaker Cibele Freire
Papers to review - The
complexity of resilience and responsibility for self-join-free
conjunctive queries. Freire, Gatterbauer, Immerman, Meliou. VLDB 2016 (paper)
|
07 R, Sept 28 | Data exploration: Guest speaker Jonathan Ullman
Papers / blog post to review - The
reusable holdout: Preserving validity in adaptive data analysis. Dwork,
Feldman, Hardt, Pitassi, Reingold, Roth. Science 2015 (paper)
- Competing in a data science contest without reading the data. Hardt. (blog post)
Optional preparation - Preserving Statistical Validity in Adaptive Data Analysis. Dwork, Feldman, Hardt, Pitassi, Reingold, Roth. STOC 2015 (long version)
- A firm foundation for private data analysis. Dwork. CACM 2011 (paper)
- The Ladder: A Reliable Leaderboard for Machine Learning Competitions. Blum, Hardt. ICML 2015 (paper)
|
08 T, Oct 3 | Uncertainty
Papers to review - Dissociation
and propagation for approximate lifted inference with standard
relational database management systems. Gatterbauer, Suciu. VLDBJ 2016 (paper)
Optional preparation - Oblivious bounds on the probability of Boolean functions. Gatterbauer, Suciu. TODS 2014 (paper)
- Probabilistic databases: diamonds in the dirt. Dalvi, Re, Suciu. CACM 2009 (paper)
|
09 R, Oct 5 | Uncertainty (continued)
Continued from last class
|
10 T, Oct 10 | Project discussions & "Factorized Evaluation"
Papers to review - Factorized Databases. Olteanu, Schleich. SIGMOD record 2016 (paper)
Optional preparation
|
11 R, Oct 12 | "Modern recommender systems: matrices, bandits, and neurons": Guest speaker Georgia Koutrika
Papers to review - CourseNavigator: interactive learning path exploration. Li, Papaemmanouil, Koutrika. ExploreDB 2016 (paper)
Optional preparation - Finding
Related Forum Posts through Content Similarity over Intention-Based
Segmentation. Papadimitriou, Koutrika, Velegrakis, Mylopoulos. TKDE 2017
(paper)
- Recsplorer:
recommendation algorithms based on precedence mining. Parameswaran,
Koutrika, Bercovitz, Garcia-Molina. SIGMOD 2010 (paper)
- FlexRecs: expressing and combining flexible recommendations. Koutrika, Bercovitz, Garcia-Molina. SIGMOD 2009 (paper)
We
will post a link to the slides as soon as Georgia shares them with us.
The talk is based on a recent longer 3 part tutorial, and for copyright
reasons with ACM, she cannot share them with us, and we cannot post them
yet: https://summerschool.acm.org/user-analytics-for-recommender-systems/
|
12 T, Oct 17 | Project proposal presentations & Factorized Evaluation (continued)
Factorized evaluation: Continued from last Tuesday
|
13 R, Oct 19 | Factorized Evaluation (continued)
Factorized evaluation continued
|
In-depth classes (led by students) |
14 T, Oct 24 | One-slide summaries & Parallel query processing: Equi-joins
Papers to review - Optimizing Joins in a Map-Reduce Environment. Afrati, Ullman EDBT 2010 (paper)
Rundong + Aditya
|
15 R, Oct 26 | Parallel query processing: Equi-joins (continued)
Papers to review - Optimizing Joins in a Map-Reduce Environment. Afrati, Ullman EDBT 2010 (paper)
Rundong + Aditya
|
16 T, Oct 31 | "Factors": Dataset versioning
Papers to review - Principles
of Dataset Versioning: Exploring the Recreation/Storage Tradeoff.
Bhattacherjee, Chavan, Huang, Deshpande, Parameswaran. VLDB 2015
Joe
|
17 R, Nov 2 | Incremental graph maintenance 1/2
Papers to review - Storing and Analyzing Historical Graph Data at Scale. Khurana, Deshpande. EDBT 2016 (paper)
Additional preparation - ImmortalGraph: A System for Storage and Analysis of Temporal Graphs. Miao et al. TOS 2015 (paper)
Joe
|
18 T, Nov 7 | Incremental graph maintenance 2/2
Papers to review - The
G* graph database: efficiently managing large distributed dynamic
graphs. Labouseur, Birnbaum, Olsen Jr., Spillane, Vijayan, Hwang, Han.
Distributed and Parallel Databases. 2016 (paper)
Joe
|
19 R, Nov 9 | Uncertainty: Learning probabilities: Guest lecture Niccolo Meneghetti
Papers to review - Beta
Probabilistic Databases: A Scalable Approach to Belief Updating and
Parameter Learning. Meneghetti, Kennedy, Gatterbauer. SIGMOD 2017 (paper)
Additional preparation - Learning Tuple Probabilities. Dylla, Theobald. Arxiv 2016 (paper)
|
20 T, Nov 14 | Project interim presentations |
21 R, Nov 16 | Distributing Frank-Wolfe via Map-Reduce: Guest lecture: Stratis Ioannidis
Paper to review - Distributing Frank-Wolfe via Map-Reduce. Moharrer, Ioannidis. ICDM 2017 (paper), (PPTX slides)
Optional preparation - Short optimization tutorial. Jaggi. 2015 (slides)
- The Recent Revival of the Frank–Wolfe Algorithm. Jaggi, Harchaoui. 2014 (paper)
|
22 T, Nov 21 | Approximate Query Processing ("AQP")
Papers to review - Wander Join: Online Aggregation via Random Walks. Li, Wu, Yi, Zhao. SIGMOD record 2017 [best of SIGMOD 2016]
Rundong + Aditya
|
R, Nov 23 | NO CLASS (Thanksgiving)
|
24 T, Nov 28 | Parallel query processing: University of Washington
Papers to review - From theory to practice: Efficient join query evaluation in a parallel database system. Chu, Balazinska, Suciu. SIGMOD 2015 (paper)
Rundong + Aditya
|
25 R, Nov 30 | Data exploration & uncertainty: University of Washington
Papers to review - Probabilistic Database Summarization for Interactive Data Exploration. Orr, Balazinska, Suciu. VLDB 2017
Xinyan + Xiaofeng
|
Project presentations |
T, Dec 5 | Project presentations |
R, Dec 7 | Project presentations |
M, Dec 11 | Final report due |
Topic and Reading list
1. Basic Database Background and overview
- "Alice book": Foundations of Databases. Abiteboul, Hull, Vianu (website). Ch 3, 4, and 6.4
- "Mining book": Mining of massive datasets. Leskovec, Rajaraman, Ullman (website). Ch. 2.3 and 2.5
- The Beckman Report on Database Research. Abadi et al. CACM 2016 (paper)
- Research Directions for Principles of Data Management. Abiteboul et al. Dagstuhl Workshop 16151 (paper)
2. Data Exploration
- Controlling
False Discoveries During Interactive Data Exploration. Zhao, De
Stefani, Zgraggen, Binnig, Upfal, Kraska. SIGMOD 2017 (paper)
- A Data Quality Metric (DQM): How to Estimate the Number of Undetected Errors in Data Sets. Chung, Krishnan, Kraska. VLDB 2017 (paper)
- Toward Sustainable Insights, or Why Polygamy is Bad for You. Binnig, De Stefani, Kraska, Upfel, Zgraggen, Zhao. CIDR 2017 (paper)
- Estimating the Impact of Unknown Unknowns on Aggregate Query Results. Chung, Mortensen, Binnig, Kraska. TODS 2017 (paper)
- Probabilistic Database Summarization for Interactive Data Exploration. Orr, Balazinska, Suciu. VLDB 2017 (paper)
- The
reusable holdout: Preserving validity in adaptive data analysis. Dwork,
Feldman, Hardt, Pitassi, Reingold, Roth. Science 2015 (paper)
- Preserving Statistical Validity in Adaptive Data Analysis. Dwork, Feldman, Hardt, Pitassi, Reingold, Roth. STOC 2015 (long version)
- A firm foundation for private data analysis. Dwork. CACM 2011 (paper)
- The Ladder: A Reliable Leaderboard for Machine Learning Competitions. Blum, Hardt. ICML 2015 (paper)
3. Join Algorithms
- "AGM algorithm": Size bounds and query plans for relational joins. Atserias, Grohe, Marx. SIAM J. Comput. 2013 (paper)
- Skew Strikes Back: New Developments in the Theory of Join Algorithms. Ngo, Ré, Rudra. SIGMOD record 2013 (paper)
- Leapfrog Triejoin: A Simple, Worst-Case Optimal Join Algorithm. Veidhulzen. ICDT 2014 (paper)
- Juggling Functions Inside a Database. Khamis, Ngo, Rudra. SIGMOD record 2017 (paper)
- In-Database Learning with Sparse Tensors. Ngo, Nguyen, Olteanu, Schleich. Arxiv pre-print (paper)
4. Parallel Query Processing
- Optimizing Joins in a Map-Reduce Environment. Afrati, Ullman. EDBT 2010 (paper)
- Communication Cost in Parallel Query Processing: A Tutorial. Suciu. PODS 2017 (PPTX slides, Youtube video)
- Processing theta-joins using MapReduce. Okcan, Riedewald. SIGMOD 2011 (paper)
- Load Balancing and Skew Resilience for Parallel Joins. Vitorovic, Elseidy, Koch. ICDE 2016 (paper)
- Scaling Out Continuous Multi-Way Theta-Joins. Hoffmann, Michel. BeyondMR 2017 (paper)
- Skew in Parallel Query Processing. Beame, Koutris, Suciu. JACM (paper, slides)
- Communication steps for parallel query processing. Beame, Koutris, Suciu. PODS 2013. (paper)
- From theory to practice: Efficient join query evaluation in a parallel database system. Chu, Balazinska, Suciu. SIGMOD 2015 (paper)
- Computing Marginals Using MapReduce. Afrati, Sharma, Ullman, Ullman. Journal of Computer and System Sciences 2017 (paper)
- Some pairs problems. Ullman, Ullman. Beyond MR 2016 (paper)
- Distributing Frank-Wolfe via Map-Reduce. Moharrer, Ioannidis. ICDM 2017 (paper)
5. "Factors" (Factorized databases / Dataset versioning / Incremental updates)
- The Dynamic Yannakakis Algorithm: Compact and Efficient Query Processing Under Updates. Idris, Ugarte, Vansummeren. SIGMOD 2017 (paper)
- LINVIEW: Incremental View Maintenance for Complex Analytical Queries. Nikolic, ElSeidy, Koch. SIGMOD 2014 (paper)
- Factorized Databases. Olteanu, Schleich. SIGMOD record 2016 (paper)
- Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff. Bhattacherjee, Chavan, Huang, Deshpande, Parameswaran. VLDB 2015 (paper)
- DEX: Query Execution in a Delta-based Storage System. Chavan, Deshpande. SIGMOD 2017 (paper)
- Managing structured collections of community data. Gatterbauer, Suciu. CIDR 2011 (paper)
- Learning Linear Regression Models over Factorized Joins. Schleich, Olteanu, Ciucanu. SIGMOD 2016 (paper)
- Incremental Maintenance of Regression Models over Joins. Nikolic, Olteanu. (arxiv preprint)
- Size Bounds for Factorised Representations of Query Results. Olteanu, Zavodny. TODS 2015 (paper)
- Aggregation and Ordering in Factorised Databases. Bakibayev, Kocisky, Olteanu, Zavodny. VLDB 2013 (paper)
- Scaling Factorization Machines to Relational Data. Rendle. VLDB 2013 (paper)
- Scalable linear algebra on a relational database system. Luo, Gao, Gubanov, Perez, Jermaine. ICDE 2017 (paper)
- LSH Ensemble: Internet-Scale Domain Search. Zhu, Nargesian, Pu, Miller. VLDB 2016 (paper)
6. Dichotomy Theorems in Databases
- The complexity of resilience and responsibility for self-join-free conjunctive queries. Freire, Gatterbauer, Immerman, Meliou. VLDB 2016. (paper)
- The data complexity of consistent query answering for self-join- free conjunctive queries under primary key constraints. Koutris, Wijsen SIGMOD record 2016 [best of PODS 2015] (paper, technical perspective)
- A dichotomy in the complexity of deletion propagation with functional dependencies. Kimelfeld. PODS 2012 (paper)
- Maximizing conjunctive views in deletion propagation. Kimelfeld, Vondrak, Williams. TODS 2012
- (paper)
- Dichotomies in the Complexity of Preferred Repairs. Fagin, Kimelfeld, Kolaitis. PODS 2015 (paper)
7. Graphs
- Graph algorithms in the language of linear algebra. Kepner, Gilbert. 2011 (book website): Chapters 1-3
- Networks, crowds, and markets: reasoning about a highly connected world. Easley, Kleinberg. 2010 (book website): Chapter 14.6
- Linearized and single-pass belief propagation. Gatterbauer, Günnemann, Koutra, Faloutsos. VLDB 2015 (paper)
- BePI: Fast and Memory-Efficient Method for Billion-Scale Random Walk with Restart. Jung, Park, Sael, Kang. SIGMOD 2017. (paper)
- Spinner: Scalable Graph Partitioning in the Cloud. Martella, Logothetis, Loukas, Siganos. ICDE 2017 (paper)
- Parallelizing Sequential Graph Computations. Fan, Xu, Wu, Yu, Jiang, Zheng, Zhang, Cao, Tian. SIGMOD 2017 (paper)
- Incremental Graph Computations: Doable and Undoable. Fan, Hu, Tian. SIGMOD 2017 (paper)
- MultiQuery Optimization for Subgraph Isomorphism Search. Ren, Wang. VLDB 2017. (paper)
- Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation. Lu, Cheng, Yan, Wu. VLDB 2015 (paper)
- ImmortalGraph: A System for Storage and Analysis of Temporal Graphs. Miao et al. TOS 2015 (paper)
- Storing and Analyzing Historical Graph Data at Scale. Khurana, Deshpande. EDBT 2016 (paper)
- The
G* graph database: efficiently managing large distributed dynamic
graphs. Labouseur, Birnbaum, Olsen Jr., Spillane, Vijayan, Hwang, Han.
Distributed and Parallel Databases. 2016 (paper)
- [Paper on Network estimation] shared on BlackBoard
8. Uncertainty
- Probabilistic databases. Suciu, Olteanu, Re, Koch. 2011 (book, alternative link)
- Probabilistic databases: diamonds in the dirt. Dalvi, Re, Suciu. CACM 2009 (paper)
- The dichotomy of probabilistic inference for unions of conjunctive queries. Dalvi, Suciu. JACM 2012 (paper)
- Dissociation
and propagation for approximate lifted inference with standard
relational database management systems. Gatterbauer, Suciu. VLDBJ 2016 (paper)
- Oblivious bounds on the probability of Boolean functions. Gatterbauer, Suciu. TODS 2014 (paper)
- Convergent Interactive Inference with Leaky Joins. Yang, Kennedy. EDBT 2017. (paper)
- Learning Tuple Probabilities in Probabilistic Databases. Dylla, Theobald. Reasoning Web 2014 (paper)
- Learning Tuple Probabilities. Dylla, Theobald. Arxiv 2016 (paper)
- Anytime approximation in probabilistic databases. Fink, Huang, Olteanu. VLDBJ 2013 (paper)
- Approximate confidence computation in probabilistic databases. Olteanu, Huang, Koch. ICDE 2010 (paper)
- Sensitivity
analysis and explanations for robust query evaluation in probabilistic
databases. Kanagal, Li, Deshpande. SIGMOD 2011 (paper)
- Beta
Probabilistic Databases: A Scalable Approach to Belief Updating and
Parameter Learning. Meneghetti, Kennedy, Gatterbauer. SIGMOD 2017 (paper)
- A Collective, Probabilistic Approach to Schema Mapping. Kimmig, Memory, Miller, Getoor. ICDE 2017 (paper)
- [Paper on Optimal Oblivious bounds] shared on BlackBoard
9. Approximate Query Processing (AQP)
- Approximate Query Processing: No Silver Bullet. Chaudhuri, Ding, Kandula. SIGMOD 2017 (keynote)
- Approximate Query Engines: Commercial Challenges and Research Opportunities. Mozafari. SIGMOD 2017 (keynote)
- Database Learning: Toward a Database that Becomes Smarter Every Time. Park, Tajik, Cafarella, Mozafari. SIGMOD 2017 (paper)
- Wander Join: Online Aggregation via Random Walks. Li, Wu, Yi, Zhao. SIGMOD record 2017 [best of SIGMOD 2016] (paper)
- Two-Level Sampling for Join Size Estimation. Chen, Yi. SIGMOD 2017 (paper)
- DAQ: A New Paradigm for Approximate Query Processing. Potti, Patel. VLDB 2015 (paper)
- The
Analytical Bootstrap: a New Method for Fast Error Estimation in
Approximate Query Processing. Zeng, Gao, Mozafari, Zaniolo. SIGMOD 2014 (paper)
10. Distributed Transactions
- Spanner: Becoming a SQL System. Bacon et al. SIGMOD 2017 (paper)
- The End of a Myth: Distributed Transactions Can Scale. Zamanian, Binnig, Harris, Kraska. VLDB 2017 (paper)
Basic Research Setup (Please read these on your own)
- Efficient Reading of Papers in Science and Technology. Hanson. 1989 (paper)
- Some Hints to Improve Writing of Technical Papers. Valduriez. 1994. (paper)
- How to write a technical paper. Ernst. 2005-2017 (website)
- Everything I wanted to know about C.S. graduate school at the beginning but didn’t learn until later. Azuma. 1997-2017 (website)
- Summary of "The Effective Engineer by Edmond Lau" 2016. (website)
- Metacademy: How to learn on your own. (website)