Recommended Textbook and Materials

There is no required textbook because the instructor provides textbook-like course material. To gain a deeper understanding of the material covered in this course, we recommend the following books (most should be available online for free for Northeastern University students from O’Reilly for Higher Education):

  • Design Patterns by Donald Miner and Adam Shook
  • Hadoop: The Definitive Guide by Tom White
  • High Performance Spark by Holden Karau and Rachel Warren
  • Spark: The Definitive Guide by Bill Chambers and Matei Zaharia
  • Spark in Action by Petar Zecevic and Marko Bonaci
  • Programming Elastic MapReduce by Kevin Schmidt and Christopher Phillips

For some topics we will work with research papers or other online resources, e.g., the Hadoop and Spark API doc.