Resources
Main Textbook
- Mining Massive Datasets, by Jure, Leskovec, Anand Rajaraman and Jeff Ullman, Cambridge University Press, 2014. The main textbook of this course. You can download it free from here.
Coursera
- Minining Massive Data sets, Stanford University
Articles
- Feng Li, Beng Chin Ooi, M. Tamer Özsu, and Sai Wu. 2014. Distributed data management using MapReduce. ACM Comput. Surv. 46, 3, Article 31 (January 2014), 42 pages.
- Gregory Mone. 2013. Beyond Hadoop. Commun. ACM 56, 1 (January 2013), 22-24.
- A. Anderson, D. Huttenlocher, J. Kleinberg, J. Leskovec. Effects of User Similarity in Social Media. Proc. 5th ACM Symposium on Web Search and Data Mining, 2012.
- J. Tang, T. Lou, J. Kleinberg. Inferring Social Ties across Heterogenous Networks. Proc. 5th ACM Symposium on Web Search and Data Mining, 2012.
- M. Stonebraker. What Does Big Data Mean?, September 2012
- E. Adar, J. Teevan, S.T. Dumais and J. L. Elsas. The Web Changes Everything: Understanding the Dynamics of Web Content, Intern. Conf. on Web Search and Data Mining (WSDM), ACM, 2009
Useful Textbooks
- Mohammed Zaki and Wagner Meira Jr. Data Mining and Analysis: Fundamental Concepts and Algorithms. This book will be published by Cambridge University Press in 2014. You can download it free from here.
- Python Data Science Handbook, Jake VanderPlas, O'reilly, 2016
- Ian H. Witten, Eibe Frank, Mark A. Hall: Data Mining: Practical Machine Learning Tools and Techniques (Third Edition), Morgan Kaufmann,January 2011
- Mahout in Action - Book by Sean Owen and Robin Anil, published by Manning Publications, 2011.
- Hadoop: The Definitive Guide, Second Edition is a book about Apache Hadoop by Tom White, published by O'Reilly Media.
- Introduction to Information Retrieval, by Christopher D. Manning, Prabhakar Raghavan & Hinrich Schütze, Cambridge University Press, 2008.
- Opinion mining and sentiment analysis, by Bo Pang and Lillian Lee, in Foundations and Trends in Information Retrieval 2(1-2), pp. 1–135, 2008. Also available as a book or e-book.
- D. Easley, J. Kleinberg. Networks, Crowds, and Markets: Reasoning About a Highly Connected World. Cambridge University Press, 2010.
- Modern Information Retrieval. R. Baeza-Yates, B. Ribeiro-Neto. Addison-Wesley, 2011.
- Community Detection and Mining in Social Media, Morgan & Claypool, 2010.
- Search Engines: Information Retrieval in Practice. Bruce Croft, Donald Metzler, Trevor Strohman. Addison-Wesley, 2009. An introductory book for search engines. Recommended for undergraduate students.
- Modeling the Internet and the Web: Probabilistic Methods and Algorithms. P. Baldi, P. Frasconi, P. Smyth. Wiley, 2003. Recommended for those who have a good foundation in probability theory.
- Google's PageRank and Beyond: The Science of Search Engine Rankings, Amy N. Langville & Carl D. Meyer, Princeton University Press, 2006. It offers a comprehensive and erudite presentation of PageRank and related search-engine algorithms, and it is written in an approachable way, given the mathematical foundations involved.
- Mining the Web: Analysis of Hypertext and Semi Structured Data. S. Chakrabarti. Morgan Kaufmann, 2002. The best introduction for Web-centric IR.
Journals
- Data Mining and Knowledge Discovery (DMKD) Journal
- ACM Transactions on the Web
- IEEE Transactions on Knowledge and Data Engineering (TKDE)
- IEEE Internet Computing
Proceedings of the International Conferences
Software Toolkits & Datasets
- Hadoop!, a Map-Reduce paradigm
- Mahout
- Google Cloud Platform
- Data Integration and Data Preparation in Python
- Kaggle: Your Home for Data Science
- Weka
- Lucene: A free open source information retrieval library
Related Courses on the WEB
- Mining Massive Datasets (Stanford)
- Analyzing Big Data with Twitter (Berkeley)
- Search Engines and Web Mining (CMU)
- Information Retrieval, Discovery, and Delivery (Princeton)
- Web Mining (Iowa)
- Web Data Mining (Depaul)
- Algorithms for Analyzing Massive Data Sets and Data Mining (Maryland)
- Data Mining (Univeristy of Ioannina)
- Data Mining (Boston)