SWE404 Big Data Analytics
Undergraduate course of software engineering, Xiamen University Malaysia, 2020-04
We look at the details of the big data tools Hadoop, Spark and related tools that provide SQL-like access to unstructured data. Some more advanced techniques such as Spark Streaming and MLlib will also be introduced. Based on Python, we use PySpark as the main programming tool to implement big data applications. We also introduce some machine learning techniques such as classification, regresion, clustering and collaborative filtering and how to implement them to real applications using PySpark and MLlib API.
Lecture Notes
Lecture 7: Machine Learning and MLlib
Lecture 8: Classification and Regression Algorithms I
Lecture 9: Classification and Regression Algorithms II
Lecture 10: Unsupervised Learning Algorithms
Lecture 11: Recommender Systems & Collaborative Filtering
Lecture 13: Data Visualization