SWE404 Big Data Analytics

Undergraduate course of software engineering, Xiamen University Malaysia, 2020-04

We look at the details of the big data tools Hadoop, Spark and related tools that provide SQL-like access to unstructured data. Some more advanced techniques such as Spark Streaming and MLlib will also be introduced. Based on Python, we use PySpark as the main programming tool to implement big data applications. We also introduce some machine learning techniques such as classification, regresion, clustering and collaborative filtering and how to implement them to real applications using PySpark and MLlib API.