BACKGROUND: Massive amounts of data are generated on a daily basis and machine learning approaches are employed in learning from existing datasets and subsequently making predictions about data not included in those datasets. There are several applications of machine learning which include recommender systems, handwriting recognition, facial recognition, intrusion detection in computer networks, credit-card fraud detection, voice recognition and self-driving cars (autonomous vehicles), to mention a few.
PROJECT OVERVIEW: The main goal of this project is to gain some familiarity with ML techniques and develop some understanding of which ML models are better suited to specific types of datasets. This knowledge will be useful in future ML-related projects.
In this project, students will utilize the Scikit-Learn ML (Python) libraries with the bundled datasets as well as other real-world datasets from other sources to explore approaches to learning from data (supervised and unsupervised machine learning). They will gain experience in training scikit-learn estimators (i.e. machine learning algorithms commonly referred to as models) using existing data and testing the accuracy of the resulting models in making predictions. To determine which models perform better for specific datasets, models will be customized by adjusting some of their default parameters, and a comparison of the accuracy of predictions of different customized and un-customized models will be performed. Students will also be involved in a literature review of applications of ML in areas including recommender systems.
In the course of the project, there is also the potential for students to be introduced to parallel programming with CUDA on graphics processing units (GPUs). GPUs, which are specifically designed to process large amounts of data, can considerably speed up the execution time of computationally intensive programs by executing the parallelizable parts of the program in parallel. The deployment of GPUs in training ML models may be explored in addition.
DETAILS: This project will be conducted over a period of 9 (40-hour) weeks. As some foundational programming experience is required for the project, applicants should at a minimum have taken CSCI 235. Having taken CSCI 255 in the past or currently being enrolled in that course would be an added advantage, though this is not required. Students who work on the project will be expected to submit a weekly summary / report of their weekly activities and their findings from work done during the week.
* The effects of COVID-19 notwithstanding, every effort will be made to ensure that the project continues as planned. Any modifications to the project details warranted by COVID-19 will be communicated where applicable.