Texts: Part I
You are a data scientist in an AI company. You are given a dataset of images containing over 15,000 images of indoor locations. The dataset was originally from MIT, built to tackle the problem of indoor scene recognition. All images are in JPEG format and have been divided into 67 categories. The number of images per category varies. However, there are at least 100 images for each category. There are some issues of image quality in the following folders: Laboratory, WetLaundromat, Library, Livingroom, Lobby, locker_room. Please remove the folders above to avoid program crashes.
Data Source: MIT Indoor Scenes Dataset [Links to an external site.]
You are asked to perform image classification with the label having sixty-one classes by using the attached Jupyter Notebook and writing a script in Python, and run all the cells. You only need to submit a Jupyter Notebook.
Download the dataset that is about 2 GB from Kaggle into the local disk and unzip it.
Build a baseline CNN model on the training dataset and evaluate it on the test dataset.
Build a second CNN model with data augmentation and dropout and evaluate it on the test dataset.
Build a third CNN model based on the pre-trained model (transfer learning) and evaluate it on the test dataset.
Which model do you recommend for the model in Q2, Q3, and Q4? Justify your answer.