Texts: Assignment questions
1. Implement, in a computer language of your choice, the method of dominance-based multi-objective simulated annealing (DBMOSA) in respect of finding high-quality hyper-parameter values for a one-layer perceptron, i.e. standard feedforward neural network comprising one hidden layer. The dataset to which the neural network is applied is the MNIST digits classification dataset (accessible at these hyperlinks: Tensorflow, Keras, and Pytorch. This is a dataset of 60,000 28x28 grayscale images of the 10 digits i.e. 1, 2, 3, ..., 10), along with a test set of 10,000 images. In respect of data pre-processing, normalize each grayscale value between zero and one, i.e. divide by 255.
Two objective functions are to be minimized, they are:
1) Error, which can be expressed as 1 - classification accuracy, and
2) Complexity, which can be expressed as the number of weights in the neural network.
The following decision variables (together with bounds) should be considered in respect of neural network hyper-parameters:
- The number of hidden neurons (integer-valued): [10, 50]
- The learning rate (real-valued): [0.001, 0.01]
- The batch size (integer-valued): [16, 128]
Constrain the number of training epochs to five and employ the vanilla stochastic gradient descent (SGD) optimizer. The remaining neural network hyper-parameters (e.g. activation functions, loss function, and weight initialization) can be decided arbitrarily. Regularization is not necessary.
The use of packages or libraries that specifically provide fully functional DBMOSA implementations is not permitted. You must therefore program it from scratch. Basic mathematical and data management operations can, of course, be performed using existing libraries or packages. Furthermore, the use of libraries and packages for implementing the neural network (such as Keras, Tensorflow, and Pytorch) is highly advisable.
1. Not to be confused with DBMOSA epochs.
Employ a geometric cooling schedule with a geometric rate of α = 0.98. Employ a starting temperature of T = 1, together with a geometric cooling schedule and rate of α = 0.98. Lower the temperature after each iteration. Do not employ reheating. Perform a total of 150 iterations. Employ the following stochastic move operator to generate a neighbor: Randomly sample (based on a uniform distribution) new decision variable values from the specified bounds.
Report on the final archive of non-dominated solutions in terms of decision variable values and objective function values. Provide graphical illustrations of the approximate Pareto optimal set in decision space and objective space after 50, 100, and 150 iterations. Carry out repeated runs (of at least 20) so as to account for the stochastic nature of DBMOSA.