Problem 8. Consider the XOR problem.
0.1)
0(.1
(0.0)
It is impossible for a single neuron to solve the XOR problem with the above data without error. Recall that a feature representation in the input feature space is converted to a linear form using the sigmoid function or the sign function. The new outputs are then combined linearly. If the hidden outputs are not converted to non-linear, then we just get another linear classifier and cannot solve XOR.
We will use a one-layer network with two hidden nodes to classify XOR with error. In the first place, we will separate (0 from the other points as shown in the figure. We will convert the output of each classifier with the sign function that simply returns the sign of the input.
The first equation shown above in dotted line is given by y = I and w = -0.5, and the feature output is sign(w * x * w). Write down the output of the four datapoints as given by this transformation.