We now take the following energy function Eq. (1), which defines a joint distribution over x and y given by Eq. (2). In Eq. (1), the -Ηx_i y_i terms have the desired effect of giving a lower energy (thus higher probability) when x_i and y_i have the same sign and a higher energy when they have the opposite sign; the -Βx_i x_j terms guarantee that the energy is lower when the ground truth pixels have the same sign than when they have the opposite sign; the hx_i terms have the effect of biasing the model towards pixel values that have one particular sign in preference to the other.