At the beginning of the video, where two rectangular blocks are drawn and a grid is added, if the video stopped here, you would have information to draw upon to interpret the image and thereby just sending the image as seen would be referred to as:
Proximity
Linear Perspective
Priming
Bottom-Up Processing
Top-Down Processing