Take a look at the following image, which shows the cross-section of a lens and how light flows through the lens, hits the convergence point (where the red lines cross), and then goes to the image sensor (the blue line).
If we say the blue line is a 1" image sensor, and the yellow line is a 1/3" image sensor (it's obviously not to scale), you can see how the smaller sensor would be getting less of the overall image from the lens.
Let's assume both imagers produce a 1080p 16:9 output image (imager size has little or nothing to do with final resolution, different sized imagers can produce the same final resolution).
The smaller imager would see only the part of the image that was in the middle part of the larger imager, it is like a zoom effect. Or, to get the same total field of view from the blue imager that the yellow imager sees, you would have to zoom in.
If you hold AOV and scene geometry constant, but change the imager size, you also need to change the lens/focal length to get the same output image.
Holding the lens constant and changing imager size would affect the scene/AOV.
Let me know if that helps/makes sense.