In a whitepaper, researchers describe "image to image translation", using deep learning techniques to convert sample images into alternate versions of those images, for example adding or removing seasonal elements to a scene, or shifting a scene between day and night.
Any video exported in standard formats could conceivably be run through a program to alter key details. Or, attackers could use exploits against recorders and alter video within the recorder itself to make recorded events or scene details appear materially different.
Image examples, showing changing hair color, expressions, etc. for an input image of a person:
An input picture of a dog in some pose, and then mutating the image to be a different breed dog in the same pose, with surrounding elements (grass, etc) intact:
Video examples (video are just a series of images to this approach):
This should add some interesting new arguments in the future to "admissible video evidence".