Subscriber Discussion

Off Topic: A Movie Is To A Record Like A Photograph Is To A ?

Or whats the audio equivalent of a picture?

Is there a decent answer?

*assume silent movie, vinyl record





If I hold one note, syllable, or noise indefinitely, what use is it anyway?

No use except maybe in tantric meditation. :)

I was just naively wondering why one stream(video) of information can be frozen and be of use whereas another(audio) is not.


I think even if a 'frame' of audio is a thing, it more or less is just an academic unit. A single frame of video is useful on it's own; it is a complete picture. A single 'note' of audio is not, and it takes a 'measure' (length of time) to gather enough 'notes' to provide useful information.

Perhaps IPVM would consider another addition to these discussion forums.

I'm thinking a check box where we can vote for dumbest post of the year.

I'm thinking a check box where we can vote for dumbest post of the year.

Agreed #1703622, tho I'm not sure you've thought it thru, forum rules prevent oneself from voting on their own posts! ;)

Getting back on the off topic, perhaps you wouldn't mind helping me to salvage something out of this for the sake of the forum.

What are the abstract qualities of a movie (video stream)?

One approach would be to say that it is:

sensory information(visual) received as waves modulating thru a time domain.

What are the abstract qualities of a audio recording:

sensory information(audible) received as waves modulating thru a time domain.

Similar in that respect, right?

What are the abstract qualities of a photograph?

At first blush you might say

sensory information received as waves, unchanging with respect to time

Surely the difference between a movie and a photograph has something to do with time, right? And there is no arguing that as humans we experience them distinctly and find both forms useful.

So the question to you, #622, is why is there no useful:

auditory sensory information received as waves unchanging with respect to time

if there is useful

visual sensory information received as waves unchanging with respect to time?

What's this have to do with IPVM? On the surface, not too much, that's why its posted as off-topic. You might not know it but there are at least a handful of genuine polymaths that post on this site and I was honestly looking for some insight.

But actually, the whole thing stems from a spot-on post that Morton Nielsen made:

LIVE is a slightly different matter. For video you can pretty much decode and show as you get frames, but you just can't do that with audio. You can tolerate stuttering framerates, but audio is a whole different ballgame. You just cannot decode and play, it sounds TERRIBLE and it is useless.

So if you are uncomfortable with the metaphysics, just answer this question, why is "audio is a whole different ballgame" with respect to dropping frames and why does it "sound[s] TERRIBLE and it is useless." when not buffered and video doesn't?

I'm all ears...


I don't care.


Meat Slicer?

A record is a sound track, which a movie encapsulates.

A _____ is a _________, which a photograph encapsulates.

Pixel? Not really, a movie has (mostly) one sound track, but a picture has many pixels, and pixels appeal to only one sense.

Monochrome image? That supposes the picture is color, encapsulating three or more different color layers, plus same sensory problem.

Intensity? Pretty much the same as a pixel.

The challenge is, pixels or color "panes" in aggregate completely comprise a picture, whereas sound tracks cannot completely comprise a movie.

Also, a record can be appreciated by only one sensing modality, while a movie appeals to two sensing modalities. Or, in a theater, three (remember that opening scene in the first star wars where the cruiser fills the screen and the sound is so intense you can feel the vibration?).

Maybe we have to discount the sensory dimensions, because how can you go below 1 sensing modality?

HEARING/SIGHT is to HEARING as SIGHT is to NOTHINGNESS. Ah, maybe I see how Carl came to "Singularity."

Back to the larger discussion, ...

Video is comprised of individual pictures. Humans can perceive pictures one at a time, although we can extract a different kind of information from a time sequence of images presented at a rate compatible with our capabilities. Video stuttering is a nuisance, but we can still perceive most of the intended information because we can perceive each primitive, a photograph.

Sound is somewhat different in what constitutes a primitive. English is said to be comprised of 50 or so fundamental phonic elements. If speech were presented in these primatives, even if presented irregularly in time, we could probably perceive the intended information. For example, when the kids say something like "Daddy how do you spell 'that?'" I slowly phonetically say "Thhh", .... "aaaAAA!" ..... "t" and they (mostly) understand. However, when speech is arbitrarily chopped up and presented at varying randomized rates, in pieces, our processing is challenged to grasp those fundamental phoenetic elements, because the random pieces are not presented in these fundamental perceivable quantities.

I like the primitives explanation, but first before anything disappears down a singularity(like this discussion), let me say i did add the asterik*d disclaimer at the end of my post to even out the modality.

*assume silent movie, vinyl record

Along the same lines as the "presented at the same rate compatible with our capabilities" do you think maybe that audio just doesn't provide anywhere near the amount of data that video does, in either raw pixels or number of primitives percieved per second, so its not useful in such small qtys, for it might it not even comprise one primitive?

picture=1k words


This is a fun discussion. Perhaps long ago, some similar discussions may have led Claud Shannon to his theory of information.

Borrowing from the framework of compounds, molecules, elements, atoms, bosuns/fermions, quarks, ... , let's think about digital representations. I'm challenged to think of a more fundamental digital quantity than a bit, but a bit provides almost no information in and of itself. Aggregating to a sample which is a collection of bits of some length: a sample provides only (visually) a shade between white and black or a color intensity, or (audibly) nothing at all except the position of the speaker cone. Already the same volume of information (a sample) provides more information visually than audibly. That's because the representations are very different. We are sampling sound, but not light, at the Nyquist rate of at least one sample per half wavelength. For light, we're filtering for a few broad bands and simply capturing the intensity information within those bands. Sound, however, is much more dynamic because without a time sequence of samples, we have no sound, but only a static representation of the acoustic actuator's position. This static information provides no sound at all, since a change (not a static pressure) of the conductive medium is necessary to convey a sound. Effectively, we represent images as a crude Fourier series, which is an amplitude sequence of a particular frequency range (eg color band), while sound is much more precisely sampled. If sound were represented in a manner similar to how images are represented, a single sample would present a tone - a real quantity of information. Instead, it's sampled as a time series, which is much more impenetrable when presented asynchronously with respect to time.

This has been a fun thought experiment. Hope I haven't rambled too much.

Chris, how about this? A movie is to a record like a photograph is to an audio clip with at least enough duration to convey a single idea.

One thing that strikes me is that a single picture can contain a lot of information presented virtually instantly (massively parallel input) where audio to be useful requires enough time to build a presentable idea for our brains (serial input).

For an object to be a photograph, isn't defined by how much useful information it presents. That falls under the qualitative judgement of how good a photograph it is. A photograph of a black cat in an unlit coal mine is still a photgraph; it just doesn't contain enough information to convey any useful information.

A silent movie is a series of photographic images presented rapidly and sequentially, enough so that they flow together to tell a story. A record does much the same thing, but requires more time to create the discrete audio images that then must be presented rapidly and sequentially together to tell their own story. Mess up the rate or the order of presentation in either medium and they can become unusable, and no story can be understood.

A movie is to a record as a photograph is to a conversation.

I like Ari's definition, but just like mine, it doesn't quite feel like we've got it nailed.

If we look at a frame from a camera, which is not exactly the same as a photgraph, we have to acknowledge that the way a frame is assembled takes time, just as surely as the audio does. My understanding of how the camera works is as follows.

An image sensor, made up of an array of pixel column and rows absorbs photons of varying energy levels in each element of the array, the quantity and ratio of each energy level of these photons is what determines perceived color and level. The shutter speed determines how long this absorbtion process is allowed to build a number in the cells (the same time for each element). This sets Time S(et) at the shutter speed. Then the values in the cells are latched while they are off-loaded into a buffer, one element at a time, sequentially a row or column at a time This sets Time L(oad). The camera then processes the raw data according to the compression algorithm used, any analytic programs that reside camera-side, and adds some framing and error detection data. This sets time P(rocessing). This is surely an over simplification of the total process; but it does show that we are dealing with a minimum time Time M(inimum)= Time S + Time L + Time P. This Time M is obviously many orders of magnitude smaller than the time required to string together enough pressure waves to form a meaningful audio clip, but it seems likely to be subject to the same poor results if interrupted or truncated.

Tim, Horace please forgive my stealing from both of your posts to posit the following

Thinking about sensors (Tim) and primitives (Horace) led me to the following thought:

Although sound and light as I said both are waves conveying information, sound is only has a instantaneous resolution of 2 at best because the waves are delivered to two sensors, ears. Light on the other hand is percieved in parallel by hundreds of thousands sensors and that is why there so much more information?


No need to apologize, Chris, on my part. This is your thread (at least until John gets weary of us using his space).

Can I assume you are refering to the rods and cones in the human eye as opposed to the hammer and anvil in the ear?

Yes, and in all fairness, i see Brian also alluded to something similar.

Thanks guys for making me think! :)

Well it depends on the photograph of course.