In February, George stated that a "data compression algorithm used
today, consists of a number of sub-algorithms". I would agree, as
algorithms of all forms consist of sub-algorithms. An algorithm
("Al-Kwarism") is little more than a meticulous step-by-step way of
boiling a complex process down until a silicon chip or other imbecile
can understand it. This involves breaking the task into smaller ones.
Matt Mahoney gave a very competent account of the various techniques
used for data compression, and even showed Claude Elwood Shannon's
logarithmic limit.
Had Lempel, Ziv and Welsh taken the agonisingly slow approach of
Mohammed al-Kwarizmi ("Algebra"), where he discovered that "numbers are
constructed out of parts, and can be broken down into their parts",
etc, they might have found EXPNENTIAL compression.
This discovery I have made. You can see it at:
http://wehner.org/compress
I cannot tell you what the compression rate will be. This is because it
depends - as Shannon rightly said - upon the ENTRPY F THE DATA.
However, I can tell you that it is the BEST.
That is because it is fundamental. I abided by the rule of Bertrand
Russell that an open set cannot contain itself. I did not rush ahead,
as with LZW in a GIF file, and arrange for the compression to begin at
once - after the first byte. Instead, I arranged for TW bytes (or in
the language of Cantor, two "elements") to be transferred unchanged
from input to output.
Now the coding begins. These two bytes are coded under a new number
each. That new number is its SET number. But each is just a set on NE.
It is therefore a PRIMITIVE set. However, each has the merit of being a
CLSED set.
We now open a NEW set. We may have to make it primitive. the other
hand, if we are lucky, and the first pair of bytes repeat, we might
make this new set into a CMPUND set. A compound set always contains
two sets.
So we might get the Fibonacci sequence (1, 1, 2, 3, 5, 8, 13, 21, 34,
55. 89 ), where each number represents the number of elements in a
single coded set. For big numbers, this expands at Phi - the "GLDEN
SECTIN", or 1.618 times per step.
, we might fall behind in our compression because nothing
repeats. By the laws of probability, however, a non-repeat scenario
cannot go on forever.
If a large set of data suddenly repeats - as in a video frame - we find
that the system now CATCHES UP by expanding at the BINARY rate. Two
bytes become one set. Two sets of two become a superset of four. Two
supersets become a super-duper-set of eight and so on.
In a graphic, this new system - EXPNENTIAL CMPRESSIN - does as GIF
does, only better. It recognises repeating "poster paint" areas, such
as a blue sky of constant colour, and compresses with exponentially
improving efficiency. It thrives on abundant data. Unfortunately, CPUs
do not. A search through gigabytes of source to match a string is an
N-SQUARED algorithm. Special string-match chips will probably be needed
if the best is to be achieved.
When it comes to video, the SAME search for "newness" does what MPEG
does. It does frame-by-frame comparison. So pixel-matching and
frame-matching are both built-in to the system.
In audio, we have more of the same. Repeating cycles of sound become
coded better and better. However, the "entropy" of audio is detected
only over long periods because sound is by nature a CHANGE in
air-pressure. A tiny burst of sound will not compress well. A complete
song of three minutes will.
The minds of humans and animals "wake up" to alterations in the
environment. So does the Exponential Compression algorithm. Indeed, it
is "alert" to the tiniest change in the data.
Polly the horse is a very wooden actress - because she is a wooden
horse. You can see on my page how a tiny change in scale, brought about
by the commencement of the zoom, causes the image (of "leading-edge
differals") suddenly to portray the "nimbus" of a horse. Polly was
"seen" because she moved.
Chomsky's "Universal Grammar" (sets of data that have been labelled in
readiness to receive a name) is also there. The machine detects a sine
wave unaided, without knowing anything about sines - but the sines must
be phase-coherent for this to happen.
This then is the future.
There are other aspects of compression. Degrading compression was
touched upon, with the introduction of "laxity" of coding. This is
because ALL sampling systems are lax. Sound, for example, is an
"infinite-bit" data set with the Brownian motion of the atoms being
part of the live sound. We cannot hear the Brownian motion, so eight,
twelve, fourteen or sixteen bits might be used when the audio is
mastered, and the Brownian motion is lost forever.
Who cares? The "Golden Ear" self-styled experts may care, because they
believe that they can hear the motion of the atoms as well as the noise
caused by the mains plug not being gold-plated. Sensible people,
however, know that there is a limit to human perception. So the system
is made as lax as necessary to achieve good compression - but not lax
enough for the mind to perceive any error.
This brings in PSYCHLGY. Psychology is not psychiatry. It is the
science of the working of the mind. As our minds generalise data, and
discard the superfluous, our compression systems can do this. That is
the basis of JPEG/MPEG. It is a psychological trich based upon the
experience gained with coloured television.
We never had colour television. It was always black-and-white that had
been coloured-in with a smear of colour. The "Golden Ears" obviously
had no golden eyes. I never heard anybody complain!
I am still pondering the question of advances in the fields of
degrading compression. For now, my Exponential Compression (nickname
"Fibonacci") will have to suffice.
I have not as yet finalised an Internet standard for this system. Part
of that is due to the need for "streaming" and for the future provision
of degrading extensions. I had also been working on a built-in language
for animation.
However, you will have learned that the future, if not yet here, is at
least on its way.
Charles Douglas Wehner