|
|
ALGORITHMIC INFORMATION THEORY AND MINIMUM LENGTH ENCODING
A key idea in AIT is that a body of information is not random if it
can be generated by a computer program which is smaller than that
body of information.
This elegant idea makes a connection between computing and information
compression and this suggests, on a superficial reading, that there is
no difference between AIT and the SP theory.
But workers in AIT have not suggested that 'computing' in
itself might be understood as information compression. There is no
suggestion that the concept of an 'algorithm' might be understood in
those terms (see, for example, Uspensky, V. A., J Symbolic Logic
57(2), 385-412, 1992).
It is not very surprising, therefore, that AIT does not provide any
resolution of the apparent conflict between 'computing as compression'
and the obvious fact that computers can be used to de-compress information
and create redundancy.
The way in which 'computing as compression' may be reconciled with
the fact that computers can be used to create redundancy
is discussed in Computing and information compression: a reply and Computing as compression:
an overview of the SP theory and system with other observations on the relationship between the SP theory and AIT.
An example of how compression can achieve the effect of generation is given in Parsing as information compression by multiple alignment, unification and
search: SP52.
Information compression has been a significant theme in research on
inductive inference including grammatical inference under the banner
of Minimum Length Encoding (MLE) pioneered by Solomonoff, Wallace and Boulton,
Rissanen and others. MLE is an umbrella term covering the
closely-related concepts of Minimum Message Length encoding (MML) and
Minimum Description Length encoding (MDL).
The key idea in MLE is that in grammatical inference and related processing, one
should seek to minimise (G + E), where G is the size of the
grammar (in bits or comparable measure) and E is the size of the raw data
from which the grammar is derived, after the raw data has been encoded in terms
of the grammar.
This idea seems to produce a happy compromise between grammars that are, at one
extreme, very small but are inefficient for encoding data and, at the other
extreme, grammars that are very efficient for encoding data but are excessively
large.
This idea provides an explanation of how it is that children can make
grammatical generalisations and eventually correct their erroneous 'over'
generalisations (e.g. "He hitted me", "Look at those mouses")
without apparently needing explicit correction from other people. Both kinds of
generalisation, by definition, have zero frequency in the child's experience but
MLE provides a means of distinguishing 'good' ones' from 'bad' ones without
external supervision of the learning process (see Language
Learning as Compression).
Distinctive features of the SP programme compared with
AIT and MLE
The SP programme of research is based on MLE principles and
is related to concepts in AIT but
it has distinctive features compared with these fields:
- The SP programme seeks to integrate all kinds of computing and
formal reasoning within a framework of information compression. The goal
is broader than it is in other research in AIT or MLE.
- The SP programme is based on the
hypothesis that all kinds of compression may be understood in
terms of information compression by multiple alignment, unification and
search (ICMAUS). In essence this means that all kinds of information
compression is achieved by the unification of matching patterns. All
existing and projected SP models are restricted to ICMAUS mechanisms
and avoid 'arithmetic coding' and related techniques which are used in
some research on MLE.
The restriction has been imposed in the
interests of simplicity in the SP theory. If, as conjectured,
arithmetic may be understood in terms of ICMAUS then arithmetic coding
may also be understood in those terms. But until this has been
demonstrated, the adoption of arithmetic coding or any similar
technique would add unwanted complexity to the SP model.
- In the first point above, the phrase "all kinds of computing"
includes the concept of 'computing' itself in its full generality. Thus
the SP programme hypothesises that other models of computing such the
Turing model or the Post Canonical System may be understood within the
ICMAUS framework. By contrast, research in AIT and MLE accepts the
Turing model (and equivalent models) as the basis of all kinds of
computing.
|