NATURAL LANGUAGE PROCESSING AS INFORMATION COMPRESSION
Methods for representing the syntax of natural language (NL), for parsing NL and producing NL have been
developed in a programme of research developing the 'SP'
conjecture that
All kinds of computing and formal reasoning may usefully be understood as information compression by multiple alignment, unification and search.
Although attention has, so far, been confined to representing
NL syntax and the parsing and production of syntactic structures,
it is anticipated that the concepts are likely to generalise
easily to the integration of semantic structures with syntactic
structures and the processing of semantic structures in conjunction
with syntactic structures.
This approach to processing NL appears to be novel and
offers potential benefits when it is more fully developed:
- A relatively simple and transparent method for representing
syntactic structures in NL including 'context sensitive' features.
- Precisely the same methods can serve for the parsing of
NL and for the production of NL.
These ideas are described in
Syntax, parsing and production of natural language in a framework of
information compression by multiple alignment, unification and search
and also with more detail and more examples in two other unpublished reports:
- The first describes the representation of syntax in the
proposed new framework and shows how the idea of parsing and
production of NL can be achieved. It also describes the SP52
computer model, which embodies these ideas.
- The second presents a range of examples including examples
showing how the system can accommodate ambiguity and recursion in syntax, discontinuous
dependencies and cross-serial dependencies in syntax, and the
interesting inter-relation of primary structure and secondary
constraints in English auxiliary verbs.
The concept has also been applied in an
interpretation of the nature of 'computing', mathematics and logic, in
modelling probabilistic reasoning and in other
areas of computing (see Computing as Compression). |