Natural Language Processing
Syntax,
parsing and production of natural language in a framework of information
compression by multiple alignment, unification and search
Journal of Universal Computer Science 6 (8), 781-829, 2000 (from
2007-01-01, this journal will be an open-access journal and all papers,
including this one, may be downloaded without any charge),
PDF,
uk.arxiv.org/abs/cs.AI/0307014.
As its title suggests, this article describes how the syntax of natural language and the parsing and production of natural language may be cast in the
SP framework.
Despite the simplicity of the
concept of 'pattern' in this context, the SP framework gives it the 'power' of a context-sensitive grammar. Discontinuous dependencies can be represented in a manner which is arguably simpler and more transparent than in other systems.
An interesting feature of the SP framework is that, in exactly the same form in the two cases, it can be used for both parsing and production of language.
This article is a condensed version of the two articles that are described next.
Parsing as information compression by multiple alignment, unification and search: SP52.
SEECS Report, February 1998. HTML (some diagrams may be too wide for printing).
Parsing as information compression by multiple alignment, unification and search: examples.
SEECS Report, February 1998. HTML
(some diagrams may be too wide for printing).
These two articles, together, are a fuller version of the article described immediately above.
The first article describes the theoretical
framework and the SP52 model in which the theory is embodied.
The second article gives examples of what the SP52 model can do.
The examples show how this framework
can accommodate ambiguity in parsing and recursion in syntax. The framework
allows 'discontinuous dependencies' (DDs) in syntax to be accommodated in a relatively
simple and 'direct' manner (this includes DDs that are nested one within another and DDs which
overlap each other).
Examples are included showing how the multiple alignment framework can accommodate
the interesting interaction between primary structure and secondary constraints in English auxiliary verbs.
There is also an example showing how the multiple alignment framework can model the kind of
'cross-serial dependencies' in syntax which are found in Swiss German and Dutch.
|