|
|
PARSING AS INFORMATIONCOMPRESSION BY MULTIPLEALIGNMENT, UNIFICATION ANDSEARCH: EXAMPLESJ Gerard WolffFebruary 1998School of Electronic Engineering and Computer Systems, University of Wales, DeanStreet, Bangor, LL57 1UT, UK. Telephone: +44 1248 382691. E-mail:gerry@sees.bangor.ac.uk. Fax: +44 1248 361429. Web: http://www.sees.bangor.ac.uk/~gerry/.TABLE OF CONTENTSAbstract
1 INTRODUCTION
4.2 Nesting of Discontinuous Dependencies 4.3 Variability of Constituents
4.3.2 Figure 9 (b) 4.3.3 Figure 9 (c)
4.4.2 The Number Marking of the 'Head' of a Structure may be Used Instead of a Number Instead of a Number Marking for the Whole Structure. 4.4.3 Any Pattern of Two or More Symbols May be Constructed from Binary Patterns. 4.4.4 Binary Dependencies can Accommodate Options in a Flexible Manner.
5.2 ICMAUS and English Auxiliary Verbs
5.2.2 The Secondary Constraints. 7 CONCLUSION
7.2 Linguistic Intuition and Information Compression 7.3 Further Development and Generalisation References AbstractThis article presents and discusses examples illustrating aspects of the proposition, described in the accompanying article (Wolff, 1998), that parsing may be understood as information compression by multiple alignment, unification and search (ICMAUS). The later examples show that the multiple alignment framework as described in the accompanying article has expressive power which is comparable with other 'context sensitive' systems used to represent the syntax of natural languages. In all the examples, the SP52 model, described in the accompanying article, is capable of finding an alignment which is intuitively 'correct' and assigning to it a 'compression score' which is higher than for any other alignment. The congruance which has been found between this range of alignments produced by a system which is dedicated to information compression and what is judged to be 'correct' in terms of linguistic intuition lends support to the hypothesis that linguistic intuition is itself a product of psychological processes of information compression. One example shows how, in cases of ambiguity, the model is capable of finding two or more 'good' parsings for a given input, corresponding to the alternative readings of the input, and with compression scores which are higher than the scores of any other alignments which has been formed. The model can also accommodate disambiguating context in an appropriate manner. A second example shows how the phenomenon of recursion in natural languages can be accommodated in the ICMAUS framework. Other examples show how 'discontinuous dependencies' in syntax may be expressed in a manner which is, arguably, simpler and more direct than in other systems. Discontinuous dependencies which are nested one within another can be accommodated as can discontinous dependencies which overlap each other. Examples are presented showing how the interesting relationship between primary structure and secondary constraints in the syntax of English auxiliary verbs may be expressed in the ICMAUS framework. 'Cross-serial dependencies' is a form which appears in Swiss German and Dutch. Although this form cannot easily be expressed as a context-free phrase-structure grammar (without augmentation) it maps into the multiple alignment framework in a straightforward manner. An example is presented showing how this form may be parsed successfully by the SP52 model. The full range of examples suggest that there is sufficient promise in these ideas to justify further exploration and development.
1 INTRODUCTIONThe accompanying article (Wolff, 1998) describes how parsing may be understood as information compression by multiple alignment, unification and search (ICMAUS) and describes a software model (SP52) which embodies these ideas, with some simple examples to show what it can do. This article presents a selection of other examples which are more realistic and which illustrate aspects of parsing such as ambiguity in the sentence (or other material) being parsed (including the effect of disambiguating context), recursion in syntax, discontinous dependencies in syntax (including nested dependencies and overlapping dependencies), the combination of primary structure and secondary constraints in the syntax of English auxiliary verbs, and 'cross-serial dependencies' which occur in languages like Swiss German and Dutch. These later examples show that the multiple alignment framework as described in the accompanying article has expressive power which is comparable with other 'context sensitive' systems used to represent the syntax of natural languages. In all these areas, the SP52 model is capable of delivering alignments which correspond with our intuitions about what a 'correct' parsing should be and which are identified by the model as the 'best' out of the alternative alignments because, in each case, the 'best' alignment has a higher compression score (CS) than any of the alternative alignments for the same input sequence and grammar. All the alignments shown in this article are actual output of the SP52 model. The congruance which has been found between this range of alignments produced by a system which is dedicated to information compression and what is judged to be 'correct' in terms of linguistic intuition lends support to the hypothesis that linguistic intuition is itself a product of psychological processes of information compression. In this article, readers will see that, within the ICMAUS framework, there are often two or more alternative techniques for representing any given aspect of linguistic structure. This article only attempts to demonstrates some of the possibilities within the ICMAUS framework. Evaluation of the relative merits of alternatives is a matter for future research.
2 AMBIGUITY IN PARSINGIt should be evident from the description of the SP52 model in the accompanying article that the model is well-adapted to finding alternative parsings of sentences or other input, including cases of 'ambiguity' where two or more of the alternative parsings are equally good or nearly so. On each application of the compress() function (shown in words in Swiss German or Dutch.Figure 5 of the accompanying article), the model creates several alternative new alignments for storage in Old. A natural consequence of this style of processing is that the model normally delivers several alternative parsings of the input, each with its own CS. To confirm that the model can indeed recognise cases of ambiguity, it has been tested with the ambiguous input sequence corresponding to the phoneme sequence1 ' ae i s k r ee m' (which can be read as "ice cream" or "I scream"), together with an appropriate grammar, shown in Figure 1, below.2,3
S 0 NP #NP V #V ADV #ADV #S 100 S 1 NP #NP VB #VB A #A #S 200 NP 0 w ee #NP 100 NP 1 ae i #NP 50 NP 2 A #A N #N #NP 150 A 0 ae i s #A 100 A 1 h o t #A 80 A 2 k o l d #A 70 N 0 k r ee m #N 30 N 1 m i l k #N 20 V 0 s k r ee m #V 150 V 1 sh ae w t #V 50 VB 0 i z #VB 200 ADV 0 l ae w d l i #ADV 40 ADV 1 k w ae i e t l i #ADV 60 As expected, the program discovers the two 'correct' parsings of this pattern and assigns CSs to them which are close in value to each other and higher than any others. These two parsings are shown in Figure 2 and Figure 3.4 ae i s k r ee m | | | | | | | A 0 ae i s #A | | | | | | | | | | | | N 0 k r ee m #N | | | | NP 2 A #A N #N #NP (a) ae i s k r ee m | | | | | | | NP 1 ae i #NP | | | | | | | | | | | | | | V 0 s k r ee m #V | | | | S 0 NP #NP V #V ADV #ADV #S (b) As expected, the provision of disambiguating context - as in ' ae i s k r ee m l 0xbeae w d l 0xf5' ("I scream loudly") or ' ae i s k r ee m i z k o l d' ("Ice cream is cold") - has the effect of swinging the CS decisively in favour of one interpretation or the other. In each of these two cases, the program finds the parsing which is correct in terms of our intuitions and assigns it a CS which is substantially higher than for any other parsing. These two 'best' parsings are shown in Figure 3. ae i s k r ee m l ae w d l i | | | | | | | | | | | | | NP 1 ae i #NP | | | | | | | | | | | | | | | | | | | | | | | | | | V 0 s k r ee m #V | | | | | | | | | | | | | | | | | | | | ADV 0 l ae w d l i #ADV | | | | | | S 0 NP #NP V #V ADV #ADV #S (a) ae i s k r ee m i z k o l d | | | | | | | | | | | | | | | | | | | | VB 0 i z #VB | | | | | | | | | | | | | | | | | | | | | | | | | | A 2 k o l d #A | | | | | | | | | | | S 1 NP | | | | | | | #NP VB #VB A #A #S | | | | | | | | | | A 0 ae i s #A | | | | | | | | | | | | | | | | N 0 k r ee m #N | | | | | | | NP 2 A #A N #N #NP (b) 1. The symbols used are an alphabetic adaptation of the normal phoneme symbols. This adaptation was adopted to facilitate processing by the SP52 model and has been retained here so that the actual alignments produced by the program could be used in Figures 2 to 4. 2. As with the grammar shown in Figure 4 of the accompanying article, all the grammars shown in this article (including Figure 1) show a number to the right of each pattern which is a notional frequency of occurrence of that pattern in an imaginary sample of text. 3. As was noted in the accompanying article, all the examples in these two articles are quite small - for the sake of clarity and to save space - and this means that many features of English cannot be accommodated in the grammars. However, for reasons given in Section 5.4 of the accompanying article, it appears that the ICMAUS approach to parsing may be applied with realistically large grammars and longer sentences without creating demands for processing time or storage space which are beyond the bounds of practicality. 4. As was noted in Section 3.2 of the accompanying article, the row in which any pattern appears in any of the alignments shown in these two articles is arbitrary except for the convention that New (the sentence or other pattern being parsed) is always shown at the top.
3 RECURSIVE STRUCTURESRecursion is a prominent feature of natural languages, illustrated classically by the traditional nursery rhyme The House that Jack Built whose last verse begins: This is the farmer sowing his corn, That kept the cock that crowed in the morn, That waked the priest all shaven and shorn, That married the man all tattered and torn, ... and so on.5 In all the diverse manifestations of recursion (so brilliantly described by Douglas Hofstadter (1979)), the key feature is that there is at least one structure which contains a reference to itself, either immediately or at some lower 'level' within its (hierarchically organised) constituents. Figure 4 shows an SP grammar for a fragment of English where the second pattern ' S 1 PN #PN V #V DPN #DPN S #S #S' contains a reference to itself near the end of the pattern via the left and right boundary symbols ' S #S'. S 0 PN #PN V #V ADV #ADV #S 1000 S 1 PN #PN V #V DPN #DPN S #S #S 700 DPN t h a t #DPN 300 PN 0 w e #PN 400 PN 1 y o u #PN 700 PN 2 h e #PN 350 PN 3 i t #PN 250 V 0 s a y s #V 200 V 1 s a y #V 210 V 2 s a i d #V 300 V 3 t h i n k s #V 250 V 4 t h i n k #V 200 V 5 g o e s #V 300 V 6 g o #V 240 ADV 0 f a s t #ADV 400 ADV 1 a w a y #ADV 250 ADV 2 l a t e r #ADV 350Figure 4 Figure 5 shows how the recursive sentence "We think he said that it goes fast" may be parsed by multiple alignment, using the grammar shown in Figure 4. The fact that any pattern in the grammar may appear one or more times in an alignment means that the second pattern in the grammar may provide a framework for the whole sentence (at the bottom of the alignment) and may also provide a framework for the embedded sentence "he said that ...". Within this second sentence is the sentence "it goes fast" which is modelled on the pattern in the first line in the grammar. w e t h i n k h e s a i d t h a t i t g o e s f a s t | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PN 3 i t #PN | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | V 5 g o e s #V | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ADV 0 f a s t #ADV | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | S 0 PN #PN V #V ADV #ADV #S | | | | | | | | | | | | | | | | | | | | | | | | | | PN 2 h e #PN | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | V 2 s a i d #V | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPN t h a t #DPN | | | | | | | | | | | | | | | | | | | | | | | | S 1 PN #PN V #V DPN #DPN S #S #S | | | | | | | | | PN 0 w e #PN | | | | | | | | | | | | | | | | | | V 4 t h i n k #V | | | | | | | | S 1 PN #PN V #V DPN #DPN S #S #S 5. From Mother Goose Nursery Rhymes, London: Heinemann, 1994. 6. In this figure and later ones in this article, readers may appreciate that parsings represented as alignments often take more space than more conventional kinds of representation. I hope that readers will appreciate the theoretical and practical value of understanding parsing as multiple alignment without being distracted unduly by the humdrum problem of representing large alignments within the confines of normal-sized pages.
4 PARSING WITH A 'CONTEXT SENSITIVE' GRAMMAR: DISCONTINUOUSDEPENDENCIES IN SYNTAXContext-free phrase-structure grammars (CF-PSGs) like the one shown in words in Swiss German or Dutch.Figure 2 of the accompanying article are quite adequate for representing the structure of simple sub-sets of a natural language but, since Chomsky's Syntactic Structures (Chomsky, 1957), it has been known that CF-PSGs are not adequate to represent the full complexity of natural languages, except at the cost of large amounts of redundancy in the representation. CF-PSGs cannot, in a succinct manner, represent 'discontinuous dependencies' (DDs) in syntax such as number dependency (singular or plural) between the subject of a sentence and the main verb (in English, for example) and gender dependencies throughout a sentence (in French, for example). The key point is that these kinds of dependencies can bridge arbitrarily large amounts of intervening structure. However, solutions to the problem of representing DDs in a succinct manner are provided by Transformational Grammars (TGs, Chomsky (1957)), Definite Clause Grammars (DCGs, Pereira and Warren (1980)), and others (see Gazdar and Mellish (1989)). The similarity between the grammar in Figure 2 of the accompanying article and the set of patterns in words in Swiss German or Dutch.words in Swiss German or Dutch.words in Swiss German or Dutch.Figure 3 of that article might suggest that grammars in the form of patterns suffer the same shortcomings as CF-PSGs. The suggestion here is that, given an appropriate system for finding 'good' alignments amongst patterns, it is possible to represent DDs in syntax in a succinct manner and, arguably, that the corresponding representations can be simpler and more 'direct' than can be achieved with TGs, DCGs or other existing systems with sufficient 'power' to represent DDs efficiently.
4.1 An ExampleConsider the grammar shown in Figure 6, below. In this grammar, the dependency between a SNG (singular) noun phrase at the beginning of a sentence and a SNG verb following is expressed with the pattern ' S NP SNG ; #NP QL #QL V SNG #V #S'. Likewise, plural dependencies are expressed with the pattern ' S NP PL ; #NP QL #QL V PL #V #S'.7,8 These dependencies bridge the qualifying structure (' QL #QL') and this structure can be arbitrarily large. S NP SNG ; #NP QL #QL V SNG #V #S 1000 S NP PL ; #NP QL #QL V PL #V #S 700 NP SNG ; 0 D SNG : #D A SNG : #A N SNG #N #NP 900 NP SNG ; 1 PN SNG #PN #NP 500 NP PL ; 0 D PL : #D A PL : #A N PL #N #NP 600 NP PL ; 1 PN PL #PN #NP 300 QL 0 DPN #DPN S #S #QL 200 QL 1 PP #PP A #A #QL 250 QL 2 PP #PP NP #NP #QL 200 D : 0 s o m e #D 150 D : 1 o u r #D 200 D : 2 t h e #D 500 D SNG : 0 o n e #D 100 D SNG : 1 t h i s #D 100 D PL : 0 t h e s e #D 200 D PL : 1 t h o s e #D 250 N SNG 0 NR #NR #N 500 N SNG 1 m a n #N 200 N SNG 2 J o h n #N 100 N SNG 3 M a r y #N 100 N PL 0 NR #NR s #N 500 N PL 1 m e n #N 100 NR 0 c a r #NR 125 NR 1 r o a d #NR 150 NR 2 h o r s e #NR 150 NR 3 d o g #NR 75 PN SNG 0 i t #PN 250 PN SNG 1 h e #PN 250 PN PL 0 w e #PN 100 PN PL 1 t h e y #PN 75 PN PL 2 t h o s e #PN 125 DPN 0 t h a t #DPN 200 DPN 1 w h i c h #DPN 150 PP 0 i n #PP 100 PP 1 w i t h #PP 220 PP 2 o f #PP 130 A : 0 r e d #A 150 A : 1 b l u e #A 250 A : 2 g r e e n #A 125 A SNG : o n e #A 100 A PL : 0 s e v e r a l #A 200 A PL : 1 m a n y #A 250 V SNG 0 VR #VR s #V 1025 V SNG 1 g o e s #V 175 V PL 0 VR #VR #V 700 V PL 1 g o #V 150 VR 0 w i n #VR 450 VR 1 r u n #VR 475 VR 2 l i k e #VR 350 VR 3 g a l l o p #VR 450 VR 4 j u m p #VR 250Figure 6 Given this grammar, a sentence like ' t h o s e i n g r e e n w i n' may be aligned with patterns in the grammar as shown in Figure 7. Given this alignment, the sentence may be specified completely with the sequence of symbols ' S PL 1 2 1 2 0 #S'. In this coded representation of the sentence, ' PL' selects the plural sentence pattern (' S NP PL ; #NP QL #QL V PL #V #S' ) which ensures that a PL noun-phrase is selected and that a PL verb is selected too regardless of the intervening structure, ' QL # QL', however small or large that structure may be. t h o s e i n g r e e n w i n | | | | | | | | | | | | | | | PN PL 2 t h o s e #PN | | | | | | | | | | | | | | | | | | | | | | | NP PL ; 1 PN PL #PN #NP | | | | | | | | | | | | | | | | | | | | | | | | | | | | PP 0 i n #PP | | | | | | | | | | | | | | | | | | | | | | | | | | | | A : 2 g r e e n #A | | | | | | | | | | | | | | | | | | QL 1 PP #PP A #A #QL | | | | | | | | | | | | | | | | | | VR 0 w i n #VR | | | | | | | | | | | | | | V PL 0 VR #VR #V | | | | | | | | | S NP PL ; #NP QL #QL V PL #V #S Notice that this alignment yields more compression than would be possible if the ' PL' markers were omitted from the pattern ' S NP PL ; #NP QL #QL V PL #V #S' and from the parsing. In this case, the sentence would be encoded with the symbols ' S PL 1 2 1 2 PL 0 #S' because the number value of the verb would have to be specified independently of the number value of the subject noun-phrase. This second encoding of the sentence contains one more symbol than the encoding which is possible when 'PL' markers for the subject noun-phrase and the main verb are included in the sentence pattern - and is correspondingly less economical.
4.2 Nesting of Discontinuous DependenciesA possible snag with the method just proposed for marking discontinuous dependencies in syntax is that it might fail to discriminate between one set of dependencies and another when two (or more) sets of dependencies are embedded, one within another. If, for example, a plural dependency were nested within a plural dependency (schematically, (PL (PL PL) PL))) the method might interpret this as a plural dependency followed by a plural dependency - ((PL PL)(PL PL)) - or some other grouping. The example in Figure 7 only shows one set of dependencies and does not throw light on this issue. However, the alignment in Figure 8 (a) shows one plural dependency nested within another and confirms that the dependency within the main structure ('those ... win') is separated quite clearly from the dependency within the subordinate clause ('... that we like ...') because the main structure is modelled on one sentence pattern in which one dependency is embedded and the subordinate clause contains another sentence pattern containing its own dependency between a plural subject and a plural verb. The alignment in Figure 8 (b) confirms, as one would expect, that a singular dependency (in the subordinate clause '... that he likes ...') can be embedded within a plural dependency ('those ... win') without risk of confusion between the two dependencies. t h o s e t h a t w e l i k e w i n | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VR 2 l i k e #VR | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | V PL 0 VR #VR #V | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NP PL ; 1 w e #NP | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | S NP PL ; #NP QL #QL V PL #V #S | | | | | | | | | | | | | | | | | | | | | | DPN 0 t h a t #DPN | | | | | | | | | | | | | | | | | | | | | | QL 0 DPN #DPN S #S #QL | | | | | | | | | | | | | NP PL ; 3 t h o s e #NP | | | | | | | | | | | | | | S NP PL ; #NP QL #QL V PL | | | #V #S | | | | | | | | VR 0 w i n #VR | | | | | | V PL 0 VR #VR #V (a) t h o s e t h a t h e l i k e s w i n | | | | | | | | | | | | | | | | | | | PN PL 2 t h o s e #PN | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NP PL ; 1 PN PL #PN #NP | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | DPN 0 t h a t #DPN | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | QL 0 DPN #DPN S | | | | | | | #S #QL | | | | | | | | | | | | | | | | | | | | | S NP PL ; #NP QL | | | | | | | | | #QL V PL | | | #V #S | | | | | | | | | | | | | | | | | | | | | | | | V PL 0 VR | | | #VR #V | | | | | | | | | | | | | | | | | V SNG 0 VR | | | | #VR s #V | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VR 2 l i k e #VR | | | | | | | | | | | | | | | | | | | S NP SNG ; | | #NP QL #QL V SNG #V #S | | | | | | | | | | | | | | | | | | | PN SNG 1 h e #PN | | | | | | | | | | | | | | | | | | NP SNG ; 1 PN SNG #PN #NP | | | | | | | | | | VR 0 w i n #VR (b)
4.3 Variability of ConstituentsRegarding the parsing in Figure 7, readers may object that noun phrases are much more variable than the parsing might suggest: noun phrases in English range from those, like the one in Figure 7, which contain a single word through those in which a singular or plural marking appears on a determiner and a noun (e.g., "those cars") through those containing a determiner, adjective and noun where all three words were marked for number (e.g., "those many cars") to those where.the determiner is not marked for number (e.g., "the", "some") or the adjective is not marked for number (e.g., "red", "large" and most other adjectives) or some other combination (with a noun) of marked or unmarked determiner or adjective, either of which may be omitted. In addition, of course, there are more complicated noun-phrases containing intensifiers (e.g., 'very') which may occur recursively as can adjectives between the determiner and noun. The grammar in Figure 6 accommodates some of this variability as can be seen in the three example parsings in Figure 9. 4.3.1 Figure 9 (a) shows a noun phrase ('this one man') containing a determiner, adjective and noun, all of which are marked as singular. This pattern of words 'selects' the pattern ' NP SNG ; 0 D SNG : #D A SNG : #A N SNG #N #NP' in the grammar in Figure 6 and thus 'selects' the ' SNG' marker for the whole noun phrase (immediately after the first ' NP' symbol). This singular marker for the whole noun phrase aligns with a matching symbol in the pattern for a singular sentence (' S NP SNG ; #NP QL #QL V SNG #V #S'). This means that a singular verb ('runs') provides the best match at the end of the sentence. In this same example parsing, the symbol ' ;' in the sentence pattern and a matching symbol in the noun-phrase pattern are needed to ensure that the ' SNG' symbol in the sentence pattern aligns with the singular marker for the whole noun-phrase, not one of the singular markers for the determiner, adjective or noun. The same effect could have been achieved without the use of the ' ;' symbol by using distinctive versions of the ' SNG' symbol such as ' NPSNG', ' DSNG', ' ASNG' and ' NSNG'. Which of the two styles is to be preferred is a matter for further study. t h i s o n e m a n r u n s | | | | | | | | | | | | | | D SNG : 1 t h i s #D | | | | | | | | | | | | | | | | | | | | | | | | | | | | A SNG : o n e #A | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | N SNG 1 m a n #N | | | | | | | | | | | | | | | | | | | NP SNG ; 0 D SNG : #D A SNG : #A N SNG #N #NP | | | | | | | | | | | | | | | | V SNG 0 VR | | | #VR s #V | | | | | | | | | | | | | | | | | | VR 1 r u n #VR | | | | | | | | S NP SNG ; #NP QL #QL V SNG #V #S (a) t h e s e d o g s j u m p | | | | | | | | | | | | | | | | | | NR 3 d o g #NR | | | | | | | | | | | | | | | | | | | | | | N PL 0 NR #NR s #N | | | | | | | | | | | | | | | | D PL : 0 t h e s e #D | | | | | | | | | | | | | | | | | | NP PL ; 0 D PL : #D A PL : #A N PL #N #NP | | | | | | | | | | | | S NP PL ; #NP QL #QL V PL | | | | #V #S | | | | | | | | | VR 4 j u m p #VR | | | | | | V PL 0 VR #VR #V (b) t h e r e d c a r s g o | | | | | | | | | | | | | | | | | | NR 0 c a r #NR | | | | | | | | | | | | | | | | | | | | N PL 0 NR #NR s #N | | | | | | | | | | | | | D : 2 t h e #D | | | | | | | | | | | | | | | | | | | | | | A : 0 r e d #A | | | | | | | | | | | | | | | | NP PL ; 0 D PL : #D A PL : #A N PL #N #NP | | | | | | | | | | | | V PL 1 g o #V | | | | | | | S NP PL ; #NP QL #QL V PL #V #S (c) 4.3.2 Figure 9 (b) shows an alignment containing a noun phrase ('these dogs') where there is no adjective between the determiner and noun. With the grammar in Figure 6, no special provision is made for the omission of any constituent within a larger structure. If a constituent is missing, the symbols which represent its 'slot' in the larger structure (the symbols ' A PL : #A' in this case) appear in the alignment but nothing is aligned with those symbols. This is not entirely satisfactory as a way of showing that a given constituent is optional within a larger structure because it would allow the non-optional noun which constitutes the 'head' of the noun-phrase to be omitted in just the same way as the determiner or the adjective. The rules governing when a constituent of a noun phrase is optional and when it is not are surprisingly complicated. For example, it is acceptable to form a plural noun phrase with a (plural) noun and without a determiner or adjective (e.g., "Dogs jump", "We like dogs") but with a singular noun phrase, there must be a determiner ("The dog jumps" is acceptable but "Dog jumps" is not). One way to show where constituents are optional in a structure like a noun-phrase and where they are not is to provide a family of patterns covering the range of possible noun-phrases. The fact that all members of the family would contain a slot for a noun or pronoun but not all of them would contain slots for determiners or adjectives would accommodate the fact that the head noun is compulsory but the other constituents may not be. The fact that the determiner is compulsory for singular noun phrases but not for plural ones may be accommodated in the family of noun-phrase patterns by the inclusion of a plural pattern or patterns without the determiner but the omission from the family of noun-phrase patterns of any corresponding singular noun-phrase patterns. 4.3.3 Figure 9 (c) shows an alignment containing a noun phrase ("the red cars") where the determiner is not marked as singular or plural and neither is the adjective. This is where the ' :' symbol plays its part by allowing ' D : 2 t h e #D' to be aligned with the symbols ' D PL : #D' within the pattern ' NP PL ; 0 D PL : #D A PL : #A N PL #N #NP'. If the ' :' symbol were omitted from either or both of the pattern for 'the' or the pattern for the plural noun phrase, there would be ambiguity about the relative positions, left to right, of the symbols ' 2 t h e' in the pattern ' D : 2 t h e #D' and the second instance of the symbol ' PL' in the noun phrase pattern.
4.4 An Alternative Technique for Marking Dependencies in SyntaxThis sub-section describes a second way of marking dependencies in syntax (discontinous or otherwise), illustrated by the grammar in Figure 10 and the alignment in Figure 11. Four features of the example are discussed in the sub-sections that follow. S NP #NP V #V #S 1200 NP D #D A #A N #N #NP 1200 D 0 t h e #D 175 D 1 s o m e #D 125 D 2 o u r #D 100 D DSNG 0 o n e #D 300 D DSNG 1 t h i s #D 200 D DPL 0 t h e s e #D 200 D DPL 1 t h o s e #D 100 N NSNG 0 NR #NR #N 400 N NSNG 1 m a n #N 150 N NSNG 2 J o h n #N 50 N NSNG 3 M a r y #N 100 N NPL 0 NR #NR s #N 400 N NPL 1 m e n #N 100 NR 0 c a r #NR 200 NR 1 r o a d #NR 200 A 0 r e d #A 125 A 1 b l u e #A 200 A 2 g r e e n #A 75 A ASNG o n e #A 100 A APL t w o #A 120 A APL 0 s e v e r a l #A 80 A APL 1 m a n y #A 150 V VSNG g o e s #V 700 V VPL g o #V 500 DSNG NSNG 700 ASNG NSNG 100 DPL NPL 500 APL NPL 350 NSNG VSNG 700 NPL VPL 500Figure 10 t h o s e t w o c a r s g o | | | | | | | | | | | | | | D DPL 1 t h o s e #D | | | | | | | | | | | | | | | | | | | | | | | | A APL t w o #A | | | | | | | | | | | | | | | | | | NP D | #D A | #A N | | | | #N #NP | | | | | | | | | | | | | | | | | | NR 0 c a r #NR | | | | | | | | | | | | | | | | | | | N NPL 0 NR #NR s #N | | | | | | | | | | | | | | | V VPL g o #V | | | | | | | | S NP | | | #NP V | #V #S | | | | | | NPL VPL | | | DPL | NPL | | APL NPL 4.4.1 Patterns of Dependency can be Separated from Basic Syntactic Patterns. In the grammar and the figure, readers will see that the sentence pattern (' S NP #NP V #V #S') and the pattern for noun phrases (' NP D #D A #A N #N #NP') do not contain any markers for number (singular or plural) but that there are six small patterns at the bottom of the grammar which do express these dependencies. Some of these patterns appear in the figure, linking words and their number markings in an appropriate manner: the plural determiner is linked to the plural noun, the plural adjective is also linked to the plural noun, and the plural noun is linked to the plural verb. This manner of marking dependencies in the noun phrase and in the sentence has a pleasing simplicity and clarity, but it may not be applicable in all situations. For example, it looks as if this manner of marking dependencies might fail in cases like the one discussed in Section 4.2 where one set of dependencies is nested inside another. Preliminary experiments in this area suggest that this kind of technique may be used where there are nested dependencies, provided the nesting is marked in the patterns which record the dependencies. For example, the pattern ' NPL VPL' in Figure 10 may be modified to become ' NPL S #S VPL' (and likewise for ' NSNG VSNG)'. This modification of the pattern for plural dependencies means that, if one sentence is nested within another, and if the symbols at each end of the inner sentence are aligned with ' S #S' in ' NPL S #S VPL', then mis-alignments of ' NPL' and ' VPL' with corresponding symbols in the outer sentence cannot easily occur. 4.4.2 The Number Marking of the 'Head' of a Structure may be Used Instead of a Number Marking for the Whole Structure. Readers will have noticed that, by contrast with the alignments in Figure 9, the alignment in Figure 11 does not have a number marking for the whole of the noun phrase. Instead, the alignment takes advantage of the fact that every noun phrase has a 'head' noun (or pronoun) and that the number marking of the head is the same as the number marking of the whole structure. Thus, the number marking for the whole structure may be omitted and the number marking of the head word (plural for ' d o g s' in Figure 11) may be used instead. 4.4.3 Any Pattern of Two or More Symbols May be Constructed from Binary Patterns. In Figure 6 and Figure 9 (a), the singular noun phrase pattern (' NP SNG ; 0 D SNG : #D A SNG : #A N SNG #N #NP') contains a three-way number dependency which is, in effect ' DSNG ASNG NSNG'. Likewise for the number dependency in the plural noun phrase in Figure 6 and in Figure 9 (b) and (c). By contrast with these three-way dependencies, the grammar in Figure 10 and the alignment in Figure 11 achieve the effect of a three-way dependency using patterns which each contain only two symbols. Because of the 'dominant' status in the noun phrase of the 'head' noun, it seemed appropriate in the grammar in Figure 10 and the alignment in Figure 11, to link the determiner to the head noun and the adjective to the head noun rather than link the determiner to the adjective and the adjective to the noun. There is another benefit of this arrangement, discussed next. 4.4.4 Binary Dependencies can Accommodate Options in a Flexible Manner. Choosing the first of the two possibilities just described has the advantage that it can show where constituents are optional and where they are not as discussed in Section 4.3.2: if either the determiner or the adjective is missing then the corresponding dependency with the head noun would be missing too. The second of the two options mentioned in the previous paragraph would fail if the adjective were missing - because the middle link in the chain between the determiner and the noun would be broken and so the dependency between the determiner and the noun could not be shown.
4.5 Discontinous Dependencies which Overlap Each OtherIn the French sentence Les plumes sont vertes ("The feathers are green") there are two sets of overlapping syntactic dependencies as shown here: When the 'subject' of this sentence (Les plumes) is plural then the determiner (Les) must have the plural form, the noun (plume) must have a plural suffix (s), the verb (sont) must be plural, and the adjective (vert) must have a plural suffix (s). Likewise, the choice of a feminine noun (plume) means that the adjective (vert) must have a feminine suffix (e). Figure 12 shows a fragment of French grammar expressed in the same manner as in Figure 10. Much of the discussion in Section 4.4 applies to the grammar and parsings shown in this section. S NP #NP VP #VP #S 500 NP D #D N #N #NP 700 VP 0 V #V A #A #VP 300 VP 1 V #V P #P NP #NP #VP 200 P 0 s u r #P 50 P 1 s o u s #P 150 V VSNG e s t #V 250 V VPL s o n t #V 250 D DSNG DM 0 l e #D 90 D DSNG DM 1 u n #D 120 D DSNG DF 0 l a #D 130 D DSNG DF 1 u n e #D 110 D DPL 0 l e s #D 125 D DPL 1 d e s #D 125 N NSNG NR #NR #N 450 N NPL NR #NR s #N 250 NR NM p a p i e r #NR 300 NR NF p l u m e #NR 400 A ASNG AM AR #AR #A 300 A ASNG AF AR #AR e #A 300 A APL AM AR #AR s #A 300 A APL AF AR #AR e s #A 300 AR 0 n o i r #AR 100 AR 1 v e r t #AR 200 DSNG NSNG 450 DPL NPL 250 DM NM 210 DF NF 240 VSNG ASNG 600 VPL APL 600 NSNG VSNG 550 NPL VPL 250 NM V #V AM 300 NF V #V AF 400Figure 12 The alignment in Figure 13 (a) shows how the French sentence, above, is parsed in terms of the grammar: the main constituents of the sentence are marked in an appropriate manner and dependencies for number and gender are marked by patterns appearing towards the bottom of the alignment. l e s p l u m e s s o n t v e r t e s | | | | | | | | | | | | | | | | | | | | | | N NPL NR | | | | | #NR s #N | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NR NF p l u m e #NR | | | | | | | | | | | | | | | | | | | | | | | | | | | | D DPL 0 l e s #D | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NP D | #D N | | #N #NP | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | V VPL s o n t #V | | | | | | | | | | | | | | | | | | | | | | | | | VP 0 V | #V A | | | | | | #A #VP | | | | | | | | | | | | | | | | | | | | | | | | | | | A APL AF AR | | | | #AR e s #A | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | AR 1 v e r t #AR | | | | | | | | | | | | | S NP | | | #NP VP | | | | | #VP #S | | | | | | | | | NPL | | VPL | | | | | | | | | | | DPL NPL | | | | | | | | | | | | | | VPL | APL | | | | | NF V #V AF (a) l a p l u m e e s t s u r l e s p a p i e r s | | | | | | | | | | | | | | | | | | | | | | | | | NR NF p l u m e #NR | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | N NSNG NR | #NR #N | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | D DSNG DF 0 l a #D | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NP D | | #D N | | #N #NP | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | V VSNG e s t #V | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | P 0 s u r #P | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | VP 1 V | #V P #P NP | | | | | | | | | | #NP #VP | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | D DPL 0 l e s #D | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NP D | #D N | | | | | | | #N #NP | | | | | | | | | | | | | | | | | | | | S NP | | | | #NP VP | | | | | | | | | | | #VP #S | | | | | | | | | | | | | | | | | | | | | N NPL NR | | | | | | #NR s #N | | | | | | | | | | | | | | | | | | | | | | NR NM p a p i e r #NR | | | | | | | | | NSNG | VSNG | | | | | | | | | DF | NF | | | | | | | | DPL NPL | | DSNG NSNG (b)
Using only binary dependencies, the plural determiner is linked to the plural noun, the plural noun is linked to the plural verb and this is linked to the plural adjective. Quite independently of this pattern of inter-linked binary dependencies for number, the gender dependency between the feminine noun and the feminine adjective is marked with the pattern ' NF V #V AF'. Why are the symbols ' V #V' included in the pattern ' NF V #V AF'? In this example, the inclusion of these two symbols is not strictly necessary. But if the grammar were augmented slightly to accommodate the fact that, in French, an adjective within a noun phrase follows the noun (e.g., Les plumes vertes sont sur la table ("The green feathers are on the table")), then the symbols ' V #V' within the pattern ' NF V #V AF' would be necessary to show that this particular dependency requires a verb to intervene between the noun and the adjective (cf discussion in Section 4.4.1 of how discontinuous dependencies for number may be expressed when one sentence is embedded within another). This example shows how overlapping patterns of dependency can be accommodated within the ICMAUS framework. Of course, these kinds of dependencies can be expressed quite well using other methods. However, work to date suggests that the ICMAUS framework may allow these kinds of dependency to be expressed with a pleasing simplicity and clarity compared with other methods. 7. Since, for reasons given earlier, the grammar in Figure 6 is quite small, the simplifying assumption has been made, contrary to fact, that the form of singular verbs does not depend on the relevant 'person' ('I', 'thou', 'he', 'she'. 'it') and likewise for plural verbs. Similar simplifying assumptions have been made in subsequent grammars in this article. 8. Readers may be puzzled by the inclusion in the grammar of 'punctuation' symbols like ' ;' and ' :'. The reasons for including these symbols in the grammar are explained in Section 4.3.
5 DEPENDENCIES IN THE SYNTAX OF ENGLISH AUXILIARY VERBSThis section presents a grammar and examples showing how the syntax of English auxiliary verbs may be described in the ICMAUS framework. Before the grammar and examples are presented, the syntax of this part of English is described with words and diagrams and alternative formalisms for describing the syntax are briefly discussed. In English, the syntax for main verbs and the 'auxiliary' verbs which may accompany them follows two quasi-independent patterns of constraint which interact in an interesting way. The primary framework may be expressed with this sequence of symbols,which should be interpreted in the following way:M H B B V,
|
|