CognitionResearch.org

Computing

Cognition

Language Learning

Home
Computing
Cognition
Language Learning
Book

PARSING AS INFORMATION

COMPRESSION BY MULTIPLE

ALIGNMENT, UNIFICATION AND

SEARCH: EXAMPLES

J Gerard Wolff

February 1998

School of Electronic Engineering and Computer Systems, University of Wales, Dean

Street, Bangor, LL57 1UT, UK. Telephone: +44 1248 382691. E-mail:

gerry@sees.bangor.ac.uk. Fax: +44 1248 361429. Web: http://

www.sees.bangor.ac.uk/~gerry/.


TABLE OF CONTENTS

Abstract

1 INTRODUCTION
2 AMBIGUITY IN PARSING
3 RECURSIVE STRUCTURES
4 PARSING WITH A 'CONTEXT SENSITIVE' GRAMMAR: DISCONTINUOUS DEPENDENCIES IN SYNTAX

5 DEPENDENCIES IN THE SYNTAX OF ENGLISH AUXILIARY VERBS 6 CROSS SERIAL DEPENDENCIES
7 CONCLUSION
Acknowledgements
References

Abstract

This article presents and discusses examples illustrating aspects of the proposition, described in the accompanying article (Wolff, 1998), that parsing may be understood as information compression by multiple alignment, unification and search (ICMAUS).

The later examples show that the multiple alignment framework as described in the accompanying article has expressive power which is comparable with other 'context sensitive' systems used to represent the syntax of natural languages.

In all the examples, the SP52 model, described in the accompanying article, is capable of finding an alignment which is intuitively 'correct' and assigning to it a 'compression score' which is higher than for any other alignment.

The congruance which has been found between this range of alignments produced by a system which is dedicated to information compression and what is judged to be 'correct' in terms of linguistic intuition lends support to the hypothesis that linguistic intuition is itself a product of psychological processes of information compression.

One example shows how, in cases of ambiguity, the model is capable of finding two or more 'good' parsings for a given input, corresponding to the alternative readings of the input, and with compression scores which are higher than the scores of any other alignments which has been formed. The model can also accommodate disambiguating context in an appropriate manner.

A second example shows how the phenomenon of recursion in natural languages can be accommodated in the ICMAUS framework.

Other examples show how 'discontinuous dependencies' in syntax may be expressed in a manner which is, arguably, simpler and more direct than in other systems. Discontinuous dependencies which are nested one within another can be accommodated as can discontinous dependencies which overlap each other.

Examples are presented showing how the interesting relationship between primary structure and secondary constraints in the syntax of English auxiliary verbs may be expressed in the ICMAUS framework.

'Cross-serial dependencies' is a form which appears in Swiss German and Dutch. Although this form cannot easily be expressed as a context-free phrase-structure grammar (without augmentation) it maps into the multiple alignment framework in a straightforward manner. An example is presented showing how this form may be parsed successfully by the SP52 model.

The full range of examples suggest that there is sufficient promise in these ideas to justify further exploration and development.

1 INTRODUCTION

The accompanying article (Wolff, 1998) describes how parsing may be understood as information compression by multiple alignment, unification and search (ICMAUS) and describes a software model (SP52) which embodies these ideas, with some simple examples to show what it can do.

This article presents a selection of other examples which are more realistic and which illustrate aspects of parsing such as ambiguity in the sentence (or other material) being parsed (including the effect of disambiguating context), recursion in syntax, discontinous dependencies in syntax (including nested dependencies and overlapping dependencies), the combination of primary structure and secondary constraints in the syntax of English auxiliary verbs, and 'cross-serial dependencies' which occur in languages like Swiss German and Dutch.

These later examples show that the multiple alignment framework as described in the accompanying article has expressive power which is comparable with other 'context sensitive' systems used to represent the syntax of natural languages.

In all these areas, the SP52 model is capable of delivering alignments which correspond with our intuitions about what a 'correct' parsing should be and which are identified by the model as the 'best' out of the alternative alignments because, in each case, the 'best' alignment has a higher compression score (CS) than any of the alternative alignments for the same input sequence and grammar. All the alignments shown in this article are actual output of the SP52 model.

The congruance which has been found between this range of alignments produced by a system which is dedicated to information compression and what is judged to be 'correct' in terms of linguistic intuition lends support to the hypothesis that linguistic intuition is itself a product of psychological processes of information compression.

In this article, readers will see that, within the ICMAUS framework, there are often two or more alternative techniques for representing any given aspect of linguistic structure. This article only attempts to demonstrates some of the possibilities within the ICMAUS framework. Evaluation of the relative merits of alternatives is a matter for future research.

2 AMBIGUITY IN PARSING

It should be evident from the description of the SP52 model in the accompanying article that the model is well-adapted to finding alternative parsings of sentences or other input, including cases of 'ambiguity' where two or more of the alternative parsings are equally good or nearly so.

On each application of the compress() function (shown in words in Swiss German or Dutch.Figure 5 of the accompanying article), the model creates several alternative new alignments for storage in Old. A natural consequence of this style of processing is that the model normally delivers several alternative parsings of the input, each with its own CS.

To confirm that the model can indeed recognise cases of ambiguity, it has been tested with the ambiguous input sequence corresponding to the phoneme sequence1 ' ae i s k r ee m' (which can be read as "ice cream" or "I scream"), together with an appropriate grammar, shown in Figure 1, below.2,3

S 0 NP #NP V #V ADV #ADV #S       100
S 1 NP #NP VB #VB A #A #S         200
NP 0 w ee #NP                     100
NP 1 ae i #NP                      50
NP 2 A #A N #N #NP                150
A 0 ae i s #A                     100
A 1 h o t #A                       80
A 2 k o l d #A                     70
N 0 k r ee m #N                    30
N 1 m i l k #N                     20
V 0 s k r ee m #V                 150
V 1 sh ae w t #V                   50
VB 0 i z #VB                      200
ADV 0 l ae w d l i #ADV            40
ADV 1 k w ae i e t l i #ADV        60

Figure 1

A simple grammar for phoneme patterns which allows two main parsings for the phoneme sequence ' ae i s k r ee m'.

As expected, the program discovers the two 'correct' parsings of this pattern and assigns CSs to them which are close in value to each other and higher than any others. These two parsings are shown in Figure 2 and Figure 3.4

         ae i s        k r ee m        
         |  | |        | | |  |        
     A 0 ae i s #A     | | |  |        
     |          |      | | |  |        
     |          |  N 0 k r ee m #N     
     |          |  |            |      
NP 2 A          #A N            #N #NP

(a)

         ae i         s k r ee m                
         |  |         | | | |  |                
    NP 1 ae i #NP     | | | |  |                
    |          |      | | | |  |                
    |          |  V 0 s k r ee m #V             
    |          |  |              |              
S 0 NP        #NP V              #V ADV #ADV #S 

(b)

Figure 2

An alignment showing the two 'best' parsings of the phoneme sequence ' ae i s k r ee m' using the grammar shown in Figure 1.

As expected, the provision of disambiguating context - as in ' ae i s k r ee m l 0xbeae w d l 0xf5' ("I scream loudly") or ' ae i s k r ee m i z k o l d' ("Ice cream is cold") - has the effect of swinging the CS decisively in favour of one interpretation or the other. In each of these two cases, the program finds the parsing which is correct in terms of our intuitions and assigns it a CS which is substantially higher than for any other parsing. These two 'best' parsings are shown in Figure 3.

         ae i         s k r ee m          l ae w d l i         
         |  |         | | | |  |          | |  | | | |         
    NP 1 ae i #NP     | | | |  |          | |  | | | |         
    |          |      | | | |  |          | |  | | | |         
    |          |  V 0 s k r ee m #V       | |  | | | |         
    |          |  |              |        | |  | | | |         
    |          |  |              |  ADV 0 l ae w d l i #ADV    
    |          |  |              |   |                  |      
S 0 NP        #NP V              #V ADV                #ADV #S 

(a)

             ae i s        k r ee m             i z         k o l d       
             |  | |        | | |  |             | |         | | | |       
             |  | |        | | |  |        VB 0 i z #VB     | | | |       
             |  | |        | | |  |        |         |      | | | |       
             |  | |        | | |  |        |         |  A 2 k o l d #A    
             |  | |        | | |  |        |         |  |           |     
S 1 NP       |  | |        | | |  |    #NP VB       #VB A           #A #S 
    |        |  | |        | | |  |     |                                 
    |    A 0 ae i s #A     | | |  |     |                                 
    |    |          |      | | |  |     |                                 
    |    |          |  N 0 k r ee m #N  |                                 
    |    |          |  |            |   |                                 
    NP 2 A          #A N            #N #NP                                

(b)

Figure 3

(a) An alignment showing the 'best' parsing of the phoneme sequence ' ae i s k r ee m' when it is included within the larger sequence ' ae i s k r ee m l 0xbeae w d l 0xf5' - using the grammar shown in Figure 1. (b) An alignment showing the 'best' parsing of the phoneme sequence ' ae i s k r ee m' when it is included within the larger sequence ' ae i s k r ee m i z k o l d' - using the same grammar as for (a).

- -

1. The symbols used are an alphabetic adaptation of the normal phoneme symbols. This adaptation was adopted to facilitate processing by the SP52 model and has been retained here so that the actual alignments produced by the program could be used in Figures 2 to 4.

2. As with the grammar shown in Figure 4 of the accompanying article, all the grammars shown in this article (including Figure 1) show a number to the right of each pattern which is a notional frequency of occurrence of that pattern in an imaginary sample of text.

3. As was noted in the accompanying article, all the examples in these two articles are quite small - for the sake of clarity and to save space - and this means that many features of English cannot be accommodated in the grammars. However, for reasons given in Section 5.4 of the accompanying article, it appears that the ICMAUS approach to parsing may be applied with realistically large grammars and longer sentences without creating demands for processing time or storage space which are beyond the bounds of practicality.

4. As was noted in Section 3.2 of the accompanying article, the row in which any pattern appears in any of the alignments shown in these two articles is arbitrary except for the convention that New (the sentence or other pattern being parsed) is always shown at the top.

3 RECURSIVE STRUCTURES

Recursion is a prominent feature of natural languages, illustrated classically by the traditional nursery rhyme The House that Jack Built whose last verse begins: This is the farmer sowing his corn, That kept the cock that crowed in the morn, That waked the priest all shaven and shorn, That married the man all tattered and torn, ... and so on.5

In all the diverse manifestations of recursion (so brilliantly described by Douglas Hofstadter (1979)), the key feature is that there is at least one structure which contains a reference to itself, either immediately or at some lower 'level' within its (hierarchically organised) constituents.

Figure 4 shows an SP grammar for a fragment of English where the second pattern ' S 1 PN #PN V #V DPN #DPN S #S #S' contains a reference to itself near the end of the pattern via the left and right boundary symbols ' S #S'.

S 0 PN #PN V #V ADV #ADV #S          1000
S 1 PN #PN V #V DPN #DPN S #S #S      700
DPN t h a t #DPN                      300
PN 0 w e #PN                          400
PN 1 y o u #PN                        700
PN 2 h e #PN                          350
PN 3 i t #PN                          250
V 0 s a y s #V                        200
V 1 s a y #V                          210
V 2 s a i d #V                        300
V 3 t h i n k s #V                    250
V 4 t h i n k #V                      200
V 5 g o e s #V                        300
V 6 g o #V                            240
ADV 0 f a s t #ADV                    400
ADV 1 a w a y #ADV                    250
ADV 2 l a t e r #ADV                  350
Figure 4

A fragment of English grammar with recursion.

Figure 5 shows how the recursive sentence "We think he said that it goes fast" may be parsed by multiple alignment, using the grammar shown in Figure 4. The fact that any pattern in the grammar may appear one or more times in an alignment means that the second pattern in the grammar may provide a framework for the whole sentence (at the bottom of the alignment) and may also provide a framework for the embedded sentence "he said that ...". Within this second sentence is the sentence "it goes fast" which is modelled on the pattern in the first line in the grammar.

         w e         t h i n k                      h e         s a i d        t h a t               i t         g o e s          f a s t               
         | |         | | | | |                      | |         | | | |        | | | |               | |         | | | |          | | | |               
         | |         | | | | |                      | |         | | | |        | | | |          PN 3 i t #PN     | | | |          | | | |               
         | |         | | | | |                      | |         | | | |        | | | |          |         |      | | | |          | | | |               
         | |         | | | | |                      | |         | | | |        | | | |          |         |  V 5 g o e s #V       | | | |               
         | |         | | | | |                      | |         | | | |        | | | |          |         |  |           |        | | | |               
         | |         | | | | |                      | |         | | | |        | | | |          |         |  |           |  ADV 0 f a s t #ADV          
         | |         | | | | |                      | |         | | | |        | | | |          |         |  |           |   |             |            
         | |         | | | | |                      | |         | | | |        | | | |      S 0 PN       #PN V           #V ADV           #ADV #S       
         | |         | | | | |                      | |         | | | |        | | | |      |                                                  |        
         | |         | | | | |                 PN 2 h e #PN     | | | |        | | | |      |                                                  |        
         | |         | | | | |                 |         |      | | | |        | | | |      |                                                  |        
         | |         | | | | |                 |         |  V 2 s a i d #V     | | | |      |                                                  |        
         | |         | | | | |                 |         |  |           |      | | | |      |                                                  |        
         | |         | | | | |                 |         |  |           |  DPN t h a t #DPN |                                                  |        
         | |         | | | | |                 |         |  |           |   |           |   |                                                  |        
         | |         | | | | |             S 1 PN       #PN V           #V DPN         #DPN S                                                  #S #S    
         | |         | | | | |             |                                                                                                      |     
    PN 0 w e #PN     | | | | |             |                                                                                                      |     
    |         |      | | | | |             |                                                                                                      |     
    |         |  V 4 t h i n k #V          |                                                                                                      |     
    |         |  |             |           |                                                                                                      |     
S 1 PN       #PN V             #V DPN #DPN S                                                                                                      #S #S 

Figure 5

A parsing by multiple alignment with the grammar shown in Figure 4.6

- -

5. From Mother Goose Nursery Rhymes, London: Heinemann, 1994.

6. In this figure and later ones in this article, readers may appreciate that parsings represented as alignments often take more space than more conventional kinds of representation. I hope that readers will appreciate the theoretical and practical value of understanding parsing as multiple alignment without being distracted unduly by the humdrum problem of representing large alignments within the confines of normal-sized pages.

4 PARSING WITH A 'CONTEXT SENSITIVE' GRAMMAR: DISCONTINUOUS

DEPENDENCIES IN SYNTAX

Context-free phrase-structure grammars (CF-PSGs) like the one shown in words in Swiss German or Dutch.Figure 2 of the accompanying article are quite adequate for representing the structure of simple sub-sets of a natural language but, since Chomsky's Syntactic Structures (Chomsky, 1957), it has been known that CF-PSGs are not adequate to represent the full complexity of natural languages, except at the cost of large amounts of redundancy in the representation.

CF-PSGs cannot, in a succinct manner, represent 'discontinuous dependencies' (DDs) in syntax such as number dependency (singular or plural) between the subject of a sentence and the main verb (in English, for example) and gender dependencies throughout a sentence (in French, for example). The key point is that these kinds of dependencies can bridge arbitrarily large amounts of intervening structure. However, solutions to the problem of representing DDs in a succinct manner are provided by Transformational Grammars (TGs, Chomsky (1957)), Definite Clause Grammars (DCGs, Pereira and Warren (1980)), and others (see Gazdar and Mellish (1989)).

The similarity between the grammar in Figure 2 of the accompanying article and the set of patterns in words in Swiss German or Dutch.words in Swiss German or Dutch.words in Swiss German or Dutch.Figure 3 of that article might suggest that grammars in the form of patterns suffer the same shortcomings as CF-PSGs. The suggestion here is that, given an appropriate system for finding 'good' alignments amongst patterns, it is possible to represent DDs in syntax in a succinct manner and, arguably, that the corresponding representations can be simpler and more 'direct' than can be achieved with TGs, DCGs or other existing systems with sufficient 'power' to represent DDs efficiently.

4.1 An Example

Consider the grammar shown in Figure 6, below. In this grammar, the dependency between a SNG (singular) noun phrase at the beginning of a sentence and a SNG verb following is expressed with the pattern ' S NP SNG ; #NP QL #QL V SNG #V #S'. Likewise, plural dependencies are expressed with the pattern ' S NP PL ; #NP QL #QL V PL #V #S'.7,8 These dependencies bridge the qualifying structure (' QL #QL') and this structure can be arbitrarily large.

S NP SNG ; #NP QL #QL V SNG #V #S                 1000
S NP PL ; #NP QL #QL V PL #V #S                    700
NP SNG ; 0 D SNG : #D A SNG : #A N SNG #N #NP      900
NP SNG ; 1 PN SNG #PN #NP                          500
NP PL ; 0 D PL : #D A PL : #A N PL #N #NP          600
NP PL ; 1 PN PL #PN #NP                            300
QL 0 DPN #DPN S #S #QL                             200
QL 1 PP #PP A #A #QL                               250
QL 2 PP #PP NP #NP #QL                             200
D : 0 s o m e #D                                   150
D : 1 o u r #D                                     200
D : 2 t h e #D                                     500
D SNG : 0 o n e #D                                 100
D SNG : 1 t h i s #D                               100
D PL : 0 t h e s e #D                              200
D PL : 1 t h o s e #D                              250
N SNG 0 NR #NR #N                                  500
N SNG 1 m a n #N                                   200
N SNG 2 J o h n #N                                 100
N SNG 3 M a r y #N                                 100
N PL 0 NR #NR s #N                                 500
N PL 1 m e n #N                                    100
NR 0 c a r #NR                                     125
NR 1 r o a d #NR                                   150
NR 2 h o r s e #NR                                 150
NR 3 d o g #NR                                      75
PN SNG 0 i t #PN                                   250
PN SNG 1 h e #PN                                   250
PN PL 0 w e #PN                                    100
PN PL 1 t h e y #PN                                 75
PN PL 2 t h o s e #PN                              125
DPN 0 t h a t #DPN                                 200
DPN 1 w h i c h #DPN                               150
PP 0 i n #PP                                       100
PP 1 w i t h #PP                                   220
PP 2 o f #PP                                       130
A : 0 r e d #A                                     150
A : 1 b l u e #A                                   250
A : 2 g r e e n #A                                 125
A SNG : o n e #A                                   100
A PL : 0 s e v e r a l #A                          200
A PL : 1 m a n y #A                                250
V SNG 0 VR #VR s #V                               1025
V SNG 1 g o e s #V                                 175
V PL 0 VR #VR #V                                   700
V PL 1 g o #V                                      150
VR 0 w i n #VR                                     450
VR 1 r u n #VR                                     475
VR 2 l i k e #VR                                   350
VR 3 g a l l o p #VR                               450
VR 4 j u m p #VR                                   250
Figure 6

A grammar in which discontinuous number dependencies in a sentence are expressed with the patterns ' S NP SNG ; #NP QL #QL V SNG #V #S' and ' S NP PL ; #NP QL #QL V PL #V #S'.

Given this grammar, a sentence like ' t h o s e i n g r e e n w i n' may be aligned with patterns in the grammar as shown in Figure 7. Given this alignment, the sentence may be specified completely with the sequence of symbols ' S PL 1 2 1 2 0 #S'. In this coded representation of the sentence, ' PL' selects the plural sentence pattern (' S NP PL ; #NP QL #QL V PL #V #S' ) which ensures that a PL noun-phrase is selected and that a PL verb is selected too regardless of the intervening structure, ' QL # QL', however small or large that structure may be.

                    t h o s e                   i n           g r e e n                    w i n           
                    | | | | |                   | |           | | | | |                    | | |           
            PN PL 2 t h o s e #PN               | |           | | | | |                    | | |           
            |  |               |                | |           | | | | |                    | | |           
  NP PL ; 1 PN PL             #PN #NP           | |           | | | | |                    | | |           
  |  |  |                          |            | |           | | | | |                    | | |           
  |  |  |                          |       PP 0 i n #PP       | | | | |                    | | |           
  |  |  |                          |       |         |        | | | | |                    | | |           
  |  |  |                          |       |         |  A : 2 g r e e n #A                 | | |           
  |  |  |                          |       |         |  |               |                  | | |           
  |  |  |                          |  QL 1 PP       #PP A               #A #QL             | | |           
  |  |  |                          |  |                                     |              | | |           
  |  |  |                          |  |                                     |         VR 0 w i n #VR       
  |  |  |                          |  |                                     |         |           |        
  |  |  |                          |  |                                     |  V PL 0 VR         #VR #V    
  |  |  |                          |  |                                     |  | |                   |     
S NP PL ;                         #NP QL                                   #QL V PL                  #V #S 

Figure 7

An alignment showing how - with the grammar from Figure 6 - the discontinous dependency between the 'plural' (PL) value of 'those' and the plural value of 'win' in 'Those in green win' may be marked despite the existence of an intervening structure ('in green') which may be arbitrarily large.

Notice that this alignment yields more compression than would be possible if the ' PL' markers were omitted from the pattern ' S NP PL ; #NP QL #QL V PL #V #S' and from the parsing. In this case, the sentence would be encoded with the symbols ' S PL 1 2 1 2 PL 0 #S' because the number value of the verb would have to be specified independently of the number value of the subject noun-phrase. This second encoding of the sentence contains one more symbol than the encoding which is possible when 'PL' markers for the subject noun-phrase and the main verb are included in the sentence pattern - and is correspondingly less economical.

4.2 Nesting of Discontinuous Dependencies

A possible snag with the method just proposed for marking discontinuous dependencies in syntax is that it might fail to discriminate between one set of dependencies and another when two (or more) sets of dependencies are embedded, one within another. If, for example, a plural dependency were nested within a plural dependency (schematically, (PL (PL PL) PL))) the method might interpret this as a plural dependency followed by a plural dependency - ((PL PL)(PL PL)) - or some other grouping.

The example in Figure 7 only shows one set of dependencies and does not throw light on this issue. However, the alignment in Figure 8 (a) shows one plural dependency nested within another and confirms that the dependency within the main structure ('those ... win') is separated quite clearly from the dependency within the subordinate clause ('... that we like ...') because the main structure is modelled on one sentence pattern in which one dependency is embedded and the subordinate clause contains another sentence pattern containing its own dependency between a plural subject and a plural verb.

The alignment in Figure 8 (b) confirms, as one would expect, that a singular dependency (in the subordinate clause '... that he likes ...') can be embedded within a plural dependency ('those ... win') without risk of confusion between the two dependencies.


            t h o s e                t h a t                  w e                        l i k e                           w i n           
            | | | | |                | | | |                  | |                        | | | |                           | | |           
            | | | | |                | | | |                  | |                   VR 2 l i k e #VR                       | | |           
            | | | | |                | | | |                  | |                   |             |                        | | |           
            | | | | |                | | | |                  | |            V PL 0 VR           #VR #V                    | | |           
            | | | | |                | | | |                  | |            | |                     |                     | | |           
            | | | | |                | | | |        NP PL ; 1 w e #NP        | |                     |                     | | |           
            | | | | |                | | | |        |  |  |        |         | |                     |                     | | |           
            | | | | |                | | | |      S NP PL ;       #NP QL #QL V PL                    #V #S                 | | |           
            | | | | |                | | | |      |                                                     |                  | | |           
            | | | | |          DPN 0 t h a t #DPN |                                                     |                  | | |           
            | | | | |           |             |   |                                                     |                  | | |           
            | | | | |     QL 0 DPN           #DPN S                                                     #S #QL             | | |           
            | | | | |     |                                                                                 |              | | |           
  NP PL ; 3 t h o s e #NP |                                                                                 |              | | |           
  |  |  |              |  |                                                                                 |              | | |           
S NP PL ;             #NP QL                                                                               #QL V PL        | | |     #V #S 
                                                                                                               | |         | | |     |     
                                                                                                               | |    VR 0 w i n #VR |     
                                                                                                               | |    |           |  |     
                                                                                                               V PL 0 VR         #VR #V    

(a)




                    t h o s e                    t h a t                            h e                             l i k e     s                       w i n           
                    | | | | |                    | | | |                            | |                             | | | |     |                       | | |           
            PN PL 2 t h o s e #PN                | | | |                            | |                             | | | |     |                       | | |           
            |  |               |                 | | | |                            | |                             | | | |     |                       | | |           
  NP PL ; 1 PN PL             #PN #NP            | | | |                            | |                             | | | |     |                       | | |           
  |  |  |                          |             | | | |                            | |                             | | | |     |                       | | |           
  |  |  |                          |       DPN 0 t h a t #DPN                       | |                             | | | |     |                       | | |           
  |  |  |                          |        |             |                         | |                             | | | |     |                       | | |           
  |  |  |                          |  QL 0 DPN           #DPN S                     | |                             | | | |     |    #S #QL             | | |           
  |  |  |                          |  |                       |                     | |                             | | | |     |    |   |              | | |           
S NP PL ;                         #NP QL                      |                     | |                             | | | |     |    |  #QL V PL        | | |     #V #S 
                                                              |                     | |                             | | | |     |    |      | |         | | |     |     
                                                              |                     | |                             | | | |     |    |      V PL 0 VR   | | | #VR #V    
                                                              |                     | |                             | | | |     |    |             |    | | |  |        
                                                              |                     | |                V SNG 0 VR   | | | | #VR s #V |             |    | | |  |        
                                                              |                     | |                |  |    |    | | | |  |    |  |             |    | | |  |        
                                                              |                     | |                |  |    VR 2 l i k e #VR   |  |             |    | | |  |        
                                                              |                     | |                |  |                       |  |             |    | | |  |        
                                                              S NP SNG ;            | |     #NP QL #QL V SNG                      #V #S            |    | | |  |        
                                                                |   |  |            | |      |                                                     |    | | |  |        
                                                                |   |  |   PN SNG 1 h e #PN  |                                                     |    | | |  |        
                                                                |   |  |   |   |         |   |                                                     |    | | |  |        
                                                                NP SNG ; 1 PN SNG       #PN #NP                                                    |    | | |  |        
                                                                                                                                                   |    | | |  |        
                                                                                                                                                   VR 0 w i n #VR       

(b)


Figure 8

(a) A parsing, using the grammar in Figure 7, which shows how one plural dependency (in '... that we like ...') may be embedded in another plural dependency (in 'those ... win') without risk of confusion between the two dependencies. (b) A parsing using the same grammar showing how a singular dependency may be embedded within a plural dependency without risk of confusion between the two.

4.3 Variability of Constituents

Regarding the parsing in Figure 7, readers may object that noun phrases are much more variable than the parsing might suggest: noun phrases in English range from those, like the one in Figure 7, which contain a single word through those in which a singular or plural marking appears on a determiner and a noun (e.g., "those cars") through those containing a determiner, adjective and noun where all three words were marked for number (e.g., "those many cars") to those where.the determiner is not marked for number (e.g., "the", "some") or the adjective is not marked for number (e.g., "red", "large" and most other adjectives) or some other combination (with a noun) of marked or unmarked determiner or adjective, either of which may be omitted. In addition, of course, there are more complicated noun-phrases containing intensifiers (e.g., 'very') which may occur recursively as can adjectives between the determiner and noun.

The grammar in Figure 6 accommodates some of this variability as can be seen in the three example parsings in Figure 9.

4.3.1 Figure 9 (a) shows a noun phrase ('this one man') containing a determiner, adjective and noun, all of which are marked as singular. This pattern of words 'selects' the pattern ' NP SNG ; 0 D SNG : #D A SNG : #A N SNG #N #NP' in the grammar in Figure 6 and thus 'selects' the ' SNG' marker for the whole noun phrase (immediately after the first ' NP' symbol). This singular marker for the whole noun phrase aligns with a matching symbol in the pattern for a singular sentence (' S NP SNG ; #NP QL #QL V SNG #V #S'). This means that a singular verb ('runs') provides the best match at the end of the sentence.

In this same example parsing, the symbol ' ;' in the sentence pattern and a matching symbol in the noun-phrase pattern are needed to ensure that the ' SNG' symbol in the sentence pattern aligns with the singular marker for the whole noun-phrase, not one of the singular markers for the determiner, adjective or noun. The same effect could have been achieved without the use of the ' ;' symbol by using distinctive versions of the ' SNG' symbol such as ' NPSNG', ' DSNG', ' ASNG' and ' NSNG'. Which of the two styles is to be preferred is a matter for further study.


                       t h i s            o n e            m a n                            r u n     s       
                       | | | |            | | |            | | |                            | | |     |       
             D SNG : 1 t h i s #D         | | |            | | |                            | | |     |       
             |  |  |           |          | | |            | | |                            | | |     |       
             |  |  |           |  A SNG : o n e #A         | | |                            | | |     |       
             |  |  |           |  |  |  |       |          | | |                            | | |     |       
             |  |  |           |  |  |  |       |  N SNG 1 m a n #N                         | | |     |       
             |  |  |           |  |  |  |       |  |  |          |                          | | |     |       
  NP SNG ; 0 D SNG :           #D A SNG :       #A N SNG         #N #NP                     | | |     |       
  |   |  |                                                           |                      | | |     |       
  |   |  |                                                           |         V SNG 0 VR   | | | #VR s #V    
  |   |  |                                                           |         |  |    |    | | |  |    |     
  |   |  |                                                           |         |  |    VR 1 r u n #VR   |     
  |   |  |                                                           |         |  |                     |     
S NP SNG ;                                                          #NP QL #QL V SNG                    #V #S 

(a)




                     t h e s e                          d o g     s                           j u m p           
                     | | | | |                          | | |     |                           | | | |           
                     | | | | |                     NR 3 d o g #NR |                           | | | |           
                     | | | | |                     |           |  |                           | | | |           
                     | | | | |              N PL 0 NR         #NR s #N                        | | | |           
                     | | | | |              | |                     |                         | | | |           
            D PL : 0 t h e s e #D           | |                     |                         | | | |           
            | |  |             |            | |                     |                         | | | |           
  NP PL ; 0 D PL :             #D A PL : #A N PL                    #N #NP                    | | | |           
  |  |  |                                                               |                     | | | |           
S NP PL ;                                                              #NP QL #QL V PL        | | | |     #V #S 
                                                                                  | |         | | | |     |     
                                                                                  | |    VR 4 j u m p #VR |     
                                                                                  | |    |             |  |     
                                                                                  V PL 0 VR           #VR #V    

(b)




                     t h e             r e d                c a r     s                      g o       
                     | | |             | | |                | | |     |                      | |       
                     | | |             | | |           NR 0 c a r #NR |                      | |       
                     | | |             | | |           |           |  |                      | |       
                     | | |             | | |    N PL 0 NR         #NR s #N                   | |       
                     | | |             | | |    | |                     |                    | |       
            D    : 2 t h e #D          | | |    | |                     |                    | |       
            |    |         |           | | |    | |                     |                    | |       
            |    |         |  A    : 0 r e d #A | |                     |                    | |       
            |    |         |  |    |         |  | |                     |                    | |       
  NP PL ; 0 D PL :         #D A PL :         #A N PL                    #N #NP               | |       
  |  |  |                                                                   |                | |       
  |  |  |                                                                   |         V PL 1 g o #V    
  |  |  |                                                                   |         | |        |     
S NP PL ;                                                                  #NP QL #QL V PL       #V #S 

(c)

Figure 9

Three example parsings, using the grammar in Figure 6, showing how the variability of noun phrases may be accommodated.

4.3.2 Figure 9 (b) shows an alignment containing a noun phrase ('these dogs') where there is no adjective between the determiner and noun. With the grammar in Figure 6, no special provision is made for the omission of any constituent within a larger structure. If a constituent is missing, the symbols which represent its 'slot' in the larger structure (the symbols ' A PL : #A' in this case) appear in the alignment but nothing is aligned with those symbols.

This is not entirely satisfactory as a way of showing that a given constituent is optional within a larger structure because it would allow the non-optional noun which constitutes the 'head' of the noun-phrase to be omitted in just the same way as the determiner or the adjective.

The rules governing when a constituent of a noun phrase is optional and when it is not are surprisingly complicated. For example, it is acceptable to form a plural noun phrase with a (plural) noun and without a determiner or adjective (e.g., "Dogs jump", "We like dogs") but with a singular noun phrase, there must be a determiner ("The dog jumps" is acceptable but "Dog jumps" is not).

One way to show where constituents are optional in a structure like a noun-phrase and where they are not is to provide a family of patterns covering the range of possible noun-phrases. The fact that all members of the family would contain a slot for a noun or pronoun but not all of them would contain slots for determiners or adjectives would accommodate the fact that the head noun is compulsory but the other constituents may not be. The fact that the determiner is compulsory for singular noun phrases but not for plural ones may be accommodated in the family of noun-phrase patterns by the inclusion of a plural pattern or patterns without the determiner but the omission from the family of noun-phrase patterns of any corresponding singular noun-phrase patterns.

4.3.3 Figure 9 (c) shows an alignment containing a noun phrase ("the red cars") where the determiner is not marked as singular or plural and neither is the adjective. This is where the ' :' symbol plays its part by allowing ' D : 2 t h e #D' to be aligned with the symbols ' D PL : #D' within the pattern ' NP PL ; 0 D PL : #D A PL : #A N PL #N #NP'. If the ' :' symbol were omitted from either or both of the pattern for 'the' or the pattern for the plural noun phrase, there would be ambiguity about the relative positions, left to right, of the symbols ' 2 t h e' in the pattern ' D : 2 t h e #D' and the second instance of the symbol ' PL' in the noun phrase pattern.

4.4 An Alternative Technique for Marking Dependencies in Syntax

This sub-section describes a second way of marking dependencies in syntax (discontinous or otherwise), illustrated by the grammar in Figure 10 and the alignment in Figure 11. Four features of the example are discussed in the sub-sections that follow.

S NP #NP V #V #S            1200
NP D #D A #A N #N #NP       1200
D 0 t h e #D                 175
D 1 s o m e #D               125
D 2 o u r #D                 100
D DSNG 0 o n e #D            300
D DSNG 1 t h i s #D          200
D DPL 0 t h e s e #D         200
D DPL 1 t h o s e #D         100
N NSNG 0 NR #NR #N           400
N NSNG 1 m a n #N            150
N NSNG 2 J o h n #N           50
N NSNG 3 M a r y #N          100
N NPL 0 NR #NR s #N          400
N NPL 1 m e n #N             100
NR 0 c a r #NR               200
NR 1 r o a d #NR             200
A 0 r e d #A                 125
A 1 b l u e #A               200
A 2 g r e e n #A              75
A ASNG o n e #A              100
A APL t w o #A               120
A APL 0 s e v e r a l #A      80
A APL 1 m a n y #A           150
V VSNG g o e s #V            700
V VPL g o #V                 500
DSNG NSNG                    700
ASNG NSNG                    100
DPL NPL                      500
APL NPL                      350
NSNG VSNG                    700
NPL VPL                      500
Figure 10

A grammar showing an alternative way in which the variability of number dependencies in noun phrases and in sentences may be recorded.

             t h o s e          t w o                 c a r     s              g o       
             | | | | |          | | |                 | | |     |              | |       
     D DPL 1 t h o s e #D       | | |                 | | |     |              | |       
     |  |              |        | | |                 | | |     |              | |       
     |  |              |  A APL t w o #A              | | |     |              | |       
     |  |              |  |  |        |               | | |     |              | |       
  NP D  |              #D A  |        #A N            | | |     | #N #NP       | |       
  |     |                    |           |            | | |     | |   |        | |       
  |     |                    |           |       NR 0 c a r #NR | |   |        | |       
  |     |                    |           |       |           |  | |   |        | |       
  |     |                    |           N NPL 0 NR         #NR s #N  |        | |       
  |     |                    |              |                         |        | |       
  |     |                    |              |                         |  V VPL g o #V    
  |     |                    |              |                         |  |  |      |     
S NP    |                    |              |                        #NP V  |      #V #S 
        |                    |              |                               |            
        |                    |             NPL                             VPL           
        |                    |              |                                            
       DPL                   |             NPL                                           
                             |              |                                            
                            APL            NPL                                           

Figure 11

A sample parsing using the grammar shown in Figure 10 showing number dependencies within a noun phrase.

4.4.1 Patterns of Dependency can be Separated from Basic Syntactic Patterns. In the grammar and the figure, readers will see that the sentence pattern (' S NP #NP V #V #S') and the pattern for noun phrases (' NP D #D A #A N #N #NP') do not contain any markers for number (singular or plural) but that there are six small patterns at the bottom of the grammar which do express these dependencies. Some of these patterns appear in the figure, linking words and their number markings in an appropriate manner: the plural determiner is linked to the plural noun, the plural adjective is also linked to the plural noun, and the plural noun is linked to the plural verb.

This manner of marking dependencies in the noun phrase and in the sentence has a pleasing simplicity and clarity, but it may not be applicable in all situations. For example, it looks as if this manner of marking dependencies might fail in cases like the one discussed in Section 4.2 where one set of dependencies is nested inside another.

Preliminary experiments in this area suggest that this kind of technique may be used where there are nested dependencies, provided the nesting is marked in the patterns which record the dependencies. For example, the pattern ' NPL VPL' in Figure 10 may be modified to become ' NPL S #S VPL' (and likewise for ' NSNG VSNG)'. This modification of the pattern for plural dependencies means that, if one sentence is nested within another, and if the symbols at each end of the inner sentence are aligned with ' S #S' in ' NPL S #S VPL', then mis-alignments of ' NPL' and ' VPL' with corresponding symbols in the outer sentence cannot easily occur.

4.4.2 The Number Marking of the 'Head' of a Structure may be Used Instead of a Number Marking for the Whole Structure. Readers will have noticed that, by contrast with the alignments in Figure 9, the alignment in Figure 11 does not have a number marking for the whole of the noun phrase. Instead, the alignment takes advantage of the fact that every noun phrase has a 'head' noun (or pronoun) and that the number marking of the head is the same as the number marking of the whole structure. Thus, the number marking for the whole structure may be omitted and the number marking of the head word (plural for ' d o g s' in Figure 11) may be used instead.

4.4.3 Any Pattern of Two or More Symbols May be Constructed from Binary Patterns. In Figure 6 and Figure 9 (a), the singular noun phrase pattern (' NP SNG ; 0 D SNG : #D A SNG : #A N SNG #N #NP') contains a three-way number dependency which is, in effect ' DSNG ASNG NSNG'. Likewise for the number dependency in the plural noun phrase in Figure 6 and in Figure 9 (b) and (c). By contrast with these three-way dependencies, the grammar in Figure 10 and the alignment in Figure 11 achieve the effect of a three-way dependency using patterns which each contain only two symbols.

Because of the 'dominant' status in the noun phrase of the 'head' noun, it seemed appropriate in the grammar in Figure 10 and the alignment in Figure 11, to link the determiner to the head noun and the adjective to the head noun rather than link the determiner to the adjective and the adjective to the noun. There is another benefit of this arrangement, discussed next.

4.4.4 Binary Dependencies can Accommodate Options in a Flexible Manner. Choosing the first of the two possibilities just described has the advantage that it can show where constituents are optional and where they are not as discussed in Section 4.3.2: if either the determiner or the adjective is missing then the corresponding dependency with the head noun would be missing too. The second of the two options mentioned in the previous paragraph would fail if the adjective were missing - because the middle link in the chain between the determiner and the noun would be broken and so the dependency between the determiner and the noun could not be shown.

4.5 Discontinous Dependencies which Overlap Each Other

In the French sentence Les plumes sont vertes ("The feathers are green") there are two sets of overlapping syntactic dependencies as shown here:

When the 'subject' of this sentence (Les plumes) is plural then the determiner (Les) must have the plural form, the noun (plume) must have a plural suffix (s), the verb (sont) must be plural, and the adjective (vert) must have a plural suffix (s). Likewise, the choice of a feminine noun (plume) means that the adjective (vert) must have a feminine suffix (e).

Figure 12 shows a fragment of French grammar expressed in the same manner as in Figure 10. Much of the discussion in Section 4.4 applies to the grammar and parsings shown in this section.

S NP #NP VP #VP #S              500
NP D #D N #N #NP                700
VP 0 V #V A #A #VP              300
VP 1 V #V P #P NP #NP #VP       200
P 0 s u r #P                     50
P 1 s o u s #P                  150
V VSNG e s t #V                 250
V VPL s o n t #V                250
D DSNG DM 0 l e #D               90
D DSNG DM 1 u n #D              120
D DSNG DF 0 l a #D              130
D DSNG DF 1 u n e #D            110
D DPL 0 l e s #D                125
D DPL 1 d e s #D                125
N NSNG NR #NR #N                450
N NPL NR #NR s #N               250
NR NM p a p i e r #NR           300
NR NF p l u m e #NR             400
A ASNG AM AR #AR #A             300
A ASNG AF AR #AR e #A           300
A APL AM AR #AR s #A            300
A APL AF AR #AR e s #A          300
AR 0 n o i r #AR                100
AR 1 v e r t #AR                200
DSNG NSNG                       450
DPL NPL                         250
DM NM                           210
DF NF                           240
VSNG ASNG                       600
VPL APL                         600
NSNG VSNG                       550
NPL VPL                         250
NM V #V AM                      300
NF V #V AF                      400
Figure 12

A fragment of French grammar with patterns for number dependencies and gender dependencies.

The alignment in Figure 13 (a) shows how the French sentence, above, is parsed in terms of the grammar: the main constituents of the sentence are marked in an appropriate manner and dependencies for number and gender are marked by patterns appearing towards the bottom of the alignment.



             l e s                p l u m e     s                   s o n t                  v e r t     e s           
             | | |                | | | | |     |                   | | | |                  | | | |     | |           
             | | |    N NPL NR    | | | | | #NR s #N                | | | |                  | | | |     | |           
             | | |    |  |  |     | | | | |  |    |                 | | | |                  | | | |     | |           
             | | |    |  |  NR NF p l u m e #NR   |                 | | | |                  | | | |     | |           
             | | |    |  |     |                  |                 | | | |                  | | | |     | |           
     D DPL 0 l e s #D |  |     |                  |                 | | | |                  | | | |     | |           
     |  |          |  |  |     |                  |                 | | | |                  | | | |     | |           
  NP D  |          #D N  |     |                  #N #NP            | | | |                  | | | |     | |           
  |     |                |     |                      |             | | | |                  | | | |     | |           
  |     |                |     |                      |       V VPL s o n t #V               | | | |     | |           
  |     |                |     |                      |       |  |          |                | | | |     | |           
  |     |                |     |                      |  VP 0 V  |          #V A             | | | |     | | #A #VP    
  |     |                |     |                      |  |    |  |          |  |             | | | |     | | |   |     
  |     |                |     |                      |  |    |  |          |  A APL AF AR   | | | | #AR e s #A  |     
  |     |                |     |                      |  |    |  |          |     |  |  |    | | | |  |          |     
  |     |                |     |                      |  |    |  |          |     |  |  AR 1 v e r t #AR         |     
  |     |                |     |                      |  |    |  |          |     |  |                           |     
S NP    |                |     |                     #NP VP   |  |          |     |  |                          #VP #S 
        |                |     |                              |  |          |     |  |                                 
        |               NPL    |                              | VPL         |     |  |                                 
        |                |     |                              |  |          |     |  |                                 
       DPL              NPL    |                              |  |          |     |  |                                 
                               |                              |  |          |     |  |                                 
                               |                              | VPL         |    APL |                                 
                               |                              |             |        |                                 
                               NF                             V             #V       AF                                

(a)



                 l a                 p l u m e                        e s t        s u r               l e s                p a p i e r     s               
                 | |                 | | | | |                        | | |        | | |               | | |                | | | | | |     |               
                 | |           NR NF p l u m e #NR                    | | |        | | |               | | |                | | | | | |     |               
                 | |           |  |             |                     | | |        | | |               | | |                | | | | | |     |               
                 | |    N NSNG NR |            #NR #N                 | | |        | | |               | | |                | | | | | |     |               
                 | |    |  |      |                |                  | | |        | | |               | | |                | | | | | |     |               
     D DSNG DF 0 l a #D |  |      |                |                  | | |        | | |               | | |                | | | | | |     |               
     |  |   |        |  |  |      |                |                  | | |        | | |               | | |                | | | | | |     |               
  NP D  |   |        #D N  |      |                #N #NP             | | |        | | |               | | |                | | | | | |     |               
  |     |   |              |      |                    |              | | |        | | |               | | |                | | | | | |     |               
  |     |   |              |      |                    |       V VSNG e s t #V     | | |               | | |                | | | | | |     |               
  |     |   |              |      |                    |       |  |         |      | | |               | | |                | | | | | |     |               
  |     |   |              |      |                    |       |  |         |  P 0 s u r #P            | | |                | | | | | |     |               
  |     |   |              |      |                    |       |  |         |  |         |             | | |                | | | | | |     |               
  |     |   |              |      |                    |  VP 1 V  |         #V P         #P NP         | | |                | | | | | |     |    #NP #VP    
  |     |   |              |      |                    |  |       |                         |          | | |                | | | | | |     |     |   |     
  |     |   |              |      |                    |  |       |                         |  D DPL 0 l e s #D             | | | | | |     |     |   |     
  |     |   |              |      |                    |  |       |                         |  |  |          |              | | | | | |     |     |   |     
  |     |   |              |      |                    |  |       |                         NP D  |          #D N           | | | | | |     | #N #NP  |     
  |     |   |              |      |                    |  |       |                               |             |           | | | | | |     | |       |     
S NP    |   |              |      |                   #NP VP      |                               |             |           | | | | | |     | |      #VP #S 
        |   |              |      |                               |                               |             |           | | | | | |     | |             
        |   |              |      |                               |                               |             N NPL NR    | | | | | | #NR s #N            
        |   |              |      |                               |                               |                |  |     | | | | | |  |                  
        |   |              |      |                               |                               |                |  NR NM p a p i e r #NR                 
        |   |              |      |                               |                               |                |                                        
        |   |             NSNG    |                              VSNG                             |                |                                        
        |   |              |      |                                                               |                |                                        
        |   DF             |      NF                                                              |                |                                        
        |                  |                                                                      |                |                                        
        |                  |                                                                     DPL              NPL                                       
        |                  |                                                                                                                                
       DSNG               NSNG                                                                                                                              

(b)

Figure 13

Parsing by multiple alignment with the grammar shown in Figure 12. Discontinuous plural dependencies are marked with the pattern ' S PL : PL PL PL #S' and the overlapping feminine gender dependencies are marked with the pattern ' S : F : F F #S'.

Using only binary dependencies, the plural determiner is linked to the plural noun, the plural noun is linked to the plural verb and this is linked to the plural adjective. Quite independently of this pattern of inter-linked binary dependencies for number, the gender dependency between the feminine noun and the feminine adjective is marked with the pattern ' NF V #V AF'.

Why are the symbols ' V #V' included in the pattern ' NF V #V AF'? In this example, the inclusion of these two symbols is not strictly necessary. But if the grammar were augmented slightly to accommodate the fact that, in French, an adjective within a noun phrase follows the noun (e.g., Les plumes vertes sont sur la table ("The green feathers are on the table")), then the symbols ' V #V' within the pattern ' NF V #V AF' would be necessary to show that this particular dependency requires a verb to intervene between the noun and the adjective (cf discussion in Section 4.4.1 of how discontinuous dependencies for number may be expressed when one sentence is embedded within another).

This example shows how overlapping patterns of dependency can be accommodated within the ICMAUS framework. Of course, these kinds of dependencies can be expressed quite well using other methods. However, work to date suggests that the ICMAUS framework may allow these kinds of dependency to be expressed with a pleasing simplicity and clarity compared with other methods.

- -

7. Since, for reasons given earlier, the grammar in Figure 6 is quite small, the simplifying assumption has been made, contrary to fact, that the form of singular verbs does not depend on the relevant 'person' ('I', 'thou', 'he', 'she'. 'it') and likewise for plural verbs. Similar simplifying assumptions have been made in subsequent grammars in this article.

8. Readers may be puzzled by the inclusion in the grammar of 'punctuation' symbols like ' ;' and ' :'. The reasons for including these symbols in the grammar are explained in Section 4.3.

5 DEPENDENCIES IN THE SYNTAX OF ENGLISH AUXILIARY VERBS

This section presents a grammar and examples showing how the syntax of English auxiliary verbs may be described in the ICMAUS framework. Before the grammar and examples are presented, the syntax of this part of English is described with words and diagrams and alternative formalisms for describing the syntax are briefly discussed.

In English, the syntax for main verbs and the 'auxiliary' verbs which may accompany them follows two quasi-independent patterns of constraint which interact in an interesting way.

The primary framework may be expressed with this sequence of symbols,
M H B B V,
which should be interpreted in the following way:

  • Each letter represents a category for a single word:

    • 'M' stands for 'modal' verbs like 'will', 'can', 'would' etc.

    • 'H' stands for one of the various forms of the verb 'to have'.

    • Each instance of ' B' stands for one of the various forms of the verb 'to be'.

    • 'V' stands for the main verb which can be any verb except a modal verb (except, arguably, when it occurs by itself).

    • The words occur in the order shown and any of the words may be omitted.9

    • Questions of 'standard' form follow exactly the same pattern as statements except that the first verb, whatever it happens to be (' M', ' H', the first ' B', the second ' B' or 'V'), precedes the subject noun phrase instead of following it.

      Here are two examples of the primary pattern with all of the words included:

      The secondary constraints are these:

    • Apart from the modals, which always have the same form, the first verb in the sequence, whatever it happens to be (' H', the first ' B', the second ' B' or 'V'), always has a 'finite' form (the form it would take if it were used by itself with the subject).

    • If an ' M' auxiliary verb is chosen, then whatever follows it (' H', first ' B', second ' B' or ' V') must have an 'infinitive' form (i.e., the 'standard' form of the verb as it occurs in the context 'to ...', but without the word 'to').

    • If an ' H' auxiliary verb is chosen, then whatever follows it (the first ' B', the second ' B' or ' V') must have a past tense form such as 'been', 'seen', 'gone', 'slept', 'wanted' etc. In Chomsky's (1957) Syntactic Structures, these forms were characterised as en forms and the same convention has been adopted here.

    • If the first of the two ' B' auxiliary verbs is chosen, then whatever follows it (the second ' B' or ' V') must have an ing form, e.g., 'singing', 'eating', 'having', 'being' etc.

    • If the second of the two 'B' auxiliary verbs is chosen, then whatever follows it (only the main verb is possible now) must have a past tense form (as above).

      Figure 14 shows a selection of examples with the dependencies marked.

      
      

      Figure 14

      A selection of example sentences in English with markings of dependencies between the verbs. Key: M = modal, H = forms of the verb 'have', B1 = first instance of a form of the verb 'be', B2 = second instance of a form of the verb 'be', V = main verb, fin a finite form, inf = an infinitive form, en = a past tense form, ing = a verb ending in 'ing'.

    5.1 Transformation Grammar and English Auxiliary Verbs

    In Figure 14 it can be seen that in many cases but not all, the dependencies which have been described may be regarded as discontinuous because they connect one word in the sequence to the suffix of the following word thus bridging the stem of the following word. Three instances of this kind of dependency can be seen in the first example in the figure.

    In Syntactic Structures, Chomsky (1957) showed that this kind of regularity in the syntax of English auxiliary verbs could be described using Transformational Grammar (TG). For each pair of symbols linked by a dependency (' M inf', ' H en', ' B1 ing', ' B2 en') the two symbols could be shown together in the 'deep structure' of a sentence and then moved into their proper position or modified in form (or both) using 'transformational rules'.

    This elegant demonstration argued persuasively in favour of TG compared with alternatives which were available at that time. However, later research has shown that the same kinds of regularities in the syntax of English auxiliary verbs can be described quite well without recourse to transformational rules, using DCGs or other systems which do not use that type of rule (see, for example, Pereira and Warren, 1980; Gazdar and Mellish, 1989). An example showing how English auxiliary verbs may be described using the DCG formalism may be found in Wolff (1987, pp. 183-4).

    5.2 ICMAUS and English Auxiliary Verbs

    Figure 15 shows an 'ICMAUS' grammar for English auxiliary verbs which exploits several of the ideas described earlier in this article. Figure 16 shows three alignments for three different sentences produced by the SP52 model using this grammar. In the following paragraphs, aspects of the grammar and of the examples are described and discussed.

    S ST NP #NP X1 #X1 XR #S               3000
    S Q X1 #X1 NP #NP XR #S                2000
    NP SNG i t #NP                         4000
    NP PL t h e y #NP                      1000
    X1 0 V M #V #X1 XR XH XB XB XV #S      1000
    X1 1 XH FIN #XH #X1 XR XB XB XV #S      900
    X1 2 XB1 FIN #XB1 #X1 XR XB XV #S      1900
    X1 3 V FIN #V #X1 XR #S                 900
    XH V H #V #XH XB #S                     200
    XB XB1 #XB1 XB #S                       300
    XB XB1 #XB1 XV #S                       300
    XB1 V B #V #XB1                         500
    XV V #V #S                             5000
    M INF                                  2000
    H EN                                   2400
    B XB ING                               2000
    B XV EN                                 700
    SNG SNG                                2500
    PL PL                                  2500
    V M 0 w i l l #V                       2500
    V M 1 w o u l d #V                     1000
    V M 2 c o u l d #V                      500
    V H INF h a v e #V                      600
    V H PL FIN h a v e #V                   400
    V H SNG FIN h a s #V                    200
    V H EN h a d #V                         500
    V H FIN h a d #V                        300
    V H ING h a v ING1 #ING1 #V             400
    V B SNG FIN 0 i s #V                    500
    V B SNG FIN 1 w a s #V                  400
    V B INF b e #V                          400
    V B EN b e EN1 #EN1 #V                  600
    V B ING b e ING1 #ING1 #V               700
    V B PL FIN 0 a r e #V                   300
    V B PL FIN 1 w e r e #V                 500
    V FIN w r o t e #V                      166
    V INF 0 w r i t e #V                    254
    V INF 1 c h e w #V                      138
    V INF 2 w a l k #V                      318
    V INF 3 w a s h #V                       99
    V ING 0 c h e w ING1 #ING1 #V           623
    V ING 1 w a l k ING1 #ING1 #V            58
    V ING 2 w a s h ING1 #ING1 #V           102
    V EN 0 m a d e #V                       155
    V EN 1 b r o k EN1 #EN1 #V              254
    V EN 2 t a k EN1 #EN1 #V                326
    V EN 3 l a s h ED #ED #V                160
    V EN 4 c l a s p ED #ED #V              635
    V EN 5 w a s h ED #ED #V                 23
    ING1 i n g #ING1                       1883
    EN1 e n #EN1                           1180
    ED e d #ED                              818
    
    Figure 15

    A grammar for the syntax of English auxiliary verbs using the ICMAUS principles described in the accompanying article.

    
    
                i t                            i s                             w a s h    e d           
                | |                            | |                             | | | |    | |           
                | |                            | |                             | | | | ED e d #ED       
                | |                            | |                             | | | | |       |        
                | |                            | |                      V EN 5 w a s h ED     #ED #V    
                | |                            | |                      | |                       |     
                | |                            | |                   XV V |                       #V #S 
                | |                            | |                   |    |                          |  
                | |                B           | |                   XV   EN                         |  
                | |                |           | |                   |                               |  
         NP SNG i t #NP            |           | |                   |                               |  
         |   |       |             |           | |                   |                               |  
    S ST NP  |      #NP X1         |           | |         #X1 XR    |                               #S 
             |          |          |           | |          |  |     |                               |  
             |          X1 2 XB1   |     FIN   | |    #XB1 #X1 XR XB XV                              #S 
             |                |    |      |    | |     |                                                
             |                |  V B SNG FIN 0 i s #V  |                                                
             |                |  | |  |            |   |                                                
             |               XB1 V B  |            #V #XB1                                              
             |                        |                                                                 
            SNG                      SNG                                                                
    
    (a)
    
    
    
                   w i l l               i t                   h a v e                         b e     e n                        b r o k     e n            
                   | | | |               | |                   | | | |                         | |     | |                        | | | |     | |            
                   | | | |               | |                   | | | |                  V B EN b e EN1 | | #EN1 #V                | | | |     | |            
                   | | | |               | |                   | | | |                  | | |       |  | |  |   |                 | | | |     | |            
                   | | | |               | |                   | | | |              XB1 V B |       |  | |  |   #V #XB1           | | | |     | |            
                   | | | |               | |                   | | | |               |    | |       |  | |  |       |             | | | |     | |            
                   | | | |               | |                   | | | |               |    | |      EN1 e n #EN1     |             | | | |     | |            
                   | | | |               | |                   | | | |               |    | |                       |             | | | |     | |            
                   | | | |               | |                   | | | |               |    | |                       |      V EN 1 b r o k EN1 | | #EN1 #V    
                   | | | |               | |                   | | | |               |    | |                       |      | |             |  | |  |   |     
                   | | | |               | |                   | | | |               |    | |                       |   XV V |             |  | |  |   #V #S 
                   | | | |               | |                   | | | |               |    | |                       |   |    |             |  | |  |      |  
                   | | | |               | |                   | | | |               |    | |                       |   |    |            EN1 e n #EN1    |  
                   | | | |               | |                   | | | |               |    | |                       |   |    |                            |  
                   | | | |               | |                   | | | |           XB XB1   | |                      #XB1 XV   |                            #S 
                   | | | |               | |                   | | | |           |        | |                           |    |                            |  
             V M 0 w i l l #V            | |                   | | | |           |        | |                           |    |                            |  
             | |           |             | |                   | | | |           |        | |                           |    |                            |  
        X1 0 V M           #V #X1        | |     XR XH         | | | |        XB XB       | |                           XV   |                            #S 
        |      |               |         | |     |  |          | | | |        |           | |                           |    |                            |  
        |      |               |  NP SNG i t #NP |  |          | | | |        |           | |                           |    |                            |  
        |      |               |  |           |  |  |          | | | |        |           | |                           |    |                            |  
    S Q X1     |              #X1 NP         #NP XR |          | | | |        |           | |                           |    |                            #S 
               |                                    |          | | | |        |           | |                           |    |                            |  
               |                                    |  V H INF h a v e #V     |           | |                           |    |                            |  
               |                                    |  | |  |          |      |           | |                           |    |                            |  
               |                                    XH V H  |          #V #XH XB          | |                           |    |                            #S 
               |                                         |  |                             | |                           |    |                               
               M                                         | INF                            | |                           |    |                               
                                                         |                                | |                           |    |                               
                                                         H                                | EN                          |    |                               
                                                                                          |                             |    |                               
                                                                                          B                             XV   EN                              
    
    (b)
    
    
    
                              a r e                   t h e y                      w a l k      i n g             
                              | | |                   | | | |                      | | | |      | | |             
                              | | |                   | | | |              V ING 1 w a l k ING1 | | | #ING1 #V    
                              | | |                   | | | |              |  |             |   | | |   |   |     
                              | | |                   | | | |           XV V  |             |   | | |   |   #V #S 
                              | | |                   | | | |           |     |             |   | | |   |      |  
                              | | |                   | | | |           |     |            ING1 i n g #ING1    |  
                              | | |                   | | | |           |     |                                |  
                              | | |             NP PL t h e y #NP       |     |                                |  
                              | | |             |  |           |        |     |                                |  
    S Q X1                    | | |         #X1 NP |          #NP XR    |     |                                #S 
        |                     | | |          |     |              |     |     |                                |  
        X1 2 XB1        FIN   | | |    #XB1 #X1    |              XR XB XV    |                                #S 
              |          |    | | |     |          |                 |        |                                   
              |  V B PL FIN 0 a r e #V  |          |                 |        |                                   
              |  | | |              |   |          |                 |        |                                   
             XB1 V B |              #V #XB1        |                 |        |                                   
                   | |                             |                 |        |                                   
                   | PL                            PL                |        |                                   
                   |                                                 |        |                                   
                   B                                                 XB      ING                                  
    
    (c)
    
    

    Figure 16

    Three parsings by the SP52 model using the grammar shown in Figure 15.

    5.2.1 The Primary Framework. The first line in the grammar is a sentence pattern for a statement (marked with the symbol ' ST') and the second line is a sentence pattern for a question (marked with the symbol ' Q'). Apart from these markers, the only difference between the two patterns is that, in the statement pattern, the symbols ' X1 #X1' follow the noun phrase symbols (' NP #NP'), whereas in the question pattern they precede the noun phrase symbols. As can be seen in the examples in Figure 16, the pair of symbols, ' X1 #X1', have the effect of selecting the first verb in the sequence of auxiliary verbs and ensuring its correct position with respect to the noun phrase. In Figure 16 (a) it follows the noun phrase, while in Figure 16 (b) and (c) it precedes the noun phrase.

    Each of the next four patterns in the grammar have the form ' X1 ... #X1 XR ... #S'. The symbols ' X1' and ' #X1' align with the same pair of symbols in the sentence pattern. The symbols ' XR ... #S' encode the remainder of the sequence of verbs.

    The first ' X1' pattern encodes verb sequences which start with a modal verb (' M'), the second one is for verb sequences beginning with a finite form of the verb 'have' (' H'), the third is for sequences beginning with either of the two ' B' verbs in the primary sequence (see below), and the last ' X1' pattern is for sentences which contain a main verb without any auxiliaries.

    In the first of the ' X1' patterns, the subsequence ' XR ... #S' encodes the remainder of the sequence of auxiliary verbs using the symbols ' XH XB XB XV'. In a similar way, the subsequence ' XR ... #S' within each of the other ' X1' patterns encodes the verbs which follow the first verb in the sequence.

    Notice that the pattern ' X1 2 XB1 FIN #XB1 #X1 XR XB XV #S' can encode sentences which start with the first ' B' verb and also contains the second ' B' verb. And it also serves for any sentence which starts with the first or the second ' B' verb with the omission of the other ' B' verb. In the latter two cases, the 'slot' between the symbols ' XB' and ' XV' is left vacant. Figure 16 (a) illustrates the case where the verb sequence starts with the first ' B' verb with the omission of the second ' B' verb. Figure 16 (c) illustrates the case where the verb sequence starts with the second ' B' verb (and the first ' B' verb has been omitted).

    5.2.2 The Secondary Constraints. The secondary constraints are represented using the patterns ' M INF', ' H EN',' B XB ING', and 'B XV EN '. Singular and plural dependencies are marked in a similar way using the patterns 'SNG SNG' and 'PL PL'.

    Examples appear in all three alignments in Figure 16. In every case except one (the fifth row in Figure 16 (a)), the patterns representing secondary constraints appear in the bottom rows of the alignment.10 As with the examples presented in Section 4, these examples show how dependencies bridging arbitrarily large amounts of structure and constraints which often overlap each other can be represented with simplicity and transparency in the medium of multiple alignments.

    Notice, for example, how dependencies between the first and second verb in a sequence of auxiliary verbs are expressed in the same way regardless of whether the two verbs lie side by side (e.g., the statement in Figure 16 (a)) or whether they are separated from each other by the subject noun phrase (e.g., the two questions in Figure 16 (b) and (c)). Notice, again, how the overlapping dependencies in Figure 16 (b) and their independence from each other are expressed with simplicity and clarity in the ICMAUS framework.

    Readers may wonder why the two patterns representing dependencies between a 'B' verb and whatever follows it ( 'B XB ING', and 'B XV EN ') contain three symbols rather than two. One reason is that, when two (or more) patterns begin with the same symbol (or sequence of symbols), the scoring method for evaluating alignments requires that the two patterns can be distinguished from each other by one (or more) symbols in each pattern which does not include the terminal symbol in each pattern. A second reason is that the second symbol in each pattern helps to determine whether the 'B' at the start of the pattern corresponds to the first or the second 'B' verb in the primary sequence:

    • 'B XB ING'. The inclusion of 'XB' in this pattern means that the 'B' verb is the first of the two 'B' verbs in the primary sequence and the following verb must be 'ING'.

    • 'B XV EN'. The inclusion of 'XV' in this pattern means that the 'B' may be the first or the second. However, since the first case is already covered by 'B XB ING', this pattern covers the constraint between the second 'B' verb and verbs of the category 'EN'.

    - -

    9. There is room for debate here about whether the main verb may be omitted. Colloquial English includes such forms as "I will", "He has been" and so on, which suggest that, at the level of syntax, the main verb may be omitted - even though, at the level of semantics, the meaning of the main verb may be understood. In this article, which focuses on syntax, it will be assumed that the main verb can be omitted.

    10. As previously noted, for all rows after the first, the order of the rows is arbitrary, depending on how the alignment has been built up by the SP52 model. For the sake of clarity in the interpretations of alignments, it is a happy accident that patterns representing secondary constraints normally appear at the bottom of alignments.

    6 CROSS SERIAL DEPENDENCIES

    This section considers a type of syntactic structure - 'cross-serial dependency' - which is difficult or impossible to represent using a basic CF-PSG although it can be expressed in a reasonably straightforward manner with augmented forms of PSG. Arguably, the multiple alignment representation of this form to be described below is even simpler and more transparent than existing alternatives.

    In Swiss German, a sentence which means "We let the children help Hans paint the house" may be expressed as Mer d'chind em Hans es huus l0x9and h0x8alfed aastriiche, which corresponds with the following sequence of words in English: "We the children Hans the house let helped paint" (see Borsley, 1996, section 2.3). This kind of structure also appears in Dutch.11

    In the English form, this type of sentence is clearly recursive because it is easily extended, without limit, to become "We let the children help Hans help Jim help Mary ... paint the house", and so on. The same appears to be true of the cross-serial form.

    In this type of structure, the dependencies between verbs and their subjects and between the verbs and their objects are discontinuous and they overlap each other as can be seen schematically here:

    In both the English form and the cross-serial form, the relationship between one sentence and the next in the recursive sequence is curious because the object of every sentence except the last becomes the subject of the next.

    Given earlier remarks and examples showing how recursive structures and DDs (including DDs which overlap each other) may be accommodated within the ICMAUS framework, we might expect cross-serial dependencies to slip easily into the same mould.

    Figure 17 shows a grammar for sentences like "We the children Hans the house let helped paint" except that "the children" has been replaced by "them" and "the house" has been replaced by "it". These substitutions have been made to avoid creating alignments which are too long to be easily displayed. Figure 18 shows an alignment for the sentence 'We them Hans it let helped paint" created by the SP52 model using the grammar in Figure 17.

    S NP #NP S NP #NP : : #S V #V #S       1000
    NP 0 h i m #NP                          500
    NP 1 h a n s #NP                        200
    NP 2 t h e m #NP                        300
    NP 3 w e #NP                            150
    NP 4 i t #NP                            250
    V 0 h e l p e d #V                      200
    V 1 l e t #V                            100
    V 2 e n c o u r a g e #V                300
    V 3 r u n #V                            100
    V 4 p a i n t #V                        125
    V 5 w a l k #V                          175
    

    Figure 17

    A grammar for cross-serial dependencies.

    
    
           w e            t h e m            h a n s            i t                    l e t           h e l p e d           p a i n t       
           | |            | | | |            | | | |            | |                    | | |           | | | | | |           | | | | |       
      NP 3 w e #NP        | | | |            | | | |            | |                    | | |           | | | | | |           | | | | |       
      |         |         | | | |            | | | |            | |                    | | |           | | | | | |           | | | | |       
      |         |    NP 2 t h e m #NP        | | | |            | |                    | | |           | | | | | |           | | | | |       
      |         |    |             |         | | | |            | |                    | | |           | | | | | |           | | | | |       
      |         |    |             |         | | | |            | |                V 3 l e t #V        | | | | | |           | | | | |       
      |         |    |             |         | | | |            | |                |         |         | | | | | |           | | | | |       
    S NP       #NP S NP           #NP        | | | |            | |         : : #S V         #V #S     | | | | | |           | | | | |       
                   | |             |         | | | |            | |         |                   |      | | | | | |           | | | | |       
                   S NP           #NP S NP   | | | | #NP        | |       : :                   #S V   | | | | | | #V #S     | | | | |       
                                      | |    | | | |  |         | |       |                        |   | | | | | | |  |      | | | | |       
                                      | NP 1 h a n s #NP        | |       |                        |   | | | | | | |  |      | | | | |       
                                      | |             |         | |       |                        |   | | | | | | |  |      | | | | |       
                                      | |             |    NP 4 i t #NP   |                        |   | | | | | | |  |      | | | | |       
                                      | |             |    |         |    |                        |   | | | | | | |  |      | | | | |       
                                      | |             |    |         |    |                        |   | | | | | | |  |  V 4 p a i n t #V    
                                      | |             |    |         |    |                        |   | | | | | | |  |  |             |     
                                      S NP           #NP S NP       #NP : :                        |   | | | | | | |  #S V             #V #S 
    
    

    Figure 18

    An alignment produced by the SP52 model using the grammar for cross-serial dependencies shown in Figure 17.

    The example sentence can be seen to be composed of three component sentences: "We them ... let", "them Hans ... helped" and "Hans it ... paint". Each of these three component sentences has the basic form: NP NP ... V which may be interpreted functionally as subject object ... verb.

    The form of each component sentence (NP NP ... V) can be seen in the first pattern in the grammar: ' S NP #NP S NP #NP : : #S V #V #S'. Apart from the symbols ' S' and ' #S' at the beginning and end of this pattern, it also contains an instance of the ' S' symbol between the two pairs of ' NP #NP' symbols, a pair of colon symbols (' : :') and a ' #S' symbol between the second pair of ' NP #NP' symbols and the two ' V #V' symbols. These additions to the basic pattern support the recursive alignment of the pattern with itself as can be seen in Figure 18.

    Although the order of the second and subsequent rows in this alignment produced by SP52 is a little 'untidy' from the standpoint of readability, the structure shown in Figure 18 is in fact correct in terms of the analysis presented above. The component sentence "We them ... let" has been parsed correctly by the lowest of the three appearances of the sentence pattern ' S NP #NP S NP #NP : : #S V #V #S'; the component sentence "them Hans ... helped" is parsed by the middle appearance of the sentence pattern; and the component sentence "Hans it ... paint" is parsed by the top appearance of the sentence pattern.

    Notice that, in accordance with the analysis above, the word "them" has been identified as the second ('object') noun-phrase in the lowest sentence pattern and, at the same time, as the first ('subject') noun-phrase in the middle sentence pattern. Likewise, the word "Hans" has been identified as the second ('object') noun-phrase in the middle sentence pattern and, at the same time, as the first ('subject') noun-phrase in the top sentence pattern.

    What is the purpose of the colon symbols in the middle of the sentence pattern and why are there two of them? As with other 'punctuation' symbols discussed elsewhere in this article, the colon symbols are needed to ensure that the relative positions, left to right, of columns in the alignment are not ambiguous. Without these symbols, the alignment could not express the fact that all three verbs follow all the noun phrases.

    Two symbols are needed in the middle of the sentence pattern because of the rule that any one instance of a symbol can never be aligned with itself (see Sections words in Swiss German or Dutch.2.1 and words in Swiss German or Dutch.5.2.2 of the accompanying article). The three appearances in Figure 18 of the basic sentence pattern are just that: they are appearances of the sentence pattern, not three separate instances. If there was only one instance of the colon symbol in the middle of the pattern, the three appearances of that symbol (in the three appearances of the sentence pattern) could not be aligned, one with another, because that would mean aligning one instance of a symbol with itself.

    - -

    11. For ease of understanding by readers who are not familiar with Swiss German or Dutch, all the examples in the rest of this section of cross-serial dependencies will substitute English words for words in Swiss German or Dutch.

    7 CONCLUSION

    In this article, I have tried to show that the proposals in the accompanying article for understanding parsing as a process of information compression by multiple alignment, unification and search (ICMAUS) is not restricted to simple forms of the kind used as illustration in the accompanying article.

    A range of examples have been presented and discussed, all of them illustrated with alignments produced by the SP52 model:

    • The ICMAUS framework is naturally sensitive to ambiguities in syntax and can adjust its parsings in the light of disambiguating context.

    • The phenomenon of recursion in syntactic structures can be detected and represented in the version of 'multiple alignment' which has been adopted in this research.

    • The framework can accommodate discontinuous dependencies in syntax, including discontinuous dependencies which are nested one within another and discontinuous dependencies which overlap each other. Discontinuous dependencies may be recognised when instances of one or more structures within the dependency (e.g., noun phrases) may be quite variable.

    • The interesting relation between primary structure and secondary constraints which is exhibited by English auxiliary verbs can be represented in a transparent manner as multiple alignment. Examples can be parsed by ICMAUS as it is realised in the SP52 model.

    • The phenomenon of cross-serial dependencies (which combines discontinuous dependencies with recursion) can also be accommodated within the ICMAUS framework.

    7.1 Representing Syntactic Structure as 'Patterns'

    As was noted in the introduction, the later examples show that representing syntactic structure as 'patterns' without meta-symbols, together with a system for finding good alignments, provides more expressive power than a CF-PSG and appears to have sufficient power to represent the kinds of structures found in natural languages which are beyond the scope of CF-PSGs.

    Arguably, this framework provides a means of describing syntactic structure in natural language which is simpler and more transparent than alternative systems.

    7.2 Linguistic Intuition and Information Compression

    As was noted also in the introduction, the fact that the alignments which have been presented reflect to a large extent our intuitions about 'correct' parsings for the sample sentences, coupled with the fact that the alignments have been produced by a system which is dedicated to information compression, lends support to the hypothesis that our intuitions about the analysis of sentences are themselves the product of psychological processes of IC.

    The evidence in this article that linguistic intuitions arise from IC applies only to the process of parsing and not to the design of the example grammars - which were the product of (the author's) linguistic intuitions. However, there is strong evidence that grammars designed in accordance with linguistic intuitions may also be the product of psychological processes which perform IC: artificial systems for inductive learning of grammars which are dedicated to IC have produced grammars containing structures which are very close to those that would be judged to be 'correct' in terms of linguistic intuition (see Wolff, 1988).

    7.3 Further Development and Generalisation

    The examples which have been presented suggest that, in relation to the parsing of natural languages, the ICMAUS concepts have sufficient promise to justify further exploration and development. However, as was described in the introduction to the accompanying article, there is the important additional motivation that, with relatively little modification, the model may may be developed to handle the semantics of languages and to support such such things as deductive and probabilistic inference in linguistic and non-linguistic domains, unsupervised inductive learning, best-match information retrieval, (fuzzy) pattern recognition and others.

    If these expectations are valid, this has theoretical interest from the standpoint of integrating diverse functions in computing but, in more practical terms, it offers the prospect of developing the system as a fully-integrated system for natural language understanding and production which includes the kinds of capabilities which have been indicated.

    Acknowledgements

    I am grateful to Dr Bob Borsley of the School of English and Linguistics, University of Wales at Bangor, for drawing my attention to the phenomenon of cross-serial dependencies in syntax and for discussions of linguistic issues considered in this article.

    References

    Borsley, R. D. (1996) Modern Phrase Structure Grammar. Blackwell, Oxford.

    Chomsky, N. (1957) Syntactic Structures. Mouton, The Hague.

    Gazdar, G. and Mellish, C. (1989) Natural Language Processing in Prolog. Addison-Wesley, Wokingham.

    Hofstadter, D. R. (1979) Gödel, Escher, Bach: an Eternal Golden Braid. Penguin Books, Harmondsworth.

    Pereira, F. C. N. and Warren, D. H. D. (1980) Definite Clause Grammars for language analysis - a survey of the formalism and a comparison with augmented transition networks. Artificial Intelligence, 13, 231-278.

    Wolff, J. G. (1998). Parsing as information compression by multiple alignment, unification and search: SP52. In this issue.

    Wolff, J. G. (1988) Learning syntax and meanings through optimization and distributional analysis. In Y. Levy, I. M. Schlesinger and M. D. S. Braine (eds.), Categories and Processes in Language Acquisition. Lawrence Erlbaum, Hillsdale, NJ. Reprinted in Chapter 2 of Wolff (1991).

    Wolff, J. G. (1987) Cognitive development as optimization. In Bolc, L (Ed.), Computational Models of Learning. Springer-Verlag, Berlin.

    Wolff, J. G. (1991) Towards a Theory of Cognition and Computing. Ellis Horwood, Chichester.

  

Computing

Cognition

Language Learning