CognitionResearch.org

The SP theory

Cognition

Language Learning

Home
Computing
Cognition
Language Learning
Book
Contact

NATURAL LANGUAGE PROCESSING AS INFORMATION COMPRESSION

Methods for representing the syntax of natural language (NL), for parsing NL and producing NL have been developed in a programme of research developing the 'SP' conjecture that

All kinds of computing and formal reasoning may usefully be understood as information compression by multiple alignment, unification and search.
Although attention has, so far, been confined to representing NL syntax and the parsing and production of syntactic structures, it is anticipated that the concepts are likely to generalise easily to the integration of semantic structures with syntactic structures and the processing of semantic structures in conjunction with syntactic structures.

This approach to processing NL appears to be novel and offers potential benefits when it is more fully developed:

  • A relatively simple and transparent method for representing syntactic structures in NL including 'context sensitive' features.
  • Precisely the same methods can serve for the parsing of NL and for the production of NL.
These ideas are described in Syntax, parsing and production of natural language in a framework of information compression by multiple alignment, unification and search and also with more detail and more examples in two other unpublished reports:
  • The first describes the representation of syntax in the proposed new framework and shows how the idea of parsing and production of NL can be achieved. It also describes the SP52 computer model, which embodies these ideas.
  • The second presents a range of examples including examples showing how the system can accommodate ambiguity and recursion in syntax, discontinuous dependencies and cross-serial dependencies in syntax, and the interesting inter-relation of primary structure and secondary constraints in English auxiliary verbs.

The concept has also been applied in an interpretation of the nature of 'computing', mathematics and logic, in modelling probabilistic reasoning and in other areas of computing (see Computing as Compression).

CognitionResearch.org

The SP theory

Cognition

Language Learning