A Little History
Why should the publications cited in Language learning as information compression be so "out of date"? This little history of my research career attempts to explain and justify why I have cited these publications and why I feel that, despite the passage of time, they are still
well worth reading by anyone interested in first language learning.
In the early 1970s, I started work trying to understand first language learning by children, exploring the potential of a 'statistical' approach to the subject, broadly within the area of 'empiricist' theories of first language learning. The method was to build computer models expressing the theory and see whether or not they could do the kinds of things that children can do.
With some honourable exceptions, the vast majority of academics and researchers in psychology had, at the time, accepted Chomsky's arguments that language 'acquisition' was too complex for any kind of empiricist mechanism to be successful and that some kind of 'nativist' theory must be where the truth lay. I think it is fair to say that 'statistical' approaches to understanding language learning were largely unknown, certainly amongst developmental psychologists. Consequently, the intellectual atmosphere was quite unsympathetic to the kind of approach I wanted to pursue. My PhD supervisor suggested encouragingly that what I wanted to do was "impossible" and Chomsky had proved it.
After building and testing a good many models, I eventually arrived at a model (MK10) which, by 'statistical' means and without correction by a 'teacher' or negative examples, was quite successful at discovering word-like segments in unsegmented language-like texts and, after some minor refinements, was quite successful at discovering real words in unsegmented samples of real natural language (see Language learning as information compression).
After a good deal more effort, I developed another program (SNPR) which could not only do what MK10 could do but could learn artificial grammars too. (Fuller information is given in
Language Learning as Compression).
Shortly after I had developed the SNPR program, but before I had finished making quantitative measurements on it, I left university to work in industry. Between 1982 and 1988 I worked first with IBM for a year's fellowship and then at Praxis Systems in Bath, developing software.
From working on software systems, I began to realise that many of the insights I
had gained in the work on language learning could also be applied to many
aspects of 'computing'. The urge to develop this new line of thinking led me
back to academic work in the
School of Informatics (formerly
SEECS), University of Wales at Bangor. Since 1988, I have been developing these
ideas, with results described in Computing as Compression.
Returning to the subject of language learning, there had been a gradual shift in academic opinion during the 1980s and 1990s so that now, the approach to the subject which I and a few others were pursuing in the 1970s is becoming much more widely recognised and accepted.
I have not made an exhaustive search of the literature (assuming that were possible!) but from some searching, from reading recent articles, and from talking to people who are in touch with the recent literature, it seems that, in the areas of first language learning that they were designed to explain, the MK10 and SNPR models may still be regarded as "leading edge".
The MK10 model has been described inaccurately in the literature on more than one occasion giving readers the impression that its capabilities are more limited than they actually are. It is interesting that the basic mechanism has been reinvented on several occasions for different purposes. I was not aware of it at the time I developed the model but Solomonoff had described
a version of the idea in 1964 in a paper about inductive inference. (The main
difference between his version and the MK10 model is that the latter performs a
repeated re-parsing of the data which enables it to escape from errors that the
system makes in the early stages of processing.)
The SNPR model has received relatively little attention. The main reason for this is probably because I was away from academic work for five years after it was developed and then started working on a new research programme. Thus, the model has not been publicised in the normal way in talks, conference papers and journal articles.
I believe the SNPR model deserves much more attention than it has received so far (see
Language acquisition, data compression and generalization and
Learning syntax and meanings through optimization and distributional analysis). It goes a long way beyond the MK10 model, demonstrating how, with unsegmented text, the learning of segmental structure can be integrated with the learning of disjunctive 'class' structures, and showing in particular how 'correct' generalisations of grammatical rules can be distinguished from 'incorrect' generalisations, without the aid of external correction by a 'teacher', without 'negative' samples and without any kind of 'grading' of language samples.
I hope the foregoing potted history provides some justification for citing articles which ordinarily one might regard as "out of date". For anyone interested in
children's language learning, I believe the articles are still well worth reading.