Chaotic language

We are developing software to robustly analyse natural language.

Current technology fails to reach a satisfactory level of accuracy for even basic tasks.

Our software represents a totally different approach to the problem. This approach reverses the traditional relationship between example and rule. In our system grammatical rules depend on examples, examples do not depend on rules.

This makes it ideal for application to less studied languages. So far the algorithms have been applied with success to Danish and Chinese, in addition to English.

We seek collaboration for the improvement of the existing code and application to further languages.

Image of sample parser output for Chinese

Parsing Problem

The problem addressed by this system is branching association. It is central to all tasks of language description and comprehension.

In short how do you teach a machine to distinguish between the following:

Branching ambiguity

((foreign exchange) dealer)

(foreign exchange)
/	\
foreign	exchange

dealer

(foreign (exchange dealer))

foreign

(exchange dealer)
/	\
exchange	dealer

Current technology generally tries to classify the words and describe which combinations are possible between classes, i.e. to describe natural language in terms of grammar.

E.g.
NP <- ADJ + N
NP <- NP + N

Essentially we do the same, with the twist that we replace the classes with vectors or lists of examples. This means, by virtue of one combination or another, we can find an infinite number of "virtual" rules at run time. The exact set depending on the exact words in the sentence (and the examples relevant to an exact body of experience.)

E.g.


NP (foreign exchange)
foreign exchange
foreign bonds
stock exchange
the stock
foreign currency
the securities
foreign languages
currency
foreign ministry
discount
foreign residents
equity
the capital
exchange
stock
global


ADJ (foreign)	N (exchange)
foreign	exchange
...	...
...	...
foreign	bonds
...	...
stock	exchange
foreign	currency
...	...
...	...
currency	exchange
...	...
securities	exchange
foreign	bond
...	...
...	...
regional	exchanges
regional	exchange
...	...

The order expressed by such ad-hoc rules is, in a sense, chaotic. Chaotic in the sense of unstable order rather than disorder. This chaotic character gives the system much greater power than fixed perspectives of order captured as rules.