Let's bring back some memories of primary and secondary school... In grammar class, you were taught about the parts of speech, that is to say the grammatical caregories of words (noun, verb, adjective, etc.), and about "function" in the sentence (subject, verb, direct object, preposition, etc.) To parse a sentence was to identify the syntactic function of all the words or groups of words in the sentence.
The syntactic analysis of a sentence can be a difficult for a young student. It is an even more delicate task for software. This is calles parsing. The task of a parser is to identify for each word in a sentence what other word it depends on syntactically, and via which syntactic relation.
In a simple sentence like John loves Mary, is not very complicated: the proper noun John depends on the words loves via the subject relation ("John is the subject of the verb loves"), and the proper noun Mary depends on the verb Love via the object relation ("Mary is the object of the ver loves")
But as the sentence gets longer, things get more complicated:
A parser is a piece of software which takes as an input a corpus of texts and outputs the syntactic analysis of each sentence in each text by calculating what word each word in every sentence depends on.
Synomia uses its parser on text corpora of various types for different applications in its digital marketing:
Synomia is the only player in the field of big data to use a parser at the heart of its semantic technology.
Synomia's Technology :
Automated syntactic analysis and machine translation have been research topics in the field of Computational Linguistics snce the beginning of the computer age in the 1950s, when U.S. authorities encouraged American research laboratories to develop automated translations of Russian documents into English.
This is a very difficult problem. Without entering into the theoretical and technical details, these difficulties can be summarized in 3 points:
Problem n°1 -The ambiguity of syntactic attachment
The prototypical problem of automatic parsing is the ambiguity of attachment, in particular with adjectives and prepositions. In sentence (a) "a bad ankle sprain", the adjective bad may depend on the noun ankle or on the noun sprain. In sentence (b), "a dark chocolate cake", the adjective dark may depend on the noun chocolate or on the noun cake. In sentence (c) the preposition with may depend on the verb hit or on the noun wall. In sentence (d) the preposition with may depend on the verb saw on the noun man.
These ambiguities are not even perceived by a human reader, but they are signifiant for a machine. Developing algorithms and rules that allow the parser to make the right choice in the vast majority of cases is a very difficult task.
Problem n°2 - Long-distance relationships
"Proprietors of a web sites, known as "content providers", contract with Akamai to deliver their Web sites content to individual Internet users."
Problem n°3 – Intricacy: at the same time and in the same sentence, the analyzer must solve several problems of type 1 (ambiguity of syntactic attachment) and several of type 2 (long-distance relationships)
The root of the difficulty is that the problems are additive. The analyzer must resolve several ambiguities of syntactic attachment of which some parts are long distances away from each other. It must observe 2 basic conditions:
In the sentence below, two words can be dependent on several other words: the preposition with may depend on the verb throws or on the noun ball and the preposition in may depend on the verb throws, the name ball, the name dots or the name cat.
To create a data-processing architecture and algorithms that make it possible to solve this difficulty is a vast task. The Synomia analyzer is the result of several years of research, carried out within the framework of a partnership between CNRS and Synomia.