Google+

DEEP PARSING

What is Deep parsing?

Let's bring back some memories of primary and secondary school... In grammar class, you were taught about the parts of speech, that is to say the grammatical caregories of words (noun, verb, adjective, etc.), and about "function" in the sentence (subject, verb, direct object, preposition, etc.) To parse a sentence was to identify the syntactic function of all the words or groups of words in the sentence.

The syntactic analysis of a sentence can be a difficult for a young student. It is an even more delicate task for software. This is calles parsing. The task of a parser is to identify for each word in a sentence what other word it depends on syntactically, and via which syntactic relation.

 

 In a simple sentence like John loves Mary, is not very complicated: the proper noun John depends on the words loves via the subject relation ("John is the subject of the verb loves"), and the proper noun Mary depends on the verb Love via the object relation ("Mary is the object of the ver loves")

John loves Mary

 But as the sentence gets longer, things get more complicated:

Mary’s cat is sleeping on the doormat

A parser is a piece of software which takes as an input a corpus of texts and outputs the syntactic analysis of each sentence in each text by calculating what word each word in every sentence depends on.

Synomia uses its parser on text corpora of various types for different applications in its digital marketing: 

  • response to open-ended survey questions for verbatim analysis
  • opinions of a brand collected on social networks or forums for sentiment analysis
  • pages of a web site and those of its competitor's web sites for semantic audit

 

Synomia is the only player in the field of big data to use a parser at the heart of its semantic technology.


Synomia's Technology :

Why is  parsing so difficult?

Automated syntactic analysis and machine translation have been research topics in the field of Computational Linguistics snce the beginning of the computer age in the 1950s, when U.S. authorities encouraged American research laboratories to develop automated translations of Russian documents into English.

This is a very difficult problem. Without entering into the theoretical and technical details, these difficulties can be summarized in 3 points:

 

Problem n°1 -The ambiguity of syntactic attachment

The prototypical problem of automatic parsing is the ambiguity of attachment, in particular with adjectives and prepositions. In sentence (a) "a bad ankle sprain", the adjective bad may depend on the noun ankle or on the noun sprain. In sentence (b), "a dark chocolate cake", the adjective dark may depend on the noun chocolate or on the noun cake. In sentence (c) the preposition with may depend on the verb hit or on the noun wall. In sentence (d) the preposition with may depend on the verb saw on the noun man.

These ambiguities are not even perceived by a human reader, but they are signifiant for a machine. Developing algorithms and rules that allow the parser to make the right choice in the vast majority of cases is a very difficult task. 

The ambiguity of syntactic attachment

Problem n°2 - Long-distance relationships

A sentence can be very long, and the distance between a word and the word it depends on syntactically can be great. It is another difficulty in the task of making a parser (and it can also be a problem for the human reader) In the sentence below the noun proprietors and the preoposition to both depend on the verb contract  but are very far apart.

"Proprietors of a web sites, known as "content providers", contract with Akamai to deliver their Web sites content to individual Internet users."

 


Problem n°3 – Intricacy: at the same time and in the same sentence, the analyzer must solve several problems of type 1 (ambiguity of syntactic attachment) and several of type 2 (long-distance relationships)


The root of the difficulty is that the problems are additive. The analyzer must resolve several ambiguities of syntactic attachment of which some parts are long distances away from each other. It must observe 2 basic conditions:

  •           a word depends only on one other word
  •           the syntactic links cannot cross

In the sentence below, two words can be dependent on several other words: the preposition with may depend on the verb throws or on the noun ball and the preposition in may depend on the verb throws, the name ball, the name dots or the name cat. 

Mary throws the ball

To create a data-processing architecture and algorithms that make it possible to solve this difficulty is a vast task. The Synomia analyzer is the result of several years of research, carried out within the framework of a partnership between CNRS and Synomia.


RECEIVE REGULAR INSIGHTS ON BIG DATA EXPLOITATION AT THE SERVICE OF YOUR COMPANY

CONTACT US

Synomia
63 bis rue de Sèvres
92100 Boulogne-Billancourt

Phone: +33 (0)1 46 10 06 40

Email :

Scroll to Top