Google+

Phrase Extraction

What is deep parsing used for? 

First, deep parsing is used to find noun phrases 

The definitions of words are not enough to characterize the semantic content of a text.“Bag of words” model which analyzes text as a package of words without connections can be sufficient in certain cases. It is the approach that Internet-based search engines use to index web pages.

Any semantic analyzer must at least be able to extract “noun phrases” automatically from text, i.e. expressions made up of 2 more or words.For instance, in texts related to health care: health insurance, marketplace, insurance company, private insurance, affordable coverage, pre-existing condition. These expression are much more precise than simple words (health, marketplace, coverage, condition). In particular in this example, the word condition is extremely vague, but the phrase pre-existing condition clearly refers to the medical definition of the noun condition. At Synomia (and linguistics), these groups are called phrases

 deep parsing is used to-find noun phrases

 

In addtion to words and phrases, one must also take into account named entities, this is to say the words or phrases that denote people, countries, cities, brands, etc. But this is easily done with dictionaries and simple extraction.

 

To extract noun phrases, semantic engines use rudimentary technology               

To extract noun phrases from text, semantic engines use a standard technology to identify all sequences of words that match valid sequences of word classes. These are, for example:

  •        An adjective followed by a noun (private, insurance, affordable coverage) 
  •        A noun followed by a preposition followed by a noun (enrollment for coverage)
  •        A noun followed by a noun followed (health, insurance, insurance company)

This technology is too primitive to extract noun phrases from a text with sufficient coverage and accuracy. In particular, it generates a lot of noise.

For instance, in the sentence:

"It holds insurance companies accountable for unjustified premium increases."

A standard engine will extract unjustified premium (adjective followed by noun) which is an error because it is the increases which are unjustified, not the premium.

And in the sentence:

"If you're eligible, enroll in a Marketplace health plan."

A standard engine will extract Marketplace health (noun followed by noun) which is also a mistake.


With deep parsing, the Synomia engine does not make these kinds of mistakes.


Thanks to deep parsing, the Synomia engine extracts noun phrases with maximum precision 

Let's go back our examples. Because the parser was able to recognize that the adjective unjustified depended on the noun increases (green arrow) and not the noun premium, the Synomia engine extracts yhe phrase unjustified premium increases, premium increases, and even unjustified increases but not unjustified premium.

 the Synomia engine extracts noun phrases with maximum precision

Because the parser was able to identify that the noun Marketplace before the noun health depended on the noun plan (green arrrow), the Synomia engine extracted the phrase Marketplace plan as well as health plan, etc. However it did not extract the erroneous group Marketplace health.

 the Synomia engine extracted the phrase

Thanks to syntactic analysis, the Synomia engine extracts noun phrases with maximum coverage

Because of their rudimentary technology, standard semantic engines extract noun phrases of length two, i.e. made up of two full words, either a noun and an adjective or two nouns. Because it exploits the complete syntactic analysis of the sentences, the Synomia engine does not suffer from these limitations. It can extract noun phrases of very varied length and structure. In previous examples, unjustified premium increases and Marketplace health plan has a length of 3 and new Marketplace health plan has a length of 4.


The phrases of length 3 or more have a major importance in any corpus.


the Synomia engine can extract noun phrases of very varied length and structure

The Synomia engine extracts these noun phrases whatever their frequency of appearance in the corpus. No frequency filter is necessary to eliminate the groups extracted on the basis of syntactic analysis. On the contrary, the standard engines are constrained to apply frequency filters to hide  the most erratic results.

The Synomia engine makes it possible to seek true weak signals.

 

Thanks to deep parsing, the Synomia engine extracts verb phrases

Standard semantic engines only extract noun phrases, made up of nouns and adjectives. But if a noun phrase like enrollment in a plan is deemed relevant, why leave aside the verb phrase enroll in a plan?


Thanks to syntactic analysis, the Synomia semantic engine extracts the verb phrases made up of a verb and one or more complements. For instance in the sentence "You can't enroll in a health plan for the rest of 2014." the parser identifies that health plan and rest of 2014 are indirect objects of the verb enroll, and so the Synomia engine is able to extract the verb phrase enroll in a health plan and enroll for the rest of 2014.

 thanks to deep parsing the Synomia engine extracts verb phrases

 

 A few examples of verb phrases:

The Big Data can be analyzed thanks to verb phrases

 

 

CONTACT US

Synomia
63 bis rue de Sèvres
92100 Boulogne-Billancourt

Tel : +33 (0)1 46 10 06 40

Email :

Scroll to Top