.For instance, in texts related to health care: health insurance, marketplace, insurance company, private insurance, affordable coverage, pre-existing condition. These expression are much more precise than simple words (health, marketplace, coverage, condition). In particular in this example, the word condition is extremely vague, but the phrase pre-existing condition clearly refers to the medical definition of the noun condition. At Synomia (and linguistics), these groups are called phrases.
In addtion to words and phrases, one must also take into account named entities, this is to say the words or phrases that denote people, countries, cities, brands, etc. But this is easily done with dictionaries and simple extraction.
To extract noun phrases from text, semantic engines use a standard technology to identify all sequences of words that match valid sequences of word classes. These are, for example:
This technology is too primitive to extract noun phrases from a text with sufficient coverage and accuracy. In particular, it generates a lot of noise.
For instance, in the sentence:
"It holds insurance companies accountable for unjustified premium increases."
A standard engine will extract unjustified premium (adjective followed by noun) which is an error because it is the increases which are unjustified, not the premium.
And in the sentence:
"If you're eligible, enroll in a Marketplace health plan."
A standard engine will extract Marketplace health (noun followed by noun) which is also a mistake.
With deep parsing, the Synomia engine does not make these kinds of mistakes.
Let's go back our examples. Because the parser was able to recognize that the adjective unjustified depended on the noun increases (green arrow) and not the noun premium, the Synomia engine extracts yhe phrase unjustified premium increases, premium increases, and even unjustified increases but not unjustified premium.
Because the parser was able to identify that the noun Marketplace before the noun health depended on the noun plan (green arrrow), the Synomia engine extracted the phrase Marketplace plan as well as health plan, etc. However it did not extract the erroneous group Marketplace health.
The phrases of length 3 or more have a major importance in any corpus.
The Synomia engine extracts these noun phrases whatever their frequency of appearance in the corpus. No frequency filter is necessary to eliminate the groups extracted on the basis of syntactic analysis. On the contrary, the standard engines are constrained to apply frequency filters to hide the most erratic results.
Thanks to syntactic analysis, the Synomia semantic engine extracts the verb phrases made up of a verb and one or more complements. For instance in the sentence "You can't enroll in a health plan for the rest of 2014." the parser identifies that health plan and rest of 2014 are indirect objects of the verb enroll, and so the Synomia engine is able to extract the verb phrase enroll in a health plan and enroll for the rest of 2014.
A few examples of verb phrases: