[PDF] Noun Phrase Analysis in Large Unrestricted Text for Information





Previous PDF Next PDF



Noun Phrases Noun Phrases

Adjectives. An adjective is a describing word. It tells you more about a noun. a black slug the creepy beetles a tiny ant a difficult job.



UNIT 2 THE NOUN PHRASE UNIT 2 THE NOUN PHRASE

You will study the main types of nouns and the important aspects of noun phrases like gender and number. Finally you will also understand how noun phrases are 



of 5 NOUN PHRASE A noun phrase is a group of words that modify

Important points to remember about noun phrase. i. A noun phrase always has a noun ii. A noun phrase does not have an action verb. Only subjects have a verb 



Noun Phrase Coreference as Clustering

Given a description of each noun phrase and a method for measuring the distance between two noun phrases a cluster- ing algorithm can then group noun phrases 



Composing Noun Phrase Vector Representations

02-Aug-2019 component of a noun-noun phrase i.e. the (syn- tactic) modifier



Chapter 3 Noun Phrases Pronouns Chapter 3 Noun Phrases Pronouns

A noun phrase is a noun or pronoun head and all of its modifiers (or the coordination of more than one NP--to be discussed in Chapter 6). Some nouns require the.



9 Phrases

However given the typical textbook definition of pronoun as a word that can replace either nouns or noun phrases



NOUN PHRASE

NOUN PHRASE. Dosen. Dr. Ali Mustadi M.Pd. NIP. 19780710 200801 1 012. Page 2. Frasa Rumus Noun phrase (Rule 2). Ket: Penjelasan rule no 2. O Si A S C O M P.



The Use of Participles and Gerunds

03-Jul-2020 NP means a noun phrase. AmE means American English and BrE



Conundrums in Noun Phrase Coreference Resolution: Making

Noun phrase coreference resolution is the pro- cess of determining whether two noun phrases. (NPs) refer to the same real-world entity or con- cept. It is 



Parsing Noun Phrases in the Penn Treebank

The parsing of noun phrases (NPs) involves the same difficulties as may never be found even if the correct dominating noun phrase has been found. As an.



Noun Phrase Coreference as Clustering

Given a description of each noun phrase and a method for measuring the distance between two noun phrases a cluster- ing algorithm can then group noun phrases 



9 Phrases

Definition of phrase. Modification and complementation. Adverb phrases. Prepositional phrases. Adjective phrases. Noun phrases. Verb phrases introduction.



Noun phrase reference in Japanese-to-English machine translation

phrase is used not to refer to anything but rather normally with a copula verb



No Noun Phrase Left Behind: Detecting and Typing Unlinkable Entities

Entity linking systems link noun-phrase men- tions in text to their corresponding Wikipedia articles. However NLP applications would.



Effects of Noun Phrase Bracketing in Dependency Parsing and

Jun 19 2011 nal noun phrase annotation



Analyzing Embedded Noun Phrase Structures Derived from

ded sentences make modified noun phrases more expressive. In embedded noun phrase struc- tures a noun phrase modified by an embedded sentence is usually 





Noun Phrase Analysis in Large Unrestricted Text for Information

This paper reports on the application of a few simple yet robust and efficient noun- phrase analysis techniques to create bet- ter indexing phrases for 



Conundrums in Noun Phrase Coreference Resolution: Making

Conundrums in Noun Phrase Coreference Resolution: Making Sense of the State-of-the-Art. Veselin Stoyanov. Cornell University. Ithaca NY ves@cs.cornell.edu.



The English Noun Phrase - Cambridge

The English Noun Phrase The Nature of Linguistic Categorization © in this web service Cambridge University Press www cambridge Cambridge University Press 978-0-521-18395-6 - The English Noun Phrase: The Nature of Linguistic Categorization Evelien Keizer Frontmatter More information



Chapter 3 Noun Phrases Pronouns

Jan 26 2005 · A noun phrase is a noun or pronoun head and all of its modifiers (or the coordination of more than one NP--to be discussed in Chapter 6) Some nouns require the presence of a determiner as a modifier Most pronouns are typically not modified at all and no pronoun requires the presence of a determiner



Noun Phrases - Carnegie Mellon University

Gender/Noun Class • English has gender on third person singular pronouns: “she” vs “he” • Genders may correspond to biological gender • But they extend to inanimate objects and become noun classes that aren’t completely connected to biological gender – In gender langages tables and chairs have gender



UNIT 1: NOUNS Lesson 1: Identifying nouns

UNIT 1: NOUNS Lesson 1: Identifying nouns UNIT 1: NOUNS Lesson 1: Identifying nouns Nounsare commonly de¢ned as words that refer to a person place thing or idea How can you identify a noun? Quick tip 1 1 If you can put the wordthein front of a word and it sounds like a unit the wordis a noun



Noun Phrase Structure - University at Buffalo

(iii) various sorts of noun phrases which lack a head noun These three types are discussed in sections 1 2 and 3 respectively 1 Simple noun phrases The most common noun phrases in many languages contain a single word which is either a noun or a pronoun In most if not all languages pronouns generally occur alone in noun phrases without



Searches related to noun phrase pdf filetype:pdf

Recognize a noun phrase when you find one A noun phrase includes a noun—a person place or thing—and the modifiers that distinguish it You can find the noun dog in a sentence for example but you do not know which canine the writer means until you consider the entire noun phrase: that dog Aunt



[PDF] Noun Phrase in English: Its Form Function and Distribution in Text

The reasons are to find as many various noun phrases as possible and to overview the similarities and differences in both languages The noun + noun structure 



[PDF] Nouns and Noun Phrases - OAPEN

26 sept 2012 · Numeral Phrase *) Noun phrase is written in full when the NP-DP distinction is not relevant Symbols abbreviations and conventions used in 



[PDF] Noun Phrases

Nouns A noun names a person place idea thing or feeling a slug the beetles an ant a job In front of a noun we often have a an the determiners 



[PDF] The Noun Phrase - Grammar Bytes

A noun phrase includes a noun—a person place or thing—and the modifiers that distinguish it You can find the noun dog in a sentence for example but you do 



[PDF] Noun-phrasespdf

13 fév 2017 · This article covers number in noun phrases and in agreement between nouns and verbs • We will look at number in noun phrases first Page 10 To 



[PDF] 14 Noun Phrases in English

It is useful to begin with to recognize that English nouns fall into four classes: pronouns proper nouns count nouns and mass nouns Count nouns and mass 



(PDF) The Noun Phrase - ResearchGate

25 mar 2017 · PDF On Jan 8 2004 Jan Rijkhoff published The Noun Phrase Find read and cite all the research you need on ResearchGate



the noun phrase: formal and functional perspectives - ResearchGate

PDF Languages have syntactic units of different types and sizes Some of these phrases are obligatory such as noun phrase and verb phrase; 



[PDF] 9 Phrases

In this chapter we will present the three less complex types first— adverb prepositional and adjective The reason for this seemingly backwards ap- proach is 



[PDF] Brief grammar 1-1pdf

1 NOUN PHRASES: THE BASICS 2 NOUNS 2 1 Noun phrases headed by common Nouns A declarative sentence in Euskara contains: a verb and its arguments 

What is a noun phrase?

    A noun phrase is a noun or pronoun head and all of its modifiers (or the coordination of more than one NP--to be discussed in Chapter 6). Some nouns require the presence of a determiner as a modifier. Most pronouns are typically not modified at all and no pronoun requires the presence of a determiner.

How do you recognize a noun phrase?

    Recognize a noun phrase when you find one. noun phrase includes a noun—a person, place, or thing—and the modifiers that distinguish it.

Can nouns be modifiers?

    The most common way in which nouns occur as modifiers of nouns is in genitive constructions, in which it is really a noun phrase rather than just a noun that is modifying the head noun. These are discussed in section 2.1 below. However, some, but not all, languages allow nouns to modify nouns without possessive meaning.

What is a possessor phrase without a noun?

    theone[=wage]ofthose[workers] (literally:theofthose) In fact, English also allows possessor phrases without a noun to function as noun phrases, as in (150). (150) Your car is nice, but Johns is nicer.
Noun-Phrase Analysis in Unrestricted Text for Information Retrieval

David A. Evans, Chengxiang Zhai

Laboratory for Computational Linguistics

Carnegie Mellon Univeristy

Pittsburgh, PA 15213

dae@cmu.edu, cz25@andrew.cmu.edu

Abstract

Information retrieval is an important ap-

plication area of natural-language pro- cessing where one encounters the gen- uine challenge of processing large quanti- ties of unrestricted natural-language text.

This paper reports on the application of a

few simple, yet robust and efficient noun- phrase analysis techniques to create bet- ter indexing phrases for information re- trieval. In particular, we describe a hy- brid approach to the extraction of mean- ingful (continuous or discontinuous) sub- compounds from complex noun phrases using both corpus statistics and linguistic heuristics. Results of experiments show that indexing based on such extracted sub- compounds improves both recall and pre- cision in an information retrieval system.

The noun-phrase analysis techniques are

also potentially useful for book indexing and automatic thesaurus extraction.

1 Introduction 1.1 Information Retrieval

Information retrieval (IR) is an important applica- tion area of naturaManguage processing (NLP). 1 The IR (or perhaps more accurately "text retrieval") task may be characterized as the problem of select- ing a subset of documents (from a document col- lection) whose content is relevant to the informa- tion need of a user as expressed by a query. The document collections involved in IR are often gi- gabytes of unrestricted natural-language text. A

user's query may be expressed in a controlled lan- guage (e.g., a boolean expression of keywords) or,

more desirably, a natural language, such as English.

A typical IR system works as follows. The doc-

uments to be retrieved are processed to extract in- dexing terms or content carriers, which are usually

(Evans, 1990; Evans et al., 1993; Smeaton, 1992; Lewis & Sparck Jones, 1996) single words or (less typically) phrases. The index- ing terms provide a description of the document's content. Weights are often assigned to terms to in- dicate how well they describe the document. A (natural-language) query is processed in a similar way to extract query terms. Query terms are then matched against the indexing terms of a document to determine the relevance of each document to the quer3a

The ultimate goal of an IR system is to increase

both precision, the proportion of retrieved docu- ments that are relevant, as well as recall, the propor- tion of relevant document that are retrieved. How- ever, the real challenge is to understand and rep- resent appropriately the content of a document and quer~ so that the relevance decision can be made ef- ficiently, without degrading precision and recall. A typical solution to the problem of making relevance decisions efficient is to require exact matching of in- dexing terms and query terms, with an evaluation of the 'hits' based on a scoring metric. Thus, for instance, in vector-space models of relevance rank- ing, both the indexing terms of a document and the query terms are treated as vectors (with individual term weights) and the similarity between the two vectors is given by a cosine-distance measure, es- sentially the angle between any two vectors?

1.2 Natural-Language Processing for IR

One can regard almost any IR system as perform-

ing an NLP task: text is 'parsed" for terms and terms are used to express 'meaning'--to capture

document content. Clearly, most traditional IR sys- tems do not attempt to find structure in the natural-

language text in the 'parsing' process; they merely extract word-like strings to use in indexing. Ide- ally, however, extracted structure would directly re- flect the encoded linguistic relations among terms-- captuing the conceptual content of the text better than simple word-strings.

There are several prerequisites for effective NLP

in an IR application, including the following.

2 (Salton & McGill, 1983)

17

1. Ability to process large amounts of text

The amount of text in the databases accessed by

modem IR systems is typically measured in gi- gabytes. This requires that the NLP used must be extraordinarily efficient in both its time and space requirements. It would be impractical to use a parser with the speed of one or two sentences per second.

2. Ability to process unrestricted text

The text database for an IR task is generally

unrestricted natural-language text possibly en- compassing many different domains and top- ics. A parser must be able to manage the many kinds of problems one sees in natural-language corpora, including the processing of unknown words, proper names, and unrecognized struc- tures. Often more is required, as when spelling, transcription, or OCR errors occur. Thus, the

NLP used must be especially robust.

3. Need for shallow understanding

While the large amount of unrestricted text

makes NLP more difficult for IR, the fact that a deep and complete understanding of the text may not be necessary for IR makes NLP for IR relatively easier than other NLP tasks such as machine translation. The goal of an IR system is essentially to classify documents (as relevant or irrelevant) vis-a-vis a query. Thus, it may suffice to have a shallow and partial represen- tation of the content of documents. Information retrieval thus poses the genuine chal- lenge of processing large volumes of unrestricted natural-language text but not necessarily at a deep level. 1.3 Our Work This paper reports on our evaluation of the use of simple, yet robust and efficient noun-phrase analy- sis techniques to enhance phrase-based IR. In par- ticular, we explored an extension of the ~phrase- based indexing in the CLARIT TM system ° using a hybrid approach to the extraction of meaning- ful (continuous or discontinuous) subcompounds from complex noun phrases exploiting both corpus- statistics and linguistic heuristics. Using such sub- compounds rather than whole noun phrases as in- dexing terms helps a phrase-based IR system solve the phrase normalization problem, that is, the prob- lem of matching syntactically different, but semanti- cally similar phrases. The results of our experiments show that both recall and precision are improved by using extracted subcompounds for indexing. 2 Phrase-Based Indexing The selection of appropriate indexing terms is criti- cal to the improvement of both precision and recall in an IR task. The ideal indexing terms would di- rectly represent the concepts in a document. Since 'concepts' are difficult to represent and extract (as well as to define), concept-based indexing is an elusive goal. Virtually all commercial IR systems (with the exception of the CLARIT system) index only on "words', since the identification of words in texts is typically easier and more efficient than the identification of more complex structures. How- ever, single words are rarely specific enough to sup- port accurate discrimination and their groupings are often accidental. An often cited example is the contrast between "junior college" and "college ju- nior". Word-based indexing cannot distinguish the phrases, though their meanings are quite different. Phrase-based indexing, on the other hand, as a step toward the ideal of concept-based indexing, can ad- dress such a case directly.

Indeed, it is interesting to note that the use

of phrases as index terms has increased dramat- ically among the systems that participate in the

TREC evaluations. ~ Even relatively traditional

word-based systems are exploring the use of multi- word terms by supplementing words with sta- tistical phrases--selected high frequency adjacent word pairs (bigrams). And a few systems, such as CLARIT--which uses simplex noun phrases, attested subphrases, and contained words as in- dex terms--and New York University's TREC systemS--which uses "head-modifier pairs" de- rived from identified noun phrases--have demon- strated the practicality and effectiveness of thor- ough NLP in IR tasks.

The experiences of the CLAR1T system are in-

structive. By using selective NLP to identify sim- plex NPs, CLARIT generates phrases, subphrases, and individual words to use in indexing documents and queries. Such a first-order analysis of the lin- guistic structures in texts approximates concepts and affords us alternative methods for calculating the fit between documents and queries. In particu- lar, we can choose to treat some phrasal structures as atomic units and others as additional informa- tion about (or representations of) content. There are immediate effects in improving precision:

1. Phrases can replace individual indexing words.

For example, if both "dog" and "hot" are used

for indexing, they will match any query in which both words occur. But if only the phrase "hot dog" is used as an index term, then it will only match the same phrase, not any of the in-

dividual words. 3(Evans et al., 1991; Evans et al., 1993; Evans et al., 1995; Evans et al., 1996) 4 (Harman, 1995; Harman, 1996)

5 (Strzalkowski, 1994) 18

2. Phrases can supplement word-level matches.

For example, if only the individual words "ju-

nior" and "college" are used for indexing, both "junior college" and "college junior" will match a query with the phrase "junior college" equally well. But if we also use the phrase "junior col- lege" for indexing, then "junior college" will match better than "college junior", even though the latter also will receive some credit as a match at the word level. We can see, then, that it is desirable to distinquish-- and, if possible, extract--two kinds of phrases: those that behave as lexical atoms and those that re- flect more general linguistic relations. Lexical atoms help us by obviating the possibility of extraneous word matches that have nothing to do with true relevance. We do not want "hot" or "dog" to match on "hot dog". In essence, we want to eliminate the effect of the independence assumption at the word level by creating new words--the lexical atoms--in which the individual word dependencies are explicit (structural).

More general phrases help us by adding detail.

Indeed, all possible phrases (or paraphrases) of ac- tual content in a document are potentially valuable in indexing. In practice, of course, the indexing term space has to be limited, so it is necessary to se- lect a subset of phrases for indexing. Short phrases (often nominal compounds) are preferred over long complex phrases, because short phrases have bet- ter chances for matching short phrases in queries and will still match longer phrases owing to the short phrases they have in common. Using only short phrases also helps solve the phrase normal- ization problem of matching syntactically different long phrases (when they share similar meaning). 6

Thus, lexical atoms and small nominal com-

pounds should make good indexing phrases.

While the CLARIT system does index at the level

of phrases and subphrases, it does not currently index on lexical atoms or on the small compounds that can be derived from complex NPs, in particular, reflecting cross-simplex NP dependency relations.

Thus, for example, under normal CLARIT process-

ing the phrase "the quality of surface of treated stainless steel strip "7 would yield index terms such as "treated stainless steel strip", "treated stainless steel", "stainless steel strip", and "stainless steel" (as a phrase, not lexical atom), along with all the relevant single-word terms in the phrase. But the process would not identify "stainless steel" as a po- tential lexical atom or find terms such as "surface quality", "strip surface", and "treated strip".

To achieve more complete (and accurate) phrase-

based indexing, we propose to use the following

6 (Smeaton, 1992) ZThis is an actual example from a U.S. patent document. four kinds of phrases as indexing terms:

1. Lexical atoms (e.g., "hot dog" or

2. 3.

4. perhaps

"stainless steel" in the example above)

Head modifier pairs (e.g., "treated strip" and

"steel strip" in the example above)

Subcompounds (e.g., "stainless steel strip" in

the example above)

Cross-preposition modification pairs (e.g.,

"surface quality" in the example above)

In effect, we aim to augment CLARIT indexing with

lexical atoms and phrases capturing additional (dis- continuous) modification relations than those that can be found within simplex NPs. It is clear that a certain level of robust and effi- cient noun-phrase analysis is needed to extract the above four kinds of small compounds from a large unrestricted corpus. In fact, the set of small com- pounds extracted from a noun phrase can be re- garded as a weak representation of the meaning of the noun phrase, since each meaningful small com- pound captures a part of the meaning of the noun phrase. In this sense, extraction of such small com- pounds is a step toward a shallow interpretation of noun phrases. Such weak interpretation is use- ful for tasks like information retrieval, document classification, and thesaurus extraction, and indeed forms the basis in the CLARIT system for automatedquotesdbs_dbs8.pdfusesText_14
[PDF] noun verb adjective preposition worksheets

[PDF] nouveau cas de coronavirus en france

[PDF] nouveau cecrl 2018

[PDF] nouveau code de procédure pénal camerounais gratuit pdf

[PDF] nouveau plan comptable ohada 2018 excel

[PDF] nouveau projet immobilier cgi rabat

[PDF] nouvel examen oqlf

[PDF] nouvelle attestation de déplacement dérogatoire

[PDF] nouvelle carte coronavirus ile de france

[PDF] nouvelle spécialité anglais monde contemporain

[PDF] nova awards 2019

[PDF] nova bear dumpling

[PDF] nova bear dumpling maker

[PDF] nova scotia court records

[PDF] nova scotia courts