Knowledge-Aware Procedural Text Understanding with Multi-Stage PDF

Untitled

l'ensemble du personnel du Centre Mutualiste Neurologique Propara sont heureux et contre avis médical vous devrez signer.

CENTRE DE REEDUCATION NEUROLOGIQUE MUTUALISTE

CENTRE DE REEDUCATION NEUROLOGIQUE MUTUALISTE PROPARA / 340001064 / juin 2018 thématiques investiguées en visite ou d'un avis défavorable à ...

HandiConsult 34

Handiconsult34@propara.fr – UM PROPARA Parc Euromédecine

La Reprise de la Conduite Automobile après une Cérébrolésion

2 jui. 2006 Le patient prend connaissance du courrier y fait figurer son paraphe

COMITE DE NEURO-UROLOGIE

4 déc. 2001 Habituellement son avis est sollicité par la survenue de ... ce du centre Propara sur l'électrosti- mulation selon Brindley dont plus de.

Tracking State Changes in Procedural Text: a Challenge Dataset

Table 1: ProPara vs. other procedural datasets. was then split 80/10/10 into train/dev/test by pro- cess prompt

Knowledge-Aware Procedural Text Understanding with Multi-Stage

ProPara and Recipes verify the effectiveness of the proposed meth- the ground-truth locations on the ProPara test set. We additionally.

Procedural Reading Comprehension with Attribute-Aware Context

31 mar. 2020 The PROPARA dataset [Mishra et al. 2018] is a collec- tion of procedural texts that describe how entities change throughout scientific ...

Be Consistent! Improving Procedural Text Comprehension using

21 jui. 2019 We compare their performance in terms of F1 on. ProPara test partition. 5.2 Semi-Supervised Learning. Unlike the other systems in Table 1 LaCE ...

Tracking State Changes in Procedural Text: A Challenge Dataset

17 mai 2018 Table 1: ProPara vs. other procedural datasets. was then split 80/10/10 into train/dev/test by pro- cess prompt ...

[PDF] centre de reeducation neurologique mutualiste propara

Dans cette instance des solutions immédiates sont trouvées et engagées Les avis et suggestions quant à des changements d'organisation proposition d'EPP

[PDF] Livret accueil patient Hospitalisation à Temps Complet - Propara

Après avis médical vous devez en faire la demande auprès du secrétariat médical (au plus tard le jeudi après-midi pour les week-end) Si vous devez utiliser un

[PDF] Cadenza Document - Haute Autorité de Santé

11 sept 2013 · Vu l'avis de la Sous-commission de revue des dossiers de certification PROPARA sis 263 rue du caducee Parc euromedecine

UNION MUTUALISTE PROPARA (479343584) - 2023 - Rubypayeur

PDF Enregistrée le 26/06/2015 Expire le 26/06/2025 Statut : Marque enregistrée Classes : 41 42 44 Numéro : FR4192402 Documents Avis de situation

responsable qualite et gestion des risques (f/h)

PROPARA est une union mutualiste de livre 3 régie par le Code de la -Formalisation d'avis et assistance technique auprès de la direction et des

[PDF] Contribution à létude juridique des risques psychosociaux au travail

17 nov 2014 · Selon l'avis de mai 2013 « l'expression des http://www inrs fr/accueil/dms/inrs/PDF/cout-stress-professionnel2007

[PDF] FLORISSE Pascale

Je remercie le personnel du Centre Mutualiste Neurologique de Propara pour avoir bien voulu s'interroger et répondre au questionnaire ce qui m'a beaucoup

association propara (ctre etude recher ameli trait paraplegie)

L'administration permet aux particuliers et agents publics de vérifier les informations légales officielles de ASSOCIATION PROPARA (CTRE ETU

[PDF] 1 Avis du Comité technique de linnovation en santé sur le projet d

Avis sur le projet d'expérimentation : - Faisabilité opérationnelle : L'expérience des porteurs est de nature à assurer le caractère

Knowledge-Aware Procedural Text Understanding with

Multi-Stage Training

Zhihan Zhang

Peking University

Beijing, China

zhangzhihan@pku.edu.cnXiubo Geng

STCA NLP Group, Microsoft

Beijing, China

xigeng@microsoft.comTao Qin

STCA NLP Group, Microsoft

Beijing, China

taoqin@microsoft.com

Yunfang Wu

Peking University

Beijing, China

wuyf@pku.edu.cnDaxin Jiang

STCA NLP Group, Microsoft

Beijing, China

djiang@microsoft.com ABSTRACTProcedural text describes dynamic state changes during a step-by- step natural process (e.g., photosynthesis). In this work, we focus on the task of procedural text understanding, which aims to com- prehend such documents and track entities" states and locations during a process. Although recent approaches have achieved sub- stantial progress, their results are far behind human performance. Two challenges, the di?culty of commonsense reasoning and data of external knowledge bases. Previous works on external knowl- edge injection usually rely on noisy web mining tools and heuristic rules with limited applicable scenarios. In this paper, we propose a novelKnOwledge-Aware proceduraLtext understAnding (KoaLa) edge in this task. Speci?cally, we retrieve informative knowledge triples from ConceptNet and perform knowledge-aware reason- ing while tracking the entities. Besides, we employ a multi-stage training schema which ?ne-tunes the BERT model over unlabeled data collected from Wikipedia before further ?ne-tuning it on the ?nal model. Experimental results on two procedural text datasets, ProPara and Recipes, verify the e?ectiveness of the proposed meth- ods, in which our model achieves state-of-the-art performance in comparison to various baselines.1

KEYWORDS

Reasoning, Multi-Stage Training?

Work was done while Zhihan Zhang was an intern at STCA NLP Group, Microsoft.

Corresponding authors.

1Code is available at https://github.com/ytyz1307zzh/KOALA

This paper is published under the Creative Commons Attribution 4.0 International (CC-BY 4.0) license. Authors reserve their rights to disseminate the work on their personal and corporate Web sites with the appropriate attribution.

WWW "21, April 19-23, 2021, Ljubljana, Slovenia

2021 IW3C2 (International World Wide Web Conference Committee), published

under Creative Commons CC-BY 4.0 License.

ACM ISBN 978-1-4503-8312-7/21/04.

https://doi.org/10.1145/3442381.3450126ACM Reference Format: Zhihan Zhang, Xiubo Geng, Tao Qin, Yunfang Wu, and Daxin Jiang. 2021. Knowledge-Aware Procedural Text Understanding with Multi-Stage Train- ing. InProceedings of the Web Conference 2021 (WWW "21), April 19-23, 2021, Ljubljana, Slovenia.ACM, New York, NY, USA, 12 pages. https://doi.org/10.

1145/3442381.3450126

1 INTRODUCTION

In this work, we focus on a challenging branch of natural language processing (NLP), namely procedural text understanding. Proce- dural text describes dynamic state changes and entity transitions of a step-by-step process (e.g., photosynthesis). Understanding such throughout a natural process [4,15]. Taking Figure 1 for example, given a paragraph describing the process of fossilization and an entity "bones", the model is asked to predict thestate(not exist, exist, move, create or destroy) andlocation(a textspan from the paragraph) of the entity at each timestep. Such procedural texts usually include the comprehension of underlying dynamics of the process, thus impose higher requirements on the reasoning ability of NLP systems. Since the proposal of the procedural text understanding task [15], many models have emerged to solve this challenging task. world of procedural texts and achieve competitive results [2,13,26]. However, the highest result so far (~65 F1) are still behind human performance (83.9 F1). Particularly, there are two major problems that has not been e?ectively solved in this task. First, commonsense reasoning plays a critical role in understand- end-to-end models assume that the clues for making predictions task. Not only do entities usually undergo implicit state changes, but their locations are also omitted in many cases, especially when humans can easily infer the location through commonsense reason- ing. For instance, in the example in Figure 1, due to the decoupling of the entity "bones" and location "animal" in the paragraph, the ini- tial location of "bones" is hard to be directly inferred from plain text, unless the model is aware of extra commonsense knowledge"bones

are parts of an animal". For statistical evidence, we manually checkarXiv:2009.13199v2 [cs.CL] 13 Feb 2021

WWW "21, April 19-23, 2021, Ljubljana, Slovenia Zhihan Zhang, Xiubo Geng, Tao Qin, Yunfang Wu and Daxin JiangStepText ParagraphStateLocation

0N/AN/Aanimal

1An animaldies.existanimal

2It is buried in an watery environment.existanimal

3The soft tissues quickly decompose.existanimal

4The bonesare left behind.movewatery environment

5Over time, mud and silt accumulate

over the bones.movemud and silt connective tissue bone animalskeleton part of animal die organism Entity: bonesFigure1:Anexampleofaproceduraltextparagraphdescrib- ing fossilization, and the state & location labels of entity "bones". Step 0 is used to identify entities" initial locations before the process. Below is part of the ConceptNet knowl- edge graph pertaining to the process.50 instances from the popular ProPara dataset [15]. Among these samples, we ?nd that an entity is not explicitly connected to its loca- tions in 32% of the cases, and state changes (create/move/destroy) of an entity are not explicitly stated in 26% of the cases. These ?gures suggest that the need of commonsense knowledge is unneglectable for understanding procedural documents. Second, data insu?ciency hinders large neural models from reaching their best performances. Since data annotation on this task includes states and locations of all entities in each timestep, fully annotated data are costly to collect. As a result, existing datasets are limited in size. The benchmark ProPara dataset only contains

488 paragraphs including 1.9k entities. Although another recent

dataset, Recipes [4], contains 66k paragraphs, only 866 of them have reliable human-annotated labels, while other paragraphs are automatically machine-annotated and contain lots of noise [12]. Moreover, such paragraphs usually fail to provide su?cient in- formation considering the complexity of scienti?c processes. For example, each paragraph in ProPara only contains ~60 words on average (see Table 1 for more stats), which restricts it from describ- ing a complex process in detail. Thus, data enrichment is in serious need on this task. Due to the need of additional knowledge in this task, incorporat- ing external knowledge to assist prediction has been an important ProStruct [25] writes heuristic rules to constraint the transition of entity states, while using Web text to estimate the probability of an entity undergoing certain state changes. Similarly, XPAD [7] also collects Web corpus to estimate the probability of action dependency. However, their approaches have limitations in both forms of knowledge and applicable scenarios. Using unstructured Web text to calculate co-occurrence frequency requires o?-the- shelf tools or heuristic rules, which, unfortunately, often induce lots of noise. Besides, such methods are only applicable to re?ne the probability space of state change prediction, which do not cover location prediction and have poor generalization ability. Di?erent from previous works, in this paper, we aim to e?ectively leverage both structured and unstructured knowledge for procedural text understanding. Structured knowledge, like relational databases, provides clear and reliable commonsense knowledge compared to web-crawled text. As for unstructured knowledge like Web text, instead of directly mining probability information, we propose to utilize it with a multi-stage training schema on BERT encoders to circumvent potential noise induced by Web search and text mining. Therefore, we propose task-speci?c methods to e?ectively leverage multiple forms of knowledge, both structured and unstructured, to help neural models understand procedural text. Based on such motivation, we aim to address the above two issues, commonsense reasoning and data ine?ciency, using ex- ternal knowledge sources, namely ConceptNet and Wikipedia. To solve the challenge of commonsense reasoning, we perform knowl- edge infusion using ConceptNet [22]. Consisting of numerous (subject,relation,object) triples, ConceptNet is a relational knowledge base composed of concepts and inter-concept relations. Such structure makes ConceptNet naturally suitable for entity- centric tasks like procedural text understanding. An entity in our task can be matched to a concept-centric subgraph in ConceptNet, including its relations with neighboring concepts. Such informa- tion can be used as extra commonsense knowledge to help models understand the attributes and properties of an entity, which further provides clues for making predictions even if the answers are not directly mentioned in plain text. As shown in Figure 1, although it is hard to directly infer the initial location of "bones", we can ?nd triples (animal,HasA,bone) and (bone,IsA,part_of_animal) from the ConceptNet knowledge graph. These knowledge triples can serve as evidence for predicting entity states and locations that are not explicitly mentioned. Therefore, we propose to retrieve relevant knowledge triples from ConceptNet, and apply attentive knowledge infusion to our model, which is further guided by a task-speci?c attention loss. As for the challenge of data insu?ciency, we propose to en- rich the training procedure using Wikipedia paragraphs based on text retrieval. Inspired by the great success of "pre-train then ?ne- tune" procedure of BERT models [9], we propose a multi-stage training schema for BERT encoders. Speci?cally, we simulate the writing style of procedural text to retrieve similar paragraphs from Wikipedia. Compared to paragraphs in existing datasets, such Wiki paragraphs are usually longer, more scienti?c procedural texts and contain more details about similar topics. We expect the BERT model learn to better encode procedural text through ?ne-tuning on this expanded procedural text corpus. Thus, we train the BERT modi?ed masked language model (MLM) objective, before further ?ne-tuning the whole model on the target dataset. We also conduct a similar multi-stage training schema on ConceptNet knowledge modeling where we adopt another BERT encoder. Based on the above approaches, we introduce ourKnOwledge- tively incorporates knowledge from external knowledge bases, Con- ceptNet and Wikipedia.KoaLainfuses commonsense knowledge from ConceptNet during decoding and is trained with a multi-stage schema using an expanded corpus from Wikipedia. For evaluation, our main experiments on ProPara dataset show thatKoaLareaches

Knowledge-Aware Procedural Text Understanding with Multi-Stage Training WWW "21, April 19-23, 2021, Ljubljana, Sloveniastate-of-the-art results. Besides, auxiliary experiments on Recipes

dataset also demonstrate the advantage of our model over strong baselines. The ablation tests and case studies further show the ef- fectiveness of the proposed methods, which makesKoaLaa more knowledgeable procedural text "reader". The main contributions of this work are summarized as follows. •We propose to apply structured knowledge, ConceptNet triples, to satisfy the need of commonsense knowledge in understanding procedural text. Knowledge triples are ex- tracted from the ConceptNet knowledge graph and incorpo- rated into an end-to-end model in an attentive manner. A task-speci?c attention loss is introduced to guide knowledge selection. We propose to use unstructured knowledge, Wikipedia para- graphs, to address the issue of data ine?ciency in this task. Through a multi-stage training procedure, the BERT encoder is ?rst ?ne-tuned on retrieved Wiki paragraphs using task- speci?c training objectives before further ?ne-tuned with the full model on the target dataset. Experimental results show that our knowledge-enhanced model achieves state-of-the-art results on two procedural text datasets, ProPara and Recipes. Further analyses prove that by e?ectively leveraging external knowledge sources, the proposed methods helps the AI model better understand procedural text.

2 RELATED WORK

Procedural Text Datasets.E?orts have been made towards re- searches in procedural text understanding since the era of deep learning. Some earlier datasets include bAbI [28], SCoNE [18] and ProcessBank [3]. bAbI is a relatively simple dataset which simu- lates actors manipulating objects and interacting with each other, using machine-generated text. SCoNE aims to handle ellipsis and coreference within sequential actions over simulated environments. ProcessBank consists of text describing biological processes and asks questions about event ordering or argument dependencies. In this paper, we mainly focus on ProPara [15], a more recent dataset containing paragraphs on a variety of natural processes. The goal is to track states and locations of the given entities at each timestep. Additionally, we also conduct experiments on Recipes dataset [4], which includes entity tracking in the cooking domain. These datasets are more challenging since AI models need to track the dynamic transitions of multiple entities throughout the process, instead of predicting the ?nal state (SCoNE) or answer a single question (bAbI, ProcessBank). Besides, entities usually undergo im- plicit state changes and commonsense knowledge is often required in reasoning. Procedural Text Understanding Models.Our paper is mainly re- lated to the lines of work on ProPara [15]. ProStruct [25] applies VerbNet rulebase and Web search co-appearance to re?ne the prob- ability space of entity state prediction. LACE [10] introduces a consistency-biased training objective to improve label consistency among di?erent paragraphs with the same topic. KG-MRC [8] con- structs knowledge graphs to dynamically store each entity"s lo- cation and to assist location span prediction. NCET [13] extracts and considers location prediction as a classi?cation task over the candidate set. ET [12] conducts analyses on the application of pre- trained BERT and GPT models on the sub-task of state tracking. XPAD[7] buildsdependency graphs onProPara dataset,which tries to explain the action dependencies within the events happened in a process. Among more recent approaches,Dynapro[2] dynamically encodes procedural text through a BERT-based model to jointly identify entity attributes and transitions. ProGraph [33] constructs an entity-speci?c heterogeneous graph on temporal dimension to assist state prediction from context. IEN [26] explores inter-entity relationship to discover the causal e?ects of entity actions on their state changes. In this paper, we aim at two main problems that have not been e?ectively solved by the above works: commonsense reasoning and data insu?ciency. Bene?ting from the commonsense knowledge in ConceptNet and the proposed multi-stage training schema, our model outperforms the aformentioned models on the

ProPara dataset.

Commonsense in Language Understanding.Incorporating com- monsense knowledge to facilitate language understanding is an- other related line of work [23,32]. Yang et al. [31] infuse concepts formation extraction. Chen et al. [6] propose a knowledge-enriched co-attention model for natural language inference. Lin et al. [17] employ graph convolutional networks and path-based attention mechanism on knowledge graphs to answer commonsense-related questions. Guan et al. [11] apply multi-source attention to connect hierarchical LSTMs with knowledge graphs for story ending gen- eration. Min et al. [19] construct relational graph using Wikipedia paragraphs to retrieve knowledge for open-domain QA. Wang et al. [27] inject factual and linguistic knowledge into language models by training multiple adapters independently. Inspired by previous works, we introduce commonsense knowledge from ConceptNet [22] into the procedural text understanding task, and prove that the retrieved knowledge contributes to the strong performance of our model.

3 PROBLEM DEFINITION

Here we de?ne the task of Procedural Text Understanding.Given: cooking recipe. pants of the process. presence}. should be a text span in the paragraph. A special '?" token initial location before the process begins.

WWW "21, April 19-23, 2021, Ljubljana, Slovenia Zhihan Zhang, Xiubo Geng, Tao Qin, Yunfang Wu and Daxin Jiang

roots absorbwaterfrom soil . waterflowsto the leaf. CO2 entersthe leaf.

Knowledge Encoderclsclsclsclssoilrootleafleaf

C R FMove

Location

Decoder

Knowledge

InjectorState

ConceptNet

waterroot oxygenplant

flow(water, at location, root)(water, at location, plant)(water, capable of, flow)(oxygen, part of, water)B E R TKnowledge

Encoder݄௧௫

LM fine-tuneܮ

inear lassifierMoveExist

LM fine-tunestep 0Knowledge

Extraction

GraphFigure 2: An overview of theKoaLamodel (left) & a detailed illustration of knowledge-aware reasoning modules (right),

focusing on entity "water". Note that the location prediction modules are applied to each location candidate (root, soil, leaf,

etc) in parallel, and perform classi?cation among candidates at each timestep. Text & knowledge encoders are implemented

using BERT. "Decoder" represents either the state decoder or the location decoder.

4 MODEL

In this section, we ?rst present the overview of our model. Then, we describe our procedural text understanding model in detail, followed by the proposed knowledge-aware reasoning methods.

4.1 Overview

The base framework ofKoaLais built upon the previous state-of- the-art model NCET [13], shown in Figure 2. Its major di?erences to NCET are the use of powerful BERT encoders, the knowledge- aware reasoning modules (Section 4.3) and the multi-stage training procedure (Section 5). Based on an encoder-decoder architecture, the model performs two sub-tasks in parallel:state trackingand location prediction. A text encoder is ?rst used to obtain the contex- are responsible for tracking the state and location changes of the given entity. Commonsense knowledge extracted from Concept- Net is integrated in the decoding process in an attentive manner. The ?nal training objective is to jointly optimize state prediction, location prediction and knowledge selection.

4.2 Framework

paragraph using a pre-trained BERT model to obtain the contextual embeddings of each text token. Meanwhile, we extract knowledge ConceptNet. These triples are encoded by another BERT encoder for their representations, which we will elaborate in Section 4.3. State Tracking Modules.An entity"s state changes are usually phrase or there are multiple verbs in the sentence, we average their

0,otherwise(1)

The state tracking modules include a knowledge injector, a Bi- LSTM state decoder and a conditional random ?eld (CRF) layer. The knowledge injector infuses the extracted ConceptNet knowledge decoder acts on the sentence level and models the entity"s state at states on the temporal dimension: semicolon denotes vector concatenation. Finally, the CRF layer is applied to compute the conditional log likelihood of ground-truth (3) tion potentials between state tags, which is obtained from CRF"s transition score matrix. Location Candidates.Predicting the entity"s location equals to predicting a text span from the input paragraph. Inspired by [13], we split this objective into two steps. We ?rst extract all possible graph, then perform classi?cation on this candidate set. Speci?cally, we use an o?-the-shelf POS tagger [1] to extract allnounsandnoun

Knowledge-Aware Procedural Text Understanding with Multi-Stage Training WWW "21, April 19-23, 2021, Ljubljana, Sloveniaphrasesas location candidates. Such heuristics reach a 87% recall of

the ground-truth locations on the ProPara test set. We additionally de?ne a learnable vector for location '?", which acts as a special candidate location. Location Prediction Modules.Similar to state tracking, for each replace it with an all-zero vector instead:

0,otherwise(5)

Similar to the state tracking modules, the location prediction mod- ules include a knowledge injector and a Bi-LSTM location decoder followed by a linear classi?er. The sentence-level Bi-LSTM loca- simulates the dynamic changes of entity locations on the tempo- decoder"s hidden states: The scores of all location candidates at the same timestep are nor- the negative log likelihood of the ground-truth locations: At inference time, we perform both sub-tasks, but only predict the entity"s location when the model predicts its state ascreate ormove, because other states will not alter the entity"s location.

Γ.ΓQΓRΓZΓOΓHΓGΓJΓHΓΛΓ5ΓHΓOΓHΓYΓDΓQΓFΓHFigure 3: Left: the relevance of the retrieved ConceptNet

knowledge to the input paragraph. Right: the novelty of the retrieved knowledge when ConceptNet triples provide use- ful knowledge.4.3 Knowledge-Aware Reasoning KoaLa. We ?rst extract those knowledge triples that are relevant to the given entity and input paragraph. Then, we encode these knowledge triples using a BERT encoder. The model attentively reads the knowledge triples and select the most relevant ones to the current context. Additionally, we add a task-speci?c attention loss to guide the training of knowledge selection modules.

4.3.1 ConceptNetKnowledgeExtraction.Asalargerelationalknowl-

edge base, ConceptNet is composed of numerous concepts and inter-concept relations. Each knowledge piece in ConceptNet can be regarded as a (h,r,t;w) triple, which means head concepthhas relationrwith tail concepttandwis its weight in the ConceptNet concepts. For phrasal entities that contain multiple words, we re- Then, we adopt two methods to retrieve relevant triples from this subgraph: Exact-match: the neighboring concept appears in the para- Fuzzy-match: the neighboring concept is semantically re- detailed retrieval algorithm is presented in Algorithm 1. We set uate 50 instances from the ProPara dataset. The results are shown in Figure 3. Regarding the relevance of the retrieved knowledge, in

36% of the cases, the knowledge triples provide direct evidence for

predicting the entity"s state/location; in another 44 % of the cases, the knowledge triples contain relevant knowledge that helps un- derstand the entity and the context; while the retrieved triples have no relationship with the context in only 20% of the cases. Among the ?rst two categories, 75% of the instances can obtain new knowl- edge that is not indicated in the text paragraph, which veri?es the novelty of the retrieved knowledge. These results suggest that the retrieved ConceptNet knowledge is very likely to be helpful from human perspectives.

4.3.2 A?entive Knowledge Infusion.The external knowledge is in-

jected into our model in an attentive manner before the decoders3, as shown in the right part of Figure 2. We ?rst encode the Con- ceptNet triples using BERT. The BERT inputs are formatted as [CLS]head[SEP]relation[SEP]tail[SEP], whererelationis interpreted as a natural language phrase. Such formatting scheme converts the original triple into a text sequence while reserving its structural feature. In Section 5.2, we will describe the multi-stage2 The highest embedding similarity between the neighboring concept and any content

3Here, "decoder" refers to either the state decoder or the location decoder.

WWW "21, April 19-23, 2021, Ljubljana, Slovenia Zhihan Zhang, Xiubo Geng, Tao Qin, Yunfang Wu and Daxin Jiang

3://exact match

4:if

6:end if

7://fuzzy match

12:end for

17:else

19:end if

training procedure which trains BERT encoder to better model ConceptNet triples. We use the average of BERT outputs (excluding [CLS] and [SEP] tokens) as the representation of a knowledge triple: In order to select the most relevant knowledge to the text para- graph, we use the decoder input as query to attend on the retrieved

ConceptNet triples:

one-hop ConceptNet graph. Finally, we equip the decoder with an input gate to select information from the original input and the injected knowledge: sigmoid function. We empirically ?nd that such gated integration For instance, (leaf,PartOf,plant) can be transformed to "leaf is a part of plant."

4.3.3 A?ention Loss on Knowledge Infusion.Although the atten-

tion mechanism can help the model attend on knowledge relevant to thecontext, it is still challenging in some cases to ?nd the most useful triple to theprediction target(i.e., the target state and lo- cation of the entity). In order to assist the model in learning the dependency between the prediction target and knowledge triples, we use an attention loss as explicit guidance. We heuristically label a subset of knowledge triples that are relevant to the prediction target, and guide the model to attend more on these labeled triples. truth location of the current movement/creation is men- which we only predict a new location when the expected state ismoveorcreate. includes a verb that usually indicates the occurrence of state Statistically, on the ProPara dataset, 61% of the data instances have at least one knowledge triples labeled as "relevant". On triple- level, 18% of the knowledge triples are labeled as "relevant" for at least once. These ?gures veri?es the trainability of the attention loss since its e?ect covers a considerable number of training data. The training objective is to minimize the attention loss, which is to maximize the attention weights of all "relevant" triples: Now the model is expected to better identify the relevance between ConceptNet knowledge and prediction target during inference. Finally, the overall loss function is computed as the weighted sum of three sub-tasks: corresponding sub-tasks in model optimization.

5 MULTI-STAGE TRAINING

5.1 Multi-Stage Training on Wikipedia

As is mentioned in Section 1, we seek to collect additional proce- dural text documents from Wikipedia to remedydata insu?ciency. Due to the high cost of human annotation and the unreliability of machine-annotated labels, we adopt self-supervised methods to apply Wiki paragraphs into the training procedure of the text encoder. Inspired by the strong performance of pre-trained BERT models on either open-domain [9] or in-domain data [24,30], we adopt a multi-stage training schema for the text encoder in our model. Speci?cally, given the original pre-trained BERT model, we utilize the following training procedure:

Knowledge-Aware Procedural Text Understanding with Multi-Stage Training WWW "21, April 19-23, 2021, Ljubljana, Slovenia[CLS][MASK][SEP]is created by[SEP]rain clouds[SEP]

[CLS]rain[SEP]is created by[SEP][MASK] clouds[SEP] [CLS]rain[SEP]is created by[SEP]rain[MASK][SEP]

[CLS]rain[SEP][MASK][MASK][MASK][SEP]rain clouds[SEP]Figure 4: Four instances created from triple (rain,

CreatedBy, rain_clouds) in LM ?ne-tuning on Con-

ceptNet. 1. tuning) on a procedural text corpus collected from Wikipedia. The training is based on a modi?ed masked language modeling (MLM) objective. 2. The fullKoaLamodel, including the BERT encoder, is further ?ne-tuned on the target ProPara or Recipes dataset. target dataset, we split Wiki documents into paragraphs and use DrQA"s TF-IDF ranker [5] to retrieve top 50 Wiki paragraphs that simulating the writing style of procedural text. By ?ne-tuning on a larger corpus of procedural text, we expect the BERT encoder learn Then, we ?ne-tune the vanilla BERT on these Wiki paragraphs. InKoaLa, contextual representations of entities, verbs and loca-quotesdbs_dbs42.pdfusesText_42

[PDF] centre propara montpellier 34

[PDF] mas propara montpellier

[PDF] propara montpellier piscine

[PDF] propara recrutement

[PDF] propara montpellier recrutement

[PDF] centre neurologique montpellier

[PDF] recit de science fiction

[PDF] exposé sur la science fiction

[PDF] effet d'amorçage médias

[PDF] fonction numerique pdf

[PDF] cours aeronautique pdf

[PDF] cours aérodynamique et mécanique du vol

[PDF] cours pilotage avion en ligne

[PDF] cours de pilotage pdf

[PDF] telecharger dictionnaire biblique gratuit pdf

[PDF] Knowledge-Aware Procedural Text Understanding with Multi-Stage

Multi-Stage Training

Zhihan Zhang

Peking University

Beijing, China

STCA NLP Group, Microsoft

Beijing, China

STCA NLP Group, Microsoft

Beijing, China

Yunfang Wu

Peking University

Beijing, China

STCA NLP Group, Microsoft

Beijing, China

KEYWORDS

Reasoning, Multi-Stage Training?

Corresponding authors.

1Code is available at https://github.com/ytyz1307zzh/KOALA

WWW "21, April 19-23, 2021, Ljubljana, Slovenia

2021 IW3C2 (International World Wide Web Conference Committee), published

ACM ISBN 978-1-4503-8312-7/21/04.

1145/3442381.3450126

1 INTRODUCTION

0N/AN/Aanimal

1An animaldies.existanimal

2It is buried in an watery environment.existanimal

3The soft tissues quickly decompose.existanimal

4The bonesare left behind.movewatery environment

5Over time, mud and silt accumulate

488 paragraphs including 1.9k entities. Although another recent

2 RELATED WORK

ProPara dataset.

3 PROBLEM DEFINITION

Knowledge Encoderclsclsclsclssoilrootleafleaf

C R FMove

Location

Decoder

Knowledge

InjectorState

ConceptNet

Encoder݄௧௫

LM fine-tuneܮ

LM fine-tunestep 0Knowledge

Extraction

4 MODEL

4.1 Overview

4.2 Framework

0,otherwise(1)

0,otherwise(5)

4.3.1 ConceptNetKnowledgeExtraction.Asalargerelationalknowl-

36% of the cases, the knowledge triples provide direct evidence for

4.3.2 A?entive Knowledge Infusion.The external knowledge is in-

3Here, "decoder" refers to either the state decoder or the location decoder.

3://exact match

6:end if

7://fuzzy match

12:end for

17:else

19:end if

ConceptNet triples:

4.3.3 A?ention Loss on Knowledge Infusion.Although the atten-

5 MULTI-STAGE TRAINING

5.1 Multi-Stage Training on Wikipedia

CreatedBy, rain_clouds) in LM ?ne-tuning on Con-

Corresponding authors.