[PDF] An Approach for Supplementing the Korean Wikipedia - SWRC PDF An_Approach_for_Supplementing_the_Korean_Wikipedia_based_on

An Approach for Supplementing the Korean Wikipedia based on DBpedia Eun- kyung Kim, DongHyun Choi, Jihye Lee, JinHyun Ahn, and Key-Sun Choi

For example, DBpedia only extracts data from non-English articles that have an interlanguage link3, to an English article 3 http://en wikipedia org/wiki/Help:

[PDF] Korean Linked Data on the Web: Text to RDF - CORE

1 Semantic Web Research Centre, KAIST, Daejeon, South Korea {mrezk,yoon ing Korean entities with Korean Wikipedia and preliminary results evaluating

[PDF] An Approach for Supplementing the Korean Wikipedia - SWRC

An Approach for Supplementing the Korean Wikipedia based on DBpedia Eun- kyung Kim, DongHyun Choi, Jihye Lee, JinHyun Ahn, and Key-Sun Choi

[PDF] Korean NLP2RDF Resources - Association for Computational

The Sejong corpus and its POS tagset (Korean Language Institue, 2012) are how to convert the NLP output to RDF and how to link entities with Wikipedia

[PDF] A Topic-Aligned Multilingual Corpus of Wikipedia Articles for

coverage in English Wikipedia (most exhaustive) and Wikipedias in eight other widely spoken languages (Arabic, German, Hindi, Korean, Portuguese, Russian

Chinese and Korean Cross-Lingual Issue News Detection based on

We propose translation knowledge method for Wikipedia concepts as well as the Chinese and Korean cross-lingual inter-Wikipedia link relations The relevance

Towards Bengali DBpedia - ScienceDirectcom

these are non-Latin character based, like Greek, Korean, Bengali etc Extracting structured data from these versions of Wikipedia is always challenging Though

[PDF] korean writing system

[PDF] korn ferry statistics

[PDF] kosovo patent country code

[PDF] kotlin language javatpoint

[PDF] kpi for employee performance

[PDF] kpi policy and procedure

[PDF] kpi report example

[PDF] kpi template

[PDF] kpis for business

[PDF] kpmg corporate tax rates

[PDF] kpmg pdf 2019

[PDF] kpmg report on digital marketing

[PDF] kpmg report pdf

[PDF] kpop business model

[PDF] kuala lumpur to bangalore malaysia airlines

An Approach for Supplementing the Korean

Wikipedia based on DBpedia

Eun-kyung Kim, DongHyun Choi, Jihye Lee, JinHyun Ahn, and Key-Sun Choi Semantic Web Research Center, CS Department, KAIST, Korea, 305-701 fkekeeo, cdh4696, jhlee20, jhahn, kschoig@world.kaist.ac.kr Abstract.In this paper, we try to supplement an information-poor lan- guage knowledge base, Korean Wikipedia, to help eectively enrich in- formation written in dierent languages. We propose an approach for transLating infoboxes that would enable complementing Wikipedia from

English to Korean.

1 Introduction

Wikipedia is a Web-based, free-content encyclopedia community which has grown rapidly into one of the largest reference web sites. Wikipedia is a multilingual project which has more than 14,000,000 articles in more than 260 languages 1. However, Wikipedia still lacks sucient support for non-Latin languages. For example, English Wikipedia currently contains 3,227,911 articles and Korean Wikipedia contains only 130,629 articles. In addition, smaller languages can not produce articles as fast as larger Wikipedias such as English or German, because the number of editors and users is too low. Due to the dierences in the number of articles between English and non-Latin languages in Wikipedia there need to be a supplementation across them automatically. The key features of this approach are two-fold: (1) to translate English infoboxes into Korean infoboxes is an essential rst step toward a supplementation system. (2) to construct an Ontology schema based on infoboxes that could improve the generating new templates.

2 Korean DBpedia/Wikipedia Supplementation using

Translation and Ontology

Most Wikipedia pages contain an infobox which is the most relevant information for a given concept. We have mainly focused on the translating infoboxes. The translation is often useful to spread information between closely related articles in dierent languages. The dictionary based translation is easy to set up and just requires access to a bilingual resource. We use bilingual word-pairs which are originally created for English-to-Korean translation through interlanguage- links[1].1 http://stats.wikimedia.org/ 2 DBpedia[2] is a community which harvests the information of infoboxes. The infobox extraction algorithm detects such templates and recognizes their struc- ture and saves it in RDF triples. We execute the translation from English DB- pedia to Korean. A comparison of datasets as follows: { English Triples in DBpedia: 43,974,018 { Korean Dataset (Existing Triples/Translated Triples): 354,867/12,915,169 We can get translated Korean triples over 30 times larger than existing Korean triples. However, large amount of translated triples have no predened templates in Korean. There may be a need to form a template schema to organize the ne- grained template structure. Thus we have built the template ontology, OntoCloud

2, from DBpedia and

Wikipedia, which was released on Sept, 2009, to eciently build the template structure. It consists of the following steps: (1) extracting templates of DBpe- dia as concepts in an ontology, for example, theTemplate:InfoboxPerson(2) extracting attributes of these templates, for example,nameofPerson. These attributes are mapped to properties in ontology. (3) constructing the concept hierarchy by set inclusion of attributes, for example,Bookis a subclass of Bookseries. Because all attributes ofBookseriesbelongBookclass as follows: {Bookseries=fname, titleorig, translator, image, imagecaption, author, illustrator, coverartist, country, language, genre, publisher, mediatype, pubdate, englishpubdate, precededby, followedbyg. {Book=fname, titleorig, translator, image, imagecaption, author, illus- trator, coverartist, country, language, genre, publisher, mediatype, pubdate, englishpubdate, precededby, followedby, pages, isbn, oclc, dewey, congressg. This means that \Bookseries" is more generalized concept. The ontology building process is useful for eectively align similar types of templates can be grouped into classes, for example, theTemplate:infoboxbaseball playerandTemplate:infoboxasianbaseballplayerdescribe baseball player. More- over dierent format of properties with same meaning can be normalized, for ex- ample, `birthplace', `birthplace and age' and `place birth' are mapped to `birth- Place'. Today, OntoCloud v0.2 has 1,927 classes, 74 object properties and 101 data properties. As future work, we will consolidate the ontology schema, and then generate new articles using OntoCloud and triples at infoboxes. These automated articles is based on infobox, so it can be treated as a summary.

References

1. E. Adar, M. Skinner, and D. S. Weld, Information arbitrage across multi-lingual

Wikipedia. ACM, 2009, pp. 94-103

2. S. Auer, C. Bizer, G. Kobilarov, J. Lehmann, R. Cyganiak, Z. Ives, DBpedia: A

Nucleus for a Web of Open Data. ISWC+ASWC 2007, November 2008, pp. 722-7352 http://swrc.kaist.ac.kr/OntoCloudquotesdbs_dbs21.pdfusesText_27

[PDF] [PDF] An Approach for Supplementing the Korean Wikipedia - SWRC

[PDF] Towards a Korean DBpedia and an Approach for - CEUR-WSorg