Telugu - English Cross Language Information Retrieval using Language segmented as ordinary words and translated with a bilingual dictionary, the
Previous PDF | Next PDF |
[PDF] Websters Telugu - English Thesaurus Dictionary by Philip M Parker
words, but who need to learn how a single English translation of a Telugu wants to improve English-language test scores covering English synonyms English Thesaurus Dictionary by Philip M Parker Free PDF d0wnl0ad, audio books,
[PDF] [1GRN]⋙ Websters Telugu - English Thesaurus Dictionary by Philip
Webster's Telugu - English Thesaurus Dictionary Philip M Parker words, but who need to learn how a single English translation of a Telugu word may to read online, online library, greatbooks to read, PDF best books to read, top books
[PDF] Six thousand common English words; their comparative - CORE
indicating approximately what English words would need to other words having the same meanings may be excluded from use in all printed matter and in all
ASC 166 - Thesaurus Based Web Searching - SpringerLink
given and it doesn't provide any specific meanings for the given word depending upon the context In this paper For a given term in Telugu we can retrieve words in English Searching is possible Manual Construction of Telugu Thesaurus
[PDF] The Oxford Thesaurus An AZ Dictionary of Synonyms - The dead
Any synonym book must be seen as a compromise that relies on the sensitivity of its users to meaning in British English and quite a different meaning in another 2 vade-mecum, manual, handbook, guide, reference book, enchiridion: They
[PDF] A Malayalam and English dictionary - Rare Book Society of India
and synonyms, many of which are confessedly very doubtful; to record merely the principal Dictionary (also his Scripture Translation) bef before Bang Bengali BhadrD Bhadra DTpam, or Telugu (Audhraegens, Plin ) (ST3)mjo anyam t
[PDF] GRAMMAR RULE BASED CROSS LANGUAGE INFORMATION
Telugu - English Cross Language Information Retrieval using Language segmented as ordinary words and translated with a bilingual dictionary, the
[PDF] English to French Words
This Online Dictionary contains general words and phrases, restaurant words and phrases and a huge section on food related items Please remember to be
[PDF] VOCABULARY LIST - Cambridge English
Example phrases and sentences are given only where words which can be used with different meanings have been restricted in the extent of their usage at
[PDF] english teaching materials pdf
[PDF] english test a2
[PDF] english test b2 with key pdf
[PDF] english to asl sentence structure translator
[PDF] english to australian aboriginal language translator
[PDF] english to braille
[PDF] english to braille translator grade 2
[PDF] english to french math terms
[PDF] english to hindi dictionary pdf
[PDF] english to hindi technical dictionary
[PDF] english to hindi vocabulary words pdf
[PDF] english to klingon voice translator
[PDF] english to norse translator
[PDF] english to tamil dictionary pdf format
GRAMMAR RULE BASED
CROSS LANGUAGE INFORMATION RETRIEVAL
FOR TELUGU
A THESIS
Submitted by
DINESH MAVALURU
Under the guidance of
Dr. R. SHRIRAM
in partial fulfillment for the award of the degree ofDOCTOR OF PHILOSOPHY
inCOMPUTER SCIENCE
B.S.ABDUR RAHMAN UNIVERSITY
(B.S.ABDUR RAHMAN INSTITUTE OF SCIENCE &TECHNOLOGY) (Estd. u/s 3 of the UGC Act. 1956) www.bsauniv.ac.inAPRIL 2014
CERTIFICATE
This is to certify that all corrections and suggestions pointed out by theRule Based Cross Language Information Retrieval f
Mr. Dinesh Mavaluru.
(Dr.R. Shriram)SUPERVISOR
Place: Chennai
Date: 04 July 2014
B.S.ABDUR RAHMAN UNIVERSITY
(B.S.ABDUR RAHMAN INSTITUTE OF SCIENCE &TECHNOLOGY) (Estd. u/s 3 of the UGC Act. 1956) www.bsauniv.ac.inBONAFIDE CERTIFICATE
Certified that this thesis GRAMMAR RULE BASED CROSS LANGUAGE INFORMATION RETRIEVAL FOR TELUGU is the bonafide work of DINESH MAVALURU (RRN: 1194207) who carried out the thesis work under my supervision. Certified further, that to the best of my knowledge the work reported herein does not form part of any other thesis or dissertation on the basis of which a degree or award was conferred on an earlier occasion on this or any other candidate.Dr. R. SHRIRAM
RESEARCH SUPERVISOR
Professor
Department of CSE
B.S. Abdur Rahman University
Vandalur, Chennai ± 600 048
Dr. P. SHEIK ABDUL KHADER
HEAD OF THE DEPARTMENT
Professor & Head
Department of CA
B.S. Abdur Rahman University
Vandalur, Chennai ± 600 048
ACKNOWLEDGEMENT
At the outset I thank the Almighty whose unbounded blessings and love have helped me in pursuing this research work. I always admired my adviser, Prof. R. Shriram, whose ideals had a big influence on me which changed the way I perceived this world. I am one of those fortunate students to scribe my name in his students list. Without his support, I could not imagine myself starting a research career. His generosity gave the freedom to enjoy all the privileges. I remain indebted to him and his family members all my life and just a mere thank you is not sufficient. I am greatly obliged to the members of my doctoral committee Dr. A. Kannan, Professor, Department of Information Science and Technology, Anna University, Chennai, Dr. T. R. Rangaswamy, Professor, Department of Electronics and Instrumentation Engineering, B S Abdur Rahman University, Chennai and Dr. P. Sheik Abdul Khader, Professor and Head, Department of Computer Applications, B S Abdur Rahman University, Chennai, for their guidance, valuable suggestions, continuous encouragement and critical reviews during the tenure of this research work. I would like to express most sincere gratitude to the members of my review committee Dr. V. Sankaranarayanan and Dr. K. M. Mehata who have influenced me greatly, and from whom I had the chance to learn throughout my research work by their valuable suggestions and guidance in between their tight schedule. I owe my sincere thanks to Prof. V. Saravanan, Computer Sciences and Information Technology College, Majmaah University, Majmaah, Kingdom of Saudi Arabia, who made me realize the best in me and also taught me how to do research. I am immensely grateful to the faculty members of Department of Computer Applications, Management and Administration of B S Abdur Rahman University, Chennai for providing all the facilities to complete my research work successfully. I would like to thank all my dear colleagues in particular, Shakthi Priyan, A. Venkat Narayanan, P. Kumaran, T. Nadana Ravi Shankar, V. K. Mohan Raj, B. Manikandan, S. Sumitra, P. Thiripurasundari and D. M. Ahamed Kabeer Bhadhusha for their constant support during my research work. My Whole hearted thanks go to my family, Mrs Gnanamani and my beloved G. Sonia who motivated me to be strong, bold and helped me to bring out the best from the beginning to the end in the completion of this research work and move on with my future goals helped me to realize the importance of many things in my life. Finally, I would like to acknowledge my friends D. Shyam Kiran, Amaresh and many others who are along with me during my bad and good times. Without you all I am nowhere.ABSTRACT
The rapid spread of the World Wide Web and improvements in information retrieval (IR) techniques have allowed people to access huge amount of information. However, majority of the web content is in English. While content in languages like Telugu and Tamil are growing every day, a huge gap remains. This gap is what this research work will be addressing. In general information retrieval systems, the relevant information retrieved for the user query, only if the information is available in that query language. For example a Telugu search engine will retrieve only results for content in Telugu. It is not considering the relevant information that is available in the other languages for the given user query. Cross Language Information Retrieval (CLIR) systems seek to overcome this gap. A CLIR query language. The goal of this research work is to develop a new framework for Telugu - English Cross Language Information Retrieval using Language Grammar Rules. The major challenges addressed are query ambiguity and the linguistic differences between the query and content language.The steps in this research are as follows:
a) The user query is tokenized into keywords using tokenizer. The language grammar rules are applied to the tokenized query terms to identify the subject, verb, object and inflection in tokenized keywords. b) The query processor searches the English equivalent terms in the ontology for the terms identified using language grammar rules. The terms which are not available in ontology are considered as Out-Of- Vocabulary terms and literally transliterated into the English language. c) The parser will find the subject, verb and object in English to assemble the query in English. The query processing is done and the query is converted into the English language. The converted query is given to the search engine for relevant results. d) The retrieved results are given to the post processor to convert the results into Telugu language. For this, the ontology is used to convert the Telugu word to the English word. Thus, all the previous stages mentioned are repeated again until the results are converted into target language representation. The grammar rule based approach is a semantic way of approaching the IR problem by first finding the meaning of query; mapping user query to target language, finding relevant information in target language, mapping this to source language and displayed to the user. This research work also evaluates the user acceptance of CLIR forTelugu using various metrics.
TABLE OF CONTENTS
CHAPTER NO. TITLE PAGE NO.
ABSTRACT V
LIST OF TABLES XI
LIST OF FIGURES XII
1. INTRODUCTION 1
1.1 General Introduction 1
1.2 Objectives 3
1.3 Contribution of The Work 4
1.4 Thesis Outline 5
2. LITERATURE REVIEW 7
2.1 Introduction 7
2.2 Information Retrieval 7
2.2.1 Retrieval Models 9
2.1.2 Improving Information
Retrieval 14
2.3 Cross Language Information
Retrieval 17
2.3.1 Non-Translation Approaches 17
2.3.2 Translation-Based
Approaches 18
2.3.3 Challenges in CLIR 20
2.3.4 Current Approaches 21
2.4 Information Retrieval In The Telugu
Language 24
2.4.1 Difficulties of Information
Retrieval in Telugu 24
2.4.2 Monolingual IR in Telugu 25
CHAPTER NO. TITLE PAGE NO.
2.4.3 CLIR and Telugu 26
2.5 CONCLUSION 28
3PROPOSED FRAMEWORK FOR TELUGU
CROSS LANGUAGE INFORMATION
RETRIEVAL
303.1 Introduction 30
3.2 Methodology of Proposed Framework 30
3.3 Proposed Framework System 32
3.3.1 Pre-Processing 32
3.3.2 Post-Processing 34
3.3 Conclusion 37
4 PREPROCESSING 38
4.1 Introduction 38
4.2 Methodology of Proposed Pre-
Processing 38
4.2.1 Tokenizer 39
4.2.2 Language Grammar Rules 41
4.2.3 Bilingual Ontology 51
4.2.4 OOV Component 54
4.3 Conclusion 58
5 POSTPROCESSING 59
5.1 Introduction 59
5.2 Methodology of Proposed Post-
Processing 59
5.2.1 Tokenizer 60
5.2.2 Language Grammar Rules 61
5.2.3 Re-ranking System 61
CHAPTER NO. TITLE PAGE NO.
5.2.4 Smoothening Approach 63
5.3 Conclusion 67
6 FRAMEWORK IMPLEMENTATION AND
RESUTLS 68
6.1 Introduction 68
6.2 Approaches For Evaluating
Information Retrieval 68
6.3 Test Collection 69
6.4 Evaluation of Results 69
6.4.1 Mean Average Precision 70
6.5 Experimental Framework And Toolkit 70
6.6 Experimental Settings For Pre-
Processing 71
6.7 Experimental Settings For Post-
Processing 73
6.8 Testing and Results 73
6.9 Conclusion 84
7EVALUATING USER ACCEPTANCE OF
CLIR USING LANGUAGE GRAMMAR
RULES 857.1 Introduction 85
7.2 Technology Acceptance Model (TAM) 85
7.3 Research Model And Hypotheses 87
7.3.1 CLIR System ease of use 87
7.3.2 CLIR System usefulness 87
7.3.3 Attitude towards using a
CLIR System 87
CHAPTER NO. TITLE PAGE NO.
7.3.4 Behavioral intentions for
using a CLIR System 887.4 Research Methodology 88
7.5 Data Analysis and Results 92
7.6 Conclusion 97
8 Conclusion 98
References 100
LIST OF TABLES
TABLE NO. TITLE PAGE NO.
4.1 Sample Telugu Sentence Order43
4.2 Post positions for Telugu sentence order 45
4.3Finite Verb Rules47
4.4Non-Finite Verb Rules50
6.1Relative Retrieval Efficiency77
6.2Time taken for Query processing in the
Existing and proposed systems80
6.3Precision Percentages For Retrieved
Results In Existing And Proposed Systems81
6.4Precision for Results82
6.5Weighted Precision83
7.1Profile of the system users89
7.2Instrument Reliability And Validity94
7.3Model fit summary for the final
measurement and structural model967.4The contribution of the study to existing
knowledge96LIST OF FIGURES
FIGURE NO. TITLE PAGE NO.
1.1 Techniques used in CLIR 2
2.1 Workflow of Information Retrieval 7
3.1 Overall process of CLIR for Telugu 30
3.2 Components for the Proposed System 31
3.3 Framework for the Proposed System 32
3.4 Retrieved Results before display 36
3.5 Retrieved results after display 37
4.1 overall process of query pre-processing 38
4.2 Tokenization component 39
4.3 Tokenizer process 39
4.4 Simple Telugu sentence tokenization 40
4.5 Tokenizer example 40
4.6 Tokenizer example for special expressions 41
4.7 Language Grammar rules component 42
4.8 Grammar rules component process 43
4.9 Ontology Component 52
4.10 Process flow of bilingual ontology
component 524.11 Ontology Relationship Hierarchies 53
4.12 Sample ontology structure 54
4.13 Out of Vocabulary Component 55
4.14 Flow Chart for the Pre-Processing stage 57
FIGURE NO. TITLE PAGE NO.
5.1 Overall process of post-processing 59
5.2 Tokenizer process 60
5.3 Process Flow of system 61
5.4 Term frequency for the query terms
relationship 625.5 Sample term frequency 63
5.6 Results retrieved related to the query 65
5.7 Final Results to the user for given query 65
5.8 Flow Chart for the Post-Processing stage 66
6.1 Step by Step Process of the System 74
existing system 76proposed system 76
existing system 78
proposed system 79