[PDF] One-Unambiguous Regular Languages

In Section 2, after giving the basic definitions, we show that a regular expression E is 1-unambiguous if and only if GE is a deterministic finite-state automaton ( DFA)

A language A is regular if and only if there exists an NFA M such that L(M) = A Proof The forward direction is trivial, since A regular means there is a DFA that recognizes it, and a DFA can be seen as an NFA rather immediately Assume that A is a language that is recognized by an NFA M = (Q,Σ,∆,q0,F)

[PDF] Homework 4 - NJIT

is regular if and only if it is recognized by an NFA (Corollary 1 20) Note that the DFA M recognizes the language B, the complement of B Since B is recognized

[PDF] Automata Theory and Languages

Automata Theory, Languages and Computation - Mırian Halfeld-Ferrari – p There are only two symbols (0 and 1) in the string 01101, but 5 positions for The constants ǫ and ∅ are regular expressions, denoting the language {ǫ} and ∅,

[PDF] Regular and Nonregular Languages

The only way to generate/accept an infinite language with a finite description is to use: Kleene star (in regular expressions), or cycles (in automata) This forces

[PDF] Regular Languages - Jeff Erickson

Regular Languages Languages A formal language (or just a language) is a set of strings over some finite alphabet Σ, or equivalently, an arbitrary subset of Σ∗

[PDF] Languages That Are and Are Not Regular

(1) There are a countably infinite number of regular languages The only way to generate/accept an infinite language with a finite description is to use Kleene

[PDF] Regular Languages Notes - Department of Computer Science at the

4 fév 2010 · The proofs we do in cs3102 will involve only a few main types of argument: • Proof by Construction — If you can express the statement you are

[PDF] Non-regular languages and the pumping lemma - MIT

– Recall, a language is any (finite or infinite) set of (finite) strings – It turns out that there are many more sets of finite strings than there are DFAs; so just based on

[PDF] Regular and Context-Free Languages* - CORE

regular expressions and context-free grammars, concentrating on the relationship between A language L _C Z* is said to be bounded if and only if there

[PDF] One-Unambiguous Regular Languages - CORE

In Section 2, after giving the basic definitions, we show that a regular expression E is 1-unambiguous if and only if GE is a deterministic finite-state automaton ( DFA)

[PDF] a large scale attribute dataset for zero shot learning

[PDF] a level chemistry aqa mechanism questions

[PDF] a level chemistry calculations pdf

[PDF] a level computer science notes

[PDF] a level computer science online revision

[PDF] a level computer science past papers

[PDF] a level computer science revision

[PDF] a level computer science syllabus

[PDF] a level english language course

[PDF] a level english language model answers

[PDF] a level english language paper 1 model answers

[PDF] a level english language paper 1 question 3 example

[PDF] a level english language paper 2 model answers

[PDF] a level english language syllabus 2019

[PDF] a level english past papers

File: DISTIL 268801 . By:DS . Date:23:01:98 . Time:13:02 LOP8M. V8.B. Page 01:01 Codes: 4035 Signs: 2189 . Length: 58 pic 2 pts, 245 mm

Information and ComputationIC2688

Information and Computation140, 229253 (1998)

One-Unambiguous Regular Languages

Anne Bru ggemann-Klein*

Institut fu r Informatik,Technische Universita t Mu nchen,

Arcisstrasse21,80290Mu nchen,Germany

E-mail: brueggeminformatik.tu-muenchen.de

and

Derick Wood

Department of Computer Science,Hong Kong University of ScienceHTechnology,

Clear Water Bay,Kowloon,Hong Kong

E-mail: dwood

cs.ust.hk The ISO standard for the Standard Generalized Markup Language (SGML) provides a syntactic meta-language for the definition of textual markup systems. In the standard, the right-hand sides of productions are based on regular expressions, although only regular expressions that denote words unambiguously, in the sense of the ISO standard, are allowed. In general, a word that is denoted by a regular expression is witnessed by a sequence of occurrences of symbols in the regular expres- sion that match the word. In an unambiguous regular expression as defined by Booket al. (1971,IEEE Trans.Comput.C-20(2), 149153), each word has at most one witness. But the SGML standard also requires that a witness be computed incrementally from the word with a one- symbol lookahead; we call such regular expressions 1-unambiguous. A regular language is a 1-unambiguous languageif it is denoted by some 1-unambiguous regular expression. We give a Kleene theorem for

1-unambiguous languages and characterize 1-unambiguous regular

languages in terms of structural properties of the minimal deterministic automata that recognize them. As a result we are able to prove the decidability of whether a given regular expression denotes a 1-unam- biguous language; if it does, then we can construct an equivalent

1-unambiguous regular expression in worst-case optimal time.

]1998

Academic Press

Article No. IC972688

229

0890-54019825.00

Copyright1998 by Academic Press

All rights of reproduction in any form reserved.

* WWW: http:www11.informatik.tu-muechen.de. The work of the second author was supported under a Natural Sciences and Engineering Research Council of Canada Grant and under an Information Technology Research Centre of Ontario Grant.

WWW: http:www.cs.ust.hk

dwood.brought to you by COREView metadata, citation and similar papers at core.ac.ukprovided by Elsevier - Publisher Connector

File: DISTIL 268802 . By:DS . Date:23:01:98 . Time:13:02 LOP8M. V8.B. Page 01:01 Codes: 3817 Signs: 3242 . Length: 52 pic 10 pts, 222 mm

1. INTRODUCTION

Document processing systems such as editors, formatters, and retrieval systems deal with many different types of documents, such as books, articles, memoranda, dictionaries, and letters. The Standard Generalized Markup Language (SGML) establishes a common platform for the syntactic specification of document types and conforming documents [ISO86, Gol90]. SGML is an ISO standard that has been endorsed by a number of publishing houses throughout North America and Europe, by the European Community, and by the U.S. Department of Defense. Document types in SGML are defined, essentially, by bracketed, extended context-free grammars [GH67, Woo87]. The right-hand sides of productions, calledmodel groups, are essentially regular expressions with two major differences. First, model groups allow three new operators ?,6, and . Second, the model groups must beunambiguousin the sense of Clause 11.2.4.3 of the standard. The intent of the standard is to make it easier for a human to write regular expressions that can be interpreted unambiguously. The notion of unambiguity for model groups used in the ISO standard differs from, but is related to, unambiguity as defined by Booket al. [BEGO71] for regular expressions. Eilenberg [Eil74] defined ambiguity for finite-state automata, rather than for regular expressions; we discuss the relationship of his notion to the one of Booket al. at the end of Section 2. Booket al. required that each words be witnessed by at most one sequence of positions of symbols in the regular expression that matches the word. For example, consider the regular expression (a+b)*aa*. If we mark different positions of the same symbol with subscripts, we get (a 1 +b 1 )*a 2 a 3 *; now, there are three witnesses for the wordaaa, namelya 1 a 1 a 2 ,a 1 a 2 a 3 , anda 2 a 3 a 3 . Thus, (a+b)*aa* is ambiguous; however, there is an unambiguous regular expression that denotes the same language, namely (a+b)*a. Unambiguity as defined in SGML is a one-symbol-lookahead version of unam- biguity as defined by Booket al. In Clause 11.2.4.3 of the SGML standard a model group (regular expression) is defined to be unambiguous if 1 ``an element [a symbol]... that occurs in the document instance [word] must be able to satisfy only one primitive content token [position of the symbol in the regular expression] without looking ahead in the document instance.'' In other words, the only valid regular expressions are those that permit us to determine uniquely which position of a symbol in a regular expression should match a symbol in an input word without looking beyond that symbol in the input word. We call such regular expressions 1-unambiguous. Consider the regular expression (a+b)*amarked as (a 1 +b 1 )*a 2 . In the word baa, after we match symbolbwith positionb 1 , we cannot decide whether we should match the subsequentain the word with positiona 1 or with positiona 2 without looking ahead beyond the current symbolain the word. Therefore, although (a+b)*ais unambiguous in the sense of Booket al., it is not 1-unambiguous; however,b*a(b*a)* is a 1-unambiguous regular expression that denotes the same language as (a+b)*a.230

BRU GGEMANN-KLEIN AND WOOD

The phrases in brackets are our interpretations

File: DISTIL 268803 . By:DS . Date:23:01:98 . Time:13:02 LOP8M. V8.B. Page 01:01 Codes: 3789 Signs: 3387 . Length: 52 pic 10 pts, 222 mm For unambiguous regular expressions in the sense of Booket al. [BEGO71], the following results are known: Booket al. gave a construction that, for each regular expressionE, gives a nondeterministic finite-state automaton (NFA)G E that recognizes the language ofE. They show thatEis unambiguous if and only ifG E is unambiguous. Berry and Sethi [BS86] showed that this NFA is the canonical representation of the corresponding regular expression, because it has a natural connection with the derivatives [Brz64] of the regular expression. Regular expressions are built with the usual operators +, }, and *. SGML, however, deals with model groups that may also contain the operators ?,6, and (E? denotes L(E+=),F6Gdenotes L(FG+GF), andE denotes L(EE*).)

Whereas the transformations ofE

intoEE* and ofF6GintoFG+GFpreserve languages, they do not preserve 1-unambiguity. For example, (a*+b) is 1-unam- biguous, but (a*+b)(a*+b)* is not. Similarly,a?6bis 1-unambiguous, but a?b+ba? is not. In fact, there are languages that can be denoted by a 1-unam- biguous model group but not by any 1-unambiguous regular expression [BK93a]. Furthermore, 1-unambiguous model groups are exponentially more succinct than are 1-unambiguous regular expressions. For example, the smallest 1-unambiguous regular expression equivalent to the 1-unambiguous model groupa 1

6}}}6a

n has size exponential inn. We establish the basic results for 1-unambiguous regular expressions and languages which are the basis for the results by Ahonen [AH97] for transforming an ambiguous model group into an unambiguous one by generalizing the language of the model group, the decidability results by Bru ggemannKlein [BK93a] for model groups, and the results for SGML exceptions by Kilpela inen and Wood [KILW97]. In Section 2, after giving the basic definitions, we show that a regular expressionEis 1-unambiguous if and only ifG E is a deterministic finite-state automaton (DFA). Thus, one can decide, in time linear in the size of a regular expressionE, whetherEis 1-unambiguous, and if it is, then one can also construct the deterministic automatonG E in linear time [BK92a, BK93c]. We establish, in Section 3, that the family of 1-unambiguous languages is closed under derivatives and, in Section 4, we establish a Kleene characterization of the family of 1-unambiguous languages. An analogous Kleene characterization for model groups still eludes us. In Section 5, we present the main result of the papera characterization of the 1-unambiguous languages in terms of their minimal deterministic finite- state automata. As one application of this result we prove that the family of

1-unambiguous languages forms a proper subfamily of the regular languages,

in contrast to the result of Booket al. that each regular language is denoted by some unambiguous regular expression. The characterization yields a decision algorithm for 1-unambiguous languages. Moreover, if a regular language is

1-unambiguous, then we can construct an equivalent 1-unambiguous regular ex-

pression. The decision algorithm runs in time quadratic in the size of the minimal deterministic finite-state automaton for the given language. The 1-unambiguous regular expression that we construct from its minimal deterministic finite-state automatonMcan have size exponential in the size ofM, which is worst-case optimal.231

ONE-UNAMBIGUOUS REGULAR LANGUAGES

File: DISTIL 268804 . By:DS . Date:23:01:98 . Time:13:02 LOP8M. V8.B. Page 01:01 Codes: 3575 Signs: 2893 . Length: 52 pic 10 pts, 222 mm

2. UNAMBIGUOUS REGULAR EXPRESSIONS

Let7be an alphabet of symbols. Regular expressions over7are built from=,<, and symbols in7using the binary operators + and } and the unary operator *. The language specified by a regular expressionEis denoted by L(E). The symbols that occur in a regular expressionEare denoted by sym(E). To indicate different positions of the same symbol in a regular expression, we mark symbols with subscripts. For example, (a 1 +b 1quotesdbs_dbs4.pdfusesText_8

[PDF] [PDF] One-Unambiguous Regular Languages - CORE

[PDF] Theorem A language A is regular if and only if there exists an NFA