[PDF] [PDF] Minimum Edit Distance - Stanford University

How to find the Min Edit Distance? • Searching for a path (sequence of edits) from the start string to the final string:



Previous PDF Next PDF





[PDF] Minimum Edit Distance - Stanford University

How to find the Min Edit Distance? • Searching for a path (sequence of edits) from the start string to the final string:



[PDF] Using Adapted Levenshtein Distance for On-Line Signature

Using Adapted Levenshtein Distance for On-Line Signature Authentication Sascha Schimke, Claus Vielhauer, Jana Dittmann Otto-von-Guericke University  



[PDF] Unsupervised Learning of Edit Distance Weights for Retrieving

Informa- tion retrieval on historical corpora can deal with these variations using fuzzy matching techniques based on Levenshtein-Distance using stochastic



[PDF] A new edit distance for fuzzy hashing applications - TIC

Keywords: Edit distance, fuzzy hashing, similarity preserving hashing 1 Introduction [11] Wikipedia (2014) Damerau-Levenshtein distance [Online] Avail-



Online Handwriting Recognition Using Levenshtein Distance Metric

Abstract—In this article, we propose a novel scheme for online handwritten character recognition based on Levenshtein distance metric Both shape and 



[PDF] Unsupervised Learning of Edit Distance Weights for Retrieving

Informa- tion retrieval on historical corpora can deal with these variations using fuzzy matching techniques based on Levenshtein-Distance using stochastic



[PDF] Character Recognition with Minimum Edit Distance Method

ICR software has a self-learning system referred to as a neural network, which automatically updates the recognition database for new handwriting patterns It 



[PDF] Learning Balls of Strings from Edit Corrections∗ - Journal of

algorithm is resistant to approximate answers Keywords: grammatical inference, oracle learning, correction queries, edit distance, balls of strings 1 Introduction



[PDF] Levenshtein distance - RYBN

22 sept 2015 · the Levenshtein distance between two words is the minimum number of single- character edits (i e Algorithms and Data Structures [online]

[PDF] lever de soleil

[PDF] lever l'indétermination d'une limite

[PDF] lever une forme indéterminée

[PDF] lever une forme indéterminée 0*infini

[PDF] léviathan 2014

[PDF] levier inter resistant

[PDF] levinas altérité

[PDF] levinas ethique et infini pdf

[PDF] levinas le temps et l'autre pdf

[PDF] levinas visage citation

[PDF] levure ade2

[PDF] levure respiration fermentation

[PDF] Levures dans 4 milieux différents

[PDF] LEWIS ET CLARK LEUR EXPEDITIONT ,AIDE MOI SVP SES POUR DEMAIN !

[PDF] lex de physique "la réfraction"

MinimumEditDistance

Defini'onofMinimum

EditDistance

DanJurafsky

Howsimilararetwostrings?

• Spellcorrec'on• Theusertyped"graffe"Whichisclosest? • graf• graB• grail• giraffe • Computa'onalBiology• Aligntwosequencesofnucleo'des• Resul'ngalignment: • AlsoforMachineTransla'on,Informa'onExtrac'on,SpeechRecogni'on AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---

TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC

DanJurafsky

EditDistance

• Theminimumeditdistancebetweentwostrings • Istheminimumnumberofedi'ngopera'ons • Inser'on• Dele'on• Subs'tu'on • Neededtotransformoneintotheother

DanJurafsky

MinimumEditDistance

• Twostringsandtheiralignment:

DanJurafsky

MinimumEditDistance

• Ifeachopera'onhascostof1• Distancebetweentheseis5 • Ifsubs'tu'onscost2(Levenshtein) • Distancebetweenthemis8

DanJurafsky

AlignmentinComputa9onalBiology

• Givenasequenceofbases • Analignment:• Giventwosequences,aligneachleXertoaleXerorgap

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC

DanJurafsky

OtherusesofEditDistanceinNLP

• Evalua'ngMachineTransla'onandspeechrecogni'on

R Spokesman confirms senior government adviser was shot!H Spokesman said the senior adviser was shot dead! S I D I!

• NamedEn'tyExtrac'onandEn'tyCoreference

• IBMInc.announcedtoday• IBMprofits• StanfordPresidentJohnHennessyannouncedyesterday• forStanfordUniversityPresidentJohnHennessy

DanJurafsky

HowtofindtheMinEditDistance?

• Searchingforapath(sequenceofedits)fromthestartstringtothefinalstring:• Ini9alstate:thewordwe'retransforming• Operators:insert,delete,subs'tute• Goalstate:thewordwe'retryingtogetto• Pathcost:whatwewanttominimize:thenumberofedits

8

DanJurafsky

MinimumEditasSearch

• Butthespaceofalleditsequencesishuge!• Wecan'taffordtonavigatenaïvely• Lotsofdis'nctpathswindupatthesamestate.• Wedon'thavetokeeptrackofallofthem• Justtheshortestpathtoeachofthoserevistedstates.

9

DanJurafsky

DefiningMinEditDistance

• Fortwostrings • Xoflengthn• Yoflengthm • WedefineD(i,j) • theeditdistancebetweenX[1..i]andY[1..j] • i.e.,thefirsticharactersofXandthefirstjcharactersofY • TheeditdistancebetweenXandYisthusD(n,m)

MinimumEditDistance

Defini'onofMinimum

EditDistance

MinimumEditDistance

Compu'ngMinimum

EditDistance

DanJurafsky

DynamicProgrammingforMinimumEditDistance

• Dynamicprogramming:Atabularcomputa'onofD(n,m) • Solvingproblemsbycombiningsolu'onstosubproblems.• BoXom-up

• WecomputeD(i,j)forsmalli,j• AndcomputelargerD(i,j)basedonpreviouslycomputedsmallervalues• i.e.,computeD(i,j)foralli(0

DanJurafsky

DefiningMinEditDistance(Levenshtein)

• Ini'aliza'on

D(i,0) = i!D(0,j) = j

• RecurrenceRela'on:

For each i = 1...M!! For each j = 1...N

D(i-1,j) + 1! D(i,j)= min D(i,j-1) + 1! D(i-1,j-1) + 2; if X(i) ≠ Y(j) ! 0; if X(i) = Y(j)!

• Termina'on:

D(N,M) is distance

DanJurafsky

N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O N

TheEditDistanceTable

DanJurafsky

N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O N

TheEditDistanceTable

DanJurafsky

N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O N

EditDistance

DanJurafsky

N 9 8 9 10 11 12 11 10 9 8 O 8 7 8 9 10 11 10 9 8 9 I 7 6 7 8 9 10 9 8 9 10 T 6 5 6 7 8 9 8 9 10 11 N 5 4 5 6 7 8 9 10 11 10 E 4 3 4 5 6 7 8 9 10 9 T 3 4 5 6 7 8 7 8 9 8 N 2 3 4 5 6 7 8 7 8 7 I 1 2 3 4 5 6 7 6 7 8 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O N

TheEditDistanceTable

MinimumEditDistance

Compu'ngMinimum

EditDistance

MinimumEditDistance

Backtracefor

Compu'ngAlignments

DanJurafsky

Compu9ngalignments

• Editdistanceisn'tsufficient• WeoBenneedtoaligneachcharacterofthetwostringstoeachother

• Wedothisbykeepinga"backtrace"• Every'meweenteracell,rememberwherewecamefrom• Whenwereachtheend,

• Tracebackthepathfromtheupperrightcornertoreadoffthealignment

DanJurafsky

N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O N

EditDistance

DanJurafsky

MinEditwithBacktrace

DanJurafsky

AddingBacktracetoMinimumEditDistance

• Basecondi'ons:Termina'on: D(i,0) = i D(0,j) = j D(N,M) is distance • RecurrenceRela'on:

For each i = 1...M!

! For each j = 1...N

D(i-1,j) + 1!

D(i,j)= min D(i,j-1) + 1! D(i-1,j-1) + 2; if X(i) ≠ Y(j) ! 0; if X(i) = Y(j)! LEFT! ptr(i,j)= DOWN! DIAG!!

DanJurafsky

TheDistanceMatrix

SlideadaptedfromSerafimBatzoglou

DanJurafsky

ResultofBacktrace

• Twostringsandtheiralignment:

DanJurafsky

Performance• Time:

O(nm) • Space: O(nm) • Backtrace

O(n+m)

MinimumEditDistance

Backtracefor

Compu'ngAlignments

MinimumEditDistance

WeightedMinimumEdit

Distance

DanJurafsky

WeightedEditDistance

• Whywouldweaddweightstothecomputa'on?• SpellCorrec'on:someleXersaremorelikelytobemistypedthanothers• Biology:certainkindsofdele'onsorinser'onsaremorelikelythan

others

DanJurafsky

Confusionmatrixforspellingerrors

DanJurafsky

DanJurafsky

WeightedMinEditDistance

• RecurrenceRela'on: D(i-1,j) + del[x(i)]!D(i,j)= min D(i,j-1) + ins[y(j)]! D(i-1,j-1) + sub[x(i),y(j)]! • Termina'on:

D(N,M) is distance

DanJurafsky

Where did the name, dynamic programming, come from?

...The 1950s were not good years for mathematical research. [the] Secretary of Defense ...had a pathological fear and hatred of the word, research...

I decided therefore to use the word, "programming". I wanted to get across the idea that this was dynamic, this was multistage... I thought,

let's

take a word that has an absolutely precise meaning, namely dynamic... it's impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible.

Thus, I thought dynamic programming was a good name. It was something not even a

Congressman could object to."

Richard Bellman, "Eye of the Hurricane: an autobiography" 1984.

MinimumEditDistance

WeightedMinimumEdit

Distance

MinimumEditDistance

MinimumEditDistance

inComputa'onalBiology

DanJurafsky

SequenceAlignment

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC

DanJurafsky

Whysequencealignment?

• Comparinggenesorregionsfromdifferentspecies • tofindimportantregions• determinefunc'on• uncoverevolu'onaryforces • AssemblingfragmentstosequenceDNA• Compareindividualstolookingformuta'ons

DanJurafsky

Alignmentsintwofields• InNaturalLanguageProcessing • Wegenerallytalkaboutdistance(minimized) • Andweights • InComputa'onalBiology• Wegenerallytalkaboutsimilarity(maximized) • Andscores

DanJurafsky

TheNeedleman-WunschAlgorithm

• Ini'aliza'on:D(i,0) = -i * d!D(0,j) = -j * d • RecurrenceRela'on: D(i-1,j) - d!D(i,j)= min D(i,j-1) - d! D(i-1,j-1) + s[x(i),y(j)]! • Termina'on:

D(N,M) is distance

DanJurafsky

TheNeedleman-WunschMatrix

SlideadaptedfromSerafimBatzoglou

(NotethattheoriginisattheupperleB.)

DanJurafsky

Avariantofthebasicalgorithm:

• MaybeitisOKtohaveanunlimited#ofgapsinthebeginningandend:

SlidefromSerafimBatzoglou

----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCGAGTTCATCTATCAC--GACCGC--GGTCG-------------- • If so, we don't want to penalize gaps at the ends

DanJurafsky

Differenttypesofoverlaps

SlidefromSerafimBatzoglou

Example: 2 overlapping"reads" from a sequencing project Example: Search for a mouse gene within a human chromosome

DanJurafsky

TheOverlapDetec9onvariant

Changes:

1. Ini'aliza'on

For all i, j,!!F(i, 0) = 0!!F(0, j) = 0!

2. Termina'on

! max i

F(i, N)!F

OPT = max !! max j

F(M, j)

SlidefromSerafimBatzoglou

DanJurafsky

Giventwostringsx=x

1 ......x M y=y 1 ......y N

SlidefromSerafimBatzoglou

TheLocalAlignmentProblem

DanJurafsky

TheSmith-Watermanalgorithm

Idea:Ignorebadlyaligningregions

Modifica'onstoNeedleman-Wunsch:Ini9aliza9on:F(0, j) = 0!!!!F(i, 0) = 0!0!!Itera9on:F(i, j) = max F(i - 1, j) - d!!!!! F(i, j - 1) - d!!!!! F(i - 1, j - 1) + s(x

i , y j

SlidefromSerafimBatzoglou

DanJurafsky

TheSmith-Watermanalgorithm

Termina9on:

1. Ifwewantthebestlocalalignment...

F OPT =max i,j

F(i,j)FindF

OPT andtraceback

2. Ifwewantalllocalalignmentsscoring>t

??Foralli,jfindF(i,j)>t,andtraceback?

Complicatedbyoverlappinglocalalignments

SlidefromSerafimBatzoglou

DanJurafsky

Localalignmentexample

A !T!T!A!T!C!0!0!0!0!0!0!0!A!0!T!0!C!0!A!0!T!0!X = ATCAT!Y = ATTATC!

DanJurafsky

Localalignmentexample

A

!T!T!A!T!C!0!0!0!0!0!0!0!A!0!1!0!0!1!0!0!T!0!0!2!1!0!2!0!C!0!0!1!1!0!1!3!A!0!1!0!0!2!1!2!T!0!0!2!0!1!3!2!X = ATCAT!Y = ATTATC!

DanJurafsky

Localalignmentexample

A

!T!T!A!T!C!0!0!0!0!0!0!0!A!0!1!0!0!1!0!0!T!0!0!2!1!0!2!0!C!0!0!1!1!0!1!3!A!0!1!0!0!2!1!2!T!0!0!2!0!1!3!2!X = ATCAT!Y = ATTATC!

DanJurafsky

Localalignmentexample

A

!T!T!A!T!C!0!0!0!0!0!0!0!A!0!1!0!0!1!0!0!T!0!0!2!1!0!2!0!C!0!0!1!1!0!1!3!A!0!1!0!0!2!1!2!T!0!0!2!0!1!3!2!X = ATCAT!Y = ATTATC!

MinimumEditDistance

MinimumEditDistance

inComputa'onalBiologyquotesdbs_dbs47.pdfusesText_47