Implementation of Levenshtein Distance Algorithm for E
A Levenshtein Algorithm Phonetic distances between Standard Danish (Lyngby) and each of the other 17 Scandinavian language varieties were calculated by means of the Levenshtein algorithm When phonetic transcriptions of two pronunciations are compared with each other, Levenshtein distance is equal to the number of
Thread-cooperative, Bit-parallel Computation of Levenshtein
distance, by less than a value specified by the user In the context of biological sequence alignment one often employs Levenshtein distance, i e the minimum number of edit operations needed to transform the query into the match Each operation can be either a substitution, or an insertion, or a deletion of a single character Levenshtein
Plagiarism Detection Using Levenshtein Distance With Dynamic
sentence per sentence using a modified levenshtein distance solver implemented with dynamic programming The result of the distance solver will be passed on to another function which will determine whether or not a potential plagiarism is detected in a certain sentence by taking into factor the distance of sentence and the number of words B
Structural Off-line Handwriting Character Recognition Using
distance, where the character’s graph feature converted into string representation, and the edit distance between strings measured The string edit distance algorithm using Levenshtein distance method19 and related work using levenshtein distance for on-line handwriting recognition is also mentioned20
RESEARCH NOTES Distributions of cognates JOB SCHEPENS in
normalized Levenshtein distance function can efficiently and reliably simulate bilingual orthographic similarity ratings Orthographic similarity distributions of cognates and non-cognates were identified across pairs of six European languages: English, German, French, Spanish, Italian, and Dutch
DNA Multiple Sequence Alignment by a Hidden Markov Model and
The Fuzzy Levenshtein Distance of each of the sequences in the chromosome is then computed from the consensus sequence The Levenshtein distance between two words is the minimum number of single-character edits (insertion, deletion, substitution) required to change one word into the other
Minimum&Edit& Distance
DanJurafsky Where did the name, dynamic programming, come from? & The 1950s were not good years for mathematical research [the] Secretary of
[PDF] lever l'indétermination d'une limite
[PDF] lever une forme indéterminée
[PDF] lever une forme indéterminée 0*infini
[PDF] léviathan 2014
[PDF] levier inter resistant
[PDF] levinas altérité
[PDF] levinas ethique et infini pdf
[PDF] levinas le temps et l'autre pdf
[PDF] levinas visage citation
[PDF] levure ade2
[PDF] levure respiration fermentation
[PDF] Levures dans 4 milieux différents
[PDF] LEWIS ET CLARK LEUR EXPEDITIONT ,AIDE MOI SVP SES POUR DEMAIN !
[PDF] lex de physique "la réfraction"
MinimumEditDistance
Defini'onofMinimum
EditDistance
DanJurafsky
Howsimilararetwostrings?
• Spellcorrec'on• Theusertyped"graffe"Whichisclosest? • graf• graB• grail• giraffe • Computa'onalBiology• Aligntwosequencesofnucleo'des• Resul'ngalignment: • AlsoforMachineTransla'on,Informa'onExtrac'on,SpeechRecogni'on AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC
DanJurafsky
EditDistance
• Theminimumeditdistancebetweentwostrings • Istheminimumnumberofedi'ngopera'ons • Inser'on• Dele'on• Subs'tu'on • NeededtotransformoneintotheotherDanJurafsky
MinimumEditDistance
• Twostringsandtheiralignment:DanJurafsky
MinimumEditDistance
• Ifeachopera'onhascostof1• Distancebetweentheseis5 • Ifsubs'tu'onscost2(Levenshtein) • Distancebetweenthemis8DanJurafsky
AlignmentinComputa9onalBiology
• Givenasequenceofbases • Analignment:• Giventwosequences,aligneachleXertoaleXerorgap-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC
DanJurafsky
OtherusesofEditDistanceinNLP
• Evalua'ngMachineTransla'onandspeechrecogni'onR Spokesman confirms senior government adviser was shot!H Spokesman said the senior adviser was shot dead! S I D I!
• NamedEn'tyExtrac'onandEn'tyCoreference• IBMInc.announcedtoday• IBMprofits• StanfordPresidentJohnHennessyannouncedyesterday• forStanfordUniversityPresidentJohnHennessy
DanJurafsky
HowtofindtheMinEditDistance?
• Searchingforapath(sequenceofedits)fromthestartstringtothefinalstring:• Ini9alstate:thewordwe'retransforming• Operators:insert,delete,subs'tute• Goalstate:thewordwe'retryingtogetto• Pathcost:whatwewanttominimize:thenumberofedits
8DanJurafsky
MinimumEditasSearch
• Butthespaceofalleditsequencesishuge!• Wecan'taffordtonavigatenaïvely• Lotsofdis'nctpathswindupatthesamestate.• Wedon'thavetokeeptrackofallofthem• Justtheshortestpathtoeachofthoserevistedstates.
9DanJurafsky
DefiningMinEditDistance
• Fortwostrings • Xoflengthn• Yoflengthm • WedefineD(i,j) • theeditdistancebetweenX[1..i]andY[1..j] • i.e.,thefirsticharactersofXandthefirstjcharactersofY • TheeditdistancebetweenXandYisthusD(n,m)MinimumEditDistance
Defini'onofMinimum
EditDistance
MinimumEditDistance
Compu'ngMinimum
EditDistance
DanJurafsky
DynamicProgrammingforMinimumEditDistance
• Dynamicprogramming:Atabularcomputa'onofD(n,m) • Solvingproblemsbycombiningsolu'onstosubproblems.• BoXom-up• WecomputeD(i,j)forsmalli,j• AndcomputelargerD(i,j)basedonpreviouslycomputedsmallervalues• i.e.,computeD(i,j)foralli(0 DanJurafsky
DefiningMinEditDistance(Levenshtein)
• Ini'aliza'on D(i,0) = i!D(0,j) = j
• RecurrenceRela'on: For each i = 1...M!! For each j = 1...N
D(i-1,j) + 1! D(i,j)= min D(i,j-1) + 1! D(i-1,j-1) + 2; if X(i) ≠ Y(j) ! 0; if X(i) = Y(j)!
• Termina'on:D(N,M) is distance
DanJurafsky
N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O NTheEditDistanceTable
DanJurafsky
N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O NTheEditDistanceTable
DanJurafsky
N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O NEditDistance
DanJurafsky
N 9 8 9 10 11 12 11 10 9 8 O 8 7 8 9 10 11 10 9 8 9 I 7 6 7 8 9 10 9 8 9 10 T 6 5 6 7 8 9 8 9 10 11 N 5 4 5 6 7 8 9 10 11 10 E 4 3 4 5 6 7 8 9 10 9 T 3 4 5 6 7 8 7 8 9 8 N 2 3 4 5 6 7 8 7 8 7 I 1 2 3 4 5 6 7 6 7 8 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O N
TheEditDistanceTable
MinimumEditDistance
Compu'ngMinimum
EditDistance
MinimumEditDistance
Backtracefor
Compu'ngAlignments
DanJurafsky
Compu9ngalignments
• Editdistanceisn'tsufficient• WeoBenneedtoaligneachcharacterofthetwostringstoeachother• Wedothisbykeepinga"backtrace"• Every'meweenteracell,rememberwherewecamefrom• Whenwereachtheend,
• TracebackthepathfromtheupperrightcornertoreadoffthealignmentDanJurafsky
N 9 O 8 I 7 T 6 N 5 E 4 T 3 N 2 I 1 # 0 1 2 3 4 5 6 7 8 9 # E X E C U T I O NEditDistance
DanJurafsky
MinEditwithBacktrace
DanJurafsky
AddingBacktracetoMinimumEditDistance
• Basecondi'ons:Termina'on: D(i,0) = i D(0,j) = j D(N,M) is distance • RecurrenceRela'on:For each i = 1...M!
! For each j = 1...ND(i-1,j) + 1!
D(i,j)= min D(i,j-1) + 1! D(i-1,j-1) + 2; if X(i) ≠ Y(j) ! 0; if X(i) = Y(j)! LEFT! ptr(i,j)= DOWN! DIAG!!
DanJurafsky
TheDistanceMatrix
SlideadaptedfromSerafimBatzoglou
DanJurafsky
ResultofBacktrace
• Twostringsandtheiralignment:DanJurafsky
Performance• Time:
O(nm) • Space: O(nm) • BacktraceO(n+m)
MinimumEditDistance
Backtracefor
Compu'ngAlignments
MinimumEditDistance
WeightedMinimumEdit
Distance
DanJurafsky
WeightedEditDistance
• Whywouldweaddweightstothecomputa'on?• SpellCorrec'on:someleXersaremorelikelytobemistypedthanothers• Biology:certainkindsofdele'onsorinser'onsaremorelikelythan
othersDanJurafsky
Confusionmatrixforspellingerrors
DanJurafsky
DanJurafsky
WeightedMinEditDistance
• RecurrenceRela'on: D(i-1,j) + del[x(i)]!D(i,j)= min D(i,j-1) + ins[y(j)]! D(i-1,j-1) + sub[x(i),y(j)]! • Termina'on:D(N,M) is distance
DanJurafsky
Where did the name, dynamic programming, come from?...The 1950s were not good years for mathematical research. [the] Secretary of Defense ...had a pathological fear and hatred of the word, research...
I decided therefore to use the word, "programming". I wanted to get across the idea that this was dynamic, this was multistage... I thought,
let'stake a word that has an absolutely precise meaning, namely dynamic... it's impossible to use the word, dynamic, in a pejorative sense. Try thinking of some combination that will possibly give it a pejorative meaning. It's impossible.
Thus, I thought dynamic programming was a good name. It was something not even aCongressman could object to."
Richard Bellman, "Eye of the Hurricane: an autobiography" 1984.MinimumEditDistance
WeightedMinimumEdit
Distance
MinimumEditDistance
MinimumEditDistance
inComputa'onalBiologyDanJurafsky
SequenceAlignment
-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC--GGTCGATTTGCCCGAC AGGCTATCACCTGACCTCCAGGCCGATGCCC TAGCTATCACGACCGCGGTCGATTTGCCCGAC
DanJurafsky
Whysequencealignment?
• Comparinggenesorregionsfromdifferentspecies • tofindimportantregions• determinefunc'on• uncoverevolu'onaryforces • AssemblingfragmentstosequenceDNA• Compareindividualstolookingformuta'onsDanJurafsky
Alignmentsintwofields• InNaturalLanguageProcessing • Wegenerallytalkaboutdistance(minimized) • Andweights • InComputa'onalBiology• Wegenerallytalkaboutsimilarity(maximized) • AndscoresDanJurafsky
TheNeedleman-WunschAlgorithm
• Ini'aliza'on:D(i,0) = -i * d!D(0,j) = -j * d • RecurrenceRela'on: D(i-1,j) - d!D(i,j)= min D(i,j-1) - d! D(i-1,j-1) + s[x(i),y(j)]! • Termina'on:D(N,M) is distance
DanJurafsky
TheNeedleman-WunschMatrix
SlideadaptedfromSerafimBatzoglou
(NotethattheoriginisattheupperleB.)DanJurafsky
Avariantofthebasicalgorithm:
• MaybeitisOKtohaveanunlimited#ofgapsinthebeginningandend:SlidefromSerafimBatzoglou
----------CTATCACCTGACCTCCAGGCCGATGCCCCTTCCGGC GCGAGTTCATCTATCAC--GACCGC--GGTCG-------------- • If so, we don't want to penalize gaps at the ends
DanJurafsky
Differenttypesofoverlaps
SlidefromSerafimBatzoglou
Example: 2 overlapping"reads" from a sequencing project Example: Search for a mouse gene within a human chromosome
DanJurafsky
TheOverlapDetec9onvariant
Changes:
1. Ini'aliza'on
For all i, j,!!F(i, 0) = 0!!F(0, j) = 0!
2. Termina'on
! max iF(i, N)!F
OPT = max !! max jF(M, j)
SlidefromSerafimBatzoglou
DanJurafsky
Giventwostringsx=x
1 ......x M y=y 1 ......y NSlidefromSerafimBatzoglou
TheLocalAlignmentProblem
DanJurafsky
TheSmith-Watermanalgorithm
Idea:Ignorebadlyaligningregions
Modifica'onstoNeedleman-Wunsch:Ini9aliza9on:F(0, j) = 0!!!!F(i, 0) = 0!0!!Itera9on:F(i, j) = max F(i - 1, j) - d!!!!! F(i, j - 1) - d!!!!! F(i - 1, j - 1) + s(x
i , y jSlidefromSerafimBatzoglou
DanJurafsky
TheSmith-Watermanalgorithm
Termina9on:
1. Ifwewantthebestlocalalignment...
F OPT =max i,jF(i,j)FindF
OPT andtraceback2. Ifwewantalllocalalignmentsscoring>t
??Foralli,jfindF(i,j)>t,andtraceback?Complicatedbyoverlappinglocalalignments
SlidefromSerafimBatzoglou
DanJurafsky
Localalignmentexample
A !T!T!A!T!C!0!0!0!0!0!0!0!A!0!T!0!C!0!A!0!T!0!X = ATCAT!Y = ATTATC!DanJurafsky
Localalignmentexample
A!T!T!A!T!C!0!0!0!0!0!0!0!A!0!1!0!0!1!0!0!T!0!0!2!1!0!2!0!C!0!0!1!1!0!1!3!A!0!1!0!0!2!1!2!T!0!0!2!0!1!3!2!X = ATCAT!Y = ATTATC!
DanJurafsky
Localalignmentexample
A!T!T!A!T!C!0!0!0!0!0!0!0!A!0!1!0!0!1!0!0!T!0!0!2!1!0!2!0!C!0!0!1!1!0!1!3!A!0!1!0!0!2!1!2!T!0!0!2!0!1!3!2!X = ATCAT!Y = ATTATC!
DanJurafsky
Localalignmentexample
A!T!T!A!T!C!0!0!0!0!0!0!0!A!0!1!0!0!1!0!0!T!0!0!2!1!0!2!0!C!0!0!1!1!0!1!3!A!0!1!0!0!2!1!2!T!0!0!2!0!1!3!2!X = ATCAT!Y = ATTATC!