From 5d347dfc798e57592979720ab568a5acc7b63cfc Mon Sep 17 00:00:00 2001 From: Anton Nekrutenko Date: Thu, 18 Apr 2024 07:53:07 -0400 Subject: [PATCH] Update alignment.md --- 2024/alignment.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/2024/alignment.md b/2024/alignment.md index 56e9835..f97866e 100644 --- a/2024/alignment.md +++ b/2024/alignment.md @@ -80,9 +80,9 @@ where $\delta(x,y) = 0$ if $x = y$ (nucleotides match) and $\delta(x,y) = 1$ if The take-home-message here is that it takes a very long time to compute the edit distance between two sequences that are only **nine** nucleotides long! Why is this happening? Figure 1 below shows a small subset of situations the algorithm is evaluating for two very short strings $\texttt{TAG}$ and $\texttt{TAC}$: -![](http://www.bx.psu.edu/~anton/bioinf-images/editDist.png) +![image](https://github.com/nekrut/BMMB554/assets/4291636/399468e5-cc12-4a84-969e-ce4c1e5186a4) -**Figure 1** | A fraction of situations evaluated by the naïve algorithm for computing the edit distance. Just like in the case of the change problem discussed in the previous lecture a lot of time is wasted on computing distances between suffixes that has been seen more than once (shown in red). +**Figure 1** | A fraction of situations evaluated by the naïve algorithm for computing the edit distance. Just like in the case of the change problem discussed in the previous lecture a lot of time is wasted on computing distances between suffixes that has been seen more than once (shown in red). To understand the magnitude of this problem let's look at slightly modified version of the previous Python code below. All we do here is keeping track how many times a particular pair of suffixes (in this case $\texttt{AC}$ and $\texttt{AC}$) are seen by the program. The number is staggering: 48,639. So this algorithm is **extremely** wasteful.