Sequence Alignment慕课视频播放-算法设计与分析-MOOC慕课视频教程-柠檬大学

我们再来看一下序列比对问题

给定两个字符串ocurrance和occurrence

从直觉来说

它们很相似

是否能够量化它们之间的相似程度呢

把这两个字符串进行比对

我们可以采用不同的方式

第一种比对方式

两个字符串有5对字母不同

称为mismatch

有1个字母没有获得配对

称为1个gap

第二种比对方式

有1个mismatch

1个gap

第三种比对方式

有0个mismatch

3个gap

序列比对问题

在众多领域中有着重要的应用

比如Unix中的diff命令

语音识别计算生物等等

我们看到对于两个字符串

有多种比对方式

如何给出一个合理的量化指标

来评价不同的比对呢

我们给出编辑距离的概念

我们引进两个参数

一个是gap的惩罚参数δ

一个是mismatch的惩罚参数αpq

那么总的费用

就是比对所产生的gap惩罚mismatch惩罚的总量

比如对如下的两个字符串进行比对

采用两种比对方式

第一种比对有5个mismatch

可以根据不同对之间的mismatch惩罚参数

算出目标值

按照第二种方式进行比对

会产生2个gap

1个mismatch

也可以算出对应的目标值

现在可以定义序列比对问题

给定两个字符串X=x_1x_2...x_m

和Y=y_1y_2...y_n

把它们进行比对

使得编辑距离最小

我们把集合M称为一个比对

它是一个有序对的集合

每对的两个元素分别来自X和Y

并且每个元素至多出现在1对中

并且对与对之间是没有交叉的

所谓的交叉是指

x_i-y_j x_i'-y_j'是比对中的两对

但是j>j'

根据编辑距离的定义

我们可以定义比对M的费用为

mismatch的费用

加上两组中没有获得配对儿的gap费用

比如我们给出两个序列

CTACCG和TACATG

进行如下方式的比对

那么会产生一对儿mismatch

C和A以及两个gap

定义OPT(i, j)

为把x1x2...xi和y1y2...yj

进行比对的最优目标值

那么原问题的最优值即为OPT(m, n)

还是分几种情况来讨论

最优解中

xi-yj配成一对

那么就会付出它们两个之间的

mismatch的费用

加上对余下的两个字符串进行比对

所产生的费用OPT(i-1, j-1)

再考虑

x_i没有获得配对

那么付出一个gap费用

再加上对余下的字符串

进行比对所产生的费用OPT(i-1, j)

如果y_j没有获得配对

那么就要付一个gap的费用δ

再加上对余下的字符串

进行比对所产生的费用OPT(i, j-1)

最优解一定是各种情况中的最优选择

我们可以建立如下的动态规划基本方程

当i=0或j=0时

目标值显然为jδ或iδ

一般情况

为上面讨论的各种情况的最小值

按照动态规划基本方程

我们可以写出算法的伪代码

这里不再累述

算法需要θ(mn)的时间和空间

算法设计与分析课程列表：

1 Introduction of Algorithm

-1.1 Introduction

--Introduction

-1.2 A First Problem: Stable Matching

--A First Problem: Stable Matching

-1.3 Gale-Shapley Algorithm

--Gale-Shapley Algorithm

-1.4 Understanding Gale-Shapley Algorithm

--Understanding Gale-Shapley Algorithm

-Homework1

-Lecture note 1

--Lecture note 1 Introduction of Algorithm

2 Basics of Algorithm Analysis

-2.1 Computational Tractability

--Computational Tractability

-2.2 Asymptotic Order of Growth

--Asymptotic Order of Growth

-2.3 A Survey of Common Running Times

--A Survey of Common Running Times

-Homework2

-Lecture note 2

--Lecture note 2 Basics of Algorithm Analysis

3 Graph

-3.1 Basic Definitions and Applications

--Basic Definitions and Applications

-3.2 Graph Traversal

--Graph Traversal

-3.3 Testing Bipartiteness

--Testing Bipartiteness

-3.4 Connectivity in Directed Graphs

--Connectivity in Directed Graphs

-3.5 DAG and Topological Ordering

--DAG and Topological Ordering

-Homework3

-Lecture note 3

--Lecture note 3 Graph

4 Greedy Algorithms

-4.1 Coin Changing

--Coin Changing

-4.2 Interval Scheduling

--Interval Scheduling

-4.3 Interval Partitioning

--Interval Partitioning

-4.4 Scheduling to Minimize Lateness

--Scheduling to Minimize Lateness

-4.5 Optimal Caching

--Optimal Caching

-4.6 Shortest Paths in a Graph

--Shortest Paths in a Graph

-4.7 Minimum Spanning Tree

--Minimum Spanning Tree

-4.8 Correctness of Algorithms

--Correctness of Algorithms

-4.9 Clustering

--Clustering

-Homework4

-Lecture note 4

--Lecture note 4 Greedy Algorithms

5 Divide and Conquer

-5.1 Mergesort

--Mergesort

-5.2 Counting Inversions

--Counting Inversions

-5.3 Closest Pair of Points

--Closest Pair of Points

-5.4 Integer Multiplication

--Integer Multiplication

-5.5 Matrix Multiplication

--Video

-5.6 Convolution and FFT

--Convolution and FFT

-5.7 FFT

--FFT

-5.8 Inverse DFT

--Inverse DFT

-Homework5

-Lecture note 5

--Lecture note 5 Divide and Conquer

6 Dynamic Programming

-6.1 Weighted Interval Scheduling

--Weighted Interval Scheduling

-6.2 Segmented Least Squares

--Segmented Least Squares

-6.3 Knapsack Problem

--Knapsack Problem

-6.4 RNA Secondary Structure

--RNA Secondary Structure

-6.5 Sequence Alignment

--Sequence Alignment

-6.6 Shortest Paths

--Shortest Paths

-Homework6

-Lecture note 6

--Lecture note 6 Dynamic Programming

7 Network Flow

-7.1 Flows and Cuts

--Flows and Cuts

-7.2 Minimum Cut and Maximum Flow

--Minimum Cut and Maximum Flow

-7.3 Ford-Fulkerson Algorithm

--Ford-Fulkerson Algorithm

-7.4 Choosing Good Augmenting Paths

--Choosing Good Augmenting Paths

-7.5 Bipartite Matching

--Bipartite Matching

-Homework7

-Lecture note 7

--Lecture note 7 Network Flow

8 NP and Computational Intractability

-8.1 Polynomial-Time Reductions

--Polynomial-Time Reductions

-8.2 Basic Reduction Strategies I

--Basic Reduction Strategies I

-8.3 Basic Reduction Strategies II

--Basic Reduction Strategies II

-8.4 Definition of NP

--Definition of NP

-8.5 Problems in NP

--Problems in NP

-8.6 NP-Completeness

--NP-Completeness

-8.7 Sequencing Problems

--Sequencing Problems

-8.8 Numerical Problems

--Numerical Problems

-8.9 co-NP and the Asymmetry of NP

--co-NP and the Asymmetry of NP

-Homework8

-Lecture note 8

--Lecture note 8 NP and Computational Intractability

9 Approximation Algorithms

-9.1 Load Balancing

--Load Balancing

-9.2 Center Selection

--Center Selection

-9.3 The Pricing Method: Vertex Cover

--The Pricing Method: Vertex Cover

-9.4 LP Rounding: Vertex Cover

--LP Rounding: Vertex Cover

-9.5 Knapsack Problem

--Knapsack Problem

-Homework9

-Lecture note 9

--Lecture note 9 Approximation Algorithms

10 Local Search

-10.1 Landscape of an Optimization Problem

--Landscape of an Optimization Problem

-10.2 Maximum Cut

--Maximum Cut

-10.3 Nash Equilibria

--Nash Equilibria

-10.4 Price of Stability

--Price of Stability

-Homework10

-Lecture note 10

--Lecture note 10 Local Search

11 Randomized Algorithms

-11.1 Contention Resolution

--Contention Resolution

-11.2 Linearity of Expectation

--Linearity of Expectation

-11.3 MAX 3-SAT

--MAX 3-SAT

-11.4 Chernoff Bounds

--Chernoff Bounds

-Homework11

-Lecture note 11

--Lecture note 11 Randomized Algorithms

Exam

-Exam

Sequence Alignment在线视频