Search

# GA-MSA

Last updated Jun 23, 2022

GA-MSA is an algorithm that tries to solve Multiple Sequence Alignment problem using Genetic Algorithm. This project was developed for my Bioinformatics class at GIK.

## # Approach

### # Crossover

• Select organisms from a Discrete Probability Distribution using Vose Alias method, where probabilities were calculated using their fitness score. Organisms with higher fitness score have a higher probability of being selected.
• About 50% of organisms are copied as it is, without any crossover, to be included in the next population.
• 50% of organisms are selected as parents, with a certain probability of applying either of the following:
1. Horizontal Recombination Probability of selection = 30%
2. Vertical Recombination Probability of selection = 20%
3. Randomly select either of the two parents. Probability of selection = 50%

#### # Horizontal Recombination

Randomly select each sequence from one of the parents. e.g.

 1 2 3 4  p_1 = [1, 2, 3, 4, 5] p_2 = [a, b, c, d, e] child = [a, 2, 3, d, 5] 

#### # Vertical Recombination

Randomly define a cut point in the sequence and build the offspring by copying the sequence from position 1 up to the cut point from one parent and from the cut point to the end of the sequence from the other parent.

Keep track of the gaps before split point to maintain the integrity of the sequence. If split point isn’t valid, select either of the parents as the offspring.

### # Mutation

Following four operators are randomly selected for mutation:

1. Gap Open: A position in the sequence is randomly selected and a block of gaps of variable size (1 - 20% of the max sequence length) is inserted into the sequence.
2. Gap Extension: Select a random block of gaps and extend it by one.
3. Gap Reduction: Select a random block of gaps and reduce it by one. Can act like a Gap Close operator in case of gap block of size 1.
4. None: No mutation is performed.