## Hidden stops in the complete genome of M. Tuberculosis

I put all the genes together:

>> genes=genoma.gene.sec(:);
>> genes=genes(genes~=0);

Compute the probabilities of hidden stop codons between each pair of aminoacids:

>> [probstop_fr1_gen,probstop_fr2_gen]=gen2probstop(genes,codigo);
>> subplot(1,2,1)
>> imagesc(probstop_fr1_gen)
>> colorbar
>> subplot(1,2,2)
>> imagesc(probstop_fr2_gen)
>> colorbar

>> sum(probstop_fr1_gen(:))
ans =
16.8621
>> sum(probstop_fr2_gen(:))
ans =
19.1280

I compare them with the theoretical ones:

>> [probstop_fr1,probstop_fr2]=paresaa2probs(codigo);
>> figure
>> subplot(1,2,1)
>> imagesc(probstop_fr1)
>> colorbar
>> subplot(1,2,2)
>> imagesc(probstop_fr2)
>> colorbar
>> sum(probstop_fr1(:))
ans =
18.7500
>> sum(probstop_fr2(:))
ans =
24.5000

So the probability in the genes is actually LOWER than expected. Puaj.

But now, I will take into account the codon bias:

>> codones=gen2codones(genes);
>> hist(codones,1:65)

(codon 65 are unidentifiable codons, due to an error in sequentiation)

>> probcodones=hist(codones,1:65);
>> probcodones=probcodones(1:64);
>> probcodones=probcodones/sum(probcodones);
>> [probstop_fr1,probstop_fr2]=paresaa2probs(codigo,probcodones);
>> close all
>> subplot(1,2,1)
>> imagesc(probstop_fr1)
>> colorbar
>> subplot(1,2,2)
>> imagesc(probstop_fr2)
>> colorbar
>> sum(probstop_fr1(:))
ans =
16.6937
>> sum(probstop_fr2(:))
ans =
18.4024

With the codon bias, the theoretical prediction is slightly lower than the experimental result, especially in the frame 2 (frame -1). Let us see the relative probability:

>> subplot(1,2,1)
>> imagesc(probstop_fr1_gen./probstop_fr1)
>> colorbar
>> subplot(1,2,2)
>> imagesc(probstop_fr2_gen./probstop_fr2)
>> colorbar
>> nanmean(nanmean(probstop_fr1_gen./probstop_fr1))
ans =
1.0142
>> nanmean(nanmean(probstop_fr2_gen./probstop_fr2))
ans =
1.0366

Psé.