Analysis of hidden stops, separating the three stop codons

IMPORTANT: I change the programs gen2probstop.m and paresaa2probs.m

>> [probstop,casosfavorables,casostotales]=gen2probstop(genes,codigo);

First stop codon:

subplot(1,2,1)
imagesc(probstop{1}(:,:,1))
colorbar
subplot(1,2,2)
imagesc(probstop{2}(:,:,1))
colorbar

Second:
subplot(1,2,1)
imagesc(probstop{1}(:,:,2))
colorbar
subplot(1,2,2)
imagesc(probstop{2}(:,:,2))
colorbar

Third:

>> subplot(1,2,1)
imagesc(probstop{1}(:,:,3))
colorbar
subplot(1,2,2)
imagesc(probstop{2}(:,:,3))
colorbar

Now I compare with the theoretical probabilities:

>> codones=gen2codones(genes);
>> probcodones=hist(codones,1:65);
>> probcodones=probcodones(1:64);
>> probcodones=probcodones/sum(probcodones);
>> probstop_teor=paresaa2probs(codigo,probcodones);

Codón uaa:
>> c=1;
subplot(1,2,1)
imagesc(probstop{1}(:,:,c)./probstop_teor{1}(:,:,c))
colorbar
subplot(1,2,2)
imagesc(probstop{2}(:,:,c)./probstop_teor{2}(:,:,c))
colorbar

Codón uga:

c=2;
subplot(1,2,1)
imagesc(probstop{1}(:,:,c)./probstop_teor{1}(:,:,c))
colorbar
subplot(1,2,2)
imagesc(probstop{2}(:,:,c)./probstop_teor{2}(:,:,c))
colorbar

Codón uag:

c=3;
subplot(1,2,1)
imagesc(probstop{1}(:,:,c)./probstop_teor{1}(:,:,c))
colorbar
subplot(1,2,2)
imagesc(probstop{2}(:,:,c)./probstop_teor{2}(:,:,c))
colorbar

 

Advertisement

Probabilities of hidden stops in the intergenetic sequences

>> interg=genoma.intergenetica(:);
>> interg=interg(interg~=0);
>> rem(length(interg),3)
ans =
1
>> interg=interg(1:end-1);
>> rem(length(interg),3)
ans =
0
>> prob_noefecto_interg=gen2prob_noefecto(interg,codigo);

>> hist(log10(prob_noefecto_interg(:)))

Random, as expected.

Matrices:
>> subplot(1,2,1)
>> imagesc(prob_noefecto_interg(:,:,1))
>> colorbar
>> subplot(1,2,2)
>> imagesc(prob_noefecto_interg(:,:,2))
>> colorbar

>> subplot(1,2,1)
imagesc(log10(prob_noefecto_interg(:,:,1)))
colorbar
subplot(1,2,2)
imagesc(log10(prob_noefecto_interg(:,:,2)))
colorbar

Bayesian study of the probabilities of hidden stop codons

There was a mistake yesterdary in the normalization of probabilities. Corrected today.

First, comparison of theoretical probabilities and actual ones (this is not new):

>> load codigo_MycobTub
>> load MT_CDC1551
>> genes=genoma.gene.sec(:);
>> genes=genes(genes~=0);
>> codones=gen2codones(genes);
>> probcodones=hist(codones,1:65);
>> probcodones=probcodones(1:64);
>> probcodones=probcodones/sum(probcodones);
>> [probstop1_teor,probstop2_teor]=paresaa2probs(codigo,probcodones);
>> [probstop1,probstop2,casosfavorables,casostotales,n_saltados]=gen2probstop(genes,codigo);
>> subplot(1,2,1)
>> imagesc(probstop1./probstop1_teor)
>> colorbar
>> subplot(1,2,2)
>> imagesc(probstop2./probstop2_teor)
>> colorbar

I look for the probability, k, of finding a hidden stop codon between two given amino-acids. In the random case, k should be the proportion of synimous codons that give a hidden stop codon, and this may be weighted with the relative proportions of codons in the genome. Then, for the real genome, I count how many times I find the two aminoacids together, and in how many of those cases I find a hidden stop codon. With that information and Bayes, I can compute the probability of k. I do it, for example, for element (2,3) in frame 1:

>> load codigo_MycobTub
>> load MT_CDC1551
>> genes=genoma.gene.sec(:);
>> genes=genes(genes~=0);
>> codones=gen2codones(genes);
>> probcodones=hist(codones,1:65);
>> probcodones=probcodones(1:64);
>> probcodones=probcodones/sum(probcodones);
>> [probstop1_teor,probstop2_teor]=paresaa2probs(codigo,probcodones);
>> [probstop1,probstop2,casosfavorables,casostotales,n_saltados]=gen2probstop(genes,codigo);
>> probstop1_teor(2,3,1)
ans =
0.6051
>> probstop1(2,3,1)
ans =
0.6246
>> [probk,k]=cuentas2probk_bayes(casosfavorables(2,3,1),casostotales(2,3,1));
>> [m,ind]=min(abs(k-probstop1_teor(2,3,1)));
>> sum(probk(1:ind))*diff(k(1:2))
ans =
0.0169
>> plot(k,probk)

I find that, for this element, the probability in the genes is higher than the theoretical probability. And that this difference is significant. Now, I do it for all pairs of aminoacids, in both reading frames:

>> prob_noefecto=gen2prob_noefecto(genes,codigo);
>> subplot(1,2,1)
>> imagesc(prob_noefecto(:,:,1))
>> colorbar
>> subplot(1,2,2)
>> imagesc(prob_noefecto(:,:,2))
>> colorbar

In logarithmic scale:

>> subplot(1,2,1)
imagesc(log10(prob_noefecto(:,:,1)))
colorbar
subplot(1,2,2)
imagesc(log10(prob_noefecto(:,:,2)))
colorbar


Some elements are significant, and some are not. This is compatible with randomness, but there are more significant elements than one would expect at random:
>> hist(prob_noefecto(:),20)


>> hist(log10(prob_noefecto(:)))

>> sum(probs_todas<.05)/length(probs_todas)
ans =
0.3047
>> sum(probs_todas<.01)/length(probs_todas)
ans =
0.2109

So 30% have p<0.05, and 21% have p<0.01. This suggests that there is an effect, but it might also be that some assumptions in the significance calculation are wrong.

I check, by randomizing the order of the codons, and performing the test:

>> codones_perm=codones(randperm(length(codones)));
>> genes_perm=codones2gen(codones_perm);
>> prob_noefecto_perm=gen2prob_noefecto(genes_perm,codigo);
>> hist(prob_noefecto_perm(:),20)

>> hist(log10(prob_noefecto_perm(:)))

>> probs_todas_perm=prob_noefecto_perm(~isnan(prob_noefecto_perm));
>> sum(probs_todas_perm<.05)/length(probs_todas_perm)
ans =
0.0391
>> sum(probs_todas_perm<.01)/length(probs_todas_perm)
ans =
0.0156

This one works fine.

So I would say there is something non-random in the genome.

>> sum(probs_todas>.95)/length(probs_todas)
ans =
0.1406
>> sum(probs_todas>.99)/length(probs_todas)
ans =
0.0859

There are also more pairs of aminoacids with fewer hidden stops than expected…

Next steps:

– Look for something special in the pairs of aminoacids with significant proportion of hidden stop codons (both too many or too few). Any aminoacid specially present? Any stop codon is more likely than the other two?

– Do the study gene by gene, and see whether there are non-random differences among genes.