Finding the optimal code most similar to the real one. Optimal code of One In A Million

>> codigo_oneinamillion=PideCodigo(codigo)
>> muestracodigo_fig(codigo_oneinamillion)

>> codigo_oneinamillion=rmfield(codigo_oneinamillion,’aa2codon’)
>> save codigo_oneinamillion codigo_oneinamillion

Now, look for the nearest ‘optimal’ code:

>> [codigo_final,errormin,errormedio]=acercacodigos(codigo,codigo_oneinamillion);
>> errormin
errormin =
0.7031
>> errormin*64
ans =
45

A bit nearer than yesterday.

>> sum(errormedio(:)==min(errormedio(:)))
ans =
4

And there are 4 permutations that match equally well (this also happened yesterday).

Advertisement

Optimal code similar to real code: First try

First, I generate 10^6 random codes, and choose the best one (alternatively, I could use one of the ‘optimal codes’ in the papers):

>> [codigo_opt,MS0_opt,MS_bases_opt,MS0,MS_bases]=optimizacodigo(codigo,10^6,1);
>> hist(MS0,100)

>> MS0_opt
MS0_opt =
4.4522
>> muestracodigo_fig(codigo_opt)

>> muestracodigo_fig(codigo)

>> save codigo_opt
>> [errormedio,matriz]=comparacodigos(codigo,codigo_opt);
>> errormedio
errormedio =
0.9531

Very different, as expected.

I find the code equivalent to the optimal one, most similar to the real genetic code:

>> [codigo_final,errormin,errormedio]=acercacodigos(codigo,codigo_opt);
>> errormin
errormin =
0.7656
>> errormin*64
ans =
49
>> muestracodigo_fig(codigo_final)

Not very-very similar. Only 15 (64-49) codons match.

Now I represent again the real and the most similar optimal codes, but colours represent polar requirement:

>> muestracodigo_fig(codigo,1)
>> figure
>> muestracodigo_fig(codigo_final,1)

Preparation of class about the genetic code (II): Stop codons

Probability of not finding a stop codón after frame-shift for random sequences:

x=0:100;
plot(x,(61/64).^x,’LineWidth’,2)

Preparation of class about the genetic code

Replication of the calculation in HaigHurst91:

cd geneticcode

>> load codigo
>> MS0=NaN(10^5,1);
MS_bases=NaN(10^5,3);
for c=1:10^5
codigo_perm=permutacodigo(codigo);
[MS0(c),MS_bases(c,1:3)]=codigo2MS(codigo_perm.codon2aa,codigo.aa_prop(:,1));
if mod(c,10000)==0
fprintf(‘%g,’,c)
end
end
10000,20000,30000,40000,50000,60000,70000,80000,90000,100000,>>
>> [MS0_real,MS_bases_real]=codigo2MS(codigo.codon2aa,codigo.aa_prop(:,1));
>> hist(MS0,100)
>> hold on
>> ejes=axis;
>> plot(MS0_real*[1 1],ejes(3:4),’k’)

Global MS:

First base:

hist(MS_bases(:,1),100)
>> ejes=axis;
>> hold on
>> plot(MS_bases_real(1)*[1 1],ejes(3:4),’k’)

Second base:

hist(MS_bases(:,2),100)
ejes=axis;
hold on
plot(MS_bases_real(2)*[1 1],ejes(3:4),’k’)

Third base:

>> hist(MS_bases(:,3),100)
ejes=axis;
hold on
plot(MS_bases_real(3)*[1 1],ejes(3:4),’k’)

Proportion of single-point mutations that lead to synonymous codons (permutations do not maintain family boxes):

MS0_nobox=NaN(10^4,1);
MS_bases_nobox=NaN(10^4,3);
prop_syn=NaN(10^4,1);
for c=1:10^4
codigo_perm=permutacodigo_noboxes(codigo);
[MS0_nobox(c),MS_bases_nobox(c,1:3),prop_syn(c)]=codigo2MS(codigo_perm.codon2aa,codigo.aa_prop(:,1));
if mod(c,10000)==0
fprintf(‘%g,’,c)
end
end
hist(prop_syn,50)
hold on
ejes=axis;
plot(prop_syn_real*[1 1],ejes(3:4),’k’)
>> set(gca,’FontSize’,15)
>> xlabel(‘Proportion of synonymous single-point mutations’,’FontSize’,15)
>> ylabel(‘Number of cases’,’FontSize’,15)