>> fig_pap_elegans_16(1,zeros(100))
>> clear
>> load datos_celegans_struct
>> [pos,omega_general,costes,x]=coste2pos_restofijas(todas.A*.05,todas.M*1.5+todas.S,todas.f,todas.pos_real,1);
>> desv=abs(todas.pos_real-pos);
>> plot(desv,omega_general,’.’)
To start with, only one outlier. Note that the new omega depends on the exponent, so this by itself favors exponent 1.
Bad news for the fit to the mean (without the outlier):
>> buenas=(desv<.2 | omega_general<20);
>> sum(~buenas)
ans =
1
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega_general(buenas))
xi =
4.5068
b =
0.1092
And of course, again great news for the fit to the envelope (again, without the outlier):
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega_general(buenas),1,4);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
I am starting to believe the theory about Bayes being especially affected by the most deviated ones, and the envelope getting that.
The problem: This plot does not look nice for the case alpha=beta=1/29.3.
load datos_celegans_struct
pos_cuad=coste2pos_cuad(todas.A/29.3,todas.M/29.3+todas.S,todas.f);
omega=sum([todas.A/29.3 todas.M/29.3+todas.S],2);
desv=abs(todas.pos_real-pos_cuad)
plot(desv,omega,’.’)
It is not a problem for exponent 2, because the metric is bad also. But it does not look nice for exponent one, for which the metric is excelent:
>> [pos,omega_general,costes,x]=coste2pos_restofijas(todas.A/29.3,todas.M/23.3+todas.S,todas.f,todas.pos_real,1);
>> desv=abs(todas.pos_real-pos);
>> plot(desv,omega,’.’)
The reason: omega is not valid in this case. For example (blue is the actual cost, red is the cost approximated by omega):
>> n=4;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))
>> n=27;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))
>> n=54;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))
And many more cases.
Solution: I define a “generalized omega”, which is simply the one that best fits the cost in the direction of the deviation. With this generalized omega the costs look like this:
>> n=4;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega_general(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))
>> n=27;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega_general(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))
>> n=54;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega_general(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))
With this generalized omega, the plot looks much better:
>> desv=abs(todas.pos_real-pos);
>> plot(desv,omega_general,’.’)
In fact, the two outliers are AVAL and AVAR:
>> find(desv>.3 & omega_general>3)
ans =
54
55
And for these the cost is not so well approximated (see above the cost for neuron 54).
Of course, it can still improve.
Exponent from the mean (removing the two outliers):
>> buenas=(desv<.3 | omega_general<3);
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega_general(buenas))
xi =
1.5765
b =
0.0995
Exponent from the envelope:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega_general(buenas),1,4);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
Not a great fit, but again near the Bayesian parameter. If you ask me, the envelope is just fucking lucky.
With outliers:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv,omega,1,7);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
We get exponent 1.
I reduce the probability, to let the outliers outside. In need to go to 0.7, and it is a disaster:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv,omega,.7,7);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
I remove the outliers:
>> buenas=omega<20 | desv<.2;
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,7);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
It is very influentiated by the last bin, which only contains one point. I reduce the binning:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,5);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
Still not very good. I use a non-equispaced binning:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,[0 10 20 40 80]);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
Another binning:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,[0 5 10 20 40 80]);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
Essentially the same…
I do it with another binning, that keeps the same number of data in each bin (except perhaps for the last one, that may be not completely full).
>> [desv_acum,omega_media]=omegadesv_acumuladas_binningpornumero(desv(buenas),omega(buenas),1,50);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
Using the center-of-mass position of each neuron with the rest fixed in their real positions does not change these results:
>> clear
>> load datos_celegans_struct
>> [pos,omega_general,costes,x]=coste2pos_restofijas(todas.A*.05,todas.M*1.5+todas.S,todas.f,todas.pos_real,1);
>> desv=abs(pos-todas.pos_real);
>> omega=sum([todas.A*.05 todas.M*1.5+todas.S],2);
>> plot(desv,omega,’.’)
>> buenas=omega<20 | desv<.2;
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega(buenas))
xi =
5.4148
b =
0.1103
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,[0 5 10 20 40 80]);
>> plot(log10(desv_acum),log10(omega_media),’.-‘)
I have two algorithms. One minimizes mean squared error in Deltax, and the other one maximizes loglikelihood assuming that the deviations are exponentially distributed (the first one is equivalent to assume that they are normally distributed around the mean. This is clearly not true, so the second one should work better).
I first calibrate the method with a synthetic elegans with exponent 2:
>> load(‘e:\hipertec\optimesto\PresentacionesPapers\PaperMolon\Figuras\elegans_sintetico_pot2_r06_todos.mat’)
>> load datos_celegans_struct
>> pos_cuad=coste2pos_cuad(todas.A*.05,todas.M*1.5+todas.S,todas.f);
>> desv_sint=abs(pos_todos(:,1)-pos_cuad);
>> omega=sum([todas.A*.05 todas.M*1.5+todas.S],2);
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv_sint,omega)
xi =
1.9973
b =
0.3085
>> [xi,b]=omegadesv2bestfittingexponent(desv_sint,omega)
xi =
2.4142
b =
0.2777
The algorithm that assumes exponential distribution of deviations and finds maximum log-likelihood works better, as expected.
Now, with the real data from C. elegans:
>> desv=abs(todas.pos_real-pos_cuad);
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv,omega)
xi =
5.8572
b =
0.1184
>> [xi,b]=omegadesv2bestfittingexponent(desv,omega)
xi =
7.2291
b =
0.1122
Nearer with the exponential one, but around 5.
Now I use the optimal for linear cost:
>> pos_lin=coste2pos_num_ruido(todas.A*.05,todas.M*1.5+todas.S,todas.f,1,0,1,0);
>> desv_lin=abs(todas.pos_real-pos_lin);
>> clf
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv_lin,omega)
xi =
5.2203
b =
0.1324
Still not working…
I remove the three outliers, and try again (both with the optimum for quadratic cost, and for the optimum for linear cost):
>> buenas=omega<20 | desv<.2;
>> sum(~buenas)
ans =
3
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega(buenas))
xi =
4.7116
b =
0.1235
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv_lin(buenas),omega(buenas))
xi =
4.1978
b =
0.1388
So 4 is the best one.
One possibility is to try assuming other probability distributions, that may describe better the data, and see whether that takes us nearer to 2.
From Remotón.
I run 1000 parallel simulated annealings, temperature between 5 and 0.1, 200000 steps:
>> coste=simulaelegans_simulatedannealing_multiple_parallel(todas.A/29.3,todas.M/29.3+todas.S,todas.f,.5,.01,[.1 5],2*10^5,1000,’Prueba’);
>> [costemedio,varianza]=simulannealing2mediavarianza(‘Prueba’);
>> T=5:(.1-5)/(length(varianza)-1):.1;
>> plot(T,costemedio)
>> plot(T,varianza)
>> plot(T,varianza./T.^2)
Clearly, I need to go to lower temperatures. Maybe we also need to cool more slowly.
Working from Remoton.
I find the cost of 1.2*10^8 random configurations.
>> generacostes_rand(‘Costes_rand’,12,4)
>> [histog,bins]=costes_rand2histog(‘Costes_rand’,180:.5:260);
>> plot(bins,histog)
>> pos2coste(todas.A/29.3,todas.M/29.3+todas.S,todas.f,todas.pos_real,.5)
ans =
119.6195
Actual configuration has a cost much lower than those we get randomly. Therefore, this is not a valid scheme to see phase transitions in a range of temperatures similar to that of the real worm. Nevertheless, I will study the phase transitions with these data:
>> [costemedio,varianza]=costes_rand2mediavarianza(‘Costes_rand’,.1:.1:5,1);
>> T=.1:.1:5;
>> plot(T,costemedio)
>> plot(T,varianza)
There is this strange maximum. A possible explanation is that the effect of the exponential is to remove the peak of the cost distribution, producing something more uniform, which has larger variance. No me lo creo ni yo, pero bueno.
Specific heat:
>> plot(T,varianza./T.^2)
Uooooohhh!!!
I do not quite understand this. The cost histogram looks too boring to be hiding a phase transition. I run a control using normally distributed costs:
>> clear
>> costes=randn(1,10^7)*10;
>> save prueba1 costes
>> [costemedio,varianza]=costes_rand2mediavarianza(‘prueba’,.1:.1:5,1);
>> T=.1:.1:5;
>> plot(T,costemedio)
>> plot(T,varianza)
>> plot(T,varianza./T.^2)
Veeeery similar. So I do not think there is anything especial in the C. elegans cost.
Why is this apparent phase transition taking place? An older post showed that local minima were needed to get a phase transition. But anyway, the cost histogram for a single well is very different to the one we get here:
>> x=-1:.01:1;
coste=(abs(x)).^2;
>> hist(coste)
Indeed, the effect of a local minimum is to create a peak in the histogram of costs:
>> x=-1:.00001:1;
coste=(abs(x-.5)).^.5;
subplot(1,2,1)
plot(x,coste)
subplot(1,2,2)
hist(coste)
>> x=-1:.00001:1;
coste=(abs(x-.5)).^.5+.7*(abs(x+.6)).^.5;
subplot(1,2,1)
plot(x,coste)
subplot(1,2,2)
hist(coste)
For maintenance, the final figure is the one in previous post.
Summary of results:
D=0.05 |
D=0.1 |
D=0.2 |
D=0.3 |
D=0.4 |
||||
Delta for main plots |
1.5 |
1.99 |
2.54 |
4.22 |
5.31 |
|||
p_growth (main plots) |
0.01 |
0.0058 |
2.0000e-004 |
4.8000e-004 |
6.2000e-004 |
|||
p_ATP (main plots) |
NaN |
0.1739 |
0.2399 |
0.0854 |
0.1211 |
|||
max(p_growth) (all Deltas) |
0.0187 |
0.0383 |
0.0517 |
0.0024 |
0.0011 |
|||
min(p_ATP) (all Deltas) |
NaN |
0.0920 |
0.0302 |
0.0280 |
0.0245 |
Growth 0.05 (maybe not final, maybe will not be included):
>> fig_pap_coli_13(1,0,.05)
Growth 0.1:
>> fig_pap_coli_13(1,0,.1)
>> fig_pap_coli_13(1,0,.2)
>> fig_pap_coli_13(1,0,.3)
>> fig_pap_coli_13(1,0,.4)
Maintenance:
>> fig_pap_coli_mantenimiento_02(1)
Figures for growth 0.1, 0.2, 0.3 and 0.4, respectively:
>> fig_pap_coli_13(1,0,.1)
>> fig_pap_coli_13(1,0,.2)
>> fig_pap_coli_13(1,0,.3)
>> fig_pap_coli_13(1,0,.4)