Omega vs Deltax, for the parameters alpha=0.05 beta=1.5, with the new generalized omega

>> clear
>> load datos_celegans_struct
>> [pos,omega_general,costes,x]=coste2pos_restofijas(todas.A*.05,todas.M*1.5+todas.S,todas.f,todas.pos_real,1);
>> desv=abs(todas.pos_real-pos);
>> plot(desv,omega_general,’.’)

To start with, only one outlier. Note that the new omega depends on the exponent, so this by itself favors exponent 1.

Bad news for the fit to the mean (without the outlier):

>> buenas=(desv<.2 | omega_general<20);
>> sum(~buenas)
ans =
1
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega_general(buenas))
xi =
4.5068
b =
0.1092

And of course, again great news for the fit to the envelope (again, without the outlier):

>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega_general(buenas),1,4);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

I am starting to believe the theory about Bayes being especially affected by the most deviated ones, and the envelope getting that.

 

Advertisement

omega vs Deltax for the anatomical alpha and beta

The problem: This plot does not look nice for the case alpha=beta=1/29.3.

load datos_celegans_struct
pos_cuad=coste2pos_cuad(todas.A/29.3,todas.M/29.3+todas.S,todas.f);
omega=sum([todas.A/29.3 todas.M/29.3+todas.S],2);
desv=abs(todas.pos_real-pos_cuad)
plot(desv,omega,’.’)

It is not a problem for exponent 2, because the metric is bad also. But it does not look nice for exponent one, for which the metric is excelent:

>> [pos,omega_general,costes,x]=coste2pos_restofijas(todas.A/29.3,todas.M/23.3+todas.S,todas.f,todas.pos_real,1);
>> desv=abs(todas.pos_real-pos);
>> plot(desv,omega,’.’)

The reason: omega is not valid in this case. For example (blue is the actual cost, red is the cost approximated by omega):

>> n=4;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))

>> n=27;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))


>> n=54;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))

And many more cases.

Solution: I define a “generalized omega”, which is simply the one that best fits the cost in the direction of the deviation. With this generalized omega the costs look like this:

>> n=4;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega_general(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))

>> n=27;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega_general(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))

>> n=54;clf; plot(x,costes(n,:)); hold on; plot(x,abs(x-pos(n))*omega_general(n),’r’); ejes=axis; plot(todas.pos_real(n)*[1 1],ejes(3:4),’k’);title(num2str(omega_general(n)))

With this generalized omega, the plot looks much better:

>> desv=abs(todas.pos_real-pos);
>> plot(desv,omega_general,’.’)

In fact, the two outliers are AVAL and AVAR:

>> find(desv>.3 & omega_general>3)
ans =
54
55

And for these the cost is not so well approximated (see above the cost for neuron 54).

Of course, it can still improve.

Exponent from the mean (removing the two outliers):

>> buenas=(desv<.3 | omega_general<3);
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega_general(buenas))
xi =
1.5765
b =
0.0995

Exponent from the envelope:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega_general(buenas),1,4);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

Not a great fit, but again near the Bayesian parameter. If you ask me, the envelope is just fucking lucky.

 

Study of the cumulative distribution of deviations in omega-Deltax plots for C. elegans

With outliers:
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv,omega,1,7);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

We get exponent 1.

I reduce the probability, to let the outliers outside. In need to go to 0.7, and it is a disaster:

>> [desv_acum,omega_media]=omegadesv_acumuladas(desv,omega,.7,7);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

I remove the outliers:

>> buenas=omega<20 | desv<.2;
>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,7);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

It is very influentiated by the last bin, which only contains one point. I reduce the binning:

>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,5);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

Still not very good. I use a non-equispaced binning:

>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,[0 10 20 40 80]);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

Another binning:

>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,[0 5 10 20 40 80]);


>> plot(log10(desv_acum),log10(omega_media),’.-‘)

Essentially the same…

I do it with another binning, that keeps the same number of data in each bin (except perhaps for the last one, that may be not completely full).

>> [desv_acum,omega_media]=omegadesv_acumuladas_binningpornumero(desv(buenas),omega(buenas),1,50);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

Using the center-of-mass position of each neuron with the rest fixed in their real positions does not change these results:

>> clear
>> load datos_celegans_struct
>> [pos,omega_general,costes,x]=coste2pos_restofijas(todas.A*.05,todas.M*1.5+todas.S,todas.f,todas.pos_real,1);
>> desv=abs(pos-todas.pos_real);
>> omega=sum([todas.A*.05 todas.M*1.5+todas.S],2);
>> plot(desv,omega,’.’)

>> buenas=omega<20 | desv<.2;
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega(buenas))
xi =
5.4148
b =
0.1103

>> [desv_acum,omega_media]=omegadesv_acumuladas(desv(buenas),omega(buenas),1,[0 5 10 20 40 80]);

>> plot(log10(desv_acum),log10(omega_media),’.-‘)

Study of the mean deviations in omega-Deltax plots for C. elegans

I have two algorithms. One minimizes mean squared error in Deltax, and the other one maximizes loglikelihood assuming that the deviations are exponentially distributed (the first one is equivalent to assume that they are normally distributed around the mean. This is clearly not true, so the second one should work better).

I first calibrate the method with a synthetic elegans with exponent 2:

>> load(‘e:\hipertec\optimesto\PresentacionesPapers\PaperMolon\Figuras\elegans_sintetico_pot2_r06_todos.mat’)
>> load datos_celegans_struct
>> pos_cuad=coste2pos_cuad(todas.A*.05,todas.M*1.5+todas.S,todas.f);
>> desv_sint=abs(pos_todos(:,1)-pos_cuad);
>> omega=sum([todas.A*.05 todas.M*1.5+todas.S],2);
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv_sint,omega)
xi =
1.9973
b =
0.3085

>> [xi,b]=omegadesv2bestfittingexponent(desv_sint,omega)
xi =
2.4142
b =
0.2777

The algorithm that assumes exponential distribution of deviations and finds maximum log-likelihood works better, as expected.

Now, with the real data from C. elegans:

>> desv=abs(todas.pos_real-pos_cuad);
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv,omega)
xi =
5.8572
b =
0.1184

>> [xi,b]=omegadesv2bestfittingexponent(desv,omega)
xi =
7.2291
b =
0.1122

Nearer with the exponential one, but around 5.

Now I use the optimal for linear cost:

>> pos_lin=coste2pos_num_ruido(todas.A*.05,todas.M*1.5+todas.S,todas.f,1,0,1,0);
>> desv_lin=abs(todas.pos_real-pos_lin);
>> clf
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv_lin,omega)
xi =
5.2203
b =
0.1324

Still not working…

I remove the three outliers, and try again (both with the optimum for quadratic cost, and for the optimum for linear cost):

>> buenas=omega<20 | desv<.2;
>> sum(~buenas)
ans =
3
>> [xi,b]=omegadesv2bestfittingexponent_exp(desv(buenas),omega(buenas))
xi =
4.7116
b =
0.1235

>> [xi,b]=omegadesv2bestfittingexponent_exp(desv_lin(buenas),omega(buenas))
xi =
4.1978
b =
0.1388

So 4 is the best one.

One possibility is to try assuming other probability distributions, that may describe better the data, and see whether that takes us nearer to 2.