Estimating the error in a Least-squares Numerical Fit

Watch
Callicious
Badges: 22
Rep:
?
#1
Report Thread starter 1 month ago
#1
Problem
I'd like to add a disclaimer that this work isn't technically graded or anything else along those lines and is purely optional/vocational.

I have a set of data at some positions (x,y), the observed distribution of points, which is inherently noisy. (x,y) are observables which both have an error in observation. For each point, the error dx and dy is known. (x,y) are observables which both have an error in observation.

I have a set of theoretical curves that should match the data, let's call it the model, which is a function of variable z. I'll call it model(z).

The goal of this is to find the theoretical curve that matches the data most effectively. Each curve is independent of another, they're nonlinear, and have no functional form. The curves are just data in themselves, a set of (x,y). They are not single-valued functions in x or y.

I also wish to get an error estimate on how good the fitting is, estimating an error in z.

Progress
My code can currently produce an estimate on z, and the estimate is perfect and matches up with what might be expected. However, I'm at a loss in calculating the error in z.

The code is an attempt at least-squares fitting. For each value z, the theoretical curve "data" is taken. For each of my observed datapoints r_i, I calculate the perpendicular distance to the theoretical curve, min(r_i - data)...

I then get the value of LS, which is given by:
Name:  unknown.png
Views: 6
Size:  6.5 KB
with min(etc) being the geometric difference of data and theory. w_i is an arbitrary weighting factor unique to i, picked arbitrarily to have the fit improve based on the geometry. Sig is just variance^2 in the (unweighted) geometric distances.

Question
Would anyone here happen to know how I could estimate an error in the variable z? Or maybe an idea of how I could solve this problem so as to get an error? :-;
0
reply
mqb2766
Badges: 19
Rep:
?
#2
Report 1 month ago
#2
(Original post by Callicious)
Problem
I'd like to add a disclaimer that this work isn't technically graded or anything else along those lines and is purely optional/vocational.

I have a set of data at some positions (x,y), the observed distribution of points, which is inherently noisy. (x,y) are observables which both have an error in observation. For each point, the error dx and dy is known. (x,y) are observables which both have an error in observation.

I have a set of theoretical curves that should match the data, let's call it the model, which is a function of variable z. I'll call it model(z).

The goal of this is to find the theoretical curve that matches the data most effectively. Each curve is independent of another, they're nonlinear, and have no functional form. The curves are just data in themselves, a set of (x,y). They are not single-valued functions in x or y.

I also wish to get an error estimate on how good the fitting is, estimating an error in z.

Progress
My code can currently produce an estimate on z, and the estimate is perfect and matches up with what might be expected. However, I'm at a loss in calculating the error in z.

The code is an attempt at least-squares fitting. For each value z, the theoretical curve "data" is taken. For each of my observed datapoints r_i, I calculate the perpendicular distance to the theoretical curve, min(r_i - data)...

I then get the value of LS, which is given by:
Name:  unknown.png
Views: 6
Size:  6.5 KB
with min(etc) being the geometric difference of data and theory. w_i is an arbitrary weighting factor unique to i, picked arbitrarily to have the fit improve based on the geometry. Sig is just variance^2 in the (unweighted) geometric distances.

Question
Would anyone here happen to know how I could estimate an error in the variable z? Or maybe an idea of how I could solve this problem so as to get an error? :-;
When the noise is additive iid .. on the output you can get the confidence intervals on the parameters (z?) without too much problem. When there is noise on the input, its less of a straight regression scenario. What is the form of the model (how does z influence the output) and how much noise do you have on the input. Are the z's parameters or some form of smoothing parameters or ...
Last edited by mqb2766; 1 month ago
0
reply
Callicious
Badges: 22
Rep:
?
#3
Report Thread starter 1 month ago
#3
(Original post by mqb2766)
When the noise is additive iid .. on the output you can get the confidence intervals on the parameters (z?) without too much problem. When there is noise on the input, its less of a straight regression scenario. What is the form of the model (how does z influence the output) and how much noise do you have on the input. Are the z's parameters or some form of smoothing parameters or ...
It's safe to assume the noise in (x,y) is additive (it lies within a +- pretty much, or that assumption can be safely taken, I suppose)

The model itself is a result of numerical simulations computed for each unique value of z, there isn't some discrete functional form for it;
Name:  9.150_NGC7789_Cross_UBV.png
Views: 4
Size:  93.5 KB
Here's an example. The red is the data, the green is something that can be disregarded for this, and the blue is the model.
0
reply
mqb2766
Badges: 19
Rep:
?
#4
Report 1 month ago
#4
(Original post by Callicious)
It's safe to assume the noise in (x,y) is additive (it lies within a +- pretty much, or that assumption can be safely taken, I suppose)

The model itself is a result of numerical simulations computed for each unique value of z, there isn't some discrete functional form for it;
Name:  9.150_NGC7789_Cross_UBV.png
Views: 4
Size:  93.5 KB
Here's an example. The red is the data, the green is something that can be disregarded for this, and the blue is the model.
I guess it helps to know how the response depends on z for you to talk about an error in its value. If not, you could always do some form of random sampling of z in order to assess the sensitivity and get some idea about the uncertainty.

I don't understand the graph or the form of the model, should the sum index be i or n.
Last edited by mqb2766; 1 month ago
0
reply
Callicious
Badges: 22
Rep:
?
#5
Report Thread starter 1 month ago
#5
(Original post by mqb2766)
I guess it helps to know how the response depends on z for you to talk about an error in its value. If not, you could always do some form of random sampling of z in order to assess the sensitivity and get some idea about the uncertainty.

I don't understand the graph or the form of the model, should the sum index be i or n.
The sum index should be i (i being the i'th datapoint, from 0 to N. I've done it pythonically, sorry about that :c)

The blue line is the model that I'm geometrically fitting to the red data. The green is just some other catalogue that can be ignored.

I'm going to approach the problem differently with a different fitter/etc (this type of fitting for this isn't well documented + there isn't much going for it yet) but the traditional method has a lot of context + sample code exists for it. Thanks for the insight though, I'm going to give a go at doing some guesswork with estimating the sensitivity of the model to changes in (t).
0
reply
mqb2766
Badges: 19
Rep:
?
#6
Report 1 month ago
#6
(Original post by Callicious)
The sum index should be i (i being the i'th datapoint, from 0 to N. I've done it pythonically, sorry about that :c)

The blue line is the model that I'm geometrically fitting to the red data. The green is just some other catalogue that can be ignored.

I'm going to approach the problem differently with a different fitter/etc (this type of fitting for this isn't well documented + there isn't much going for it yet) but the traditional method has a lot of context + sample code exists for it. Thanks for the insight though, I'm going to give a go at doing some guesswork with estimating the sensitivity of the model to changes in (t).
I've no real trouble understanding the cost function, but not really much wiser about the parameter(s) z and the model. As above, Id probably go down some form of Monte Carlo approach, if I could make few assumptions about the model. An ok overview is
http://www-personal.umd.umich.edu/~w...arloHOWTO.html
but this is from a very quick goog le. It looks a readable intro, but that is obviously subjective and not carefully reviewed.
Last edited by mqb2766; 1 month ago
1
reply
Callicious
Badges: 22
Rep:
?
#7
Report Thread starter 1 month ago
#7
(Original post by mqb2766)
I've no real trouble understanding the cost function, but not really much wiser about the parameter(s) z and the model. As above, Id probably go down some form of Monte Carlo approach, if I could make few assumptions about the model. An ok overview is
http://www-personal.umd.umich.edu/~w...arloHOWTO.html
but this is from a very quick goog le. It looks a readable intro, but that is obviously subjective and not carefully reviewed.
Thanks! O.O

I'll have a crack at using Monte-Carlo once I've finished up the other approach I've been given to try.
0
reply
X

Quick Reply

Attached files
Write a reply...
Reply
new posts
Back
to top
Latest
My Feed

See more of what you like on
The Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

Personalise

What factors affect your mental health the most right now?

Anxiousness about lockdown easing (147)
4.89%
Uncertainty around my education (443)
14.73%
Uncertainty around my future career prospects (338)
11.24%
Lack of purpose or motivation (419)
13.93%
Lack of support system (eg. teachers, counsellors, delays in care) (140)
4.66%
Impact of lockdown on physical health (181)
6.02%
Loneliness (258)
8.58%
Financial worries (109)
3.62%
Concern about myself or my loves ones getting/having been ill (123)
4.09%
Exposure to negative news/social media (136)
4.52%
Lack of real life entertainment (163)
5.42%
Lack of confidence in making big life decisions (266)
8.85%
Worry about missed opportunities during the pandemic (284)
9.44%

Watched Threads

View All