# Estimating the error in a Least-squares Numerical Fit

Watch
Announcements
#1
Problem
I'd like to add a disclaimer that this work isn't technically graded or anything else along those lines and is purely optional/vocational.

I have a set of data at some positions (x,y), the observed distribution of points, which is inherently noisy. (x,y) are observables which both have an error in observation. For each point, the error dx and dy is known. (x,y) are observables which both have an error in observation.

I have a set of theoretical curves that should match the data, let's call it the model, which is a function of variable z. I'll call it model(z).

The goal of this is to find the theoretical curve that matches the data most effectively. Each curve is independent of another, they're nonlinear, and have no functional form. The curves are just data in themselves, a set of (x,y). They are not single-valued functions in x or y.

I also wish to get an error estimate on how good the fitting is, estimating an error in z.

Progress
My code can currently produce an estimate on z, and the estimate is perfect and matches up with what might be expected. However, I'm at a loss in calculating the error in z.

The code is an attempt at least-squares fitting. For each value z, the theoretical curve "data" is taken. For each of my observed datapoints r_i, I calculate the perpendicular distance to the theoretical curve, min(r_i - data)...

I then get the value of LS, which is given by:

with min(etc) being the geometric difference of data and theory. w_i is an arbitrary weighting factor unique to i, picked arbitrarily to have the fit improve based on the geometry. Sig is just variance^2 in the (unweighted) geometric distances.

Question
Would anyone here happen to know how I could estimate an error in the variable z? Or maybe an idea of how I could solve this problem so as to get an error? :-;
0
1 month ago
#2
(Original post by Callicious)
Problem
I'd like to add a disclaimer that this work isn't technically graded or anything else along those lines and is purely optional/vocational.

I have a set of data at some positions (x,y), the observed distribution of points, which is inherently noisy. (x,y) are observables which both have an error in observation. For each point, the error dx and dy is known. (x,y) are observables which both have an error in observation.

I have a set of theoretical curves that should match the data, let's call it the model, which is a function of variable z. I'll call it model(z).

The goal of this is to find the theoretical curve that matches the data most effectively. Each curve is independent of another, they're nonlinear, and have no functional form. The curves are just data in themselves, a set of (x,y). They are not single-valued functions in x or y.

I also wish to get an error estimate on how good the fitting is, estimating an error in z.

Progress
My code can currently produce an estimate on z, and the estimate is perfect and matches up with what might be expected. However, I'm at a loss in calculating the error in z.

The code is an attempt at least-squares fitting. For each value z, the theoretical curve "data" is taken. For each of my observed datapoints r_i, I calculate the perpendicular distance to the theoretical curve, min(r_i - data)...

I then get the value of LS, which is given by:

with min(etc) being the geometric difference of data and theory. w_i is an arbitrary weighting factor unique to i, picked arbitrarily to have the fit improve based on the geometry. Sig is just variance^2 in the (unweighted) geometric distances.

Question
Would anyone here happen to know how I could estimate an error in the variable z? Or maybe an idea of how I could solve this problem so as to get an error? :-;
When the noise is additive iid .. on the output you can get the confidence intervals on the parameters (z?) without too much problem. When there is noise on the input, its less of a straight regression scenario. What is the form of the model (how does z influence the output) and how much noise do you have on the input. Are the z's parameters or some form of smoothing parameters or ...
Last edited by mqb2766; 1 month ago
0
#3
(Original post by mqb2766)
When the noise is additive iid .. on the output you can get the confidence intervals on the parameters (z?) without too much problem. When there is noise on the input, its less of a straight regression scenario. What is the form of the model (how does z influence the output) and how much noise do you have on the input. Are the z's parameters or some form of smoothing parameters or ...
It's safe to assume the noise in (x,y) is additive (it lies within a +- pretty much, or that assumption can be safely taken, I suppose)

The model itself is a result of numerical simulations computed for each unique value of z, there isn't some discrete functional form for it;

Here's an example. The red is the data, the green is something that can be disregarded for this, and the blue is the model.
0
1 month ago
#4
(Original post by Callicious)
It's safe to assume the noise in (x,y) is additive (it lies within a +- pretty much, or that assumption can be safely taken, I suppose)

The model itself is a result of numerical simulations computed for each unique value of z, there isn't some discrete functional form for it;

Here's an example. The red is the data, the green is something that can be disregarded for this, and the blue is the model.
I guess it helps to know how the response depends on z for you to talk about an error in its value. If not, you could always do some form of random sampling of z in order to assess the sensitivity and get some idea about the uncertainty.

I don't understand the graph or the form of the model, should the sum index be i or n.
Last edited by mqb2766; 1 month ago
0
#5
(Original post by mqb2766)
I guess it helps to know how the response depends on z for you to talk about an error in its value. If not, you could always do some form of random sampling of z in order to assess the sensitivity and get some idea about the uncertainty.

I don't understand the graph or the form of the model, should the sum index be i or n.
The sum index should be i (i being the i'th datapoint, from 0 to N. I've done it pythonically, sorry about that :c)

The blue line is the model that I'm geometrically fitting to the red data. The green is just some other catalogue that can be ignored.

I'm going to approach the problem differently with a different fitter/etc (this type of fitting for this isn't well documented + there isn't much going for it yet) but the traditional method has a lot of context + sample code exists for it. Thanks for the insight though, I'm going to give a go at doing some guesswork with estimating the sensitivity of the model to changes in (t).
0
1 month ago
#6
(Original post by Callicious)
The sum index should be i (i being the i'th datapoint, from 0 to N. I've done it pythonically, sorry about that :c)

The blue line is the model that I'm geometrically fitting to the red data. The green is just some other catalogue that can be ignored.

I'm going to approach the problem differently with a different fitter/etc (this type of fitting for this isn't well documented + there isn't much going for it yet) but the traditional method has a lot of context + sample code exists for it. Thanks for the insight though, I'm going to give a go at doing some guesswork with estimating the sensitivity of the model to changes in (t).
I've no real trouble understanding the cost function, but not really much wiser about the parameter(s) z and the model. As above, Id probably go down some form of Monte Carlo approach, if I could make few assumptions about the model. An ok overview is
http://www-personal.umd.umich.edu/~w...arloHOWTO.html
but this is from a very quick goog le. It looks a readable intro, but that is obviously subjective and not carefully reviewed.
Last edited by mqb2766; 1 month ago
1
#7
(Original post by mqb2766)
I've no real trouble understanding the cost function, but not really much wiser about the parameter(s) z and the model. As above, Id probably go down some form of Monte Carlo approach, if I could make few assumptions about the model. An ok overview is
http://www-personal.umd.umich.edu/~w...arloHOWTO.html
but this is from a very quick goog le. It looks a readable intro, but that is obviously subjective and not carefully reviewed.
Thanks! O.O

I'll have a crack at using Monte-Carlo once I've finished up the other approach I've been given to try.
0
X

new posts
Back
to top
Latest
My Feed

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### See more of what you like onThe Student Room

You can personalise what you see on TSR. Tell us a little about yourself to get started.

### Poll

Join the discussion

#### What factors affect your mental health the most right now?

4.89%
Uncertainty around my education (443)
14.73%
Uncertainty around my future career prospects (338)
11.24%
Lack of purpose or motivation (419)
13.93%
Lack of support system (eg. teachers, counsellors, delays in care) (140)
4.66%
Impact of lockdown on physical health (181)
6.02%
Loneliness (258)
8.58%
Financial worries (109)
3.62%
Concern about myself or my loves ones getting/having been ill (123)
4.09%
Exposure to negative news/social media (136)
4.52%
Lack of real life entertainment (163)
5.42%
Lack of confidence in making big life decisions (266)
8.85%
Worry about missed opportunities during the pandemic (284)
9.44%