x Turn on thread page Beta
 You are Here: Home >< Maths

optimizing, in real life. watch

1. i have a project to do, wondering how people would approach it/whether there's some magic mathematical technique i don't know about.

I have a load of data. several thousand pieces. let's say for simplicity that each piece of data has three numerical properties describing it: A, B, C.

i want to find a set of a specific size N (of reasonably size) such that the sums or averages of A, B, C are in some way 'close' to some values I input, and also I get 'close' to a specified standard deviation for the set of one of the values, say A.

any ideas?

I had two, not sure how well they will work...

i) build the set up from a small N, at each step adding an extra piece of data to the set that doesn't pull us away from our target values.

ii) start with a random set of size N, switch elements in and out. if they result in an improvement, keep the switch. if not, move on.

neither is very clever, and both have obvious flaws..
2. If you know nothing about the data, simulated annealing is probably one of the better ways of getting an approximate solution. (It's basically a slightly more advanced version of (ii)).

If the data is reasonably uniformly spread, and N is 'small' relative to the size of the data, then you might be able to 'cheat'. For example, choose A1, A2 so that if you have N/2 items equalling A1 and N/2 items equalling A2, you get the desired standard deviation. Then pick N/2 items close to {A1, B, C} and N/2 close to {A2, B, C}.
3. OK, did a little experiment.

5000 random vectors (a, b, c) with a, b, c all uniformly distributed on [0,1], looking for a subset of size 50 satisfying and s.d. of a = 0.2.

Using simulated annealing with 10000000 iterations, I can find a subset which meets those constraints to around 1 part in 10000 in all 4 conditions. It takes under a second.
4. (Original post by DFranklin)
OK, did a little experiment.

5000 random vectors (a, b, c) with a, b, c all uniformly distributed on [0,1], looking for a subset of size 50 satisfying and s.d. of a = 0.2.

Using simulated annealing with 10000000 iterations, I can find a subset which meets those constraints to around 1 part in 10000 in all 4 conditions. It takes under a second.
under a second!???? given that my code I've written so far, which just 'cleans' the data (there are only a few thousand pieces out of tens of thousands of data points that i'm interested in) takes a good 15 minutes (the data is in excel, and is 200 cells deep...), i'm expecting my program to take hours. hmm. also because i am a very lazy programmer.

can i ask, using the terms that are in the wikipedia article, what you used as your measure of the 'energy'?

also, did you use the 'temperature' funtion given on wikipedia?
5. (Original post by Chewwy)
under a second!???? given that my code I've written so far, which just 'cleans' the data (there are only a few thousand pieces out of tens of thousands of data points that i'm interested in) takes a good 15 minutes (the data is in excel, and is 200 cells deep...), i'm expecting my program to take hours. hmm. also because i am a very lazy programmer.
Yeah well, I think we can say that Excel is not going to be competitive with C++ for this task.

can i ask, using the terms that are in the wikipedia article, what you used as your measure of the 'energy'?

also, did you use the 'temperature' funtion given on wikipedia?
What I did was 'dirtier' than that (though as far as I know, it works out roughly the same in the end).

My "scoring" function is just the sum of the errors for and the standard deviation.

I accept a change if "new_score < old_score + epsilon", where epsilon is 10 * exp(-Iter * .000002), where Iter is the number of iterations (trials) so far.

Which is somewhat "trial and error", but to be honest the exact numbers probably don't matter too much - the process seems to converge quite nicely.
6. (Original post by DFranklin)
Yeah well, I think we can say that Excel is not going to be competitive with C++ for this task.

What I did was 'dirtier' than that (though as far as I know, it works out roughly the same in the end).

My "scoring" function is just the sum of the errors for and the standard deviation.

I accept a change if "new_score < old_score + epsilon", where epsilon is 10 * exp(-Iter * .000002), where Iter is the number of iterations (trials) so far.

Which is somewhat "trial and error", but to be honest the exact numbers probably don't matter too much - the process seems to converge quite nicely.
cool. so, to further prod you on the intricacies of your code... what was your transition step thing? i.e. how did it decide which element out of the 50 to throw out each time, and which to switch it with? I was thinking about, say, switching the Mn (modM)th element on the nth iteration, where M is coprime to N, something like that. But my M is variable, so that means i need to write some 'find a coprime' code. oh, woe is me.

edit: maybe you just used M=1? oh yeah. for complicated reasons that could cost my employers billions of dollars, i can't do that.
7. I just pick a random element out of the 50, and replace it with a random element of the 5000.
(I also keep track of which elements I've already got, so as to make sure I don't end up with two elements the same).
8. http://www.amazon.co.uk/Programming-.../dp/0596529325
This book has a few very nice chapters on various optimisation techniques (simulated annealing, genetic algorithms etc), and uses python, which is an awesome language.

TSR Support Team

We have a brilliant team of more than 60 Support Team members looking after discussions on The Student Room, helping to make it a fun, safe and useful place to hang out.

This forum is supported by:
Updated: July 15, 2009
Today on TSR

Unconditional offer...

But going to get a U!

Poll
Useful resources

Maths Forum posting guidelines

Not sure where to post? Read the updated guidelines here

How to use LaTex

Writing equations the easy way

Study habits of A* students

Top tips from students who have already aced their exams

Chat with other maths applicants