# Quick question on the chain rule

#1
Hello. I've basically finished C2 and have started C3. I am starting with algebra/ functions and differentiation. Would anybody know an explanation to why you can just multiply f' (gx) and g'(x) to get d[f(g(x)]/d[x]

I have read an explanation as follows...To understand chain rule think about definition of derivative as rate of change. d[f(g(x)]/d[x] basically means rate of change of f(g(x)) regarding rate of change of x, and to calculate this we need to know two values:

1- How much f(g(x)) changes while g(x) changes = d[f(g(x))]/d[g(x)]
2- How much g(x) changes while x changes = d[g(x)]/d[x]
to calculate rate of change of f(g(x)) in regard to rate of change of x, you just need to multiply these two values together because x changes f(x) and g(x)
changes f(g(x)) (it should be obvious thinking about definition of a function in mathematics).
Please could anybody elaborate on this and explain why?
Thanks

0
7 years ago
#2
https://proofwiki.org/wiki/Chain_Rul...lued_Functions

0
#3
(Original post by Wahrheit)
Unfortunately not, sorry. I was really looking for an understanding of why the definition of a function would help understand why you can just multiply (or some reason other than the fraction cancelling one). I was looking for something a bit intuitive or something that makes perfect sense. For example if I were asking about why nx^n-1 worked I would need somebody to share about concavity/ first principles. Thank you anyway.
0
7 years ago
#4
Essentially all chain rule is saying is dy/dx = dy/du . du/dx so just think of it in terms of that.
0
7 years ago
#5
(Original post by MathMeister)
Hello. I've basically finished C2 and have started C3. I am starting with algebra/ functions and differentiation. Would anybody know an explanation to why you can just multiply f' (gx) and g'(x) to get d[f(g(x)]/d[x]

I have read an explanation as follows...To understand chain rule think about definition of derivative as rate of change. d[f(g(x)]/d[x] basically means rate of change of f(g(x)) regarding rate of change of x, and to calculate this we need to know two values:

1- How much f(g(x)) changes while g(x) changes = d[f(g(x))]/d[g(x)]
2- How much g(x) changes while x changes = d[g(x)]/d[x]
to calculate rate of change of f(g(x)) in regard to rate of change of x, you just need to multiply these two values together because x changes f(x) and g(x)
changes f(g(x)) (it should be obvious thinking about definition of a function in mathematics).
Please could anybody elaborate on this and explain why?
Thanks

(Original post by MathMeister)
Unfortunately not, sorry. I was really looking for an understanding of why the definition of a function would help understand why you can just multiply (or some reason other than the fraction cancelling one). I was looking for something a bit intuitive or something that makes perfect sense. For example if I were asking about why nx^n-1 worked I would need somebody to share about concavity/ first principles. Thank you anyway.
when I was at the same stage in with my mathematics all I wanted to know is how to do the chain rule correctly and not why and how.

The formal proofs for these require mathematical analysis which is a branch of mathematics normally taught to first year maths undergraduates ...
1
7 years ago
#6
(Original post by TeeEm)
when I was at the same stage in with my mathematics all I wanted to know is how to do the chain rule correctly and not why and how.

The formal proofs for these require mathematical analysis which is a branch of mathematics normally taught to first year maths undergraduates ...
Yeah sorry I couldn't resist the opportunity to be mischievous by answering his question is the best yet least helpful way possible.
0
7 years ago
#7
(Original post by Wahrheit)
Yeah sorry I couldn't resist the opportunity to be mischievous by answering his question is the best yet least helpful way possible.
it is all good ...
0
7 years ago
#8
Ok so think of two functions like 5x and cos(y) and you have to differentiate z=cos(y) with respect to x where y=5x so d(cos(5x))/dx

dz/dx=dz/dy . dy/dx just by fraction cancellation and you know how to work out dz/dy and dy/dx so just multiply them and you're sorted!
0
#9
(Original post by TeeEm)
The formal proofs for these require mathematical analysis which is a branch of mathematics normally taught to first year maths undergraduates ...
Is there nothing intuitive or simple? I've thought about it a lot and understand that g(x) maps x onto the y (the range) and this I suppose this forms the graph of g(x). Then these values get mapped onto the same x and the y is given by f(g(x)). I'm not sure if my thoughts are even useful for understanding and I've not really gotten very far. :/
1
7 years ago
#10
(Original post by MathMeister)
Is there nothing intuitive or simple? I've thought about it a lot and understand that g(x) maps x onto the y (the range) and this I suppose this forms the graph of g(x). Then these values get mapped onto the same x and the y is given by f(g(x)). I'm not sure if my thoughts are even useful for understanding and I've not really gotten very far. :/
My post above yours explains it pretty simply. It's all about cancellation. Very intuitive.
0
7 years ago
#11
(Original post by MathMeister)
Is there nothing intuitive or simple? I've thought about it a lot and understand that g(x) maps x onto the y (the range) and this I suppose this forms the graph of g(x). Then these values get mapped onto the same x and the y is given by f(g(x)). I'm not sure if my thoughts are even useful for understanding and I've not really gotten very far. :/
Also f(g(x)) is a different range than g(x). If you are think of y as g(x) think of f(g(x)) as z
0
7 years ago
#12
(Original post by MathMeister)
Is there nothing intuitive or simple? I've thought about it a lot and understand that g(x) maps x onto the y (the range) and this I suppose this forms the graph of g(x). Then these values get mapped onto the same x and the y is given by f(g(x)). I'm not sure if my thoughts are even useful for understanding and I've not really gotten very far. :/
This is stuff that in order to understand properly and formally you will need a lot more than A level maths.
In my opinion anybody who claims "it is obvious..." is either the next recipient of the Fields Medal or just shows total contempt for the majority of people trying to make sense of this. I learned the formal proof of the chain rule over 25 years ago and I am not embarrassed to say I cannot replicate the proof without looking it up.
Learn and practice the technique at this stage and if you still wander about the why a year on from now consider a Maths degree ...
0
#13
(Original post by Wahrheit)
My post above yours explains it pretty simply. It's all about cancellation. Very intuitive.
Isn't their anything else? Explaining with cancellation is like explaining why the differential of x^n is nx^n-1 without teaching first principles and concavity. Cancellation isn't actually what is happening here though :/
0
7 years ago
#14
(Original post by MathMeister)
...
http://kruel.co/math/chainrule.pdf

The first link in Google if you search for chain rule proof.
0
7 years ago
#15
(Original post by MathMeister)
Isn't their anything else? Explaining with cancellation is like explaining why the differential of x^n is nx^n-1 without teaching first principles and concavity. Cancellation isn't actually what is happening here though :/
Well it kind of does, sometimes thinking about everything graphically doesn't actually help especially in higher dimensions
0
7 years ago
#16
(Original post by MathMeister)
Isn't their anything else? Explaining with cancellation is like explaining why the differential of x^n is nx^n-1 without teaching first principles and concavity. Cancellation isn't actually what is happening here though :/
"Concavity" has nothing to do with explaining the differential of x^n - that just follows from the limit definition of the derivative and a straightforward application of the binomial theorem (assuming you're using the usual convention of n being a positive integer; things are more complicated otherwise!).

No offence, but you seem to be constantly looking for "intuitive" reasons why everything is true. There is nothing wrong with finding intuitive arguments when they work, but intuition can often lead you astray (especially where limits are concerned), and an awful lot of higher mathematics is about taking a purely abstract definition and seeing how far you can work with that definition.

MrM has given you a great link to a pdf which gives a reasonably concise demonstration of why the chain rule is true. Read it, file it and forget about it - just make sure you can apply the chain rule in practice
0
7 years ago
#17
(Original post by MathMeister)
[COLOR=#444444]Hello. I've basically finished C2 and have started C3. I am starting with algebra/ functions and differentiation. Would anybody know an explanation to why you can just multiply f' (gx) and g'(x) to get d[f(g(x)]/d[x]
Suppose .

The derivative of is represented by the the slope of the gradient of the tangent to its graph at some point on the x-axis, say . Let's suppose that at , the tangent has gradient 6.

This means that if we move a tiny distance of 1 (small) unit along the x-axis from , we will move 6 (small) units up the y-axis along this tangent (think of rise = run * gradient - here the run is 1 unit, the rise is 6 units, on the tangent).

If we move a tiny distance of 2 (small) units along the x-axis from , we will move 6*2 = 12 (small) units up the y-axis along this tangent.

If we move a tiny distance of 3 (small) units along the x-axis from , we will move 6*3 = 18 (small) units up the y-axis along this tangent.

Now consider again. Here the increase in the value of 's argument is what comes out of . Suppose that now we think of the graph of and its tangent at the point and let's say this tangent has gradient 3. Let's also say that .

Now when we move a tiny distance of 1 (small) unit along the x-axis from on the graph of , we will move 3 (small) units up the y-axis along the tangent to at the point .

But of course we have a composite function: when we change the value of by 3 (small) units, we feed that into , and consequently by the argument I gave above, we will now move up the y-axis on the graph of by 6 * 3 = 18 units, by moving 1 unit along the x-axis on the graph of .

Note that since , then the increase in value of the argument of is happening very close to , and we know that at that point has gradient 6.

This merely says that if . then as you already know.

Why did I insist on only moving by small units along the x-axis? That's because if you move too much, then you begin to move away from the point at which you drew the tangent, and you'll be working at a point with a different tangent. In fact, to make this argument work properly, you have to imagine that the 1 small unit is in fact infinitesimally small (or alternatively, and from a more modern point of view, that the result becomes more and more accurate the smaller you make your 1 small unit).

The whole explanation stems from the fact that for a straight line (like a tangent), rise = run * gradient, but in this case:

1. "rise" is "rise of "

2. "gradient" is the slope of the tangent to at

3. "run" is the output from - which in fact is the rise of the tangent at at when we move 1 small unit along the x-axis of its graph, and hence "run = 1 * gradient of = gradient of ".

Thus, in fact, we have, by substituting a bit:

which is the result that we wanted.
0
7 years ago
#18
use small increments in each function

Posted from TSR Mobile
0
#19
(Original post by atsruser)
...
Thank you! It makes perfect sense now!
0
7 years ago
#20
(Original post by MathMeister)
Thank you! It makes perfect sense now!
My pleasure. I added a bit more explanation at the end to try to make it transparently clear.
0
X

new posts
Back
to top
Latest

### Oops, nobody has postedin the last few hours.

Why not re-start the conversation?

see more

### Poll

Join the discussion

#### Were exams easier or harder than you expected?

Easier (52)
26.4%
As I expected (65)
32.99%
Harder (72)
36.55%
Something else (tell us in the thread) (8)
4.06%