The Student Room Group

KS plot

What's the intuition behind the uniform ticks? Why do they set bk=k0.5nb_k=\frac{k-0.5}{n}Image 12-06-2023 at 13.44.JPG
(edited 10 months ago)
Reply 1
Original post by Student 999
What's the intuition behind the uniform ticks? Why do they set bk=k0.5nb_k=\frac{k-0.5}{n}Image 12-06-2023 at 13.44.JPG

Without understanding the isi... data too much, theyve rescaled and ordered the z so theyre assuming theyre drawn from a uniform distribution. To visualize this, you could say how different they are from an evenly sampled unit interval, so in other words the b's. If the z samples are roughly evenly spaced then they'll lie on the 45 degree line If not, then they wont.
(edited 10 months ago)
Reply 2
Original post by mqb2766
Without understanding the isi... data too much, theyve rescaled and ordered the z so theyre assuming theyre drawn from a uniform distribution. To visualize this, you could say how different they are from an evenly sampled unit interval, so in other words the b's. If the z samples are roughly evenly spaced then they'll lie on the 45 degree line If not, then they wont.


In their diagram, would the y-axis be empirical cdf and the x-axis be the sample?
(edited 10 months ago)
Reply 3
Original post by Student 999
In their diagram, would the y-axis be empirical cdf and the x-axis be the theoretical cdf?

In a sense the diagonal line is the theoretical or expected value for the cdf of a uniform distribution and the plot is similar to an empirical cdf
https://en.wikipedia.org/wiki/Empirical_distribution_function
using points instead of a staircase and 95% confidence limits either side of the diagonal.

The x-axis is the data or sample values and the y-axis represents their expected value. So its basically the usual cdf plot with the random variable values on the x-axis.
Reply 4
Thanks, I'm getting confused with some plots. Some of the KS tests plots theoretical cdf against empirical cdf? How does that work, supposed you have 10 data points 0.1,0.11,...,0.2 and you want to test if it's uniform distributed. For the first data point it's empirical cdf would just be 1/10 or (1-0.5)/10 depending on how you define the empirical cdf. Would the theoretical cdf then be F(0.1)? where F is the cdf of the uniform distribution
Reply 5
Original post by Student 999
Thanks, I'm getting confused with some plots. Some of the KS tests plots theoretical cdf against empirical cdf? How does that work, supposed you have 10 data points 0.1,0.11,...,0.2 and you want to test if it's uniform distributed. For the first data point it's empirical cdf would just be 1/10 or (1-0.5)/10 depending on how you define the empirical cdf. Would the theoretical cdf then be F(0.1)? where F is the cdf of the uniform distribution


Thats basically what the posted plot is doing, as far as I understand you. The cdf of a uniform is a linear ramp, so the 45 diagonal which is why the b's are uniformally sampled from [0,1] as they represent the (cumulative probability of the) expected value of the ith sample.

For your example, so a uniform on [0.1,0.2] and say you had 10 samples, then their expected values would be 0.105, 0.115,...0.195
so the (scaled) bs in your plot.
(edited 10 months ago)
Reply 6
Original post by mqb2766
Thats basically what the posted plot is doing, as far as I understand you. The cdf of a uniform is a linear ramp, so the 45 diagonal which is why the b's are uniformally sampled from [0,1] as they represent the (cumulative probability of the) expected value of the ith sample.

For your example, so a uniform on [0.1,0.2] and say you had 10 samples, then their expected values would be 0.105, 0.115,...0.195
so the (scaled) bs in your plot.


I'm still not really following, I'll attach an example. Could you help clarify my confusions, thanks.Screenshot 2023-06-19 at 01.06.57.png
The rescaled ISI in this case should theoretically follow an exponential distribution of unit parameter as shown in figure 15A.
First thing I'm confused on, the y-axis should be named 'empirical cdf' rather than cdf shouldn't it?
How is 15B correct? The cdf of exp(1) is monotone, immediately it makes no sense that for x=0 to 0.2, y=0
(edited 10 months ago)
Reply 7
Original post by Student 999
I'm still not really following, I'll attach an example. Could you help clarify my confusions, thanks.
Attachment not found

The rescaled ISI in this case should theoretically follow an exponential distribution of unit parameter as shown in figure 15A.
First thing I'm confused on, the y-axis should be named 'empirical cdf' rather than cdf shouldn't it?
What is figure 15B? What exactly is the 'model cdf'? Is it just the exponential cdf plotted against the empirical cdf derived in figure 15A?

There are two cdfs plotted in 15A, the model exponential (red dashed) and the empirical (blue). Calling the y axis the cdf seems fine as its the cdf for either distribution. For a real empirical cdf Id expect something like the graph in
https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test
The samples / empirical cdf look too good, but its irrelevant for what the plot represents.

15B seems similar. Though its more unusual to plot it that way. Im presuming they effectively sampled both curves in 15A so produced sets of points (x,ym) and (x,ye) for the model and empirical cdf respectively and then plotted (ye,ym) as a scatter/staircase relationship. Youd expect the empirical and model cdfs to be positively correlated, hence the 45 degree line.

To my mind, 15B, is the "same" as the graph in the OP (ignoring the slighlty different axes labelling, the scatter/staircase difference and the confidence interval lines). The random variable values are transformed such that they represent the cdf. Doing a nonlinear scaling (or similarly a "scatter plot" of cdf values) does make it easier to compare cumulative distributions which have some form of exponential scaling which I guess is the purpose here.
(edited 10 months ago)

Quick Reply

Latest