Siegel-Tukey: a Non-parametric test for equality in variability (R code)

Daniel Malter just shared on the R mailing list (link to the thread) his code for performing the Siegel-Tukey (Nonparametric) test for equality in variability.
Excited about the find, I contacted Daniel asking if I could republish his code here, and he kindly replied “yes”.
From here on I copy his note at full.

The R function can be downloaded from here
Corrections and remarks can be added in the comments bellow, or on the github code page.

* * * *

Hi, I recently ran into the problem that I needed a Siegel-Tukey test for equal variability based on ranks. Maybe there is a package that has it implemented, but I could not find it. So I programmed an R function to do it. The Siegel-Tukey test requires to recode the ranks so that they express variability rather than ascending order. This is essentially what the code further below does. After the rank transformation, a regular Mann-Whitney U test is applied. The “manual” and code are pasted below.

Description:  Non-parametric Siegel-Tukey test for equality in variability. The null hypothesis is that the variability of x is equal between two groups. A rejection of the null indicates that variability differs between
the two groups.

Usage:

# Loading the function
source("http://www.r-statistics.com/wp-content/uploads/2012/01/source_https.r.txt") # Making sure we can source code from github
source_https("https://raw.github.com/talgalili/R-code-snippets/master/siegel.tukey.r")
# Using the function
siegel.tukey(x,y,id.col=FALSE,adjust.median=FALSE,rnd=8, ...)

Arguments:

x: a vector of data

y: Data of the second group (if id.col=FALSE) or group indicator (if id.col=TRUE). In the latter case, y MUST take 1 or 0 to indicate observations of group 1 and 0, respectively, and x must contain the data for both groups.

id.col: If FALSE (default), then x and y are the data vectors (columns) for group 1 and 0, respectively. If TRUE, the y is the group indicator.

adjust.median: Should between-group differences in medians be leveled before performing the test? In certain cases, the Siegel-Tukey test is susceptible to median differences and may indicate significant differences in variability that, in reality, stem from differences in medians.

rnd: Should the data be rounded and, if so, to which decimal? The default (-1) uses the data as is. Otherwise, rnd must be a non-negative integer. Typically, this option is not needed. However, occasionally, differences in
the precision with which certain functions return values cause the merging of two data frames to fail within the siegel.tukey function. Only then  rounding is necessary. This operation should not be performed if it affects
the ranks of observations.

… arguments passed on to the Wilcoxon test. See ?wilcox.test

Value: Among other output, the function returns rank sums for the two groups, the associated Wilcoxon’s W, and the p-value for a Wilcoxon test on tie-adjusted Siegel-Tukey ranks (i.e., it performs and returns a
Siegel-Tukey test). If significant, the group with the smaller rank sum has greater variability.

References: Sidney Siegel and John Wilder Tukey (1960) “A nonparametric sum of ranks procedure for relative spread in unpaired samples.” Journal of the
American Statistical Association. See also, David J. Sheskin (2004) ”Handbook of parametric and nonparametric statistical procedures.” 3rd
edition. Chapman and Hall/CRC. Boca Raton, FL.

Notes: The Siegel-Tukey test has relatively low power and may, under certain conditions, indicate significance due to differences in medians rather than
differences in variabilities (consider using the argument adjust.median).

Output (in this order)

1. Group medians
2. Wilcoxon-test for between-group differences in median (after the median
adjustment if specified)
3. Unique values of x and their tie-adjusted Siegel-Tukey ranks
4. Xs of group 0 and their tie-adjusted Siegel-Tukey ranks
5. Xs of group 1 and their tie-adjusted Siegel-Tukey ranks
6. Siegel-Tukey test (Wilcoxon test on tie-adjusted Siegel-Tukey ranks)

The R code:

Update: The R function was moved to github, and corrected from a few mistakes found by some of the sharp readers of this blog. The R function can be downloaded from here

Here is an example of its usage, and output:

 
######################
# Loading the functions
######################
 
source("http://www.r-statistics.com/wp-content/uploads/2012/01/source_https.r.txt") # Making sure we can source code from github
source_https("https://raw.github.com/talgalili/R-code-snippets/master/siegel.tukey.r")
 
######################
# Examples:
######################
 
### 1
x=c(4,4,5,5,6,6)
y=c(0,0,1,9,10,10)
siegel.tukey(x,y, F)
siegel.tukey(x,y) #same as above
 
### 2
# example for a non equal number of cases:
x=c(4,4,5,5,6,6)
y=c(0,0,1,9,10)
siegel.tukey(x,y,F)
 
### 3
x <- c(33, 62, 84, 85, 88, 93, 97, 4, 16, 48, 51, 66, 98)
id <- c(0,0,0,0,0,0,0,1,1,1,1,1,1)
siegel.tukey(x,id,T)
siegel.tukey(x~id) # from now on, this also works as a function...
siegel.tukey(x,id,T,adjust.median=F,exact=T)
 
### 4
x<-c(177,200,227,230,232,268,272,297,47,105,126,142,158,172,197,220,225,230,262,270)
id<-c(rep(0,8),rep(1,12))
siegel.tukey(x,id,T,adjust.median=T)
 
 
### 5
x=c(33,62,84,85,88,93,97)
y=c(4,16,48,51,66,98) 
siegel.tukey(x,y)
 
### 6
x<-c(0,0,1,4,4,5,5,6,6,9,10,10)
id<-c(0,0,0,1,1,1,1,1,1,0,0,0)
siegel.tukey(x,id,T)
 
### 7
x <- c(85,106,96, 105, 104, 108, 86)
id<-c(0,0,1,1,1,1,1)
siegel.tukey(x,id,T)

Here is the code’s output:

 
> 
> ### 1
> x=c(4,4,5,5,6,6)
> y=c(0,0,1,9,10,10)
> siegel.tukey(x,y, F)
 
Median of group 1 = 5
Median of group 2 = 5
 
Testing median differences... 
 
        Wilcoxon rank sum test with continuity correction
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 18, p-value = 1
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
   sort.x sort.id unique.ranks
1       0       1          2.5
2       0       1          2.5
3       1       1          5.0
4       4       0          8.5
5       4       0          8.5
6       5       0         11.5
7       5       0         11.5
8       6       0          8.5
9       6       0          8.5
10      9       1          6.0
11     10       1          2.5
12     10       1          2.5
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 9.5
Mean rank of group 1: 3.5
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 36, p-value = 0.003601
alternative hypothesis: true location shift is not equal to 0 
 
Warning message:
In wilcox.test.default(data$x[data$y == 0], data$x[data$y == 1]) :
  cannot compute exact p-value with ties
> siegel.tukey(x,y) #same as above
 
Median of group 1 = 4
Median of group 2 = 5
 
Testing median differences... 
 
        Wilcoxon rank sum test with continuity correction
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 0, p-value = 0.4795
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
  sort.x sort.id unique.ranks
1      4       0          2.5
2      4       0          2.5
3      5       1          5.5
4      5       9          5.5
5      6      10          2.5
6      6      10          2.5
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 2.5
Mean rank of group 1: 5.5
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 0, p-value = 0.4795
alternative hypothesis: true location shift is not equal to 0 
 
Warning message:
In wilcox.test.default(data$x[data$y == 0], data$x[data$y == 1]) :
  cannot compute exact p-value with ties
> 
> ### 2
> # example for a non equal number of cases:
> x=c(4,4,5,5,6,6)
> y=c(0,0,1,9,10)
> siegel.tukey(x,y,F)
 
Median of group 1 = 5
Median of group 2 = 1
 
Testing median differences... 
 
        Wilcoxon rank sum test with continuity correction
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 18, p-value = 0.6451
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
   sort.x sort.id unique.ranks
1       0       1          2.5
2       0       1          2.5
3       1       1          5.0
4       4       0          8.5
5       4       0          8.5
6       5       0         10.5
7       5       0         10.5
8       6       0          6.5
9       6       0          6.5
10      9       1          3.0
11     10       1          2.0
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 8.5
Mean rank of group 1: 3
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 30, p-value = 0.007546
alternative hypothesis: true location shift is not equal to 0 
 
Warning message:
In wilcox.test.default(data$x[data$y == 0], data$x[data$y == 1]) :
  cannot compute exact p-value with ties
> 
> ### 3
> x <- c(33, 62, 84, 85, 88, 93, 97, 4, 16, 48, 51, 66, 98)
> id <- c(0,0,0,0,0,0,0,1,1,1,1,1,1)
> siegel.tukey(x,id,T)
 
Median of group 1 = 85
Median of group 2 = 49.5
 
Testing median differences... 
 
        Wilcoxon rank sum test
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 31, p-value = 0.1807
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
   sort.x sort.id unique.ranks
1       4       1            1
2      16       1            4
3      33       0            5
4      48       1            8
5      51       1            9
6      62       0           12
7      66       1           13
8      84       0           11
9      85       0           10
10     88       0            7
11     93       0            6
12     97       0            3
13     98       1            2
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 7.714286
Mean rank of group 1: 6.166667
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 26, p-value = 0.5203
alternative hypothesis: true location shift is not equal to 0 
 
> siegel.tukey(x~id) # from now on, this also works as a function...
 
Median of group 1 = 85
Median of group 2 = 49.5
 
Testing median differences... 
 
        Wilcoxon rank sum test
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 31, p-value = 0.1807
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
   sort.x sort.id unique.ranks
1       4       1            1
2      16       1            4
3      33       0            5
4      48       1            8
5      51       1            9
6      62       0           12
7      66       1           13
8      84       0           11
9      85       0           10
10     88       0            7
11     93       0            6
12     97       0            3
13     98       1            2
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 7.714286
Mean rank of group 1: 6.166667
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 26, p-value = 0.5203
alternative hypothesis: true location shift is not equal to 0 
 
> siegel.tukey(x,id,T,adjust.median=F,exact=T)
 
Median of group 1 = 85
Median of group 2 = 49.5
 
Testing median differences... 
 
        Wilcoxon rank sum test
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 31, p-value = 0.1807
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
   sort.x sort.id unique.ranks
1       4       1            1
2      16       1            4
3      33       0            5
4      48       1            8
5      51       1            9
6      62       0           12
7      66       1           13
8      84       0           11
9      85       0           10
10     88       0            7
11     93       0            6
12     97       0            3
13     98       1            2
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 7.714286
Mean rank of group 1: 6.166667
 
        Wilcoxon rank sum test
 
data:  ranks0 and ranks1 
W = 26, p-value = 0.5338
alternative hypothesis: true location shift is not equal to 0 
 
> 
> ### 4
> x<-c(177,200,227,230,232,268,272,297,47,105,126,142,158,172,197,220,225,230,262,270)
> id<-c(rep(0,8),rep(1,12))
> siegel.tukey(x,id,T,adjust.median=T)
 
Adjusting medians...
 
Median of group 1 = 0
Median of group 2 = 0
 
Testing median differences... 
 
        Wilcoxon rank sum test
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 52, p-value = 0.7921
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
   sort.x sort.id unique.ranks
1  -137.5       1            1
2   -79.5       1            4
3   -58.5       1            5
4   -54.0       0            8
5   -42.5       1            9
6   -31.0       0           12
7   -26.5       1           13
8   -12.5       1           16
9    -4.0       0           17
10   -1.0       0           20
11    1.0       0           19
12   12.5       1           18
13   35.5       1           15
14   37.0       0           14
15   40.5       1           11
16   41.0       0           10
17   45.5       1            7
18   66.0       0            6
19   77.5       1            3
20   85.5       1            2
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 13.25
Mean rank of group 1: 8.666667
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 70, p-value = 0.09716
alternative hypothesis: true location shift is not equal to 0 
 
> 
> 
> ### 5
> x=c(33,62,84,85,88,93,97)
> y=c(4,16,48,51,66,98) 
> siegel.tukey(x,y)
Error in data.frame(x, y) : 
  arguments imply differing number of rows: 7, 6
> 
> ### 6
> x<-c(0,0,1,4,4,5,5,6,6,9,10,10)
> id<-c(0,0,0,1,1,1,1,1,1,0,0,0)
> siegel.tukey(x,id,T)
 
Median of group 1 = 5
Median of group 2 = 5
 
Testing median differences... 
 
        Wilcoxon rank sum test with continuity correction
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 18, p-value = 1
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
   sort.x sort.id unique.ranks
1       0       0          2.5
2       0       0          2.5
3       1       0          5.0
4       4       1          8.5
5       4       1          8.5
6       5       1         11.5
7       5       1         11.5
8       6       1          8.5
9       6       1          8.5
10      9       0          6.0
11     10       0          2.5
12     10       0          2.5
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 3.5
Mean rank of group 1: 9.5
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 0, p-value = 0.003601
alternative hypothesis: true location shift is not equal to 0 
 
Warning message:
In wilcox.test.default(data$x[data$y == 0], data$x[data$y == 1]) :
  cannot compute exact p-value with ties
> 
> ### 7
> x <- c(85,106,96, 105, 104, 108, 86)
> id<-c(0,0,1,1,1,1,1)
> siegel.tukey(x,id,T)
 
Median of group 1 = 95.5
Median of group 2 = 104
 
Testing median differences... 
 
        Wilcoxon rank sum test
 
data:  data$x[data$y == 0] and data$x[data$y == 1] 
W = 4, p-value = 0.8571
alternative hypothesis: true location shift is not equal to 0 
 
Performing Siegel-Tukey rank transformation... 
 
  sort.x sort.id unique.ranks
1     85       0            1
2     86       1            4
3     96       1            5
4    104       1            7
5    105       1            6
6    106       0            3
7    108       1            2
 
Performing Siegel-Tukey test...
 
Mean rank of group 0: 2
Mean rank of group 1: 4.8
 
        Wilcoxon rank sum test with continuity correction
 
data:  ranks0 and ranks1 
W = 1, p-value = 0.1752
alternative hypothesis: true location shift is not equal to 0
  • opossum

    The code for determining ranks is buggy. Check, e.g.,

    x1 <- c(85, 106)
    x2 <- c(96, 105, 104, 108, 86)
    iv <- rep(1:2, c(length(x1), length(x2)))
    siegel.tukey(c(x1, x2), iv, id.col=TRUE)

    The rank for the middle element (104) should be 7, but it's calculated as 8. If N is the length of the combined data, this works:

    TF <- rep(c(TRUE, FALSE, FALSE, TRUE), ceiling(N/4))
    up <- TF[1:min(N, length(TF))]
    Rup <- rank(X)[up]
    Rdown <- rev(rank(X)[!up])
    Rx <- c(Rup, Rdown)

    • http://www.talgalili.com Tal Galili

      Thank you for catching (and reporting!) this opossum.

      I’ve updated the code with your corrections.

  • melissa

    Hello,

    First of all, thank you for sharing your code.
    I have got some questions/notices concerning it:
    - It seems that the line:
    “print(wilcox.test(data$x[data$y==1],data$x[data$y==y]))” provides an error while looking with data with decimals.

    (I just remove it and no more errors when there are decimals)

    - The adjustment of the medians does not seem to work.
    Below is an example of the fact that it does not work:
    ### adjust.median=F
    x<-c(177,200,227,230,232,268,272,297)
    y<-c(47,105,126,142,158,172,197,220,225,230,262,270)
    siegel.tukey(x,y)
    ## pval : 0.9385

    ### adjust.median=T
    x<-c(177,200,227,230,232,268,272,297)
    y<-c(47,105,126,142,158,172,197,220,225,230,262,270)
    siegel.tukey(x,y,adjust.median=T)
    ## pval : 0.9079

    ### by adjusting before the medians

    x<-c(177,200,227,230,232,268,272,297)
    y<-c(47,105,126,142,158,172,197,220,225,230,262,270)
    medx<-median(x)
    medymedy){
    x <-x -(medx-medy)
    }
    if (medx<medy){
    y <- y -(medy-medx)
    }
    siegel.tukey(x,y)
    ### pval: 0.09716
    We do not have the same pvalues (at all as you can notice).

    Just for your information, these two vectors come from a book (i can give you the reference) and the pvalue they found is 0.0976, by doing the median adjustment.

    For the moment, I do not know how to correct it (sorry for that, I will do this adjustment manually before), but I can in the future propose something if you are interested of course.
    By the way I’m just wondering whether this test is meaningful if we don’t adjust the medians… for me no but I may be wrong.

    Best regards,

    Mélissa

  • Daniel Malter

    My original function was suitable only if x and y were of the same length. Thanks for fixing this. However, the function as shown above returns incorrect ranks, as well. Take the example in Sheskin’s book:

    x=c(4,4,5,5,6,6)
    y=c(0,0,1,9,10,10)The ranks for 

    c(0,0,1,4,4,5,5,6,6,9,10,10)
     
    should be

    1, 4, 5, 8, 9, 12, 11, 10, 7, 6, 3, 2

    so that the adjusted ranks would be

    2.5, 2.5, 5, 8.5, 8.5, 11.5, 11.5, 8.5, 8.5, 6, 2.5, 2.5

    However, currently the function returns:

    unique values of x tie-adjusted Siegel-Tukey rank                  0                           3.00                  1                           4.50                  4                           8.50                  5                          11.50                  6                           8.25                  9                           6.50                 10                           2.25

  • Daniel Malter

    The issues have been fixed. Tal will certainly upload the code I sent him soon.

  • Daniel Malter

    Original post of the corrected code: https://stat.ethz.ch/pipermail/r-help/2012-February/304958.html

  • Hurstrd198

    I found that by changing line 18 the code can handle x and y of different lengths.
    print(wilcox.test(data$x[data$y==1],data$x[data$y==y])) should beprint(wilcox.test(data$x[data$y==1],data$x[data$y==2]))