Siegel-Tukey: a Non-parametric test for equality in variability (R code)

Daniel Malter just shared on the R mailing list (link to the thread) his code for performing the Siegel-Tukey (Nonparametric) test for equality in variability.
Excited about the find, I contacted Daniel asking if I could republish his code here, and he kindly replied “yes”.
From here on I copy his note at full.

p.s: (The R function can be downloaded from here)

* * * *

Hi, I recently ran into the problem that I needed a Siegel-Tukey test for equal variability based on ranks. Maybe there is a package that has it implemented, but I could not find it. So I programmed an R function to do it. The Siegel-Tukey test requires to recode the ranks so that they express variability rather than ascending order. This is essentially what the code further below does. After the rank transformation, a regular Mann-Whitney U test is applied. The “manual” and code are pasted below.

Description:  Non-parametric Siegel-Tukey test for equality in variability. The null hypothesis is that the variability of x is equal between two groups. A rejection of the null indicates that variability differs between
the two groups.

Usage:

1
siegel.tukey(x,y,id.col=FALSE,adjust.median=FALSE,rnd=8, ...)

Arguments:

x: a vector of data

y: Data of the second group (if id.col=FALSE) or group indicator (if id.col=TRUE). In the latter case, y MUST take 1 or 2 to indicate observations of group 1 and 2, respectively, and x must contain the data for both groups.

id.col: If FALSE (default), then x and y are the data columns for group 1 and 2, respectively. If TRUE, the y is the group indicator.

adjust.median: Should between-group differences in medians be leveled before performing the test? In certain cases, the Siegel-Tukey test is susceptible to median differences and may indicate significant differences in variability that, in reality, stem from differences in medians.

rnd: Should the data be rounded and, if so, to which decimal? The default (-1) uses the data as is. Otherwise, rnd must be a non-negative integer. Typically, this option is not needed. However, occasionally, differences in
the precision with which certain functions return values cause the merging of two data frames to fail within the siegel.tukey function. Only then  rounding is necessary. This operation should not be performed if it affects
the ranks of observations.

… arguments passed on to the Wilcoxon test. See ?wilcox.test

Value: Among other output, the function returns rank sums for the two groups, the associated Wilcoxon’s W, and the p-value for a Wilcoxon test on tie-adjusted Siegel-Tukey ranks (i.e., it performs and returns a
Siegel-Tukey test). If significant, the group with the smaller rank sum has greater variability.

References: Sidney Siegel and John Wilder Tukey (1960) “A nonparametric sum of ranks procedure for relative spread in unpaired samples.” Journal of the
American Statistical Association. See also, David J. Sheskin (2004) ”Handbook of parametric and nonparametric statistical procedures.” 3rd
edition. Chapman and Hall/CRC. Boca Raton, FL.

Notes: The Siegel-Tukey test has relatively low power and may, under certain conditions, indicate significance due to differences in medians rather than
differences in variabilities (consider using the argument adjust.median).

Output (in this order)

1. Group medians
2. Wilcoxon-test for between-group differences in median (after the median
adjustment if specified)
3. Unique values of x and their tie-adjusted Siegel-Tukey ranks
4. Xs of group 1 and their tie-adjusted Siegel-Tukey ranks
5. Xs of group 2 and their tie-adjusted Siegel-Tukey ranks
6. Siegel-Tukey test (Wilcoxon test on tie-adjusted Siegel-Tukey ranks)

And here is the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
siegel.tukey=function(x,y,id.col=FALSE,adjust.median=F,rnd=-1,alternative="two.sided",mu=0,paired=FALSE,exact=FALSE,correct=TRUE,conf.int=FALSE,conf.level=0.95){
 if(id.col==FALSE){
   data=data.frame(c(x,y),rep(c(1,2),c(length(x),length(y))))
   } else {
	data=data.frame(x,y)
   }
 names(data)=c("x","y")
 data=data[order(data$x),]
 if(rnd>-1){data$x=round(data$x,rnd)}
 
 if(adjust.median==T){
	data$x[data$y==1]=data$x[data$y==1]-(median(data$x[data$y==1])-median(data$x[data$y==2]))/2
	data$x[data$y==2]=data$x[data$y==2]-(median(data$x[data$y==2])-median(data$x[data$y==1]))/2
 }
 cat("Median of group 1 = ",median(data$x[data$y==1]),"\n")
 cat("Median of group 2 = ",median(data$x[data$y==2]),"\n","\n")
 cat("Test of median differences","\n")
 print(wilcox.test(data$x[data$y==1],data$x[data$y==y]))
 
 a=rep(seq(ceiling(length(data$x)/4)),each=2)
 b=rep(c(0,1),ceiling(length(data$x)/4))
 rk.up=c(1,(a*4+b))[1:ceiling(length(data$x)/2)]
 rk.down=rev(c(a*4+b-2)[1:floor(length(data$x)/2)])
 
 cat("Performing Siegel-Tukey rank transformation...","\n","\n")
 
 rks=c(rk.up,rk.down)
 unqs=unique(sort(data$x))
 corr.rks=tapply(rks,data$x,mean)
 cbind(unqs,corr.rks)
 rks.data=data.frame(unqs,corr.rks)
 names(rks.data)=c("unique values of x","tie-adjusted Siegel-Tukey rank")
 print(rks.data,row.names=F)
 names(rks.data)=c("unqs","corr.rks")
 data=merge(data,rks.data,by.x="x",by.y="unqs")
 
 rk1=data$corr.rks[data$y==1]
 rk2=data$corr.rks[data$y==2]
 cat("\n","Tie-adjusted Siegel-Tukey ranks of group 1","\n")
 group1=data.frame(data$x[data$y==1],rk1)
 names(group1)=c("x","rank")
 print(group1,row.names=F)
 cat("\n","Tie-adjusted Siegel-Tukey ranks of group 2","\n")
 group2=data.frame(data$x[data$y==2],rk2)
 names(group2)=c("x","rank")
 print(group2,row.names=F)
 cat("\n")
 
 cat("Siegel-Tukey test","\n")
 cat("Siegel-Tukey rank transformation performed.","Tie adjusted ranks computed.","\n")
 if(adjust.median==T) {cat("Medians adjusted to equality.","\n")} else {cat("Medians not adjusted.","\n")}
 cat("Rank sum of group 1 =", sum(rk1),"    Rank sum of group 2 =",sum(rk2),"\n")
 
 
 print(wilcox.test(rk1,rk2,alternative=alternative,mu=mu,paired=paired,exact=exact,correct=correct,conf.int=conf.int,conf.level=conf.level))
}
 
#Example:
 
x=c(4,4,5,5,6,6)
y=c(0,0,1,9,10,10)
 
siegel.tukey(x,y)
Here is the code output:
Median of group 1 =  5
Median of group 2 =  5
Test of median differences
Wilcoxon rank sum test with continuity correction
data:  data$x[data$y == 1] and data$x[data$y == y]
W = 1, p-value = 0.4274
alternative hypothesis: true location shift is not equal to 0
Performing Siegel-Tukey rank transformation...
unique values of x tie-adjusted Siegel-Tukey rank
0                            2.5
1                            5.0
4                            8.5
5                           11.5
6                            8.5
9                            6.0
10                            2.5
Tie-adjusted Siegel-Tukey ranks of group 1
x rank
4  8.5
4  8.5
5 11.5
5 11.5
6  8.5
6  8.5
Tie-adjusted Siegel-Tukey ranks of group 2
x rank
0  2.5
0  2.5
1  5.0
9  6.0
10  2.5
10  2.5
Siegel-Tukey test
Siegel-Tukey rank transformation performed. Tie adjusted ranks computed.
Medians not adjusted.
Rank sum of group 1 = 57     Rank sum of group 2 = 21
Wilcoxon rank sum test with continuity correction
data:  rk1 and rk2
W = 36, p-value = 0.003601
alternative hypothesis: true location shift is not equal to 0
Warning message:
In wilcox.test.default(data$x[data$y == 1], data$x[data$y == y]) :
cannot compute exact p-value with ties

Median of group 1 =  5 Median of group 2 =  5  Test of median differences Wilcoxon rank sum test with continuity correction data:  data$x[data$y == 1] and data$x[data$y == y] W = 1, p-value = 0.4274alternative hypothesis: true location shift is not equal to 0 Performing Siegel-Tukey rank transformation...   unique values of x tie-adjusted Siegel-Tukey rank                  0                            2.5                  1                            5.0                  4                            8.5                  5                           11.5                  6                            8.5                  9                            6.0                 10                            2.5 Tie-adjusted Siegel-Tukey ranks of group 1  x rank 4  8.5 4  8.5 5 11.5 5 11.5 6  8.5 6  8.5 Tie-adjusted Siegel-Tukey ranks of group 2   x rank  0  2.5  0  2.5  1  5.0  9  6.0 10  2.5 10  2.5 Siegel-Tukey test Siegel-Tukey rank transformation performed. Tie adjusted ranks computed. Medians not adjusted. Rank sum of group 1 = 57     Rank sum of group 2 = 21 Wilcoxon rank sum test with continuity correction data:  rk1 and rk2 W = 36, p-value = 0.003601alternative hypothesis: true location shift is not equal to 0 Warning message:In wilcox.test.default(data$x[data$y == 1], data$x[data$y == y]) :  cannot compute exact p-value with ties

(The R function can be downloaded from here)

Related posts:

  1. Post hoc analysis for Friedman’s Test (R code)
  2. Barnard’s exact test – a powerful alternative for Fisher’s exact test (implemented in R)

Tags: , , , , , , , , ,
Posted under: R, statistics

Leave a Reply