If i have a data frame which looks like this:
x y
13 a
14 b
15 c
15 c
14 b
and I wanted each group of equal rows to have a unique id, like this:
x y id
13 a 1
14 b 2
15 c 3
15 c 3
14 b 2
Is there any easy way of doing this?
Thanks
If i have a data frame which looks like this:
x y
13 a
14 b
15 c
15 c
14 b
and I wanted each group of equal rows to have a unique id, like this:
x y id
13 a 1
14 b 2
15 c 3
15 c 3
14 b 2
Is there any easy way of doing this?
Thanks
I have a bit of a concern with the paste0 approach. If your columns contained more complex data, you could end up with surprising results, e.g. imagine:
x y
ab c
a bc
One solution is to replace paste0(...) with paste(..., sep = "@"). Even so, you cannot come up with a sep general enough that it will work with any type of data as there is always a non-zero probability that sep will be contained in some kind of data.
A more robust approach is to use a split/transform/combine approach. You can certainly do it with the base package but plyr makes it a bit easier:
library(plyr)
.idx <- 0L
ddply(df, colnames(df), transform, id = (.idx <<- .idx + 1L))
If this is too slow, I would recommend a data.table approach, as proposed here: data.table "key indices" or "group counter"
This is the first thing I thought:
Make a new variable which just combines the two columns by pasting their values to strings:
a<-paste0(z$x,z$y) #z is your data.frame
The make this as a factor and combine it to your dataframe:
cbind(z,id=factor(a,labels=1:length(unique(a))))
EDIT: @flodel was concerned about using paste0, it's better to use ordinary paste, or interaction:
a<-interaction(z,drop=TRUE)
cbind(z,id=factor(a,labels=1:length(unique(a))))
This is assuming that you want to separate x=ab, y=c, and x=a,y=bc. If not, then use paste0.