4

If i have a data frame which looks like this:

x y
13 a
14 b
15 c
15 c
14 b

and I wanted each group of equal rows to have a unique id, like this:

x y id
13 a 1
14 b 2
15 c 3
15 c 3
14 b 2

Is there any easy way of doing this?

Thanks

Omar Wagih
  • 8,504
  • 7
  • 59
  • 75

2 Answers2

4

I have a bit of a concern with the paste0 approach. If your columns contained more complex data, you could end up with surprising results, e.g. imagine:

 x  y
ab  c
 a bc

One solution is to replace paste0(...) with paste(..., sep = "@"). Even so, you cannot come up with a sep general enough that it will work with any type of data as there is always a non-zero probability that sep will be contained in some kind of data.

A more robust approach is to use a split/transform/combine approach. You can certainly do it with the base package but plyr makes it a bit easier:

library(plyr)
.idx <- 0L
ddply(df, colnames(df), transform, id = (.idx <<- .idx + 1L))    

If this is too slow, I would recommend a data.table approach, as proposed here: data.table "key indices" or "group counter"

Community
  • 1
  • 1
flodel
  • 87,577
  • 21
  • 185
  • 223
  • Good point about `paste0`, I added a better solution which is actually more neater than the original answer. – Jouni Helske Mar 08 '13 at 21:39
  • @Hemmo. I think using `interaction` is equivalent to using `paste(..., sep = '.')`; theoretically, it suffers the same (unlikely) problem I was discussing. – flodel Mar 08 '13 at 21:43
  • Oh yes, you are right that they actually produce the same thing, but they actually both work correctly in situation you discussed, as you get `ab.c` and `a.bc` which are distinct. And I guess that is what is wanted. `paste0` doesn't work properly though (that works if separation is not wanted). – Jouni Helske Mar 08 '13 at 21:48
3

This is the first thing I thought:

Make a new variable which just combines the two columns by pasting their values to strings:

a<-paste0(z$x,z$y) #z is your data.frame

The make this as a factor and combine it to your dataframe:

cbind(z,id=factor(a,labels=1:length(unique(a))))

EDIT: @flodel was concerned about using paste0, it's better to use ordinary paste, or interaction:

a<-interaction(z,drop=TRUE)
cbind(z,id=factor(a,labels=1:length(unique(a))))

This is assuming that you want to separate x=ab, y=c, and x=a,y=bc. If not, then use paste0.

Jouni Helske
  • 6,427
  • 29
  • 52