-2

I know how to do basic stuff in R, but I am still a newbie. I am also probably asking a pretty redundant question (but I don't know how to enter it into google so that I find the right hits).

I have been getting hits like the below:

Assign value to group based on condition in column

R - Group by variable and then assign a unique ID

I want to assign subgroups into groups, and create a new column out of them. I have data like the following:

dataframe:

ID    SubID    Values
1     15       0.5
1     15       0.2
2     13       0.1
2     13       0
1     14       0.3
1     14       0.3
2     10       0.2
2     10       1.6
6     31       0.7
6     31       1.0

new dataframe:

ID    SubID    Values   groups
1     15       0.5      2
1     15       0.2      2
2     13       0.1      2
2     13       0        2
1     14       0.3      1
1     14       0.3      1
2     10       0.2      1
2     10       1.6      1
6     31       0.7      1
6     31       1.0      1

I have tried the following in R, but I am not getting the desired results:

newdataframe$groups <- dataframe %>% group_indices(,dataframe$ID, dataframe$SubID)
newdataframe<- dataframe %>% group_by(ID, SubID) %>% mutate(groups=group_indices(,dataframe$ID, dataframe$SubID))

I am not sure how to frame the question in R. I want to group by ID, and SubID, and then assign those subgroups in that are grouped by IDs and reset the the grouping count on each ID.

Any help would be really appreciated.

ENZSIO
  • 183
  • 3
  • 11
  • 2
    I don't think you should be calling `group_indices` directly. Can you describe how you determine which rows go into which groups? – r2evans Sep 01 '19 at 04:49
  • Every ID has some number of SubIDs. Each SubID may occur more than one time. I want to group the SubIDs for each group. This is to compare all ID SubIDs group 1's together, group 2's together, etc. Also, why is it inappropriate to call group_indices directly? – ENZSIO Sep 01 '19 at 05:11
  • 4
    WHAT IS THE RULE FOR GROUPING??? Learn to explain in natural language. An example is often ambiguous. As your most certainly is. – IRTFM Sep 01 '19 at 06:37
  • Grouping is on ID and SubID. I would like to rank (or count the unique sub IDs asign as groups) the subIDs. So that the first SubID of each ID belongs to group 1, the second SubID of each ID belongs to group 2, etc. – ENZSIO Sep 01 '19 at 17:27
  • BTW: ENZSIO, I found a good counter-example to my statement of not calling `group_indices` directly (https://stackoverflow.com/a/57808024). I still think it wasn't necessarily right *here*, but now I'll question my assumption that it's primary use was behind-the-scenes. Cheers. – r2evans Sep 05 '19 at 15:02

2 Answers2

3

Here is an alternative approach which uses the rleid() function from the data.table package. rleid() generates a run-length type id column.

According to the expected result, the OP expects SubId to be numbered by order of value and not by order of appearance. Therefore, we need to call arrange().

library(dplyr)
df %>% 
  group_by(ID) %>% 
  arrange(SubID) %>% 
  mutate(groups = data.table::rleid(SubID))
      ID SubID Values groups
   <int> <int>  <dbl>  <int>
 1     2    10    0.2      1
 2     2    10    1.6      1
 3     2    13    0.1      2
 4     2    13    0        2
 5     1    14    0.3      1
 6     1    14    0.3      1
 7     1    15    0.5      2
 8     1    15    0.2      2
 9     6    31    0.7      1
10     6    31    1        1

Note that the row order has changed.

BTW: With data.table, the code is less verbose and the original row order is maintained:

library(data.table)
setDT(df)[order(ID, SubID), groups := rleid(SubID), by = ID][]
    ID SubID Values groups
 1:  1    15    0.5      2
 2:  1    15    0.2      2
 3:  2    13    0.1      2
 4:  2    13    0.0      2
 5:  1    14    0.3      1
 6:  1    14    0.3      1
 7:  2    10    0.2      1
 8:  2    10    1.6      1
 9:  6    31    0.7      1
10:  6    31    1.0      1
Uwe
  • 41,420
  • 11
  • 90
  • 134
  • Thanks Uwe! The first example slighty work, but assigned everything to 1 for some odd reason. The second example with data.table did the trick: setDT(df)[order(ID, SubID), groups := rleid(SubID), by = ID][]. I am going to read into the data.table documentation today and learn about it. Thank you :). – ENZSIO Sep 01 '19 at 17:37
2

There are multiple ways to do this one way would be to group_by ID and create a unique number for each SubID by converting it to factor and then to integer.

library(dplyr)

df %>%
  group_by(ID) %>%
  mutate(groups = as.integer(factor(SubID)))

#     ID SubID Values groups
#   <int> <int>  <dbl>  <int>
# 1     1    15    0.5      2
# 2     1    15    0.2      2
# 3     2    13    0.1      2
# 4     2    13    0        2
# 5     1    14    0.3      1
# 6     1    14    0.3      1
# 7     2    10    0.2      1
# 8     2    10    1.6      1
# 9     6    31    0.7      1
#10     6    31    1        1

In base R, we can use ave with similar logic

df$groups <- with(df, ave(SubID, ID, FUN = factor))
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • df$groups <- with(df, ave(SubID, ID, FUN = factor)) worked for my data, but I am trying to reset the grouping number back to 1 for each ID. – ENZSIO Sep 01 '19 at 05:42
  • @ENZSIO It already resets the count for each `ID` to 1, right? – Ronak Shah Sep 01 '19 at 05:48
  • I tried it on my data, but it counts up instead of reseting (my IDs are numerical, and my SubIDs are ). The dataset I provided is an example (cannot post any real data). – ENZSIO Sep 01 '19 at 05:52
  • @ENZSIO Can you try the `dplyr` alternative and check if it returns you the same result ? – Ronak Shah Sep 01 '19 at 06:12
  • I tried both. The first one created unique IDs for the groups. The alternative came closer, but still wan't it. I would just like to compare ID's SubIDs belonging to group 1, or compare all group 2s. – ENZSIO Sep 01 '19 at 06:15
  • Sorry, I didn't get it. Did the `dplyr` alternative work or not ? For `ave`, can you try . `df$groups <- as.integer(with(df, ave(as.character(SubID), ID, FUN = factor)))` and see if this is what you want ? – Ronak Shah Sep 01 '19 at 06:16
  • Its okay. The alternative did not work. I did learn a lot by you stepping in and answering. :) I will be sure to try yours out as well. – ENZSIO Sep 01 '19 at 17:44