I have an example project and need to search for strings using the stringr package. In the example, to eliminate other case spellings I started with str_to_lower(example$remarks), which made the remarks all lower case. The remarks column describes residential properties.
I need to search for the word "shop". However, the word "shopping" is also in the remarks column and I don't want that word.
Some observations: a) Have only the word "shop"; b) Have only the word "shopping"; c) Have neither the words "shop" or "shopping"; d) Have BOTH the words "shop" & "shopping".
When using str_detect(), I want it to give me a TRUE for detecting the word "shop", but I DO NOT want it to give me a TRUE for detecting the string "shop" within the word "shopping". Currently, if I run str_detect(example$remarks, "shop") I get a TRUE for both the words "shop" and "shopping". Effectively, I ONLY want a TRUE for the 4-character string "shop" and if the characters "shop" appear but have any other characters after it like shop(ping), I want the code to exclude detecting it and not identifying it as TRUE.
Also, if the remarks contain BOTH the words "shop" and "shopping", I would like the result to be TRUE only for detecting "shop" but not "shopping".
Ultimately, I'm hoping one line of code using str_detect() can give me the result of:
- If the remarks observation has only the word "shop" =
TRUE - If the remarks observation has only the word "shopping" =
FALSE - If the remarks observation has neither the words "shop" or "shopping" =
FALSE - If the remarks observation has both the words "shop" AND "shopping" =
TRUEfor detecting ONLY the 4-character string "shop" and it DOES not output aTRUEbecause of the word "shopping".
I need all of the observations to remain in the dataset and cannot exclude them because I need to create a new column, which I have labeled shop_YN, that give a "Yes" for observations with only the 4-character string "shop". Once I have the correct str_detect() code, I plan to wrap the results in a mutate() and if_else() function as follows (except I don't know what to code to use inside str_detect() to get the results I need):
shop_YN <- example %>% mutate(shop_YN = if_else(str_detect(example$remarks, ), "Yes", "No"))
Here is a sample of the data using the dput():
structure(list(price = c(195000, 213000, 215000, 240000, 241000,
250000, 255000, 256500, 260000, 263500, 265000, 277000, 280000,
280000, 150000), remarks = c("large home with a 1200 sf shop. great location close to shopping.",
"updated home close to shopping & schools.", "nice location. 2br home with updating.",
"huge shop on property!", "close to shopping.", "updated, clean, great location, garage.",
"close to shopping and massive shop on property.", "updated home near shopping, schools, restaurants.",
"large home with updated interior.", "close to schools, updated, stick-built shop 1500sf.",
"home and shop.", "near schools, shopping, restaurants. partially updated home.",
"located close to shopping. high quality home with shop in backyard.",
"brick 2-story. lots of shopping near by. detached garage and large shop in backyard.",
"fixer! needs work.")), row.names = c(NA, -15L), class = c("tbl_df",
"tbl", "data.frame"))