0

I am not sure if this is an R/R Studio question per se or a Java issue related to my computer. I am having trouble with getting the registered trademark symbol (R with the circle) to show up in data I import via SQL from a database. However, when I download this data via Excel (where the registered trademark symbol is read correctly), there is no problem in reading the trademark symbol into R/R Studio from the Excel file. Based on my reading, this unicode character <U+00AE> is in the Latin-1 Supplement, so I don't know if it's an issue with my R Studio settings.

Here is my session info.

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] DBI_1.1.0        odbc_1.2.2       feasts_0.1.6     fable_0.2.1      fabletools_0.2.0 tsibble_0.9.1   
 [7] lubridate_1.7.4  writexl_1.2      forcats_0.4.0    stringr_1.4.0    dplyr_1.0.0      purrr_0.3.3     
[13] readr_1.3.1      tidyr_1.1.0      tibble_3.0.3     ggplot2_3.3.0    tidyverse_1.3.0  readxl_1.3.1    

loaded via a namespace (and not attached):
 [1] tidyselect_1.1.0     haven_2.2.0          lattice_0.20-38      colorspace_1.4-1     vctrs_0.3.2         
 [6] generics_0.0.2       blob_1.2.1           utf8_1.1.4           rlang_0.4.7          pillar_1.4.6        
[11] glue_1.4.0           withr_2.1.2          bit64_0.9-7          dbplyr_1.4.2         modelr_0.1.5        
[16] distributional_0.1.0 lifecycle_0.2.0      munsell_0.5.0        anytime_0.3.7        gtable_0.3.0        
[21] progressr_0.6.0      cellranger_1.1.0     rvest_0.3.5          RDCOMClient_0.93-0   fansi_0.4.0         
[26] broom_0.5.4          Rcpp_1.0.3           scales_1.1.0         backports_1.1.5      jsonlite_1.6.1      
[31] bit_1.1-15.2         farver_2.0.1         fs_1.5.0             hms_0.5.2            digest_0.6.25       
[36] stringi_1.5.3        grid_3.6.1           cli_2.0.2            tools_3.6.1          magrittr_1.5        
[41] crayon_1.3.4         pkgconfig_2.0.3      ellipsis_0.3.0       xml2_1.3.2           reprex_0.3.0        
[46] assertthat_0.2.1     httr_1.4.1           rstudioapi_0.11      R6_2.4.1             nlme_3.1-140        
[51] compiler_3.6.1  

Here's a preview of the select file columns when I call the head() function in the console, anonymized to protect intellectual property.

     MFMCU MFLITM IMDSC1             MFAN8 IMUOM1   UMCONVF 
1 PBK0100 123456  product<U+00AE>   559286     DR        50         
Phil
  • 7,287
  • 3
  • 36
  • 66
LauraDR
  • 86
  • 9
  • Please [edit] your question and share a [mcve]. – JosefZ Jan 26 '21 at 18:52
  • if i shared the specific data that I am having problems with, i might be exposing intellectual property that does not belong to me. That is, the string contains the name of the product for which the company has a registered trademark and the data frame contains associated sales quantities. i can't reproduce this example because it only occurs when i pull in data from a company database via SQL, and not when I read the same data in via Excel. – LauraDR Jan 27 '21 at 01:46

1 Answers1

0

After doing some more reading, this is the solution I found.

I realized I had a mixture of encodings in the relevant column

table(stri_enc_mark(forecast.file$IMDSC1))

ASCII UTF-8 
  658   184

The UTF-8 items were the rows where the registered trademark unicode was not recognized by R.

There are two ways to work around this.

Encoding(forecast.file$IMSDSC1) <- "latin1" 
forecast.file$IMDSC1 <- iconv(forecast.file$IMDSC1, from = "latin1", to = "UTF-8")

Alternatively and per Force character vector encoding from "unknown" to "UTF-8" in R

you can write to a csv and read it back in, specifying latin-1 as encoding type.

library(data.table)
fwrite(forecast.file, "temp.csv")
forecast.file <- fread("temp.csv", encoding = "Latin-1")
LauraDR
  • 86
  • 9