0

I'm having some trouble setting readable tick marks on my axes. The problem is that my data are at different magnitudes, so I'm not really sure how to go about it.

My data include ~400 different products, with 3/4 variables each, from two machines. I've pre-processed it into a data.table and used gather to convert it to long form- that part is fine.

Overview: Data is discrete, each X_________ on the x-axis represents a separate reading, and its relative values from machine 1/2 - the idea is to compare the two. The graphical format is perfect for my needs, I would just like to set the ticks at say, every 10 products on the x-axes, and at reasonable values on the y-axis.

  • Y_1: from 150 to 250
  • Y_2: from say, 1.5* to 2.5
  • Y_3: from say, 0.8* to 2.3
  • Y_4: from say, 0.4* to 1.5

*Bottom value, rounded down

Here's the code I'm using so far

var.Parameter <- c("Var1", "Var2", "Var3", "Var4")

MProduct$Parameter <- factor(MProduct$Parameter,
                          labels = var.Parameter)
labels_x <- MProduct$Lot[seq(0, 1626, by= 20)]
labels_y <- MProduct$Value[seq(0, 1626, by= 15)]


plot.MProduct <- ggplot(MProduct, aes(x = Lot,
                                y = Value,
                                colour = V4)) +
  facet_grid(Parameter ~.,
            scales = "free_y") + 
  scale_x_discrete(breaks=labels_x) +
  scale_y_discrete(breaks=labels_y) +
  geom_point() +
  labs(title = "Product: Select  Trends | 2018",
       x = "Time (s)",
       y = "Value") +
  theme(axis.text.x = element_text (angle = 90,
                                    hjust = 1,
                                    vjust = 0.5)) 
 # ggsave("MProduct.png")
plot.MProduct

Mproduct.png

Anyone knows how to possibly render this graph more readable? Setting labels/breaks manually greatly limits flexibility and readability - there should be an option to set it to every X ticks, right? Same with y.

I need to apply this as a function to multiple datasets, so I'm not very happy about having to specify the column length of the "gathered" dataset every time either, which, in this case is 1626.

Since I'm here, I would also like to take the opportunity to ask about this code:

var.Parameter <- c("Var1", "Var2", "Var3", "Var4")

More often than not, I need to label my data in a specific order, which is not necessarily alphabetical. R, however, defaults to some kind of odd behaviour whereupon I have to plot and verify that the labels are indeed where they should be. Any clue how I could force them to be presented in order? As it is, my solution is to keep shifting their position in that line of code until it produces the graph correctly.

Many thanks.

Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
zirconium
  • 3
  • 4
  • 1
    Regarding your final comment about ordering data, please see the FAQ on [ordering bars in ggplot](https://stackoverflow.com/a/5210833/903061). It doesn't matter if you are using bars or dots or whatever, ggplot will use the order of the `levels` of your `factor`. You can set the order of those levels however you want. `reorder()` is a useful function for setting them in an order based on a function of a numeric column. – Gregor Thomas Oct 01 '18 at 16:06
  • It's very difficult to modify code and answer questions without any test data. Could you provide a little bit of example data? We don't need much, just 3-5 points in each of 2-3 facets would be plenty. `dput()` is a great function for sharing a copy/pasteable representation of data. – Gregor Thomas Oct 01 '18 at 16:09
  • Took me a while to figure out how to share data anonymously. Here you go: https://docs.google.com/spreadsheets/d/1QZhcsJe6HOgtsz4cWVwRBy0Iq7vNyJ1McR0sy5gs8uo/edit?usp=sharing - in two forms - long and wide. I started off with wide, and gathered it into long, which is what ggplot is feeding off of – zirconium Oct 01 '18 at 17:49
  • Thanks for putting that together, but I was completely serious when I said "*just 3-5 points in each of 2-3 facets would be plenty*". Next time save yourself a bunch time and just create 6-10 rows of random data with the the same structure. – Gregor Thomas Oct 01 '18 at 18:49

1 Answers1

1

Okay. I'm going to ignore the y axis labels because the defaults seem to work just fine as long as you don't try to overwrite them with your custom labels_y thing. Just let the defaults do their work. For the X axis, we'll give a couple options:

(A) label every N products on X-axis. Looking at ?scale_x_discrete, we can set the labels to a function that takes all the level of the factor and returns the labels we want. So we'll write a functional that returns a function that returns every Nth label:

every_n_labeler = function(n = 3) {
  function (x) {
    ind = ((1:length(x)) - 1) %% n == 0
    x[!ind] = ""
    return(x)
  }
}

Now let's use that as the labeler:

ggplot(df, aes(x = Lot,
               y = Value,
               colour = Machine)) +
  facet_grid(Parameter ~ .,
             scales = "free_y") +
  geom_point() +
  scale_x_discrete(labels = every_n_labeler(3)) +
  labs(title = "Product: Select  Trends | 2018",
       x = "Time (s)",
       y = "Value") +
  theme(axis.text.x = element_text (
    angle = 90,
    hjust = 1,
    vjust = 0.5
  )) 

enter image description here

You can change the every_n_labeler(3) to (10) to make it every 10th label.

(B) Maybe more appropriate, it seems like your x-axis is actually numeric, it just happens to have "X" in front of it, let's convert it to numeric and let the defaults do the labeling work:

df$time = as.numeric(gsub(pattern = "X", replacement = "", x = df$Lot))

ggplot(df, aes(x = time,
               y = Value,
               colour = Machine)) +
  facet_grid(Parameter ~ .,
             scales = "free_y") +
  geom_point() +
  labs(title = "Product: Select  Trends | 2018",
       x = "Time (s)",
       y = "Value") +
  theme(axis.text.x = element_text (
    angle = 90,
    hjust = 1,
    vjust = 0.5
  )) 

enter image description here

With your full x range, I imagine that would look nice.

(C) But who wants to read those 9-digit numbers? You're labeling the x-axis a "Time (s)", which makes me think it's actual a time, measured in seconds from some start time. I'll make up that your start time is 2010-01-01 and covert these seconds to actual times, and then we get a nice date-time scale:

ggplot(df_s, aes(x = as.POSIXct(time, origin = "2010-01-01"),
               y = Value,
               colour = Machine)) +
  facet_grid(Parameter ~ .,
             scales = "free_y") +
  geom_point() +
  labs(title = "Product: Select  Trends | 2018",
       x = "Time (s)",
       y = "Value") +
  theme(axis.text.x = element_text (
    angle = 90,
    hjust = 1,
    vjust = 0.5
  )) 

enter image description here

If this is the real meaning behind your data, then using a date-time axis is a big step up for readability. (Again, notice that we are not specifying the breaks, the defaults work quite well.)


Using this data (I subset your sample data down to 2 facets and used dput to make it copy/pasteable):

df = structure(list(Lot = structure(c(1L, 2L, 3L, 4L, 1L, 2L, 3L, 
4L, 1L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L), .Label = c("X180106482", "X180126485", "X180306523", 
"X180526326"), class = "factor"), Value = c(201, 156, 253, 211, 
178, 202.5, 203.4, 204.3, 205.2, 2.02, 2.17, 1.23, 1.28, 1.54, 
1.28, 1.45, 1.61, 2.35, 1.34, 1.36, 1.67, 2.01, 2.06, 2.07, 2.19, 
1.44, 2.19), Parameter = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L), .Label = c("Var 1", "Var 2", "Var 3", "Var 4"
), class = "factor"), Machine = structure(c(2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L), .Label = c("Machine 1", "Machine 2"), class = "factor"), 
    time = c(180106482, 180126485, 180306523, 180526326, 180106482, 
    180126485, 180306523, 180526326, 180106482, 180106482, 180126485, 
    180306523, 180526326, 180106482, 180126485, 180306523, 180526326, 
    180106482, 180106482, 180126485, 180306523, 180526326, 180106482, 
    180126485, 180306523, 180526326, 180106482)), row.names = c(NA, 
-27L), class = "data.frame")
Gregor Thomas
  • 136,190
  • 20
  • 167
  • 294
  • Hey! Thanks, your fix for the x-axis definitely works! Alas, the x axis is not time, it's a product serial number. That's my bad for recycling old code. My y-axis does not behave, even when left as default. It's ok in this example as there's not a lot of data - in my real world example, there are >400 data points for each of them, as seen in the pasted image. Using my label thing or not doesn't make a difference - it comes out in that format. Unfortunately, I don't think I'd be able to employ the same tactic you used for the x-axis for it. Do you have any idea how to do it? – zirconium Oct 01 '18 at 19:28
  • Make sure the `class` of your y column is numeric - looks to me like it's a `factor` (or `character`) and is being treated as discrete when it should really by continuous. – Gregor Thomas Oct 01 '18 at 19:36
  • Hey thanks! Solved it :) The solution that worked involved: 1 - using the every_n_labeler(x) function for my x-axis 2 - converting my df$value (y) using as.numeric My data look much more ordered and regular now. Many thanks, I'll mark it as solved. – zirconium Oct 02 '18 at 07:11
  • Note to self: for (A) you can do better specifying breaks instead of labels. Update as in [this answer](https://stackoverflow.com/a/52920047/903061). – Gregor Thomas Oct 21 '18 at 21:29