class: center, middle, inverse, title-slide # Taking visualizations to the next level with the scales package ##
### Dana Seidel (
@dpseidel
)
https://scales.r-lib.org/
--- class: middle, center, inverse ![](engine.jpg) --- class: middle, center ## What do we mean by "scales" really?? <br/> What is scaling? 📈 (1) converting data values to perceptual properties ➕ (2) the inverse process: making guides (legends and axes) to read the graph <br/> <br/> Scaling and guides are often some of the most difficult parts of building any visualization. --- class: center, middle ![](scaleshex.png) The scales package provides the internal scaling infrastructure to [ggplot2](https://github.com/tidyverse/ggplot2) and exports standalone, **system-agnostic**, functions. The functions provided in the scales package try to make easier 5 different things about data scales: 1. transformations 2. bounds and rescaling 3. breaks 4. labels 5. palettes Learning a little about these functions can help transform your plots and make you a true visualization magician 🧙♂️ --- class: center, middle ## ⚠️ Notable changes in [scales 1.1.0](https://www.tidyverse.org/blog/2019/11/scales-1-1-0/)! + New naming convention for guides functions: `breaks_*`, `labels_*` E.g., `number` and `number_format` have been **"superseded"** by `label_number` </br> This enables really easy tab completion!👍 + New `demo_*` functions for easy ggplot2 examples --- class: center, middle, inverse # Transformations & Bounds --- class: center, middle # Transformations scales provides a number of common transformation functions (`*_trans`) which specify functions to preform data transformations, format labels, and set correct breaks. <br/> For example: <br/> `log_trans`, `sqrt_trans`, `reverse_trans` power the `scale_*_log10`, `scale_*_sqrt`, `scale_*_reverse` functions in ggplot2. <br/> `scale_*_datetime`, `scale_*_date` are powered by the `time_trans` and `date_trans` functions. --- class: left, middle ## Additional Transformations - `asn_trans()` : Arc-sin square root transformation. - `atanh_trans()` : Arc-tangent transformation. - `boxcox_trans()` `modulus_trans()` : Box-Cox & modulus transformations. - `exp_trans()` : Exponential transformation (inverse of log transformation). - `pseudolog_trans()` : Pseudo-log transformation - `probabilty_trans()`: Probability transformation and more... --- class: middle, left ## Building your own transformations Users can also define and apply their own custom transformation functions for repeated use. ```r # use trans_new to build a new transformation dollar_log <- trans_new( name = "dollar_log", # extract a single element from another trans trans = log10_trans()$trans, # or write your own custom functions inverse = function(x) 10^(x), breaks = breaks_log(), format = label_dollar() ) ``` --- class: center, middle ```r ggplot(diamonds, aes(y = price, x = carat)) + geom_hex() + scale_y_continuous(trans = dollar_log) + scale_x_log10() ``` ![](rstudioconf2020_files/figure-html/unnamed-chunk-1-1.png)<!-- --> --- class: center, middle ## Rescaling data scales provides a handful of functions for rescaling data to fit new ranges. <br/> The rescale functions can help rescale continuous vectors to new min, mid, or max values - `rescale` rescales to a new min and max - `rescale_mid` rescales to a new mid, max, and min - `rescale_max` rescales to a new maximum ```r # rescale to new min and max rescale(runif(5), to = c(0, 50)) ``` ``` ## [1] 40.68297 23.86821 50.00000 0.00000 42.49868 ``` --- class: center, middle ## Squish, Discard, Censor - `squish` will squish your values into a specified range, respecting NAs - `discard` will drop data outside a range, respecting NAs - `censor` will return NAs for values outside a range ```r squish(c(-1, 0.5, 1, 2, NA), range = c(0, 1)) ``` ``` ## [1] 0.0 0.5 1.0 1.0 NA ``` ```r discard(c(-1, 0.5, 1, 2, NA), range = c(0, 1)) ``` ``` ## [1] 0.5 1.0 NA ``` ```r censor(c(-1, 0.5, 1, 2, NA), range = c(0, 1)) ``` ``` ## [1] NA 0.5 1.0 NA NA ``` --- class: center, middle #### Squish can be really useful for setting the `oob` argument for a color scale with reduced limits. ```r ggplot(iris, aes(x = Sepal.Length, y = Sepal.Width, colour = Sepal.Length)) + geom_point() + scale_color_continuous(limit = c(6, 8), oob = scales::squish) ``` ![](rstudioconf2020_files/figure-html/unnamed-chunk-2-1.png)<!-- --> --- class: center, middle, inverse # Sensible breaks for continuous scales --- class: left, middle ## Breaks - `breaks_extended()` sets most breaks by default in ggplot2 using Wilkonson's algorithm - `breaks_pretty()` uses R's default breaks algorithm - `breaks_log()` is used to set breaks for log transformed axes with `log_trans()`. - `breaks_width()` is used to set breaks by width, especially useful for date and date/time axes. The output of each `breaks_*` function is a function. All these are designed to take a vector of limits and return a vector of break points. --- class: left, middle ```r breaks_extended(n = 8)(c(74, 997)) ``` ``` ## [1] 0 100 200 300 400 500 600 700 800 900 1000 ``` ```r breaks_log(8)(c(74, 997)) ``` ``` ## [1] 70 100 200 300 400 500 700 1000 ``` ```r breaks_width(width = 8)(c(7, 111)) ``` ``` ## [1] 0 8 16 24 32 40 48 56 64 72 80 88 96 104 112 ``` --- class: center, middle, inverse # Effective Labels --- class: left, middle # Label Formatters: - `label_number`: a generic number formatter that forces intuitive decimal display of numbers - `label_dollar`, `label_percent`, `label_comma` - `label_scientific` - `label_date`, `label_time` : Formatted dates and times. - `label_ordinal`: add ordinal suffixes (-st, -nd, -rd, -th) to numbers according to languages (e.g. English, Spanish, French). - `label_bytes`, `label_number_si` - `label_parse`, `label_math`, `label_pvalue` - `label_wrap` For a full list and descriptions, see: [scales.r-lib.org/reference](scales.r-lib.org/reference) --- class: center, middle Like break functions, the output of all `label_*` functions is a function. All these output functions are designed to take a numeric vector (traditionally, of breaks) and return a nicely-formatted character vector. <br/> This makes it easy to use them directly as the input to a ggplot2 scale_* labels argument to automatically take the plot's calculated breaks and label them with consistent style. ```r demo_continuous(txhousing$median, labels = label_dollar()) ``` ``` ## scale_x_continuous(labels = label_dollar()) ``` ![](rstudioconf2020_files/figure-html/unnamed-chunk-4-1.png)<!-- --> --- class: middle, center ## Changing defaults Many of the `label_*` functions are built off of the generic `label_number` function, and allow users to customize label formats with consistent args. Users can specify a different rounding behavior (`accuracy`), or change the `big_mark` or `decimal_mark` for international styling; even add a `prefix` or a `suffix` or `scale` your numbers on the fly. ```r label_dollar(prefix = "", suffix = "\u20ac", scale = .91)(100) ``` ``` ## [1] "91€" ``` --- class: left, middle ```r filter(txhousing, city == "Houston") %>% ggplot() + geom_line(aes(y = sales, x = date_decimal(date))) + scale_x_datetime("Month-Year", breaks = breaks_width("2 year"), labels = label_date("%b %Y") # delightfully, this functionality is already implemented x # in scale_*_date args: date_breaks, date_labels ) ``` ![](rstudioconf2020_files/figure-html/unnamed-chunk-6-1.png)<!-- --> --- class: center, middle, inverse # Palettes --- class: center, middle # Color palettes 🎨 scales provides a number of color palette functions that passed the number of colors you want or a range of gradient values, will return vector of colors by hex code. ```r viridis_pal()(4) ``` ``` ## [1] "#440154FF" "#31688EFF" "#35B779FF" "#FDE725FF" ``` ```r brewer_pal(type = "div", direction = -1)(4) ``` ``` ## [1] "#018571" "#80CDC1" "#DFC27D" "#A6611A" ``` ```r div_gradient_pal()(seq(0, 1, length.out = 4)) ``` ``` ## [1] "#2B6788" "#99A8B4" "#BBA19A" "#90503F" ``` --- class: center, middle # Hot tip 🔥 Ever wondered what colors are going to look like before you use them? Try `scales::show_col()` ```r show_col(viridis_pal()(4)) ``` ![](rstudioconf2020_files/figure-html/unnamed-chunk-7-1.png)<!-- --> --- class: center, middle ## Non-color palettes Often you want to be able to scale elements other than color. e.g. size, alpha, shape... Of course, scales handles those too! ```r your_data <- runif(5, 1, 20) area_pal(range = c(1, 20))(your_data) ``` ``` ## [1] 61.47381 43.13051 34.18974 72.24327 54.83530 ``` ```r shape_pal()(6) ``` ``` ## [1] 16 17 15 3 7 8 ``` --- class: center, middle ## See these in action in `ggplot2`... ```r # color examples... scale_fill_brewer() scale_color_grey() scale_color_viridis_c() # shape examples scale_shape() scale_shape_ordinal() # implement them yourself with... scale_color_manual() scale_shape_manual() scale_size_manual() # use the values as they are from your data frame with scale_color_identity() scale_shape_identity() scale_size_identity() ``` --- class: center, middle ## Or, apply them to BaseR! Remember, while scales functions are primarily used under the hood in ggplot2, they are generic enough to be combined with any plotting system. For example, use them in combination with `grDevices::palette()`, provided with base R, to affect your base plots... --- class: center, middle ```r palette(viridis_pal()(4)) plot(Sepal.Length ~ Sepal.Width, col = Species, pch = 20, data = iris) ``` ![](rstudioconf2020_files/figure-html/unnamed-chunk-10-1.png)<!-- --> --- class: center, middle ## Questions? **Read more about scales at [scales.r-lib.org](https://scales.r-lib.org/)** <br/> Slides: [danaseidel.com/rstudioconf2020](https://danaseidel.com/rstudioconf2020) Code: [github.com/dpseidel/rstudioconf2020](https://github.com/dpseidel/rstudioconf2020) @dpseidel on [Twitter](https://twitter.com/dpseidel) and [GitHub](https://github.com/dpseidel) <br/> 👏 Shout out to the wonderful `xaringan`, `xaringanthemer`, & `emo` packages which helped make these slides beautiful! And HUGE thank you to Hadley Wickham and other tidyverse community developers who contributed so much to the scales 1.1.0 release! 🙏