The Wayback Machine - https://web.archive.org/web/20201024070636/https://github.com/stan-dev/bayesplot/issues/223
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request/suggestion - point estimate with interval labels to aid in plotting #223

Open
Jimbilben opened this issue Jun 18, 2020 · 2 comments
Labels

Comments

@Jimbilben
Copy link

@Jimbilben Jimbilben commented Jun 18, 2020

I've found it very useful to have easy access to a 'label' providing a chosen point estimate and density interval, which can be added for example to graphs as a means of providing extra, concise information about a parameter or effect that is depicted. E.g., adding label with the median and 95% HDI from the posterior, or whatever other point estimate/interval you choose, in the form: 5.67 [4.33 - 6.01].

The following R code takes a posterior distribution in the form of a vector and returns a summary. You can ignore most of the probably very inelegant coding, but the function inside of this function called 'point.uncertainty' could be a useful place to begin if you think this feature may be useful.

This requires dplyr and hdinterval packages to run.

summarise.me <- function(posterior, name = "summary", interval = .95, p.e = "median", u.e = "hdi") {
  
  hdi.bounds <- hdi(posterior, credMass = interval)
  hdi.lower <- hdi.bounds["lower"]
  hdi.upper <- hdi.bounds["upper"]
  
  eti.upper <- quantile(posterior, probs = interval + ((1 - interval) / 2) )
  eti.lower <- quantile(posterior, probs = (1 - interval) / 2)
    
  mean <- mean(posterior)
  median <- median(posterior)
  mode <- (density(as.numeric(posterior))
           $x[which.max(density(as.numeric(posterior))$y)])
  
  hdi.width <- hdi.upper - hdi.lower
  eti.width <- eti.upper - eti.lower
  
  posterior.summary <- 
    tibble(name = name,
           mean = mean,
           median = median,
           mode = mode,
           hdi.lower = hdi.lower,
           hdi.upper = hdi.upper,
           hdi.width = hdi.width,
           eti.lower = eti.lower,
           eti.upper = eti.upper,
           eti.width = eti.width,
           interval = interval)
  
  point.uncertainty <- function(summary, point.estimate = "mean", uncertainty = "hdi") {
    if (point.estimate == "mean" & uncertainty == "hdi")
    {
      paste(sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$mean)), 
            " [", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$hdi.lower)), 
            "-", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$hdi.upper)), 
            "]" , 
            sep = "")
    } else if (point.estimate == "mode" & uncertainty == "hdi")
    {
      paste(sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$mode)), 
            " [", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$hdi.lower)), 
            "-", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$hdi.upper)), 
            "]" , 
            sep = "")
    } else if (point.estimate == "median" & uncertainty == "hdi")
    {
      paste(sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$median)), 
            " [", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$hdi.lower)), 
            "-", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$hdi.upper)), 
            "]" , 
            sep = "")
    } else if (point.estimate == "mean" & uncertainty == "eti")
    {
      paste(sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$mean)), 
            " [", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$eti.lower)), 
            "-", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$eti.upper)), 
            "]" , 
            sep = "")
    } else if (point.estimate == "mode" & uncertainty == "eti")
    {
      paste(sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$mode)), 
            " [", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$eti.lower)), 
            "-", 
            sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$eti.upper)), 
            "]" , 
            sep = "")
    }else if (point.estimate == "median" & uncertainty == "eti")
  {
    paste(sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$median)), 
          " [", 
          sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$eti.lower)), 
          "-", 
          sub("^(-?)0.", "\\1.", sprintf("%.2f", summary$eti.upper)), 
          "]" , 
          sep = "")
  }

  }
  posterior.summary <- posterior.summary %>%
    mutate(label = point.uncertainty(posterior.summary, point.estimate = p.e, uncertainty = u.e))
  
  return(posterior.summary)
}
@jgabry jgabry added the feature label Jun 22, 2020
@jgabry
Copy link
Member

@jgabry jgabry commented Jun 22, 2020

Thanks for the feature request!

We have a function in bayesplot called mcmc_intervals_data() which does something similar to the summary that your function creates

library(bayesplot)
x <- example_mcmc_draws()
mcmc_intervals_data(x) # has some optional arguments to control interval width
# A tibble: 4 x 9
  parameter outer_width inner_width point_est      ll       l       m      h     hh
* <fct>           <dbl>       <dbl> <chr>       <dbl>   <dbl>   <dbl>  <dbl>  <dbl>
1 alpha             0.9         0.5 median    -42.7   -28.7   -18.5   -7.46   9.18 
2 sigma             0.9         0.5 median     17.1    17.6    18.0   18.4   19.0  
3 beta[1]           0.9         0.5 median     -0.165   0.136   0.358  0.601  0.916
4 beta[2]           0.9         0.5 median     -0.722  -0.578  -0.469 -0.362 -0.209

but we don't currently provide any way of creating plot labels from this summary table. We do provide other ggplot2 helper functions (https://mc-stan.org/bayesplot/reference/index.html#section-ggplot-helpers) so there's certainly a precedent for bayesplot incorporating something like this. Alternatively, it could be added as a column in the mcmc_intervals_data() data frame instead of as a separate function.

@Jimbilben
Copy link
Author

@Jimbilben Jimbilben commented Jun 24, 2020

One thing that would be nice about having it in the same output as other summary information is that when you wish to make plots, you already have it in the same tibble that you are summarising/visualising. I would at least find that more useful than a separate function, but up to you how/if you would think it best implemented!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants
You can’t perform that action at this time.