library(tidyverse)
library(bayesSurv)
library(GIGrvg)
library(statmod)
library(MCMCpack)
library(reshape2)
library(ggplot2)
library(latex2exp)
library(psych)

## Warning: package 'psych' was built under R version 3.5.2

library(parallel)
source("https://raw.githubusercontent.com/poworoznek/sparse_bayesian_infinite_factor_models/master/demo_source.R")

Source Code

See https://github.com/poworoznek/sparse_bayesian_infinite_factor_models for code to run all chunks in this demo.

Generate the data

In our simulated study, we observe three families of chemicals, that show high correlation within each family. We assume that the chemicals assume a latent representation of 5 factors. Let \(X_i = (x_{i1},\cdots,x_{ip})\) denote a vector of exposure measurements, we have \[X_i = \Lambda \eta_i+ \epsilon_i, \quad \epsilon_i \sim N_p(0,\Psi),\] where \(\Lambda\) is the factor loadings matrix and \(\Psi\) a diagonal matrix. In the next plot we show the simulated \(p \times k\) matrix of factor loadings \(\Lambda\). In particular, there are \(3\) “active” latent factors, each influencing one family of chemicals, and \(2\) noise factors.

set.seed(1)
k0 = 5
p = 20

lambda = matrix(rnorm(p*k0, 0, 0.01), ncol = k0)
lambda[sample.int(p, 40, replace = TRUE) +
       p*(sample.int(k0, 40, replace = TRUE)-1)] = rnorm(40, 0, 1)
lambda[1:7, 1] = rnorm(7, 2, 0.5)
lambda[8:14, 2] = rnorm(7, -2, 0.5)
lambda[15:20, 3] = rnorm(6, 2, 0.5)
lambda[,4] = rnorm(p, 0, 0.5)
lambda[,5] = rnorm(p, 0, 0.5)

Now we generate the outcome according to a linear regression with pairwise interactions. We assume that the first family has a positive effect on the health response and the second family has a negative effect. We add some interactions between the chemicals within and between families.

n = 500;
X = matrix(rnorm(n*k0),n,k0)%*%t(lambda) + bayesSurv::rMVNorm(n,Sigma = 3*diag(p))

beta_true = numeric(p); beta_true[c(1,3,6,8,10,11)] =c(1,1,0.5,-1,-2,-0.5)
Omega_true = matrix(0,p,p)
Omega_true[1,2] = 1; Omega_true[5,2] = -1; Omega_true[10,8] = 1; 
Omega_true[11,5] = -2; Omega_true[1,1] = 0.5; 
Omega_true[2,3] = 0.5; 
Omega_true = Omega_true + t(Omega_true)
y = X%*%beta_true + diag(X%*%Omega_true%*%t(X)) +  rnorm(n,0.5)

Bayesian Factor Model for INteractions (FIN)

Let \(y_i\) denote a continuous health response for individual i. We propose a latent factor joint model, which includes shared factors in both the predictor and response components while assuming conditional independence. We include interactions among latent variables in the response component. We also assume that, given the latent variables, the explanatory variables and the response are continuous and normally distributed. We assume that the data have been standardized prior to the analysis so that we omit the intercept. \[y_i = \eta_i^T \omega + \eta_i^T \Omega \eta_i +\epsilon_{y,i}, \quad \epsilon_{y,i} \sim N(0,\sigma^2) \nonumber, \\ X_i = \Lambda \eta_i+ \epsilon_i, \quad \epsilon_i \sim N_p(0,\Psi), \quad \eta_i \sim N_k(0,I),\] where \(\Psi = diag(\sigma_1^2,\cdots, \sigma_p^2)\) We can \(3^{rd}\)show that induced regression of \(X_i\) on \(y_i\) is indeed a quadratic regression, i.e. : \[E(y_i | X_i) = tr (\Omega V)+(\omega^T A) X_i + X_i^T (A^T \Omega A) X_i\] where \(V = (\Lambda^T \Psi^{-1} \Lambda + I)^{-1}\) and \(A = V \Lambda^T \Psi^{-1} = (\Lambda^T \Psi^{-1} \Lambda + I)^{-1} \Lambda^T \Psi^{-1}\). Several priors have been proposed for the \(p \times k\) factor loadings matrix \(\Lambda\), including priors that allow the number of factors to be equal to \(+ \infty\) (A. Bhattacharya and Dunson 2011,Legramanti, Durante, and Dunson (2019)). We choose the Dirichlet-Laplace (DL) prior of (A. K. Bhattacharya et al. 2015) row-wise for \(\Lambda\), corresponding to \[\lambda_{j,h} | \phi_j, \tau_j \sim DE(\phi_{j,h} \tau_j) \quad h = 1, \cdots, k \\ \phi_j \sim Dir(a,\cdots,a) \quad \tau_j \sim Gamma(k a , 1/2), \] Through the choice of the DL prior, we are inducing near sparsity row-wise in the matrix \(\Lambda\). In fact, it is reasonable to assume that each variable loads on a few factors. This does not imply that \(\Omega\) will be sparse too.

Induced main Effects and Interactions

FIN can be generalized to allow for higher order interactions, allowing increasing shrinkage as the order of interaction increases. In the next plots, we show the induced marginal priors for main effects, pairwise interactions and \(3^{rd}\) order interactions when \(p = 20\) and \(k = 5,10\). The green lines corresponds to \(0.25\) and \(0.75\) quartiles and the red lines to the \(0.05\) and \(0.95\). For a fixed \(k\), there is increasing shrinkage towards zero with higher orders of interaction. However, we avoid assuming exact sparsity corresponding to zero values of the coefficients, a standard assumption of other methods. Although most of the mass is concentrated around zero, the distributions have heavy tails. We can indeed notice that the form of the priors resembles a mixture model of two normal distributions with different variances, and that we place a higher mixture weight on the normal distribution concentrated around zero as we increase the order of interactions. Also, notice that the priors are flatter as we increase the number of latent factors \(k\).

generate_DL_reg(S = 5000, k = 5, p = 20,
                a_DL = 0.5, var_omega = 5,
                as = 1, bs = 0.3,
                trace = FALSE, every = 500)

Regression Example

nrun = 5000; burn = 4000; thin = 1; k = 6
gibbs = gibbs_DL(y, X ,nrun, burn, thin = 1, 
               delta_rw = 0.04, epsilon_rw = 0.5,
               a = 1, k = k)

Postprocessing

Our resulting samples of the factor loadings matrix are not directly summarizable. The original model statement allows post-multiplication of an arbitrary orthogonal matrix and the factor loadings matrix \(\Lambda\) without affecting the identifiable elements \(V\) and \(A\). Because of this, we can think of our posterior as a massively multimodal hypersurface and our sampler can travel between modes as it pleases, and without constraints we see the same sorts of issues as in standard Gaussian Mixture Modeling. Enforcing constraints like a triangular \(\Lambda\) can massively limit the class of matrices \(V\) and \(A\) estmiable by the method.

To avoid enforcing constraints on the model we instead rely on post processing to return the samples of \(\Lambda\) to a summarizable form. The forms by which samples of \(\Lambda\) can change throughout the Markov Chain fall into three categories: Rotational Ambiguity, Label Switching, and Sign Ambiguity.

For Rotational Ambiguity we rely on previous methods such as Varimax and the Orthogonal Procrustes algorithm to orthogonalize samples. Through these methods, we limit the amount by which samples can vary in their rotations, as they are rotated to be as orthognal as possible. Sampling noise will create and remove small amounts of linear dependence even without rotational ambiguity, so while rotating to enforce maximal orthogonality among the samples helps reduce ergodic bias, it is not guarenteed to rotationally align all samples.

For columnwise Label Switching we introduce a method that clusters the norms of the differences between a pivot matrix (a representative of \(\Lambda\)) and the samples of \(\Lambda\). This clustering massively reduces the computational load in this ordinarily very large solution space (\(k!m\) for \(m\) samples). These normed differences represent a loss that is minimized over the space of column permutations.

For Sign Ambiguity we augment the Label Switching method to now minimize the previously defined loss over the set of all column permutations crossed with the set of all column sign switches. The resultant solution set has cardinality \(k!2^km\), and by utilizing the clustering we largely reduce the \(m\) term.

lambda_sample = gibbs$Lambda
lambda_sample = lapply(1:1000, function(ind) lambda_sample[ind,,])
sample_mean = reduce(lambda_sample, `+`)/length(lambda_sample)
rotated = mcrotfact(lambda_sample, method = "varimax", file = FALSE)
aligned = clustalignplus(rotated$samples, itermax = 500)

## [1] "cluster successfully permuted"
## [1] 1 2 3 6 5 4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3 -1  5  4
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1]  6  2  3  4  5 -1
## [1] "cluster successfully permuted"
## [1] 1 2 3 4 5 6
## [1] "cluster successfully permuted"
## [1]  1  2  3  4  5 -6
## [1] "cluster successfully permuted"
## [1]  1  2  3  4  5 -6

Bhattacharya, A., and D. B. Dunson. 2011. “Sparse Bayesian Infinite Factor Models.” Biometrika, 291–306. doi:10.1093/biomet/asr013.

Bhattacharya, Anirban Krishna, Debdeep Pati, Natesh S. Pillai, and David B. Dunson. 2015. “Dirichlet-Laplace Priors for Optimal Shrinkage.” Journal of the American Statistical Association 110 512: 1479–90.

Legramanti, Sirio, Daniele Durante, and David B Dunson. 2019. “Bayesian Cumulative Shrinkage for Infinite Factorizations.” arXiv Preprint arXiv:1902.04349.

Factor for INteractions (FIN) and Postprocessing