Factor Analysis is a dimensionality reduction technique commonly used in the neuroscience community to assess the structure and dimension of covariability in neural activity. I led a team of Computer Science and Data Science graduate students in developing a scalable, distributed implementation of Factor Analysis using cluster-computing framework Apache Spark.
Our Spark implementation leverages distributed linear algebra framework Apache Mahout to accelerate the most computationally-expensive steps of the Factor Analysis Expectation-Maximization algorithm that must be performed on large matrices. We demonstrate the speed, accuracy, and scientific utility of Scalable Factor Analysis by analyzing 10000+ samples of two-photon calcium imaging data from 300+ neurons published by the Allen Brain Observatory.