Manifold Factor Analysis: Nonlinear Dimension Reduction with Statistical Guarantees

Dr Abhishek Kumar

doi:10.32628/CSEIT26121315

Authors

Dr Abhishek Kumar Department of Computer Science, S A Jain College, Ambala City, India Author

DOI:

https://doi.org/10.32628/CSEIT26121315

Keywords:

Non Linear Factor Analysis, Structural Equation Modeling, Manifold Learning, Asymptotic Statistics, Simulation Study

Abstract

Classical factor analysis assumes observed variables are linear combinations of latent factors plus isotropic noise, implicitly restricting the latent structure to an affine subspace. This paper relaxes the linearity assumption by modeling the latent variable mapping as a smooth embedding from R^m to R^p, such that observations concentrate near an unknown m-dimensional submanifold of the observation space. We establish three main results. First, we prove local identifiability of the model parameters under a full-rank Jacobian condition and unit-variance latent normalization, resolving rotational indeterminacy in the nonlinear setting. Second, we derive a maximum marginal likelihood estimator and prove its √n-consistency and asymptotic normality under regularity conditions that accommodate heteroskedastic measurement errors. Third, we develop a two-stage procedure that combines kernel spectral analysis for intrinsic dimension estimation with likelihood- based manifold reconstruction, and prove that the estimated dimension is consistent under a spectral gap condition. Simulation studies demonstrate that the proposed estimator achieves 40–60% lower RMSE than linear factor analysis under quadratic and multiplicative nonlinearities, with coverage probabilities approaching the nominal 95% level at n = 1000. Empirical analyses of marketing, banking, and Parkinson's telemonitoring datasets show that the recovered latent manifolds have 1–2 fewer dimensions than linear PCA suggests, improve cross-validated prediction error by 15–30%, and yield clinically interpretable latent factors in the biomedical application.

Downloads

Download data is not yet available.

References

Anderson, T. W., & Rubin, H. (1956). Statistical inference in factor analysis. Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 5, 111-150.

Bartholomew, D. J., Knott, M., & Moustaki, I. (2011). Latent Variable Models and Factor Analysis: A Unified Approach (3rd ed.). Wiley. DOI: https://doi.org/10.1002/9781119970583

Bi, Z., & Lafaye de Micheaux, P. (2025). Beyond PCA: Manifold dimension estimation via local graph structure. arXiv preprint arXiv:2510.15141. DOI: https://doi.org/10.2139/ssrn.6168352

Byrd, R. H., Lu, P., Nocedal, J., & Zhu, C. (1995). A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5), 1190-1208. DOI: https://doi.org/10.1137/0916069

Garg, S., & Dereziński, M. (2025). Faster low-rank approximation and kernel ridge regression via the block-Nyström method. Proceedings of the 38th Conference on Learning Theory (COLT), PMLR 291:2291-2325.

Gilbert, A. C., & O'Neill, T. (2025). Curvature-adjusted PCA for manifold dimension estimation. Journal of Machine Learning Research, 26, 1-37.

Gong, L., & Saxena, S. (2025). Learning mixtures of linear dynamical systems via hybrid tensor-EM method. arXiv preprint arXiv:2510.08553.

Grønneberg, S., & Foldnes, N. (2024). Non-parametric regression among factor scores: Motivation and diagnostics for nonlinear structural equation models. Psychometrika, 89(3), 822-850. [PMC11458680] DOI: https://doi.org/10.1007/s11336-024-09959-4

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning (2nd ed.). Springer. DOI: https://doi.org/10.1007/978-0-387-84858-7

Hsieh, H.-L., & Shanechi, M. (2025). Probabilistic geometric principal component analysis with application to neural data. International Conference on Learning Representations (ICLR).

Jöreskog, K. G. (1969). A general approach to confirmatory maximum likelihood factor analysis. Psychometrika, 34(2), 183-202. DOI: https://doi.org/10.1007/BF02289343

Kano, Y., & Harada, K. (2025). A note on identification in nonlinear factor analysis with structured loadings. Psychometrika, 90(1), 112-134.

Kelava, A., Kohler, M., Krzyżak, A., & Schaffland, T. F. (2017). A new approach for estimating nonlinear structural equation models. Psychometrika, 82(1), 1-27.

Kim, J., & Lee, S. (2025). Hamiltonian Monte Carlo for constrained latent variable models. Journal of Computational and Graphical Statistics, 34(1), 78-95.

Klein, A. G., & Moosbrugger, H. (2000). Maximum likelihood estimation of latent interaction effects with the LMS method. Psychometrika, 65(4), 457-474. DOI: https://doi.org/10.1007/BF02296338

Lawrence, N. D. (2004). Gaussian process latent variable models for visualisation of high dimensional data. Advances in Neural Information Processing Systems, 16, 329-336.

Lee, J. M. (2013). Introduction to Smooth Manifolds (2nd ed.). Springer. DOI: https://doi.org/10.1007/978-1-4419-9982-5_1

Lepilov, M. (2024). Fast spectrum estimation of some kernel matrices. arXiv preprint arXiv:2411.00657.

McDonald, R. P. (1962). A general approach to nonlinear factor analysis. Psychometrika, 27(4), 397-415. DOI: https://doi.org/10.1007/BF02289646

Mukherjee, S., Aguilar, J. E., Zago, M., Claassen, M., & Bürkner, P.-C. (2025). Latent variable estimation with composite Hilbert space Gaussian processes. arXiv preprint arXiv:2505.12978.

Newey, W. K., & McFadden, D. (1994). Large sample estimation and hypothesis testing. In R. F. Engle & D. McFadden (Eds.), Handbook of Econometrics (Vol. 4, pp. 2111-2245). Elsevier. DOI: https://doi.org/10.1016/S1573-4412(05)80005-4

Ranganath, R., Tang, L., Charlin, L., & Blei, D. M. (2015). Deep exponential families. Proceedings of the 18th International Conference on Artificial Intelligence and Statistics (AISTATS).

Rothenberg, T. J. (1971). Identification in parametric models. Econometrica, 39(3), 577-591. DOI: https://doi.org/10.2307/1913267

Roweis, S. T., & Saul, L. K. (2000). Nonlinear dimensionality reduction by locally linear embedding. Science, 290(5500), 2323-2326. DOI: https://doi.org/10.1126/science.290.5500.2323

Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels. MIT Press. DOI: https://doi.org/10.7551/mitpress/4175.001.0001

Shukla, P. A. (2025). The spectral dimension of NTKs is constant: A theory of implicit regularization, finite-width stability, and scalable estimation. arXiv preprint arXiv:2512.00860.

Tenenbaum, J. B., de Silva, V., & Langford, J. C. (2000). A global geometric framework for nonlinear dimensionality reduction. Science, 290(5500), 2319-2323. DOI: https://doi.org/10.1126/science.290.5500.2319

Trillos, N. G., Gerlach, M., Hein, M., & Slepčev, D. (2024). Error estimates for spectral convergence of the graph Laplacian on manifolds with boundary. SIAM Journal on Mathematics of Data Science, 6(2), 450-478.

Trillos, N. G., Hoffmann, F., & Hosseini, B. (2019). Error estimates for spectral convergence of the graph Laplacian. SIAM Journal on Mathematics of Data Science, 1(4), 730-762.

van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge University Press. DOI: https://doi.org/10.1017/CBO9780511802256

von Luxburg, U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395-416. DOI: https://doi.org/10.1007/s11222-007-9033-z

Wang, Q., & Paynabar, K. (2025). Maximum covariance unfolding: A novel covariate-based manifold learning approach for point cloud regression. INFORMS Journal on Data Science. DOI: https://doi.org/10.1287/ijds.2024.0043

Zhang, L., & Tuerde, M. (2025). Bayesian analysis of nonlinear quantile structural equation model with possible non-ignorable missingness. Mathematics, 13(19), 3094. DOI: https://doi.org/10.3390/math13193094

Zhang, Y., Wang, X., & Shi, J. Q. (2025). Bayesian analysis of nonlinear structured latent factor models with a Gaussian process prior. Journal of Multivariate Analysis. [ScienceDirect] DOI: https://doi.org/10.1016/j.jmva.2025.105577

Pal, P. K., Kataria, B., & Jangid, J. (2025). AI-Driven Multimodal Ensemble Framework for Accurate Hardware Failure Detection in Optical Embedded Systems: Eliminating Unnecessary RMAs. Preprints. https://doi.org/10.20944/preprints202512.1937.v1 DOI: https://doi.org/10.20944/preprints202512.1937.v1

FNU Pawan Kumar. Developing SOA architecture web services for high throughput systems. International Journal of Science and Research Archive, 2025, 15(02), 1897–1906. Article DOI: https://doi.org/10.30574/ijsra.2025.15.2.1511. DOI: https://doi.org/10.30574/ijsra.2025.15.2.1511

Mishra, Chandan. (2025). PeopleSoft and cloud integration: Opportunities and challenges in the future of financial management systems. International Journal of Science and Research Archive. 16. 008-016. 10.30574/ijsra.2025.16.2.2271. DOI: https://doi.org/10.30574/ijsra.2025.16.2.2271

Zheng, Y., Liu, Y., Yao, J., Hu, Y., & Zhang, K. (2025). Nonparametric factor analysis and beyond. Proceedings of the 28th International Conference on Artificial Intelligence and Statistics (AISTATS), PMLR 258:424-432.

Manifold Factor Analysis: Nonlinear Dimension Reduction with Statistical Guarantees

Authors

DOI:

Keywords:

Abstract

Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

IssueDate

RightSideBlock

Latest publications