Topics in Random Matrices and Statistical Machine Learning

2018·Arxiv

Abstract

Abstract

This thesis consists of two independent parts: random matrices, which form the first one-third of this thesis, and machine learning, which constitutes the remaining part.

The classical Wishart matrix has been defined only for the values and 4 (corresponding to real, complex and quaternion cases respectively), where indicates the number of real matrices needed to define a particular type of Wishart matrix. The moments and inverse moments of Wishart matrices have their theoretical and practical importance. In the works of Graczyk, Letac and Massam (2003, 2004), Matsumoto (2012), Collins et al. (2014), a certain additional condition is assumed in order to derive a formula for finite inverse moments of Wishart matrices. Here, we address the necessity of this additional condition. In general, we consider the question of having finite inverse moments for two bigger classes of Wishart-type matrices: the -Laguerre matrices defined for continuous values of compound Wishart matrices for the values of (real) and 2 (complex).

We show that the c-th inverse moment of a -Laguerre matrix is finite if and only if . Moreover, we deduce that the c-th inverse moment of a compound Wishart matrix is finite if and only if . The definition of compound Wishart matrix in quaternion case () is not so coherent yet, so the condition for finiteness of inverse moments in this case is a future work.

The second part of the thesis is devoted to the subject of the universal consistency of the k-nearest neighbor rule in general metric spaces. The k-nearest neighbor rule is a well-known learning rule and one of the most important. Given a labeled sample, the k-nearest neighbor rule first find ‘data points in the sample, which are closest to x based on a distance function and then predicts the label of x as being the most commonly occurring label among the picked ‘k’ labels. There is an error if the predicted label is not same as the true label. A learning rule is universally weakly consistent if the expected (average) learning error converges to the smallest possible error for the given problem (known as the Bayes error).

According to the 2006 result of Cérou and Guyader, the k-nearest neighbor rule is universally weakly consistent in every metric space equipped with probability measure satisfying the strong differentiation property. A 1983 result announced by Preiss states necessary and sufficient condition for a metric space to satisfy the strong differentiation property for all finite Borel measures. This is the condition of being metrically sigma-finite dimensional in the sense of Nagata. Thus, in every sigma-finite dimensional metric space in the sense of Nagata, the k-nearest neighbor rule is universally weakly consistent.

The main aim of this part of the thesis is to prove the above result by direct means of statistical learning theory, bypassing the machinery of real analysis. Our proof is modeled on the classical proof by Charles Stone for the Euclidean space. However, the main tool of his proof, the geometric Stone lemma, only makes sense in the presence of the finite dimensional linear structure. The lemma gives an upper bound on the number of points in a sample for which a given point can serve as one of the k-nearest neighbors. We search for an analogue of the geometric Stone lemma for metrically (sigma) finite dimensional spaces in Nagata’s sense, making a number of interesting discoveries on the way. While in the absence of distance ties there is a straightforward analogue of the lemma, it is provably false in the presence of ties, and besides, we show that the distance ties in general metrically finite-dimensional (even zero-dimensional) spaces are unavoidable. At the same time, it turns out that the upper bound in the Stone lemma, although unbounded, grows slowly in n (as the n-th harmonic number), which allows to deduce the universal consistency.

Further, we establish strong consistency in a metrically finite dimensional space, under the additional condition of zero probability of ties. In the Euclidean case, the result is known in the general case, but historically, it was also first proved in the absence of ties. We leave the question of validity of the result in a metrically sigma-finite dimensional space as an open question.

Finally, we work out in detail the necessity part of the proof of the Preiss theorem above. The original note by Preiss only briefly outline the ideas of the proof in a few lines, and to work out sufficiency, Assouad and Quentin de Gromard had written a 61-page long article. The details of the necessity part appear in our thesis for the first time.

Declaration

I hereby declare the thesis entitled “Topics in Random Matrices and Statistical Machine Learning” has been undertaken by me and reflect my original work. All sources of knowledge used have been duly acknowledged. I declare that this thesis has never been submitted and/or published for any award to any other institution before.

Acknowledgments

This thesis is a result of support and guidance of many people. I would like to extend my sincere thanks to all of them. Firstly, I would like to express my sincere gratitude to my supervisors Dr. Benoît Collins and Dr. Vladimir G. Pestov for their guidance and encouragement. I greatly acknowledge the support of Dr. Collins during my PhD. I am very grateful to Dr. Pestov for his guidance and critical comments on this research work over emails and calls irrespective of the 12-hour time difference between Brazil and Kyoto. I sincerely acknowledge the help by Dr. Hiroshi Kokubu and the financial support of Kyoto Top Global Unit (KTGU) for Brazil overseas trip. I greatly acknowledge the hospitality of Department of Mathematics, Kyoto University and financial support of JICA-IITH Friendship program for giving me an opportunity to pursue a doctoral course at Kyoto University. I would like to thank my thesis committee members for their support and suggestions.

My special thanks goes to my best friend, Mr. Akshay Goel, for the umpteen number of discussions we had over varied topics of mathematics. I would like to thank my super-friends Ms. Jasmine Kaur and Ms. Akanksha Yadav for keeping with me in my good and bad days. I thank my fellow colleagues Gunjan, Prashant, Reddy, Mathieu, Felix for being there. Last but not the least, a very special thanks to my lovely family: mumy, papa and bhai (mother, father and elder brother). Their constant assurance and faith in me has made me come so far.

Dedication

to my family...

Bibliography

[1] Assouad, P., Quentin de Gromard, T.: Recouvrements, derivation des mesures et dimensions. Revista Matemática Iberoamericana 22(3), 893– 953 (2006)

[2] Besicovitch, A.: A general form of the covering principle and relative dif- ferentiation of additive functions. Proceedings of the Cambridge Philosophical Society 41, 103–110 (1945)

[3] Biau, G., Devroye, L., Lugosi, G.: Consistency of random forests and other averaging classifiers. Journal of Machine Learning Research 9, 2015–2033 (2008)

[4] Billingsley, P.: Convergence of probability measures, second edn. Wiley Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons Inc. (1999). A Wiley-Interscience Publication

[5] Billingsley, P.: Probability and Measure, anniversary edition edn. Wiley Series in Probability and Statistics. John Wiley & Sons Inc. (2012)

[6] Cérou, F., Guyader, A.: Nearest Neighbor Classification in infinite di- mension. ESAIM: Probability and Statistics 10, 340–355 (2006)

[7] Collins, B.: Moments and cumulants of polynomial random variables on unitarygroups, the itzykson-zuber integral, and free probability. International Mathematics Research Notices 2003(17), 953–982 (2003)

[8] Collins, B., Matsumoto, S., Saad, N.: Integration of invariant matrices and moments of inverses of Ginibre and Wishart matrices. Journal of Multivariate Analysis pp. 1–13 (2014)

[9] Couillet, R., Debbah, M.: Random Matrix Methods for Wireless Com- munications. Cambridge University Press (2011)

[10] Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Trans- actions on Information Theory 13, 21–27 (1967)

[11] Davies, R.O.: Measures not approximable or not specifiable by means of balls. Mathematika 18(2), 157–160 (1971)

[12] Devroye, L.: On the almost everywhere convergence of nonparametric regression function estimates. The Annals of Statistics 9(6), 1310–1319 (1981)

[13] Devroye, L., Györfi, L.: Nonparametric Density Estimation: The View. John Wiley & Sons (1985)

[14] Devroye, L., Györfi, L., Krzyzak, A., Lugosi, G.: On the strong universal consistency of nearest neighbor regression function estimates. The Annals of Statistics 22(3), 1371–1385 (1994)

[15] Devroye, L., Györfi, L., Lugosi, G.: A Probabilistic Theory of Pattern Recognition. Stochastic Modelling and Applied Probability, Springer (1996)

[16] Duan, H.H.: Applying Supervised Learning Algorithms and a New Fea- ture Selection Method to Predict Coronary Artery Disease. Masters thesis, University of Ottawa (2014)

[17] Dumitriu, I., Edelman, A.: Matrix models for beta ensembles. Journal of Mathematical Physics 43, 5830–5847 (2002)

[18] Engelking, R.: General topology, revised and completed edn. Sigma series in pure mathematics. Berlin: Heldermann (1989)

[19] Fitzpatrick, P.: Advanced Calculus, second edition edn. Wiley Series in Probability and Statistics. American Mathematical Society (2006)

[20] Fix, E., Hodges, J.L.: Discriminatory analysis. nonparametric discrim- ination: Consistency properties. Technical Report 4, Project Number 21-59-004 (1951)

[21] Folland, G.B.: Real Analysis: Modern Techniques and Their Applica- tions. Pure and Applied Mathematics: A Wiley Series of Texts, Monogrpahs and Tracts. Wiley (1999)

[22] Forrester, P.J.: Log-gases and random matrices. Princeton, NJ: Prince- ton University Press (2010)

[23] Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhäuser Basel, Springer (2013)

[24] Goodman, N.R.: Statistical analysis based on a certain multivariate complex Gaussian distribution (An Introduction). The Annals of Mathematical Statistics 34(1), 152–177 (1963)

[25] Graczyk, P., Letac, G., Massam, H.: The complex Wishart distribu- tion and the symmetric group. The Annals of Statistics 31(1), 287–309 (2003)

[26] Hatko, S.: k-Nearest Neighbour Classification of Datasets with a Family of Distances. Masters thesis, University of Ottawa (2015)

[27] Horn, R.A., Johnson, C.R. (eds.): Matrix Analysis. Cambridge Univer- sity Press (1986)

[28] Kumari, S.: Finiteness of inverse moments of -laguerre matrices. Infinite Dimensional Analysis, Quantum Probability, and Related Topics (accepted, 2018)

[29] Letac, G., Massam, H.: All Invariant Moments of the Wishart Distribu- tion 31(2), 295–318 (2004)

[30] Liao, Z., Couillet, R.: Random matrices meet machine learning: A large dimensional analysis of LS-SVM. International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 2397–2401 (2017)

[31] Louart, C., Liao, Z., Couillet, R.: A random matrix approach to neural networks. The Annals of Applied Probability 28, 1190–1248 (2018)

[32] Mai, X., Couillet, R.: A random matrix analysis and improvement of semi-supervised learning for large dimensional data. Journal of Machine Learning Research (2017)

[33] Matsumoto, S.: General moments of the inverse real Wishart distri- bution and orthogonal Weingarten functions. Journal of Theoretical Probability 25(3), 798–822 (2012)

[34] Mattila, P.: Differentiation of measures in uniform spaces. Measure theory, Oberwolfach pp. 261–283 (1971)

[35] McDiarmid, C.: On the method of bounded differences. Cambridge University Press pp. 148–188 (1989)

[36] Mehta, M.L.: Random Matrices, 3rd edn. Elsevier, Academic Press, New York (2004)

[37] Mezzadri, F., Reynolds, A.K., Winn, B.: Moments of the eigenvalue densities and of the secular coefficients of -ensembles. Nonlinearity 30(3), 1034 (2017)

[38] Preiss, D.: Gaussian measures and the density theorem. Commenta- tiones Mathematicae Universitatis Carolinae 022(1), 181–193 (1981)

[39] Preiss, D.: Dimension of metrics and differentiation of measures. In General Topology and its Relations to Modern Analysis and Algebra V, Heldermann Verlag, Berlin pp. 565–568 (1983)

[40] Ross, S.M.: Introduction to Probability Models, Ninth Edition. Aca- demic Press, Inc. (2006)

[41] Speicher, R.: Combinatorial theory of the free product with amalgama- tion and operator-valued free probability theory. American Mathematical Society (1998)

[42] Stone, C.J.: Consistent nonparametric regression. The Annals of Statis- tics 5, 595–620 (1977)

[43] Tao, T.: Topics in Random Matrix Theory, Graduate studies in Math- ematics, vol. 132. American Mathematical Society (2013)

[44] Wishart, J.: The generalised product moment distribution in samples from a normal multivariate population. Biometrika 20A, 32–52 (1928)

[45] Zakai, A., Ritov, Y.: Consistency and localizability. Journal of Machine Learning Research 10, 827–856 (2009)

[46] Zhao, L.: Exponential bounds of mean error for the nearest neighbor estimates of regression functions. Journal of Multivariate Analysis pp. 168–178 (1987)

designed for accessibility and to further open science