Back to top

Research Interests

High-dimensional dependent data

High-dimensional dependent data

Sports analytics

Sports analytics

Statistical computing

Statistical computing

Statistics education

Statistics education

Recent Articles

Homogeneity tests of covariance and change-points identification for high-dimensional functional data

Shawn Santo, Ping-Shou Zhong

May 2020

We consider inference problems for high-dimensional (HD) functional data with a dense number (T) of repeated measurements taken for a large number of p variables from a small number of n experimental units. The spatial and temporal dependence, high dimensionality, and the dense number of repeated measurements all make theoretical studies and computation challenging.
This paper has two aims; our first aim is to solve the theoretical and computational challenges in detecting and identifying change points among covariance matrices from HD functional data. The second aim is to provide computationally efficient and tuning-free tools with a guaranteed stochastic error control. The change point detection procedure is developed in the form of testing the homogeneity of covariance matrices. The weak convergence of the stochastic process formed by the test statistics is established under the "large p, large T and small n" setting. Under a mild set of conditions, our change point identification estimator is proven to be consistent for change points in any location of a sequence. Its rate of convergence depends on the data dimension, sample size, number of repeated measurements, and signal-to-noise ratio. We also show that our proposed computation algorithms can significantly reduce the computation time and are applicable to real-world data such as fMRI data with a large number of HD repeated measurements. Simulation results demonstrate both finite sample performance and computational effectiveness of our proposed procedures. We observe that the empirical size of the test is well controlled at the nominal level, and the locations of multiple change points can accurately be identified. An application to fMRI data demonstrates that our proposed methods can identify event boundaries in the preface of the movie Sherlock. Our proposed procedures are implemented in an R package TechPhD.

Homogeneity tests of covariance matrices with high-dimensional longitudinal data

Ping-Shou Zhong, Runze Li, Shawn Santo

Biometrika, May 2019

This paper deals with the detection and identification of change points among covariances of high-dimensional longitudinal data, where the number of features is greater than both the sample size and the number of repeated measurements. The proposed methods are applicable under general temporal-spatial dependence. A new test statistic is introduced for change point detection, and its asymptotic distribution is established.
If a change point is detected, an estimate of the location is provided. The rate of convergence of the estimator is shown to depend on the data dimension, sample size, and signal-to-noise ratio. Binary segmentation is used to estimate the locations of possibly multiple change points, and the corresponding estimator is shown to be consistent under mild conditions. Simulation studies provide the empirical size and power of the proposed test and the accuracy of the change point estimator. An application to a time-course microarray dataset identifies gene sets with significant gene interaction changes over time.

Student Collaborations

Ana Belac

Independent Study, Spring 2020

An Analysis Of How Golf Skills Affect PGA Tour Earnings

Michael Tan

Honors Thesis, Spring 2020

Investigating Underpricing in Venture-Backed IPOs Using Statistical Techniques