Samory K. Kpotufe (Columbia) – From Theory To Clustering

Date & Time:

February 17, 2021 4:00 pm – 5:00 pm

Location:

Zoom

02/17/2021 04:00 PM 02/17/2021 05:00 PM America/Chicago Samory K. Kpotufe (Columbia) – From Theory To Clustering Data Science Joint Seminar of the Departments of Statistics and Computer Science Zoom

From Theory To Clustering

For Zoom information, subscribe to the Stats Seminars email list.

Clustering is a basic problem in data analysis, consisting of partitioning data into meaningful groups called clusters. Practical clustering procedures tend to meet two criteria: flexibility in the shapes and number of clusters estimated, and efficient processing. While many practical procedures might meet either of these criteria in different applications, general guarantees often only hold for theoretical procedures that are hard if not impossible to implement. A main aim is to address this gap.

We will discuss two recent approaches that compete with state-of-the-art procedures, while at the same time relying on rigorous analysis of clustering. The first approach fits within the framework of densitybased clustering, a family of flexible clustering approaches. It builds primarily on theoretical insights on nearest-neighbor graphs, a geometric data structure shown to encode local information on the data density. The second approach speeds up kernel k-means, a popular Hilbert space embedding and clustering method. This more efficient approach relies on a new interpretation – and alternative use – of kernel-sketching as a geometry-preserving random projection in Hilbert space.

Finally, we will present recent experimental results combining the benefits of both approaches in the IoT application domain.

The talk is based on various works with collaborators Sanjoy Dasgupta, Kamalika Chaudhuri, Ulrike von Luxburg, Heinrich Jiang, Bharath Sriperumbudur, Kun Yang, and Nick Feamster.

Host: Rebecca Willett

Samory K. Kpotufe

Associate Professor, Columbia University

I work in statistical machine learning, with an emphasis on common nonparametric methods (e.g., kNN, trees, kernel averaging). I’m particularly interested in adaptivity, i.e., how to automatically leverage beneficial aspects of data as opposed to designing specifically for each scenario. This involves characterizing statistical limits, under modern computational and data constraints, and identifying favorable aspects of data that help circumvent these limits.

Some specific interests: notions of intrinsic data dimension, benefits (or lack thereof) of sparse or manifold representations; performance limits and adaptivity in active learning, transfer and multi-task learning; hyperparameter-tuning and guarantees in density-based clustering.

Resources

Community

What’s Real and What’s Not? Watermarking to Identify AI-Generated Text

Enhancing Multitasking Efficiency: The Role of Muscle Stimulation in Reducing Mental Workload

From wildfires to bird calls: Sage redefines environmental monitoring

“Machine Learning Foundations Accelerate Innovation and Promote Trustworthiness” by Rebecca Willett

Nightshade: Data Poisoning to Fight Generative AI with Ben Zhao

Ian Foster – Better Information Faster: Programming the Continuum

From Theory To Clustering

Samory K. Kpotufe

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

New CS and DSI Faculty Haifeng Xu Brings Strategic Intelligence to NeurIPS 2022

UChicago CS Research Finds New Angle on Database Query Processing with Geometry

Asst. Prof. Aloni Cohen Receives Award For Revealing Flaws in Deidentifying Data

UChicago Hosts NSF Workshop on Frontiers of Quantum Advantage

New 2022-23 Faculty Add Expertise in Linguistics, Visualization, Economics, and Data Science Education

UChicago Co-Leads $10 Million NSF Institute on Foundations of Data Science

Bill Fefferman Comments on New Standards for Quantum-Proof Cryptography

UChicago London Colloquium Features Data Science, Quantum Research

Faculty Bill Fefferman and Chenhao Tan Receive Google Research Scholar Awards

First-Year PhD Student Co-Authors Outstanding Paper Award Winner at TQC 2022

Quanta Magazine Features Prof. Bill Fefferman’s Work on Quantum Algorithms