Date & Time:
February 17, 2021 4:00 pm – 5:00 pm
Location:
Zoom
02/17/2021 04:00 PM 02/17/2021 05:00 PM America/Chicago Samory K. Kpotufe (Columbia) – From Theory To Clustering Data Science Joint Seminar of the Departments of Statistics and Computer Science Zoom

From Theory To Clustering

For Zoom information, subscribe to the Stats Seminars email list.

Clustering is a basic problem in data analysis, consisting of partitioning data into meaningful groups called clusters. Practical clustering procedures tend to meet two criteria: flexibility in the shapes and number of clusters estimated, and efficient processing. While many practical procedures might meet either of these criteria in different applications, general guarantees often only hold for theoretical procedures that are hard if not impossible to implement. A main aim is to address this gap.

We will discuss two recent approaches that compete with state-of-the-art procedures, while at the same time relying on rigorous analysis of clustering. The first approach fits within the framework of densitybased clustering, a family of flexible clustering approaches. It builds primarily on theoretical insights on nearest-neighbor graphs, a geometric data structure shown to encode local information on the data density. The second approach speeds up kernel k-means, a popular Hilbert space embedding and clustering method. This more efficient approach relies on a new interpretation – and alternative use – of kernel-sketching as a geometry-preserving random projection in Hilbert space.

Finally, we will present recent experimental results combining the benefits of both approaches in the IoT application domain.

The talk is based on various works with collaborators Sanjoy Dasgupta, Kamalika Chaudhuri, Ulrike von Luxburg, Heinrich Jiang, Bharath Sriperumbudur, Kun Yang, and Nick Feamster.

Host: Rebecca Willett

Samory K. Kpotufe

Associate Professor, Columbia University

I work in statistical machine learning, with an emphasis on common nonparametric methods (e.g., kNN, trees, kernel averaging). I’m particularly interested in adaptivity, i.e., how to automatically leverage beneficial aspects of data as opposed to designing specifically for each scenario. This involves characterizing statistical limits, under modern computational and data constraints, and identifying favorable aspects of data that help circumvent these limits.

Some specific interests: notions of intrinsic data dimension, benefits (or lack thereof) of sparse or manifold representations; performance limits and adaptivity in active learning, transfer and multi-task learning; hyperparameter-tuning and guarantees in density-based clustering.

Related News & Events

Students posing at competition

UChicago Undergrad Team Places Second Overall In Regionals For World’s Largest Programming Competition

Mar 17, 2023
Haifeng Xu

New CS and DSI Faculty Haifeng Xu Brings Strategic Intelligence to NeurIPS 2022

Nov 28, 2022

UChicago CS Research Finds New Angle on Database Query Processing with Geometry

Nov 08, 2022

Asst. Prof. Aloni Cohen Receives Award For Revealing Flaws in Deidentifying Data

Sep 09, 2022

UChicago Hosts NSF Workshop on Frontiers of Quantum Advantage

Aug 15, 2022

New 2022-23 Faculty Add Expertise in Linguistics, Visualization, Economics, and Data Science Education

Aug 11, 2022

UChicago Co-Leads $10 Million NSF Institute on Foundations of Data Science

Aug 09, 2022

Bill Fefferman Comments on New Standards for Quantum-Proof Cryptography

Jul 07, 2022

UChicago London Colloquium Features Data Science, Quantum Research

Jul 01, 2022

Faculty Bill Fefferman and Chenhao Tan Receive Google Research Scholar Awards

Jun 21, 2022

First-Year PhD Student Co-Authors Outstanding Paper Award Winner at TQC 2022

Apr 28, 2022

Quanta Magazine Features Prof. Bill Fefferman’s Work on Quantum Algorithms

Jan 20, 2022
arrow-down-largearrow-left-largearrow-right-large-greyarrow-right-large-yellowarrow-right-largearrow-right-smallbutton-arrowclosedocumentfacebookfacet-arrow-down-whitefacet-arrow-downPage 1CheckedCheckedicon-apple-t5backgroundLayer 1icon-google-t5icon-office365-t5icon-outlook-t5backgroundLayer 1icon-outlookcom-t5backgroundLayer 1icon-yahoo-t5backgroundLayer 1internal-yellowinternalintranetlinkedinlinkoutpauseplaypresentationsearch-bluesearchshareslider-arrow-nextslider-arrow-prevtwittervideoyoutube