Diving For Treasure in Complex Data: From Roman Urns to Alzheimer's

Marvin Weinstein, Theoretical Physicist, SLAC National Accelerator Laboratory, Stanford



"We're drowning in data".  How often have you seen or heard that phrase? Discovering hidden information in an ocean of high-dimensional, often noisy and low contrast data requires novel data-mining techniques.  Dynamic Quantum Clustering (DQC), a new, unbiased, visual data-mining technology, has been successfully applied to small datasets in many fields.  In this talk I show that DQC also works for large and noisy datasets on which other, more familiar methods fail. The first example shows the application of DQC to a dataset consisting of ~700,000 x-ray absorption spectra. We will see that DQC is remarkably successful at discovering and extracting both simple and topologically complex structures from a large, noisy dataset without making any a-priori assumptions. Other examples include earthquake data and SNP data for Alzheimer's patients. The success of these analyses bodes well for the application of DQC to diverse fields, including biology, genetics, epidemiology, geology, physics, chemistry, document classification, business and national security.