Skip to main content
University of Wisconsin–Madison

Transforming Data Science at UW–Madison and Beyond

Virtually all aspects of society are in some way driven by data collected, stored and analyzed to glean insights and guide decision-making, and a new field — data science — has emerged to study and advance those activities. Data science builds on computer science, statistics, mathematics, e-science and other disciplines to develop principles, algorithms and best practices for the generation, processing and use of data.

A large proportion of data science research has focused on the data analysis step, this project focuses expertise on improving data preparation — the complex extracting, cleaning, matching and integrating step that comes before data analysis can be carried out. Preparation has become a bottleneck preventing wide and effective adoption of data science techniques, and improvements would benefit almost any data science users.

This project seeks to make transformative impacts on the data preparation step, by charting new research directions and translating them into tools that will be integrated with the widely adopted HTCondor software. The project turbocharges data science effort at UW–Madison by working toward setting up a campus-wide Data Science Institute and extending the Center for High Throughput Computing with data science services. Partnering with the Core Computational Technology activity at MIR and WID, the project will enable campus scientists to perform data science tasks far more effectively.

To evaluate and drive the research and development work, the project will help the science policy research community clean and extend their large data repository, UMETRICS, to aid their work studying the impact of investment in university research.

Principal Investigator

  • AnHai Doan
    Computer Sciences

Co-Principal Investigators

  • Brent Hueth
    Associate professor
    Agricultural and Applied Economics
  • Miron Livny
    Computer Sciences