research

Below is my (mostly) complete bibliography with links to articles, presentations, posters, videos, and suggested citations. Make sure to check my blog as well for additional informal discussion of some of this research.

Projects

Collaborative data science development

While the open-source model for software development has led to successful, large-scale collaborations in building software applications, chess engines, and scientific analyses, data science has not benefited from this development paradigm. In part, this is due to the divide between the development processes used by software engineers and those used by data scientists.

Ballet tries to address this disparity. It is a lightweight software framework that supports collaborative data science development by composing a data science pipeline from a collection of modular patches that can be written in parallel. Ballet provides the underlying functionality to support interactive development, test and merge high-quality contributions, and compose the accepted contributions into a single product.

I've led development of the core Ballet framework, the Assemblé development environment, the Ballet Bot, and a bunch of other software.

We've evaluated Ballet in an extensive case study analysis of a personal income prediction project, and describe our ideas for collaborative data science development, the design of the framework, and the results of this evaluation in our preprint.

Frameworks for AutoML

In our experience developing and deploying ML systems in my research group, we realized that every project used a different set of libraries depending on the task at hand that fit together more or less poorly. To address this, we redesign our systems building approach to one based on the concepts ML primitives, ML pipelines, and AutoML components. The resulting software framework is used for everything from our entry to DARPA's Data-Driven Discovery of Models program to unsupervised time-series anomaly detection in satellite telemetry to ML on electronic health records. I designed the BTB library for model selection and hyperparameter tuning which has also been contributed to by many folks in the Data to AI Lab. We describe the framework, some of the ML and AutoML systems we have built with it, and a thorough evaluation in this paper.

Systems for AutoML

I am a developer on the ATM project, a full-fledged open-source system for joint model selection and hyperparameter tuning for classification. ATM is one of the first projects from the research community that went beyond the creation of libraries for model selection or hyperparameter tuning to create a system with a database backend designed for ease of use and high performance. On top of this, we collaborated with the VisLab at HKUST to create a frontend for ATM that allows users to monitor and control an ongoing AutoML search process. This led to the ATMSeer system which we describe in this paper.

Publications

Shubhra Kanti Karmaker (“Santu”), Md. Mahadi Hassan, Micah J. Smith, Lei Xu, Chengxiang Zhai, and Kalyan Veeramachaneni. "AutoML to Date and Beyond: Challenges and Opportunities." ACM Computing Surveys. 2021. (Also published at arXiv:2010.10777 [cs])

Micah J. Smith, Jürgen Cito, Kelvin Lu, and Kalyan Veeramachaneni. "Enabling Collaborative Data Science Development with the Ballet Framework." Proceedings of the ACM on Human-Computer Interaction. 2021. (Also published at arXiv:2012.07816 [cs])

Micah J. Smith. "Collaborative, Open, and Automated Data Science." Thesis. 2021.

Micah J. Smith, Jürgen Cito, and Kalyan Veeramachaneni. "Meeting in the Notebook: A Notebook-Based Environment for Micro-Submissions in Data Science Collaborations." arXiv:2103.15787 [cs]. 2021.

Micah J. Smith, Carles Sala, James Max Kanter, and Kalyan Veeramachaneni. "The Machine Learning Bazaar: Harnessing the ML Ecosystem for Effective System Development." Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2020. (Also published at arXiv:1905.08942 [cs])

Dongyu Liu, Micah J. Smith, and Kalyan Veeramachaneni. "Understanding User-Bot Interactions for Small-Scale Automation in Open-Source Development." Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems. 2020.

Micah J. Smith, Kelvin Lu, and Kalyan Veeramachaneni. "Demonstration of Ballet: A Framework for Open-Source Collaborative Feature Engineering." Proceedings of the 3rd MLSys Conference. 2020.

Qianwen Wang, Yao Ming, Zhihua Jin, Qiaomu Shen, Dongyu Liu, Micah J. Smith, Kalyan Veeramachaneni, and Huamin Qu. "ATMSeer: Increasing Transparency and Controllability in Automated Machine Learning." Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems. 2019. (Also published at arXiv:1902.05009 [cs])

Micah J. Smith, Kelvin Lu, and Kalyan Veeramachaneni. "Ballet: A Lightweight Framework for Open-Source, Collaborative Feature Engineering." Workshop on Systems for Machine Learning and Open Source Software at NeuRIPS 2018. 2018.

Micah J. Smith. "Scaling Collaborative Open Data Science." Thesis. 2018.

Micah J. Smith, Roy Wedge, and Kalyan Veeramachaneni. "FeatureHub: Towards Collaborative Data Science." 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). 2017.

José Cambronero, John K. Feser, Micah J. Smith, and Samuel Madden. "Query Optimization for Dynamic Imputation." Proceedings of the VLDB Endowment. 2017.

Marco Del Negro, Gauti Eggertsson, Andrea Ferrero, and Nobuhiro Kiyotaki. "The Great Escape? A Quantitative Evaluation of the Fed's Liquidity Facilities." American Economic Review. 2017. (Substantial contribution)

Marco Del Negro, Marc Giannoni, and Micah J. Smith. "The Macro Effects of the Recent Swing in Financial Conditions." Report. 2016.

Marco Del Negro, Marc Giannoni, Pearl Li, Erica Moszkowski, and Micah J. Smith. "The FRBNY DSGE Model Meets Julia." Article. 2015.

Zac Cranko, Pearl Li, Spencer Lyon, Erica Moszkowski, Micah J. Smith, and Pablo Winant. "The DSGE MATLAB to Julia Transition: Improvements and Challenges." Article. 2015.

Marco Del Negro, Marc Giannoni, Erica Moszkowski, Sara Shahanaghi, and Micah J. Smith. "The FRBNY DSGE Model Forecast - November 2015." Article. 2015.

Andreas Fuster, Basit Zafar, and Micah J. Smith. "Just Released: 2015 SCE Housing Survey Shows Households Optimistic about Housing Market." Article. 2015.

Andreas Fuster, Basit Zafar, and Micah J. Smith. "Survey of Consumer Expectations: Housing Survey - 2015: Report." Article. 2015.

Marco Del Negro, Marc Giannoni, Matthew Cocci, Sara Shahanaghi, and Micah J. Smith. "Why are interest rates so low?." Article. 2015.

Marco Del Negro, Marc Giannoni, Matthew Cocci, Sara Shahanaghi, and Micah J. Smith. "The FRBNY DSGE Model Forecast - April 2015." Article. 2015.

Marco Del Negro, Marc Giannoni, and Christina Patterson. "The forward guidance puzzle." Report. 2012. (Substantial contribution)