I defended my thesis

Posted on Wed 21 July 2021 in life

I'm excited to announce that I successfully defended my PhD thesis today. My thesis is entitled, "Collaborative, open, and automated data science," and will be submitted to the MIT Library.

I want to thank the members of my committee, Dr. Kalyan Veeramachaneni, Prof. Saman Amarasinghe, and Prof. Rob Miller, as …

Continue reading

Evaluating the impact of mentorship programs

Posted on Fri 04 June 2021 in life

One initiative that I have been working on for the past year at MIT is the EECS Graduate Application Assistance Program (EECS GAAP), both as an organizer and as a mentor. EECS GAAP is a program that pairs underrepresented applicants to the MIT EECS department with graduate student mentors.

This …

Continue reading

Preprint of Assemblé

Posted on Tue 30 March 2021 in research • Tagged with ballet

I'm excited to share that we have posted a preprint to arXiv of our paper, "Meeting in the notebook: a notebook-based environment for micro-submissions in data science collaborations." This preprint describes the design and implementation of Assemblé, a development environment for data science collaborations that is targeted to Ballet. This …

Continue reading

Presenting at LIDSCONF 2021

Posted on Mon 15 March 2021 in research • Tagged with ballet, mit

Last month was the 26th LIDS Student Conference, which was held virtually. I was happy to give a talk entitled, "A New Approach to Collaborative Data Science with the Ballet Framework," which presents some of our work from our paper on Ballet. It was great to see the creativity with …

Continue reading

Sharing a preprint using acmart

Posted on Mon 08 February 2021 in research • Tagged with latex

Congratulations! You just submitted a paper to an ACM conference. While you are waiting to receive peer reviews, you may want to share a preprint with your colleagues.

Here's what you should not do -- send the exact file that you created as part of your submission. This is still anonymized …

Continue reading

Preprint of Ballet

Posted on Fri 18 December 2020 in research • Tagged with ballet

I'm excited to share that we have posted a preprint to arXiv of our paper, "Enabling collaborative data science development with the Ballet framework." This preprint summarizes our work on the Ballet framework for collaborative, open-source data science development.

Though there is much potential in building predictive models in an …

Continue reading

The poor man's Google Sheets mail merge

Posted on Tue 01 December 2020 in life • Tagged with gsuite, email, scripting

If you're not in a corporate environment, you're probably not sending many mail merges in which you automatically send many emails to different recipients with customized content for each recipient. I recently had to send about 150 emails during my work for a student organization I am involved in. Content …

Continue reading

Pipenv install with all the flags, explained

Posted on Thu 19 November 2020 in programming • Tagged with python, pipenv, docker

A common pattern of using pipenv in containers is to install as follows:

COPY Pipfile Pipfile.lock ./
RUN pip install --upgrade pip && \
    pip install pipenv && \
    pipenv install --system --deploy --ignore-pipfile && \
    pipenv --clear

What is going on here, exactly? How does one understand the intersection of the flags given to pipenv …

Continue reading

Inline markup within words in reStructuredText

Posted on Wed 12 August 2020 in programming • Tagged with rst, docs

I often write documentation with reStructuredText and Sphinx. Sometimes I want to refer to the plural of some programming concept, where the concept is monospace font but the plural form is in the normal font, e.g. lists.

While this is easy to do in markdown:


It …

Continue reading

Invitation to dance: a status report on the Ballet project

Posted on Tue 30 June 2020 in research • Tagged with ballet, feature engineering, machine learning

At MIT, we recently marked the close of one of the most turbulent academic years on record, in which academic and research activities were significantly disrupted by the emergence of the COVID-19 pandemic, which has by now, killed well over 400,000 people globally and 100,000 people in the …

Continue reading

Hidden Icon? files on macOS

Posted on Mon 15 June 2020 in programming • Tagged with macos

For a while I've tolerated files named Icon? that appear in my console ls and git status output on macOS as well as in some IDE file trees like in Atom:

$ \ls -Al
total 696
-rw-------  1 micahsmith  staff  179 Nov 12  2019 Experiment Datasets.gsheet
-rw-r--r--@ 1 micahsmith  staff …

Continue reading

MLSys 2020 Recap

Posted on Wed 04 March 2020 in research • Tagged with machine learning, mlsys

This week, I attended the third Conference on Machine Learning and Systems (MLSys) in Austin, TX. It was a great experience and I thought I would record some of my thoughts and observations from attending the conference.

Demonstration of Ballet

First off, the reason I was attending in the first …

Continue reading

Attending MLSys 2020

Posted on Sun 02 February 2020 in research

Our demonstration of Ballet was recently accepted at the Conference on ML and Systems (MLSys 2020). I will be attending the conference March 2-4 to present a live demonstration of our framework for real-time, collaborative feature engineering. I'm looking forward to attending the conference, visiting Austin, TX, and meeting some …

Continue reading

Dataclasses and mutable defaults

Posted on Tue 14 January 2020 in programming • Tagged with python

One common Python gotcha is the use of mutable objects as defaults for function keyword arguments. There are approximately one billion questions on SO about this or nice discussions elsewhere. I came across a nice feature in Python's dataclasses library that addresses a similar problem.

Mutable defaults are bad

As …

Continue reading

Fixing invalid argument iwhite on vimdiff

Posted on Thu 07 November 2019 in programming • Tagged with vim, macos, catalina

[Update from 2020-03-27 below]

Vim's diff mode is a lightweight diffing tool that can be used at the command line, in particular with the git diff command. In a previous post, I wrote about configuring vimdiff as the git difftool for use with Matlab development (yes, way back when I …

Continue reading

Anonymize GitHub repos for double-blind submission

Posted on Wed 23 October 2019 in research • Tagged with latex

Many venues for submitting research in computer science and other academic fields follow a double-blind review process, in which the authors should be anonymous to the reviewers. This can cause conflicts with ideas from open science or with the evaluation of software contributions as any linked GitHub repositories can contain …

Continue reading

Find ports in use on macOS

Posted on Mon 16 September 2019 in programming • Tagged with macos, shell

How to find ports that are already "in use" on macOS:

sudo lsof -P -i TCP -s TCP:LISTEN

This is helpful if you are trying to figure out which process is using a port so that you can kill that process — for example, if you have a web server …

Continue reading

Trim graphics in LaTeX

Posted on Fri 13 September 2019 in programming • Tagged with latex

How to trim a figure in LaTeX.

You've already inserted the figure into a figure environment and now want to trim excess whitespace.

Start adjusting from here:

\fbox{\includegraphics[clip=true, trim={0 0 0 0}, width=\linewidth]{myfigure}}

fbox is a black box around the float content so you …

Continue reading

Using autoreload in IPython

Posted on Fri 13 September 2019 in programming • Tagged with python, ipython, jupyter

Using the autoreload extension.

%load_ext autoreload
%aimport mymodule
%autoreload 1

The autoreload command understands three levels:

  • 0 -> extension is disabled
  • 1 -> reload modules that were marked with %aimport
  • 2 -> reload everything

The easiest usage of autoreload is to not aimport anything and set %autoreload 2, which causes the extension …

Continue reading

Switching up blog content

Posted on Thu 12 September 2019 in general

It's been over a year and a half since I've last posted here. I have some great ideas for really interesting posts to write, but somehow I never seem to find the time.

So I'll be switching the type of content I post in an effort to write more. I …

Continue reading

My year in books

Posted on Sun 31 December 2017 in life • Tagged with life, books

At the beginning of 2017, I made a resolution: I would read 17 books for pleasure over the course of the year. Now, this might seem like very few to my mother who claims to read a new book every two days on her Kindle. And it might seem like …

Continue reading

Wonders of YouTube

Posted on Fri 19 May 2017 in life • Tagged with life, italian, youtube, spirits, mediums

Who was Eusapia Palladino?

Eusapia Palladino era una medium Italiana.

I remember saying these words over and over in my dorm room in Schapiro Hall during my third year of college. I was practicing for the midterm presentation in my Advanced Italian Conversation course, in which we had to make …

Continue reading

You need a doorbell

Posted on Fri 03 February 2017 in programming • Tagged with life, programming, python, twilio, flask

Sometimes a dumb technical approach can be a solution to a real world problem.

I live in a graduate residence that doesn't have a buzzer system. To be granted access, my guests have to text/call me directly and I have to walk downstairs to let them in. Though this …

Continue reading

My first crossword

Posted on Tue 24 January 2017 in crossword • Tagged with crossword

I made my first crossword! This is something that I've been working on very intermittently since 2011. Very intermittently, in the sense that I would make progress only when I was flying across the country on college breaks and, with no any internet connection, sometimes didn't have anything better to …

Continue reading

I ran a race and made some graphs

Posted on Thu 17 November 2016 in dataviz • Tagged with dataviz, running, julia

The Cambridge Half Marathon was this past Sunday and I was able to race with a couple of friends. It was a gorgeous day and very nice course. As always, I was inspired by the diversity of runners.

After finishing, I was happy to see that the full results were …

Continue reading

Why I'm bailing on Julia for machine learning

Posted on Fri 04 November 2016 in programming • Tagged with julia, python, ml

I'm bailing on Julia for machine learning — just for my one class, that is. Don't worry ~too much~!

I'm taking graduate machine learning (6.867) this semester at MIT. There are three homework assignments in the course that are structured as mini-projects, in which students implement canonical algorithms from scratch …

Continue reading

Eight days a week, revisited

Posted on Thu 02 June 2016 in economics • Tagged with life, economics, corporate drones, nundinum

If, as I investigated in a previous post, you were down with an eight-day week, then you were down with an idea that is possibly welfare improving. Give yourself a pat on the back.

That's according to Maya Eden in a paper elegantly titled The Week. Dr. Eden presented her …

Continue reading

Using vimdiff with dumb paths

Posted on Tue 02 February 2016 in programming • Tagged with vim, git, matlab

I've been loving vimdiff as my git difftool for a while.

Vimdiff and Matlab packages

Vimdiff runs into some problems when working with Matlab code. Matlab considers directories with names beginning with + as "packages", a natural way to organize projects. However, this means that many paths relative to the git …

Continue reading

How Goldman stays cool

Posted on Mon 11 January 2016 in buildings • Tagged with buildings, ice, cubes, ice cubes

Goldman literally has a vault full of ice cubes.

I wasn't sure if I had imagined this fun fact or not, but sure enough, Goldman's FiDi headquarters is literally air conditioned in hot weather by the melting of massive ice blocks in their basement.

According to the WSJ, which covers …

Continue reading

Eight days a week

Posted on Sat 09 January 2016 in economics • Tagged with life, economics, corporate drones, nundinum

What would you give up for a three-day weekend?

The problem

How great are three-day weekends? For me, at least, they recall the days in undergrad in which Thursday at 2pm was the end of the work week. The rejuvenation of those three-day weekends is certainly a step above the …

Continue reading


Posted on Fri 08 January 2016 in life • Tagged with life

Welcome to my site. I'll be writing about various topics, such as the world to which we are all party.