The start of my sabbatical has given me a moment to reflect on my publications. But my CV only shows a list of neatly cataloged papers: title, authors, conference. Each one appearing no different from another. But how each paper ended up published is its own story, a story about people and opportunity.
Using notes from my research journal and conference records, I reassembled the "Behind the scenes" story for each of my full research papers: 15 papers as a student, and 15 papers after becoming a professor (excluding papers not led by my research group, as I feel it's not my place to tell those stories). This is as much a reflective exercise for myself than it is for an audience.
If you read this page in its entirety, it will take about 30 minutes. But you can skip any story, and it should still make sense. The first half are my student stories, and the second half are my professor stories, so you can even just read the half that's more interesting to you. This would probably be more enticing as a series of Tweets or a Substack newsletter, but I'd rather post it all at once.
I'd love to read the backstories of other peoples' publications too. So if you feel comfortable sharing, post yours and email me if you want it linked at the bottom of this page so we can start a collection. Anyways, here goes.
Bo was a friend who needed some programming help for an automated help generator as part of a research project, and off I went. What I thought would be a one week task turned into three rewrites of an application, two studies, and one long faculty mentorship. Neither of us could predict that Bo would graduate before the main study began, so he did not even become a co-author of a project he started (though in hindsight, I could have just put him as second author).
Because I was learning as I went, I made all the novice mistakes from study design to paper structure. It was a long 3.5 years between when I got that first message and when it was published, after my graduation. And the acceptance decision came only after eeking out a lucky coin-flip, as the initial metareview score was "3-Borderline".
But this paper was critical to getting me accepted to a Ph.D. program. Why do I think that? Well I was rejected by every Ph.D. program I applied to before this publication (but that's another story). So with this paper published, I left my job, squeezed my belongings into my car, and drove up the I-5 highway from California to Seattle.
It was rejected from SIGIR on my first try, and I was starting to worry the topic would become stale, especially with the controversy about the original dataset that led to AOL closing their research department. So I was relieved this wrapped up quickly, and it felt good to have a paper under my belt in my first year.
It puzzles me that this is my most cited full paper, but I think it's due to its topic rather than because of its contribution; though when I checked recently, a few citations are from people using the source code even 12 years later. I guess there's something to be said about the compound interest for citations.
The gold standard for these internships was to do a paper from start to finish in the twelve weeks, and I was desperately trying to do paper-worthy work to prove myself. I carried out the analysis as best I could, however I struggled to do enough for a full paper by the end of the internship. Fortunately, Ryen finished up and expanded on it substantially after I left, so I'm grateful he didn't just give it up as unfinished.
To my complete surprise, this paper won the best paper award that year at SIGIR. Even after being nominated for the award, I felt it was so unlikely to win that I didn't attend the conference. In fact, at that time I thought my other related paper at that conference (the one below) was a better paper overall, but now I see that the best paper award committee probably felt that evaluation was a messier topic so wanted to reward that effort. This award boosted my confidence during my Ph.D. and opened some doors later, so I'm both lucky and thankful it happened.
We had a good time working out the math, mostly me asking questions while watching her think aloud on the whiteboards. I learned about different strategies for deriving proofs in a real setting, which were unlike problem sets, where you knew a proof existed and were roughly the right difficulty for you to do. During this summer, I was mathematically in my best shape, as I could follow along enough to write out the solution and check for errors, whereas today it would take a while to even familiarize myself with the equations in the paper.
That idea clicked with me, so I helped out with a qualitative study to understand why. Gifford taught me most of what I know about grounded theory (and gave me a second lesson a few years ago while collaborating on a more recent 2018 paper).
At that time, I had in my mind that I was helping him rescue a rejected idea, but now in retrospect, it's clear that it was he who gave me the opportunity to help out on work he already had a vision for. I'm quite proud of this work, which is my third most-cited paper, but mainly because what we thought would become a phenomenon did indeed come to pass.
I aimed for a commit deadline during the second half of my internship, and the day it had to ship by 5pm, my code simply wouldn't pass the unit tests. I tried to force it by (naively) changing the unit tests, but that just broke other parts of the build process. It was 4pm and I even had to be somewhere else across the bridge in 30 minutes; I was desperate. I ran in the hallways panicking and found a software developer, Sarvesh, who took a look and gave me some tips. But by the time I had to leave, I still could not push the code. Sarvesh again came to my rescue and assured me, "you take off, I've got this" and sat down at my desk to fix the problem and ship my code while I drove out of the parking lot. Without his help, I would have had to ship it during the next cycle, and would not have left enough time to analyze the data before my internship was over. Not only that, but because it was during a corporate internship, I would have no rights to the intellectual property of any of the work.
But my 20 lines of JavaScript did ship and luckily my was bug-free (after I pored over every line a hundred times), so this paper became the first paper I published at the main conference in my field; it was nominated for a best paper award, and became the foundational chapter in my dissertation. Not only that, but it was this work that led to a Google Research Grant, Facebook Ph.D. Fellowship, and Microsoft patent.
Her group was probably the closest to my core research interests, and it just happened that Ryen had moved to this group. So with this combination of good circumstances, I aimed to write three papers with Ryen and others to get enough material for my dissertation.
This first one was difficult for me, using a technique I was unfamiliar with, and required a lot of compute. My compute jobs would compete with higher priority jobs during the day so often timed out. So I had to stay many late evenings to kick off the 8-hour distributed processes that could only finish at night when the cluster was not in heavy use. My efforts paid off; reviewers were generally favorable and it was a clean accept.
Now I have to confess, this is the paper I am most skeptical about. I don't fully trust the model, even though I kept checking it over. The results showed a modest improvement, but I can't seem to shake the feeling that they were due to a secondary factor like a collinearity, or even worse that there was a calculation error somewhere in there. But now my code is probably gone, and it seems like others were able to show practical improvements using similar models.
I sweated over the rebuttal and promised changes, and luckily convinced the metareviewers to let this one slip through. But it actually turned out to be a fairly influential paper, with 223 citations as of today, and served as a baseline for some of my students' work later. So this was paper number two from that summer. I think it hit a trend of papers about user attention that came out the next few years, but this trend has dwindled since then.
While browser tabs did continue to be a phenomenon, this paper got fewer citations than a lighter short paper I wrote earlier on this topic. I think partly because the title was too clever to be easily recognizable.
But one day my opportunity to be a TA vanished (itself another story), and I begged over to Oren, a professor from the computer science department for an office and funding. To my surprise, he immediately agreed within hours of my email. So I started to learn about natural language processing and got to see how he ran his lab.
This resulting paper was a combination of his interests and mine. A couple of undergraduate students joined in under my supervision, which was also my first time mentoring students. The study almost did not happen, because I had trouble fitting the procedure under the rules of our human subjects guidance. But after last-minute discussions with Oren and an HCI professor, we found a way to thread the needle.
Its publication helped launch one of the undergraduate students into the Ph.D. program at MIT, and led to my interest in involving undergraduate students in research for many years to come.
I barely finished the final analysis in the paper the day that I had my farewell lunch. Reviewers loved the work more than I expected, and this paper led to a few opportunities later so I'm glad it worked out. It also brought closure to my Ph.D., as I ended up with no leftover working papers in the pipeline, hence the 2-year gap until my next paper.
Part 2: Papers as a faculty author
So fast forward past my move to Providence and a few false starts at unfinished projects. My first published paper as a faculty member came from a student referred by a professor at another university. The student was Eddie, an undergraduate student from UCLA who reached out to the professor about conducting research analyzing patterns in StarCraft replays. That professor thought I fit the topic better but cautioned, "he [I] probably doesn't have the bandwidth to supervise external students at this time". While that would probably be true now, back then I took the chance and steered him towards an adjacent investigation.
I invited Gifford (yes, my classmate from before) to help out, and the work from start to finish was about 8 months of intense analysis and figure-making. The paper ended up with two strong ratings (4.5/5, 5/5) and two unenthusiastic ratings (2.5/5, 3/5), so the compromise was that it ended up shepherded (a paper deemed borderline but asked to make specific changes to be acceptable) to guide us to "accept". This made us nervous for longer, but after this paper got in, I wrote to Eddie, "You've earned your golden ticket to grad school :-) congrats!" and he chose to do a Ph.D. at the University of Washington, my own alma mater.
The timing for this particular paper was a bit lucky because the reviewers nominated it for a best paper award, but our follow-up work was not as successful; we still had more to say on this topic, but met a lot of resistance in writing the sequels after years of trying to publish newer findings with only rejections.
This paper was the first of many product-style papers that have become the norm in our research group. The work was initially rejected at multiple conferences because while the overall system was effective and the functionality was novel, the technique was not innovative and the results were numerically worse than some of our competitors. I was frustrated about being compared against competitors who only reported data from the users for whom they get good results from (even when they are upfront about omitting results from most of their users), while we were reporting full results from every user.
Anyways, it took over two years to build and publish, but I'm proud that the system in our paper is used by a sizable community. It has become part of a popular psychology library used for many research studies, and adopted by a few startups including one which bought a non-exclusive commercial license. We knew this work would have impact later, as Alexandra sent me one of my favorite acceptance notifications, "It got in!!!! I am going back to sleep, I'll email the rest of the authors tomorrow! :D Very excited, Alexandra" This paper was the foundation for her dissertation, and we are both still working on the project now seven years since it began.
We had a tough start and ate a few rejections at both UIST and CHI before we published the paper at the following UIST, 2 years after the initial idea. Even then, the paper almost didn't happen because the reviewers were skeptical (borderline ratings) but Nedi wrote a convincing rebuttal, as a metareviewer summarized, "I re-read the paper in light of the rebuttal. The proposed changes [...] pushed me into the slightly positive end of the spectrum. The submission was discussed at length at the PC meeting and received additional input from another PC member who reviewed the submission at a prior venue. The overall feeling is that this isn't a perfect paper, but it is a difficult area in which to do research and we do learn something from the submission."
Close call, but this became the foundation for the rest of Nedi's Ph.D. work. We were lucky to publish it sooner than later because I later found out there were other research groups working on similar ideas.
This paper set the standard that we would try to include undergraduate students in every paper, and so far that still holds true—100% of the papers from our group have included undergraduate authors.
We submitted to CHI 2017 and while two reviewers rated it highly (4.5/5 and 4/5), the third wrote a scathing review; the two metareviewers examined the paper closely, and ultimately decided to reject. It was a little frustrating to be so close, as this paper had the highest average rating of all the rejected papers that year. However, we revised it and ultimately published it at IMWUT the following year after a cycle of major revisions.
However, even after the data collection, there were a few snags. We learned that video frames did not inherently have timestamps associated with them, and it was nearly impossible to retroactively infer them to millisecond-level accuracy. While the dataset itself still felt like a strong contribution in the end, it was harder for other researchers to apply immediately so hasn't been as broadly used as I hoped. I still feel like this paper is a bit underrated today, and someone could write one or two other papers from the dataset we collected.
The paper was accepted on its second submission and one of the rare times I've encountered an "accept" decision without having to do a major revisions beforehand. But as a product it has been disappointing; we attempted to deploy it to usability professionals, the target audience, but it turns out that very few people were willing or capable of 3D-printing their own components. We learned from this so the project has not ended here, and we are nearly finished with a sequel, 5 years after the initial idea to reach our original vision.
Many of the authors had never met each other, and it was a bittersweet moment for me when by pure coincidence, the original author, Eda, was standing in the hallway outside my office with the final author, Neilly, without knowing one other (which I immediately corrected by introducing them).
What was challenging about this paper was the engineering work used existing known techniques, so we had to emphasize how the experience was a contribution on its own. This was hard to do in a study, as it wasn't about directly improving any specific aspect of life, but being able to experience it differently. I begged my old colleague Gifford to help in early 2018, and what put it over the finish line was a careful mixed methods descriptive writing based on the detailed analysis Gifford directed.
Reviews were mixed, but we thankfully had support from our metareviewer, "I am looking forward to hopefully a strong rebuttal so I can be your advocate at the UIST 2019 PC meeting." This encouragement was exactly what we needed in that moment.
The effort was worth it, because this system led to a few other projects, and serves as a foundational paper for Jing's Ph.D. What we are still trying to figure out today is how to deploy this as a product to regular people, as the hardware requirements again posed a barrier for adoption.
The paper was hard to publish, because unlike recruited and paid participants, our 5,000 app store installs (now 7,700) led to messy data—a lot of people never opened the app, or did so only once. Reviewers were unimpressed that thousands of installs only led to a couple hundred active participants, of which only about a fifth of the users tracked for enough nights to get useful information.
In the end, it was a close decision but the CHI program committee decided that it could be acceptable if shepherded, "I am still leaning positive given the difficulty of the method, the importance of the topic, and complexity of the project as a whole." Being on the program committee myself that year, I wondered to another faculty member why our papers always seem to only barely get in, and they responded matter-of-factly, "all accepted papers barely get in," referring to the declining average scores at CHI over the years.
There were many nervous moments leading up to each attempt. The tool failed the first semester or two that we tried it; the server would crash or some of the data would be lost or corrupted, and we would lose our chance to get data that semester.
Submission-wise, reviewers always wanted more, so the paper itself went through multiple different narratives before being accepted in a tough revise and resubmit cycle. The mood around this revise and resubmit is best portrayed by a reviewer, "I find this to be a mostly solidly executed project, but I don't see a substantial contribution to CSCW / creativity support tools here. I am not sure if this can be fixed in a revision cycle, but if the authors are keen, ..."
Well, we were keen.
However, the papers took a while and the one led by my group was rejected throughout 2018 and 2019. The hard part was because while we were using a fairly unique approach, I didn't have much experience writing about this topic. Reviewers would have varying opinions of what needed to change, so the text would waffle back and forth; we finally arrived at a version of the paper that was satisfying enough, and it was accepted for publication in 2020.
About 20 people were involved at different points in time, but the study didn't start until 2017 so the paper itself was about 3 years in preparation. I feel like the overall goal of computational interventions for mental health is a good one, but it feels like we've only taken a small step.
Sochiatrist: Signals of Affect in Messaging Data. Talie Massachi, Grant Fong, Varun Mathur, Sachin Pendse, Gabriela Hoefer, Jessica Fu, Chong Wang, Nikita Ramoji, Nicole Nugent, Megan Ranney, Daniel Dickstein, Michael Armey, Ellie Pavlick, Jeff Huang. CSCW 2020.
Our first attempt to publish at CHI 2020 was rejected partly due to weak findings, but Nedi had already been preparing a second study to complement what we had. While it still required major edits, CHI 2021 accepted the paper after a straightforward rebuttal. Did I finally end a long streak of struggling to publish? We considered this our easiest paper, but it still took an 8-person team nearly 30 months.
But maybe publishing faster or publishing more is not what it's about. I care more about this project as an app that people can use, so I rebuilt it in the cross-platform Flutter framework, with hopes to use it as a foundation for later work. Hopefully the papers are just a milestone towards people making their lives better through self-experimentation.
After reviewing these notes, I'm a bit ambivalent. When I was a clueless student, I got lucky with my collaborators and acceptance decisions, which made all the difference. After becoming a professor, I had the experience yet the papers take even longer to publish.
Part of why is the focus on systems papers, which are known to take 3-4 years, but shipping them as products has been an even longer 4+ year agenda. Worth it, sure, but we'll never be like most groups that publish 5+ papers a year.
But I also think about how my group has encountered a lot of rejection, and keeping up morale was sometimes difficult. Bad news injects doubt and discouragement into students' minds, who then have to rally the team to continue the work in hope that acceptance is just around the corner. This dissonance is hard to manage.
The other thing I noticed is that papers that get the most citations later often got poor reviews or multiple rejections. They're usually about a new phenomenon, but the novelty can always be recast as "old thing, but just on the web" or "mostly engineering work, with so-so results". With this in mind, I should probably be generous in my own interpretation of what's novel when I review papers.
In retrospect, I am grateful for many key collaborators for their extra help in those times, and that some conferences like UIST accepted papers despite some obvious flaw because those papers ended up defining long-term research programs for multiple young researchers.
Thanks to Alexandra Papoutsaki, Bo Lu, Gifford Cheung, Tongyu Zhou, and Zainab Iftikhar for their comments on earlier drafts.