Software Agility or Facebook, Did You Break My App Again?

Last Thursday night, Bacon heard from a top3Clicks user that our app was broken and provided some specifics repro cases. We checked them out, and sure enough the user was correct. Our app on Facebook was actually unusable, a true high priority situation.

Thanks to our easy roll-forward/roll-back deployment mechanisms, I was able to quickly run through the last 4 builds, diff all the JS (in the actual cloud and in SVN) and determine that *we* didn’t break anything.

So I started searching the FB dev forums.

A few minutes later, I found it: an FB deployment on Thursday contained a critical bug. The particular bug was disastrous for our app and many others as it broke any use of Javascript EventListeners. Other developers posted repro cases, eventually getting down to this simple case:

Just wanted to add that simply calling purgeEventListeners on an element is enough to notice that it doesn't work

A case this simple provides ample evidence that Facebook is doing a poor job of unit testing their changes. There are a number of ways to unit test Javascript in general, including JsUnit.

In this day and age, there is no excuse for not requiring developers to create/maintain an adequate level of unit tests. In fact, the absence of unit tests is a reason *to not ship*. The challenge of quick iterative cycles should not include a culture where having your end-users function as your QA is standard operating procedure. There is a wealth of information available on how to be nimble while producing high-quality output.

If anyone has links to FB presentations about Developer quality processes/metrics, please send them my way.

Update:
The fix is now broken again. Guess they rolled back the change. Sheesh