With the Node.js Hammer, Everything Looks Like a Nail

Sunday, September 08, 2019

American psychologist Abraham Maslow is noted for what is often called law of the instrument and his famous quote, "I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail."

This quote is frequently used today in a variety of contexts to speak to the natural tendency to rely (often too heavily) on a familiar tool to solve (often unfamiliar) problems. Of all of the technologies I've worked with, I've seen this "everything is a nail" mindset most often in the realm of Node.js.

To be clear, I am a JavaScript fan (in case you didn't know). Its ubiquity, however, naturally lends itself (and Node in particular) to be used in cases where it's clearly not the best option. The oft-cited example of "when not to use Node" is a CPU-bound task that is computationally heavy. In my experience, there are many others cases where Node isn't the best option.

Recently, I was presented with a "rush" project that had to be done quickly – and in Node: a console application that needed to call a third-party API via HTTP anywhere from hundreds to potentially tens of thousands of times (per run) and log the results to SQL Server. Sounds simple, right?

The exact use case was fairly specific: one API call served as a "handshake" which had to occur first, and the response was then used in a number of subsequent calls (potentially hundreds), which could occur in any order (and in parallel). Only once all of the responses had been received could the results then be logged to the database, as we first had to query the database for duplicate results (and thus wait for those responses as well).

The entire process above had to occur effectively as a single, atomic "chunk" due to sequencing requirements, but multiple such chunks could be done in parallel. With the API being the primary bottleneck (responses typically occurring on the order of 2-3 seconds each), Node sounded like a perfect fit – until it wasn't.

The first trick was trying to make an asynchronous operation with a bunch of synchronous sub-steps (each consisting of potentially hundreds of async sub-sub-steps) actually work correctly. I was using the request module (everything wrapped with promises and using async/await), yet the first thing that I ran into was that requests were timing out. Unlike a case where one could simply put a few promises into an array and call Promise.all(), this was much more involved. Finally, I worked around this by adjusting the socket count allocated to the request library, which seemed to help things. (As a side note, Node's great ecosystem sometimes serves as one of its challenges: should I have been using a different library, such as request-promise or one of its several forks? You often don't know if a particular library is going to work until it doesn't work...)

Next, as the process ran, additional network errors popped up. After some more digging, I ended up at another Stack Overflow post and found a comment about DNS lookups and Node's thread pool. The suggested fix was to adjust the value of process.env.UV_THREADPOOL_SIZE, which I'd never even heard of until then. That fix (along with a few others suggested) seemed to help, so I marched on.

Eventually, I got things to a point where trying to run even a few of the atomic chunks in parallel caused the API to slow down (throttling, I'm assuming) and ultimately start to reject requests. This complicated things because, as you may recall, each atomic chunk contained a variable number of sub-requests which could run async – but all had to complete prior to other synchronous tasks being run within that chunk. To make matters worse, if the number of concurrent async sub-requests was too large, the API would start rejecting requests. This was getting painful.

Finally, I got things working, only to then find that even after all the API requests and database calls had completed, trying to close the mssql connection pool would throw an exception – "Connection is closed". Failing to close the connection pool, however, resulted in the Node process never exiting. It looked like – despite every request having received a response and every response having been committed to the database – there were still unresolved promises floating around somewhere from what I could tell. Ultimately, the only solution that worked here was waiting for 5-10 seconds prior to closing the connection pool, which seemed to work consistently and without error (but felt kludgy nonetheless).

I was done – and I was glad to be done, because the entire effort felt like an uphill battle against Node.

The entire exercise kind of bothered me on principle alone – I felt like I'd used the wrong tool for the job. Now, call it cosmic guidance or call it masochism, but for whatever reason, I then opted to write the exact same application in Python, which to me felt better suited to the task at hand.

The Python version ended up being half the number of lines of code of the Node version, and it took me less than a third the time to write. Now granted, some of the time saved was the result of not having to re-learn the nuances of the third-party API, but I'm confident that had I written the Python version first, it would have taken (at most) half the time that the Node version took me.

When all was said and done, I delivered both the Node and Python versions of the application, and I explained why I'd written the Python version. Ultimately, I was told, "Oh, we only asked for it to be written in Node because that's what we're using elsewhere. Python is fine, too."

The takeaway here is two-fold:

In my experience, it's easy to get a developer to agree that "the right tool should be used for the job"; it's sometimes harder, however, to get that same developer to agree that their "favorite hammer" may not be the right tool for the job.
Even in light of #1, rely on your developers to choose the right tools for the job – don't just simply give them a hammer unless there's a compelling reason to do so.