Quote:
Originally Posted by Grue
If/when we fix it I'll write up something but yeah. Still have some ideas to try tomorrow but I'm starting to think its not a problem in our application at all but a server thing or something to do with Docker. Right now we restart all 6 of our nodes every 2 hours rolling
The biggest surprise was when I realized our "server.js" express file was requiring express, http-proxy, and body-parser without them being dependencies in package.json! Which I wasn't even aware could happen. I was really hoping pinning them to a version of our app that worked (i.e. from checking out old version/looking at yarn.lock) but no luck.
~5-8 other engineers working on it for 2 days straight too. I thought the web was easy in 2018 -_-
How many requests/hour are causing this to happen, and have you narrowed it down to a particular request? By "restart our node" I'm guessing you mean your VM? I assume you've tried Dtrace?
As far as the dependencies the only thing I can think of is some other dependency pulled those in and your require statement is finding them there (which only started happened when npm flattened the dependencies). Although I can't imagine what would require express. Maybe a testing framework? I think super-agent might. Are you running npm install --production and do you have your devDependencies split out?
In my first real programming job we had the weirdest bug where every about once every couple weeks, one of the 4 servers would stop returning content for one of the reports. No error anywhere, just empty report. Once it got like that you had to reboot weblogic. Had to be some kind of rare deadlock or race condition. So it was basically impossible to replicate and anything you tried had a two week turnaround before you got feedback. We upgraded Weblogic, Java, Oracle. No help.
It bugged the **** out of me that we never figured it out. It was literally down to rewrite the entire backend as the only possible solution. After I left I talked to one of the devs a few years later. They said yeah - it's still doing it. Arrghghhhh. Hurts me just to think about now.
Maybe now with modern load-testing tools we could find a way to replicate it.
Last edited by suzzer99; 11-20-2018 at 07:32 PM.