Quote:
Originally Posted by RustyBrooks
Hm. I figure every endpoint is going to need a database connection, so you open one the first time the lambda needs one, and keep it open. Whatever request the lambda serves uses that connection. Connections are essentially anonymous, the "next" request doesn't need to use the same connection.
I guess it could be a problem if you have a *lot* of open lambdas? But most modern databases are OK with you holding open thousands of connections.
Yes, essentially that is the issue. You have Lambda 1-100 come in on Server 1. For simplicity say you open 5 connections to the DB. Any other requests that come into Server 1 will use those already opened connections, ie not spend any time on opening a new db connection. The problem is, eventually Server 1 is no longer partitioned for your lambdas. Again for simplicity, say Lambda 101-200 come in on Server 2, which now opens 5 more connections. Server 1 may never be used again by you (It still may, it is entirely unpredictable and there is no "known" way to select a certain server, you get them assigned seemingly random). This now means you have 5 connections open on Server 1 twisting in the wind, because they get no explicit call to shut down and go on open until a time out is reached.*
So yes, imagine that on a scale of a lot of requests. And again, you don't really know when or why the next lambda may decide to cold start on a new server. They try to keep you on a hot server, but again it is random because you are essentially borrowing empty server time.
Your point about a lot of requests is true yes, that is when it becomes a huge issue. But there are a few problems there.
1. If you have lower request volume, you have to pay for more connections than you are using still. RDS lowest plan uses 100 connections. I have hit this limit on smaller scale, so I would have to pay for more, even though the scale of my actual site using the database doesn't justify it.
2. It is harder to manage a system with random open connections in the wild, so assuming you have unlimited connections, you will probably at some point want to debug just what the heck each connection is doing.**
3. The entire concept of serverless is how easy it is to handle scaling, especially bursts of traffic. This concept is actually the opposite. If i pay for 1000 connections, a burst of traffic could cause a lot of them to go unused and I may still get connection errors. And I would wager that is a tougher issue to debug in relation to scaling problems.
I would also point out in a microservice architecture, we have a ton of services living on 5-20 connections, so problem 1 would be exacerbated pretty quick. In a large monolithic service with 1000s of connections, it doesn't really feel like serverless is going to be a congruent use case anyways. So to me, it doesn't really feel like there is a correct API use case for serverless systems that use a DB connection pool.***
* You can set the timeout to a very short limit, we used 5 seconds in many cases, but that can cause errors or false warnings so I am not really sure what is best there
** This is a weaker argument but I think it is a fair one. Visibility is a big deal
*** The way connection pooling works with server hopping in general leads to a necessity for "connectionless" database systems, which obviously amazon supplies with dynamodb and elasticsearch cloud. These obviously manage connections but you talk to them from anywhere through http
Sorry for my lack of clarity, I'm a nerdy introverted programmer
Last edited by PJo336; 08-13-2018 at 11:53 PM.
Reason: ETA: I did the lambda jazz in 2017, maybe they have made things better or google cloud works better, idk