nestjs / typeorm

TypeORM module for Nest framework (node.js) 🍇

Home Page:https://nestjs.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

AlreadyHasActiveConnectionError in serverless (AWS Lambda) environment

IRCraziestTaxi opened this issue · comments

I'm submitting a...


[ ] Regression 
[X] Bug report
[ ] Feature request
[ ] Documentation issue or request
[ ] Support request => Please do not submit support request here, instead post your question on Stack Overflow.

Current behavior

Even with keepConnectionAlive: true, AlreadyHasActiveConnectionError occasionally occurs in AWS Lambda.

Expected behavior

There needs to be a way to check for/recycle an existing TypeORM connection when bootstrapping the app; currently, TypeOrmModule.forRoot[Async] will always create a new connection; there is no way to override that behavior, so we cannot check for and use an existing connection unless abandoning the use of TypeOrmModule, which removes a lot of useful features like repository injection.

Minimal reproduction of the problem with instructions

I created this minimal repro, which is a very simplified version of the serverless API we are deploying; the bootstrapping code and deployment methods are the same, but this repro only contains one entity and one repository which is only injected into one service, as opposed to our production app, which contains several repositories injected into several services.

As a result, it takes many more concurrent and repetitive requests to trigger the AlreadyHasActiveConnectionError in the API deployed by the above repro than it does in our production app (is each repository creating a connection or something?), but I was eventually able to trigger it. It occurs on a fairly regular basis in our production app despite having relatively far fewer and infrequent requests (as opposed to brute force hammering it in a script). In fact, this app isn't even in "production" yet - it is still only in our test environment with not many users.

What is the motivation / use case for changing the behavior?

We should be able to mitigate AlreadyHasActiveConnectionErrors when running in a serverless/Lambda environment; if keepConnectionAlive (which is barely documented at all and was not even explained in #61) cannot do that, then my first thought is we need a way to intercept/override TypeOrmModule.forRoot[Async]'s connection creation behavior by creating/returning one ourselves in a useFactory or useClass method. That would allow us to ensure TypeOrmModule does not create a new connection since we would be responsible for it, which seems to be necessary in cases such as this.

For what it's worth, I am aware we can do something similar to this - I have, in fact, done something similar in the past with an API that was not built using a framework as robust as Nest - but doing so means not using TypeOrmModule.forRoot[Async], which means we cannot use TypeOrmModule.forFeature and thus lose injection of repositories, which seems to me to be most of the entire point of using @nestjs/typeorm.

Environment


Nest version: 7.6.0

 
For Tooling issues:
- Node version: XX  
- Platform:  

Others:

For what it's worth, I am aware we can do something similar to this

This is exactly what we're doing internally when the keepConnectionAlive = true.

if (options.keepConnectionAlive) {
const connectionName = getConnectionName(options as ConnectionOptions);
const manager = getConnectionManager();
if (manager.has(connectionName)) {
const connection = manager.get(connectionName);
if (connection.isConnected) {
return connection;
}
}
}

AlreadyHasActiveConnectionError occasionally occurs in AWS Lambda.

The keepConnectionAlive was never really designed to cover so many scenarios, instead, we introduced it to support HMR (for development purposes), and that's why you may potentially encounter such issues. My guess is that there's a small probability that with many concurrent requests you may run into a race condition in which the connection object is created, but the actual connection to the database is not yet established (when manager.has(connectionName)) returns true but connection.isConnected is false). To address this issue, we could add another flag to control this condition here:

if (connection.isConnected) {
(basically disable checking if the connection "isConnected" assuming that it will connect eventually).

If you can fork this package and check if removing the if (connection.isConnected) condition fixes the issue for your project, that would be great.

Thanks for the clarification. That's good to know.

I will see about forking this package and trying out that change.

I forked the repository, added an awaitExistingConnection option (which, when set to true, will bypass the connection.isConnected check) and testing locally (using plain old npm run start:dev) worked fine, but the results in Lambda were all failures.

This error occurred on every request except for one (which resulted in a different error):

RepositoryNotFoundError: No repository for "User" was found. Looks like this entity is not registered in current "default" connection?
at new RepositoryNotFoundError (/var/task/node_modules/typeorm/error/RepositoryNotFoundError.js:12:28)
...

For the sake of full disclosure, this error also usually precedes the RepositoryNotFoundError:

Error: connect ETIMEDOUT
at PoolConnection.Connection._handleConnectTimeout (/var/task/node_modules/mysql/lib/Connection.js:409:13)
...

Although that error seems to be resolved on a retry because afterwards the app continues bootstrapping until the RepositoryNotFoundError.

This was the one lone error I got on the second request but could not reproduce again:

Error: Connection lost: The server closed the connection.
at Protocol.end (/var/task/node_modules/mysql/lib/protocol/Protocol.js:112:13)
...

But in all cases, requests resulted in a 502 with one of the above errors, usually the RepositoryNotFoundError. So I assume skipping the check for connection.isConnected just won't work in Lambda for some reason. :/

Do you have any other ideas to try?

This may also be worth mentioning: initially, since changing package.json's "version" of @nestjs/typeorm to be:

"@nestjs/typeorm": "https://github.com/IRCraziestTaxi/typeorm/tarball/dc13663200ff52749f3526b5478dd37389b8fbcf",

and adding a postinstall script:

"postinstall": "cd ./node_modules/@nestjs/typeorm && npm install && npm run build",

for some reason, I was initially (in Lambda) getting an import error saying that node_modules/@nestjs/typeorm/index.js could not be found. I'm not sure why since I believe all sls deploy is supposed to do is bundle up the node_modules that are currently present...

I was able to move past that (onto the errors in the above comment) by modifying this portion of webpack.config.js:

    externals: [nodeExternals()],

to instead be:

    externals: [
        nodeExternals({
            // allowList necessary when using packages from a github tarball rather than npm published version.
            allowlist: ['@nestjs/typeorm'],
        }),
    ],

So... it could be possible that there is some disconnect that is not allowing TypeOrmModule to maintain the connection/repositories between forRootAsync and forFeature, but I'm not sure exactly what would be causing that or what the fix would be.

Perhaps it's not an issue, but I thought it was worth mentioning in case anybody else finds something suspect with that.

I'm guessing that now when you removed the connection.isConnected check, NestJS assumes the connection is ready to be used even if there are repositories/entities that were not yet registered. Since TypeOrmModule is designed to be fully asynchronous, it can happen, unfortunately.

The last thing I can think of is that we could allow providing the connectionFactory configuration variable in which you could fully control the "connection creation" process (basically implement any error handling/retrying/connection reuse logic you need).

Thanks, I'll look it over and see how clean the solution looks.

Update: I added a connectionFactory option and tried it out. You can check out the details in this branch of my fork if you want. Changes are in typeorm-options.interface.ts and in the forRootAsync and createConnectionFactory methods in typeorm-core.module.ts.

Providing my own connectionFactory seems to have worked; I hammered the Lambda API and got no errors in my logs. However, I am bothered by the fact that what I am doing:

            connectionFactory: async (options) => {
                const manager = getConnectionManager();
                let connection: Connection;

                if (manager.has('default')) {
                    connection = manager.get('default');
                }

                if (!connection?.isConnected) {
                    connection = await createConnection(options);
                }

                return connection;
            },

is more or less just what the keepConnectionAlive option is already doing. The only difference is this way, it will occur in the retry loop in createConnectionFactory. So I wondered if the keepConnectionAlive logic needs to be moved into the retry loop as well. However, if that were the case, I would expect to see some errors even with my connectionFactory solution along with retry messages, but I got neither in my logs; neither any errors nor any messages regarding database connection retries. So at first glance I'm not exactly sure what to make of what's going on, but it does seem to work for some reason.

Providing my own connectionFactory seems to have worked; I hammered the Lambda API and got no errors in my logs.

🎉

So I wondered if the keepConnectionAlive logic needs to be moved into the retry loop as well.

This shouldn't be a breaking change so I'm fine with this change 👍 Would you like to create a PR for this? We could also preserve the connectionFactory configuration option you added to your fork

Sure, I'll test out moving keepConnectionAlive into the retry loop then wrap up tests and everything for a PR.

@kamilmysliwiec I am ready to create a PR for the changes I made, but I cannot check off this checklist item yet:

Docs have been added / updated (for bug fixes / features)

I didn't see anything in the contributing guidelines about how to update the docs. Can you point me in the right direction on that? Is there a guide somewhere regarding that? And does the change for updating the docs need to be complete/merged before creating the PR for the changes to this repository?

Thanks @IRCraziestTaxi! Let's track this here nestjs/docs.nestjs.com#1958 and here #934

I use the latest version of nestjs/typeorm, but the error Connection lost: The server closed the connection appears. Can you help me? @kamilmysliwiec

Environment

"@nestjs/common": "^8.4.3",
"@nestjs/core": "^8.4.3",
"@nestjs/jwt": "^8.0.0",
"@nestjs/passport": "^8.2.1",
"@nestjs/platform-express": "^8.4.3",
"@nestjs/typeorm": "^9.0.0",

Please, use our Discord channel (support) for such questions. We are using GitHub to track bugs, feature requests, and potential improvements.