dsherret / ts-morph

TypeScript Compiler API wrapper for static analysis and programmatic code changes.

Home Page:https://ts-morph.com

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

perf(Node.js): Major performance bottleneck in Node.js due to fs.statSync

maxpatiiuk opened this issue · comments

Describe the bug

Version: 20.0.0

ts-morph calls RealFileSystemHost.fileExistsSync and RealFileSystemHost.directoryExistsSync very often.

The current implementation of those functions relies on calling fs.statSync and try/catching the error if file doesn't exist.

However, due to the current nature of Node.js (nodejs/performance#40, nodejs/performance#39), creating exceptions is very expensive.

In my case, the codemod takes a total of 40s to run, of which 22s was spent in fs.statSync creating exceptions:

Screenshot 2023-12-01 at 08 24 32

Solutions:

  • Node.js has fs.existsSync() function that allows for checking for file existence without throwing exceptions
  • Node.js's fs.statSync() has a throwIfNoEntry argument (fs.statSync(file, { throwIfNoEntry: false });), which makes it return undefined rather than throw when file does not exist.

of course, the situation is a bit more complicated by the fact that node.js file system is abstracted away behind an adapter in ts-morph, and adapter has to be as small as possible. As a solution, I would suggest adding existsSync to the adapter, as an optional function (for backwards compatibility), and implementing it using node's existsSync - but in cases when custom adapter was used and fs.existsSync wasn't provided, fallback to current behavior with fs.statSync.


There is an excellent article on Node.js performance that describes the exact issue ts-morph is experiencing - https://marvinh.dev/blog/speeding-up-javascript-ecosystem-part-2/#:~:text=The%20cost%20of%20capturing%20stack%20traces


For node.js projects, until this bug is fixed, the workaround looks like this:

import { Project } from "ts-morph";
import { existsSync, statSync } from "node:fs";

const project = new Project();
const fileSystem = project.getFileSystem();
fileSystem.fileExistsSync = existsSync;
fileSystem.directoryExistsSync = (file) => {
  const stat = statSync(file, { throwIfNoEntry: false });
  return stat?.isDirectory() ?? false;
};

To Reproduce

By the nature of the performance issue, it requires a larger codebase to reproduce noticeably.

import { Project } from "ts-morph";

const project = new Project();
const sourceFile = project.addSourceFileAtPath("someFileWithImportsAndTypeScriptErrors.ts");
// This function in particular causes a lot of fs.existsSync calls:
sourceFile.getPreEmitDiagnostics();

Expected behavior

Checking for file existence should not be a major bottleneck.

Less critical, but out of 40s, 2s was spent in this function, thus if possible it should be optimized better too:

/**
* Gets the standardized absolute path.
* @param fileSystem - File system.
* @param fileOrDirPath - Path to standardize.
* @param relativeBase - Base path to be relative from.
*/
static getStandardizedAbsolutePath(fileSystem: FileSystemHost, fileOrDirPath: string, relativeBase?: string): StandardizedFilePath {
return FileUtils.standardizeSlashes(path.normalize(getAbsolutePath())) as StandardizedFilePath;
function getAbsolutePath() {
if (isAbsolutePath(fileOrDirPath))
return fileOrDirPath;
if (!fileOrDirPath.startsWith("./") && relativeBase != null)
return path.join(relativeBase, fileOrDirPath);
return path.join(fileSystem.getCurrentDirectory(), fileOrDirPath);
}
}

The fact that this function creates another function inside of it is a possible culprit

Screenshot 2023-12-01 at 09 11 04

For reference, this is how I captured and inspected the performance:

# I am running Node.js like this:
node --cpu-prof --cpu-prof-dir ~/Downloads/node --no-warnings=ExperimentalWarning --loader @swc-node/register/esm ./myScript.ts
# But if you are dealing with .js file, this would work too:
node --cpu-prof --cpu-prof-dir ~/Downloads/node ./myScript.js
# (The performance issue is independent of using a TypeScript loader)

That creates several .cpuprofile files in the ~/Downloads/node directory. Pick the largest among those, and open it in https://www.speedscope.app/. In that tool, use the "Sanditch" view, and sort by "Self" time to see functions which are good targets for further optimization.

@maxxxxxdlp thanks for benchmarking it! Feel free to submit a PR. I don't have time to spend on performance optimizations unfortunately (though I'll probably get to this eventually).

Actually, I'm going to consolidate the deno code to use the node code and in the meantime fix this perf issue.

Can confirm the performance issue with statSync/captureLargerStackTrace is resolved in the most recent release. Thanks for the quick fix!