*.tsx files are classified as XML
kachkaev opened this issue · comments
Steps to reproduce:
mkdir test-dir
cd test-dir
echo "export default function () { return <div>hello</div>; }" > ./MyComponent.jsx
echo "export default function () { return <div>hello</div>; }" > ./MyComponent.tsx
npx linguist-js@2.0.0 --analyze .
Current output
Analysed 112 B from 2 files with linguist-js
Language analysis results:
1. XML 50.00% 56 B
2. JavaScript 50.00% 56 B
Total: 112 B
Expected output
Analysed 112 B from 2 files with linguist-js
Language analysis results:
1. TypeScript 50.00% 56 B
2. JavaScript 50.00% 56 B
Total: 112 B
If we create files with
touch ./MyComponent.jsx
touch ./MyComponent.tsx
the output is:
Analysed 0 B from 2 files with linguist-js
Language analysis results:
1. XML NaN% 0 B
2. JavaScript NaN% 0 B
Total: 0 B
"tsx" is classified as both a TS and XML filename in GitHub-linguist's languages file.
Running linguist-js, the intermediary file classification is ["TypeScript", "XML"]
. This program then runs through heuristics etc, but didn't find anything specific to one of the languages. Here it just defaults to the last language in the aforementioned list, so XML.
In the first case it falls down to the heuristics, which match with the following:
As you can see the heuristics only check import keywords not export, so the check is missed, and falls back to picking XML out of the list by default.
In the second case theres no file content to match, so it just loads XML from the list by default as well. That can't be fixed at all. (The percent can be though...)
IC, the case with an empty file makes sense. What’s odd is that have quite a few real files importing React which are still classified as XML
. Here’s an MVE:
mkdir test-dir
cd test-dir
cat <<EOF > MyComponent1.tsx
import * as React from "react";
export default function () { return <div>hello</div>; }
EOF
cat <<EOF > MyComponent2.tsx
import React from "react";
export default function () { return <div>hello</div>; }
EOF
cat <<EOF > MyComponent3.tsx
const react = require("react");
export default function () { return <div>hello</div>; }
EOF
npx linguist-js@2.0.0 --analyze .
Analysed 256 B from 3 files with linguist-js
Language analysis results:
1. XML 100.00% 256 B
Total: 256 B
In that case it is a problem, I'll check to see if the heuristics are applied properly.
The heuristics don't actually check for CommonJS imports, only TS-native imports: see https://regexr.com/659pr
so even when this is fixed your Component3 example won't be matched by the heuristic.
Fixed in 2.0.2 👍
I can confirm that I start seeing "TSX"
for *.tsx
files in 2.0.2.
There are still instances of XML though and it seems to be to do with the location of React import. The heuristics only applies to the first line.
mkdir test-dir
cd test-dir
cat <<EOF > MyComponent1.tsx
import * as React from "react";
export default function () { return <div>hello</div>; }
EOF
cat <<EOF > MyComponent2.tsx
import foo from "bar";
import * as React from "react";
export default function () { return <div>hello</div>; }
EOF
npx linguist-js@2.0.2 --analyze .
Analysed 199 B from 2 files with linguist-js
Language analysis results:
1. XML 55.78% 111 B
2. TSX 44.22% 88 B
Shall we reopen this issue or is it better to create a new one?
The heuristics only applies to the first line.
That's a simple fix, I wasn't applying flag /m
to the RegEx when it contained ^
.
Fixed now, though it will now classify TSX as TypeScript unless --childLanguages is set