Support for online docs
ndayishimiyeeric opened this issue · comments
Error while working with a document hosted by an online provider uploadthing
To reproduce
- Set up an file hosting file
- after upload process file using unstructured-js-client sdk
Error
⨯ Error: ENOENT: no such file or directory, open 'https://utfs.io/f/"key".pdf'
at Object.openSync (node:fs:581:18)
at Object.readFileSync (node:fs:457:35)
at processFile (./src/data/files.ts:35:52)
at async handler (./src/actions/file/upload/index.ts:42:51)
code
const fsData = fs.readFileSync(url);
const fsData = fs.readFileSync(url);
usClient.general
.partition({
files: {
content: fsData,
fileName: url,
},
})
.then((res: PartitionResponse) => {
if (res.statusCode === 200) {
console.log("res", res);
return res;
}
})
.catch((err) => {
console.log("err", err);
});
Other options tried
- langchain blob loader then providing the loaded content in the file
ts error
Type 'string' is not assignable to type 'Uint8Array'.
Is there a way to read hosted file?
readFileSync return buffer data I guess, convert your url to buffer data instead of using readFileSync.
Go visit here
https://stackoverflow.com/a/55665383/5748537
If you need to get a file from the web you need to use http/https api, specifically request or similar to read the contents of the file/url you want.
Thanks @hiepxanh
I've found a stable solution using the writeFile
and unlink
from fs/promises
code snippet
const data = await axios.get(url, {
responseType: "arraybuffer",
});
const randomName = Math.random().toString(36).substring(7);
await writeFile(`/tmp/${randomName}.pdf`, data, "binary");
const loader = new UnstructuredLoader(`/tmp/${randomName}.pdf`, {
// loader data using langchain UnstructuredLoader
});
const documents = await loader.load();
await unlink(`/tmp/${randomName}.pdf`);
Great, I think this is a good solution <3