obi1kenobi / trustfall

A query engine for any combination of data sources. Query your files and APIs as if they were databases!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Reusable adapter for (local) file search

JulianWgs opened this issue · comments

Hi,

to learn trustfall I've implemented a small adapter for search through local files. I'm very pleasent with the result as these kind of searches could only be done with one-off scripts beforehand, but now there just a trustfall query away.

This is my schema:

schema {
    query: RootSchemaQuery
}

type RootSchemaQuery {
    Path(path: String!): Path!
}

interface Path {
    path: String!

}

type Folder implements Path {
    path: String!
    children: [Path!]
}

type LocalGitRepository implements Folder & Path {
    path: String!
    children: [Path!]
    branches: [Branch!]
    remotes: [Remote]
}

type Branch {
   name: String!
}

type Remote {
   name: String!
   url: String
}

interface File implements Path {
    path: String!
    size: Int!
    extension: String!
}

type Textfile implements File & Path {
    path: String!
    size: Int!
    extension: String!
    text: String!
}

type Imagefile implements File & Path {
    path: String!
    size: Int!
    extension: String!
    height: Int!
    model: String
}

type PdfFile implements File & Path {
    path: String!
    size: Int!
    extension: String!
    text: String
    n_pages: Int
    pages: [PdfPage]
}

type PdfPage {
   page_number: Int!
   page_name: String!
   text: String
}

Here are some simple example queries using this schema (without arguments).

List all images which created by stable diffusion:

query {
    Path(path: "/home/julian/Downloads/") {
        ... on Folder {
            children {
                ... on Imagefile {
                   path @output
                   height @output
                   model @output @filter(op: "regex", value:["$model"])
                }
            }
        }
    }
}

List all local only (without remote) repositories, which have a branch named master:

query {
    Path(path: "/home/julian/Documents/Coding") {
        ... on Folder {
            children @recurse(depth: 2) {
                ... on LocalGitRepository {
                   path @output
                   branches {
                       name @output @filter(op: "=", value: ["$branch_name"])
                   }
                   remotes @fold @transform(op: "count") @filter(op: "=", value: ["$zero"]) 
                }
            }
        }
    }
}

Search through all PDF Files in folder recursively and list page number

query {
    Path(path: "/home/julian/Downloads/") {
        ... on Folder {
            children @recurse(depth: 1) {
                ... on PdfFile {
                   path @output
                   pages {
                       text @filter(op: "has_substring", value: ["$search_string"])
                       page_number @output
                       page_name  @output
                   }
                }
            }
        }
    }
}

With this schema I've tried to create a recognizable and reusable type system. I've seen some of your queries (couldnt find them anymore) searching for files, but they contained more function names rather than technical terms (a Path can be either be a file or directory; a directory can be a git repo). How do you like it? Do you think this is a good starting point?

One future goal of this schema is to have an abstraction around versions of a file. When searching through a git repo multiple version of that file exist. The same is true for cloud storage solutions like Dropbox or filesystem like BTRFS. It would be really cool to search through all versions of a file with one query.

Where should these community adapters go? Do you want me to create a pull request for this repo or to create them in another repo? Can I use the name trustfall in the repo name?

I've currently implemented the adapter in Python, but am on the way to converting it to Rust. The end goal being to create a powerful Desktop application to search the filesystem.

Best regards & thanks for all the great work you've done.
Julian

This is great, and is a great example of exactly what Trustfall exists to make easy and pleasant! Thanks for sharing!

Logistically, it'll probably be easiest to keep community adapters in their own repos. If you send me a link, I'll start a "Community Adapters" section in the README and docs and I'd be happy to link to it there.

Thanks for asking about using the name. Feel free to include "trustfall" as part of the repo name, and consider using a README title like " adapter for Trustfall" ideally linking to the Trustfall repo. I'm hoping that cleanly balances between the clarity of the use case ("it's a Trustfall adapter! use it to query ") and the fact that the source code in that repo is its own thing, not affiliated with or endorsed by the Trustfall project itself. The "not affiliated" part is important to me because I'd love to avoid the sort of thing the author of curl has been dealing with: they get hate-mail from frustrated people who had a bad experience with some piece of software that uses curl under the hood.