SarthakMakhija / goselect

SQL like 'select' interface for files

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

goselect

Platform Build Status
Ubuntu latest GoSelectCI-ubuntu-latest
macOS 11 GoSelectCI-macos-11
macOS 12 GoSelectCI-macos-12
Windows latest GoSelectCI-windows-latest

codecov

goselect provides SQL like 'select' interface for files. This means one can execute a select query like:

select name, path, size from . where or(like(name, result.*), eq(isdir, true)) order by 3 desc

to get the file name, file path and size of all the files that are either directories or their names begin with result. The result will be ordered by size in descending order.

Why goselect

The project goselect was created to understand the following:

🔹 Parsing. The parsing pipeline typically involves a lexer, a parser and an AST.

🔹 Recursive descent parser

🔹 Representation of functions in the code. Functions like lower, upper and trim take a single parameter, functions like now and currentDate take zero parameters whereas functions like concat take a variable number of parameters.

🔹 Execution of scalar functions like lower, upper and substr. These functions are stateless and run for each row. They may take a parameter and return a value for each row. These functions can also involve nesting. For example, lower(substr(ext, 1)).

🔹 Execution of aggregation functions like average, min, max and countDistinct. These functions run over a collection of rows and return a final value. These functions can also involve nesting. For example, average(count()).

🔹 Execution of nested scalar and aggregation functions like countDistinct(lower(name)). Here, the function lower(name) runs for each row whereas countDistinct runs over a collection of rows.

Overview

Example queries

- select * from .
- select * from . limit 10
- select name, size from . where gt(size, 1024)
- select name, size from . where gt(size, 1024) order by 2 desc
- select name, size from . where gt(size, 1024) order by 2 
- select name, size from . where gt(size, 1024) order by 2 limit 10
- select name, size from . where or(like(name, result.*), eq(isdir, true)) order by 2 limit 10
- select name, size from ~/Documents where or(like(name, result.*), eq(isdir, true)) order by 2 limit 10

Feature overview

goselect provides various features including:

  1. Support for attribute aliases. For example, filename is same as fname
  2. Support for function aliases. For example, lower is same as low
  3. Support for various string scalar functions like lower, upper, concat, substr etc
  4. Support for various comparison scalar functions eq, le, lt, ge, gt etc
  5. Support for various date based scalar functions now, extract, parsedatetime, daysdifference etc
  6. Support for various composite scalar functions or, and, not etc
  7. Support for various mime type scalar functions isText, isPdf, isImage etc
  8. Support for various aggregate functions like count, countdistinct, average etc
  9. Support for exporting the results in table, json and html format
  10. Support for performing select in nested directories
  11. Support for skipping directories like .git & .github
  12. Support for executing queries with aliases. For example, goselect ex -q='select name from .' --createAlias=ls -n=false will save the query along with its alias in a text file in the current directory. In order to execute the query using an alias, run goselect ex --useAlias=ls -n=false
  13. Support for predefined query aliases

Differences between SQL select and goselect

Features that are different from SQL:

  1. goselect does not support 'group by'. All the aggregating functions return results that repeat for each row
  2. goselect does not support expressions like 1+2 or 1*2. Instead, goselect gives functions like 'add' and 'mul' etc to write such expressions
  3. goselect does not support expressions like name=sample.log in 'where' clause. Instead, various functions are given to represent such expressions. These functions include: eq, ne, lt, ge etc
  4. goselect has a weak grammar. For example, a query like:
select 1+2, name from /home/projects

will ignore 1+2 and return file names

  1. goselect's 'order by' clause supports only attribute positions. For example, a query like:
select name, size from /home/projects order by 1

will order the results by the first attribute name

Supported platforms

  • goselect has been tested on macOS Big Sur 11.4, macOS Monterey 12.6, Ubuntu 20.0.3 and Windows 10

Installation

MacOS

  1. Download the current release

    • wget -o - https://github.com/SarthakMakhija/goselect/releases/download/v0.0.7/goselect_0.0.7_Darwin_all.tar.gz
  2. Unzip the release in a directory

    • mkdir goselect && tar xvf goselect_0.0.7_Darwin_all.tar.gz -C goselect

Linux AMD64

  1. Download the current release

    • wget -o - https://github.com/SarthakMakhija/goselect/releases/download/v0.0.7/goselect_0.0.7_Linux_x86_64.tar.gz
  2. Unzip the release in a directory

    • mkdir goselect && tar xvf goselect_0.0.7_Linux_x86_64.tar.gz -C goselect

Windows AMD64

  1. Create a directory

    • mkdir goselect
  2. Switch to the newly created directory

    • cd goselect
  3. Download the current release

    • curl https://github.com/SarthakMakhija/goselect/releases/download/v0.0.7/goselect_0.0.7_windows_x86_64.tar.gz -o goselect.tar.gz
  4. Unzip the release in the current directory

    • tar -xvzf .\goselect.tar.gz

Changelog

Version 0.0.7

  1. Support for Windows operating system
  2. Introduction of saving query with aliases. For example, goselect ex -q='select name from .' --createAlias=ls -n=false will save the query along with its alias in a text file in the current directory. In order to execute the query using an alias, run goselect ex --useAlias=ls -n=false
  3. Introduction of listQueryAliases command. For example, goselect listQueryAliases will return a list of all the saved query aliases
  4. Support for predefined query aliases. Run goselect listQueryAliases to get a list of all the query aliases

Version 0.0.6

  1. Support for single and double quotes. For example, goselect ex -q='select name, extension from . where eq(name, "hello world.txt")'
  2. Support for groups in regular expressions. For example, goselect ex -q='select name, extension from . where like(name, "^([0-9]{4})-[0-9]{2}-[0-9]{2}-.*\.txt$")'
  3. Addition of a new function isArchive

Version 0.0.5

  1. Addition of a new function parseSize makes the comparison based on file size easier. Use, select * from . where gt(size, parseSize(15 Mb))
  2. Addition of 't' as a shorthand for the flag 'term' in describe command
  3. Addition of new flags 'minWidth' and 'maxWidth' for controlling the width of the attributes while exporting as table. Usage: goselect ex -q='select name, ext, size, fmtsize(size), isdir from .' --minWidth=10 --maxWidth=50, will put a minimum of 10 characters and maximum of 50 characters in each row.
  4. Table export does not capitalize the headers
  5. Removal of 'sorted' flag from listAttributes, listFunctions, listWhereClauseFunctions and listTimeFormats commands

Version 0.0.4

  1. Inference of input values as numeric, floating point, string and boolean
  2. Addition of functions: isImage, isText, isPdf, isAudio, isVideo to where clause
  3. Improvement in the performance of the function like
  4. Introduction of new functions: startsWith, endsWith and ifBlank
  5. Introduction of version command

Version 0.0.3

  1. Adoption of consistent file size units. File size is now reported in IEC units: (B, KiB, MiB, GiB, TiB, PiB, EiB)
  2. No extension is reported for hidden files
  3. * projection now returns name, extension, size and absolute path
  4. fmtsize is provided as a function that can be used to format the file size
  5. hsize (or human-readable size) is dropped from attribute list in favor of the function fmtsize. Use fmtsize(size) to get the formatted size

Version 0.0.2

  1. Support for getting the mime type from a file
  2. Support for various functions like isPdf, isVideo, isText, isAudio and isImage that work on the mime type

Example Queries for mime type

1. select name, size, mime from ~/Documents/ where eq(isImage(mime), true)  order by 2 desc limit 10
2. select name, size, mime from ~/Documents/ where eq(isPdf(mime), true)  order by 2 desc limit 10

Commands

Command Description Available since Example usage
listAttributes List all the attributes supported by goselect v0.0.1 goselect listAttributes
listFunctions List all the functions supported by goselect v0.0.1 goselect listFunctions
listWhereClauseFunctions List all the functions supported by goselect in 'where' clause v0.0.1 goselect listWhereClauseFunctions
listTimeFormats List the date/time formats supported by goselect v0.0.1 goselect listTimeFormats
describe Describe an attribute or a function v0.0.1 goselect describe --term=lower
execute Execute a select query v0.0.1 goselect execute -q='select * from .'
version Return the current version of goselect v0.0.4 goselect version
listQueryAliases List all the saved query aliases v0.0.7 goselect listQueryAliases

Queries in detail

Projections

  1. Use * as projection
goselect ex -q='select * from . '
  1. Select name, size, extension, absolute path using aliases
goselect ex -q='select name, size, ext, abspath from . '
  1. Select just the file name without its extension
goselect ex -q='select basename, extension from . '
  1. Determine if a file is a directory or not
goselect ex -q='select name, isdir, isfile from .'
  1. Select created time, access time and modified time of all the files
goselect ex -q='select name, ctime, atime, mtime from .'
  1. Select the mime type of all the files
goselect ex -q='select name, mime from .'
  1. Select file name in upper case and format the file size
goselect ex -q='select upper(name), size, formatsize(size) from .'
  1. Select the length of all the file names
goselect ex -q='select name, len(name) from .'
  1. Select the length of all the file extensions without including . in the length
goselect ex -q='select name, ext, len(substr(ext, 1)) from .'
  1. Select name and extension of all the files, but return NA if there is no extension
goselect ex -q='select name, ifblank(ext, NA) from .'
  1. Select trimmed file names
goselect ex -q='select trim(name), ltrim(name), rtrim(name) from .'
  1. Concat the basename, extension and the formatted file size
goselect ex -q='select concat(basename, ext, fmtsize(size)) from .'
  1. Concat the basename, extension and the formatted file size with a separator @
goselect ex -q='select concatWs(basename, ext, fmtsize(size), @) from .'
  1. Determine if a file name contains the string project
goselect ex -q='select name, contains(name, project) from .'
  1. Replace all the dots (.) in file names with a hash (#)
goselect ex -q='select name, replaceall(name, ., #) from .'
  1. Determine if a file name starts with a string go
goselect ex -q='select name, startswith(name, go) from .'
  1. Determine if a file name ends with a string go
goselect ex -q='select name, endswith(name, go) from .'
  1. Select the date from the access time of the files
goselect ex -q='select name, atime, extract(atime, date) from .'
  1. Select current month, current year, current date, day of the week and current day
goselect ex -q='select cmonth(), cyear(), cdate(), dayofweek(), cday() from . limit 1'
  1. Select file name and determine if the user can read, write and execute a file
goselect ex -q='select name, uread, uwrite, uexecute from .'

Order by and limit

  1. Order the results by size in descending order
goselect ex -q='select name, size, ext, abspath from . order by 2 desc'
  1. Order the results by size in descending order and limit the total number of results to 5
goselect ex -q='select name, size, ext, abspath from . order by 2 desc limit 5'

Aggregate functions

  1. Count all the entries in the current directory
goselect ex -q='select count() from .'
  1. Count distinct file extensions
goselect ex -q='select countDistinct(ext) from .'
  1. Select the sum of size of all the files and also return the average file size
goselect ex -q='select sum(size), avg(size) from .'
  1. Select the sum of size of all the files and also return the average file size, but format the sizes
goselect ex -q='select fmtsize(sum(size)), fmtsize(avg(size)) from .'
  1. Select the minimum and the maximum file size
goselect ex -q='select min(size), max(size) from .'
  1. Select the minimum and the maximum length of all the file names
goselect ex -q='select min(len(name)), max(len(name)) from .'

Where clause

  1. Select file name and extension of all the files containing the string go in their name
goselect ex -q='select name, extension from . where contains(name, go)'
  1. Select file name and extension of all the files if their names begin with READ, followed by numbers, followed by any number of characters
goselect ex -q='select name, extension from . where like(name, ^READ[0-9].*)'
  1. Select file name and extension of all the files if their names start with READ
goselect ex -q='select name, extension from . where startswith(name, READ)'
  1. Select file name and extension of all the files if their names end with go
goselect ex -q='select name, extension from . where endswith(name, go)'
  1. Select file name and extension of all the text files
goselect ex -q='select name, extension from . where isText(mime)'
  1. Select file name and extension of all the image files
goselect ex -q='select name, extension from . where isImage(mime)'
  1. Select file name and extension of all the audio files
goselect ex -q='select name, extension from . where isAudio(mime)'
  1. Select file name and extension of all the video files
goselect ex -q='select name, extension from . where isVideo(mime)'
  1. Select file name and extension of all the pdf files
goselect ex -q='select name, extension from . where isPdf(mime)'
  1. Select file name and extension of all the archive files (Version 0.0.6)
goselect ex -q='select name, extension from . where isArchive(mime)'
  1. Select file name, extension and size of all the files if their size is greater than 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where gt(size, mul(1, 1024, 1024))'

1 Mib = 1 * 1024 * 1024 bytes

Or

goselect ex -q='select name, extension, size, fmtsize(size) from . where gt(size, 1048576)'

Or (Version 0.0.5)

goselect ex -q='select name, extension, size, fmtsize(size) from . where gt(size, parseSize(1 Mib))'
  1. Select file name, extension and size of all the files if their size is greater than or equal to 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where gte(size, mul(1, 1024, 1024))'

Or

goselect ex -q='select name, extension, size, fmtsize(size) from . where gte(size, 1048576)'

Or (Version 0.0.5)

goselect ex -q='select name, extension, size, fmtsize(size) from . where gte(size, parseSize(1 Mib))'
  1. Select file name, extension and size of all the files if their size is less than 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where lt(size, mul(1, 1024, 1024))'

Or

goselect ex -q='select name, extension, size, fmtsize(size) from . where lt(size, 1048576)'

Or (Version 0.0.5)

goselect ex -q='select name, extension, size, fmtsize(size) from . where lt(size, parseSize(1 Mib))'
  1. Select file name, extension and size of all the files if their size is less than or equal to 1 Mib
goselect ex -q='select name, extension, size, fmtsize(size) from . where lte(size, mul(1, 1024, 1024))'

Or

goselect ex -q='select name, extension, size, fmtsize(size) from . where lte(size, 1048576)'

Or (Version 0.0.5)

goselect ex -q='select name, extension, size, fmtsize(size) from . where lte(size, parseSize(1 Mib))'
  1. Select file name, extension and size of all the files if their size equal to 600 bytes
goselect ex -q='select name, extension, size, fmtsize(size) from . where eq(size, 600)'

Or (Version 0.0.5)

goselect ex -q='select name, extension, size, fmtsize(size) from . where eq(size, parseSize(600 B))'
  1. Select file name, extension and size of all the files if their extension is equal to .go
goselect ex -q='select name, extension, size, fmtsize(size) from . where eq(ext, .go)'
  1. Select file name, extension and size of all the files if their extension is not equal to .go
goselect ex -q='select name, extension, size, fmtsize(size) from . where ne(ext, .go)'
  1. Select file name and extension of all the text files if their names start with lowercase test
goselect ex -q='select name, extension from . where and(isText(mime), startsWith(lower(name), test))'
  1. Select file name of all the hidden files and directories
goselect ex -q='select name from . where or(eq(isDir, true), eq(isHidden, true))'

Or

goselect ex -q='select name from . where or(isDir, isHidden)'
  1. Select count of all the hidden files
goselect ex -q='select count(), name from . where eq(isHidden, true)'
  1. Select count of all the pdf files
goselect ex -q='select count() from . where eq(isPdf(mime), true)'
  1. Select sum of size of all the hidden files
goselect ex -q='select fmtsize(sum(size)), size from . where eq(isHidden, true)'
  1. Select all the files that are not hidden
goselect ex -q='select name, extension from . where not(isHidden, true)'

Or

goselect ex -q='select name, extension from . where eq(isHidden, false)'
  1. Select basename and extension of all the files with .log as the extension and their basename should have a date format: Year-Month-Date (for example, 2022-09-28)
goselect ex -q='select basename, ext from . where and(eq(ext, .log), like(basename, [0-9]{4}-[0-9]{2}-[0-9]{2}))'
  1. Select file name and extension of all the files that were accessed 2 or more days earlier
goselect ex -q='select name, ext, atime from . where gte(daysdiff(atime), 2)'

Default value of second parameter in daysdiff is now()
  1. Select file name and extension of all the files that were modified on or after 2022-09-22
goselect ex -q='select name, ext, mtime from . where gt(mtime, parseDateTime(2022-09-22, dt))'

Here, parseDateTime is given a date without timezone, that means while comparing mtime and the input value, timezone may play role. 
'2022-09-22' will have UTC as the timezone that might not be same the timezone of mtime.   

Using quotes in queries (Version 0.0.6)

  1. Select size and the formatted size of the file that has hello world.txt as the name
goselect ex -q='select size, formatSize(size) from . where eq(name, "hello world.txt")'
  1. Select size and the formatted size of the file that has hello world.txt as the name (query begins with double quotes)
goselect ex -q="select size, formatSize(size) from . where eq(name, 'hello world.txt')"
  1. Select size and the formatted size of the file that has "hello world.txt" as the name (file name contains double quotes)
goselect ex -q='select size, formatSize(size) from . where eq(name, \"hello world.txt\")'
  1. Select size and the formatted size of the file that has "hello world.txt" as the name (query begins with double quotes and the file name contains double quotes)
goselect ex -q="select size, formatSize(size) from . where eq(name, \\\"hello world.txt\\\")"

FAQs

  1. How do I get a list of all the supported attributes?

Use goselect listAttributes or goselect attrs to get all the supported attributes.

Usage

goselect listAttributes 
goselect attrs 
  1. How do I get a list of all the supported functions?

Use goselect listFunctions or goselect fns to get all the supported functions.

Usage

goselect listFunctions 
goselect fns 
  1. How do I get the description or the purpose of a function or an attribute?

Let's say that we want to get the description of fname. In order to do that, use goselect describe --term=fname. term flag can be given an attribute name or a function name.

Usage

goselect describe --term=fname 
goselect desc --term=path 
goselect desc --term=lower 
  1. Are there any functions to parse an input in date/time type?

Yes, goselect supports a function parsedatetime to parse an input string in time type. It takes 2 parameters, the first parameter is a string to be parsed and the second is the format identifier.

  1. How do I get the supported date/time formats and their identifiers?

Use goselect listTimeFormats or goselect fmts to get all the supported date/time formats along with their identifiers.

Usage

goselect listTimeFormats 
goselect fmts 
  1. Are the select queries case-sensitive?

No, the select queries are case-insensitive.

  1. Are all the functions supported in where clause?

No, goselect supports a set of functions that can be used in where clause. Use goselect listWhereClauseFunctions to get a list of all the functions that can be used in where clause.

Usage

goselect listWhereClauseFunctions 
goselect wherefns 
  1. Can I get some example queries that involve or/and functions?

Usage

1. goselect execute -q='select * from . where or(like(name, result.*), eq(ishidden, true))' 
2. goselect execute -q='select * from . where and(like(name, result.*), eq(isdir, false))' 
  • The first query returns all the files where name is like result.* and the file is a hidden file.
  • The second query returns all the files where name is like result.* and the file is not a directory.
  1. Can I get some example queries that involve date/time functions?

Usage

1. goselect execute -q='select atime, extract(atime, date) from .' 
2. goselect execute -q='select atime, extract(atime, date) from . where gt(daysdifference(atime), 2)'  
  • The first query extracts the date from the access time of a file
  • The second query returns all the files where difference (in days) between today and the access time of a file is greater than 2
  1. Can I get some example queries that aggregate functions?

Usage

1. goselect execute -q='select count(), min(len(name)) from .' 
2. goselect execute -q='select countdistinct(ext) from .'  
  • The first query returns the count of all the entries in the source directory along with the minimum file length
  • The second query returns all the distinct file extensions in the source directory
  1. What is a query alias (Version 0.0.7)?

Query alias allows saving and executing your favorite queries with an alternate name. For example, select name, extension, size from . where and(isPdf(mime), gt(size, parseSize(1 Mib))) query returns all the pdfs with size greater than 1Mib. Using query alias, you can save the query with an alternate name and execute it later using the saved name.

Usage

goselect ex -q='select name, extension, size from . where and(isPdf(mime), gt(size, parseSize(1 Mib)))' --createAlias=allPdfsGt1Mib
goselect ex --useAlias=allPdfsGt1Mib 
  1. Are the execution options like -n=false also saved with query alias?

No, query alias only saves the query and its alias

  1. Where is the query alias saved?

Query alias is saved in a human-readable file in the current directory.

  1. How do I get a list of all the saved query aliases?

Use listQueryAliases command with goselect

Usage

goselect listQueryAliases
  1. Are there any predefined query aliases?

Yes, goselect provides a set of predefined query aliases. Some examples include:

Alias Query
lsCurrent select * from .
lsCurrentOrderedBySizeDesc select name, ext, size, abspath from . order by 3 desc
lsCurrentFormattedSize select name, ext, size, fmtSize(size), abspath from . order by 3 desc
fileWithMaxSizeInCurrent select name, size, fmtSize(size) from . order by 2 desc limit 1

All the supported features

  • Support for where clause
    - [X] select * from /home/apps where eq(add(2,3), 5)
    - [X] select * from /home/apps where eq(lower(ext), .log)
    - [X] select * from /home/apps where ne(lower(ext), .log)
    - [X] `where` clause supports functions for comparison like `eq`, `le`, `lt`, `ge`, `gt`, `ne`, `contains`, `or`, `and`, `not` etc
  • Support for projections
    • projections with attribute name: name, size
    • projections with alias in attribute name: fName instead of name
    • projections with scalar functions: contains, lower
    • projections with alias in scalar functions: low instead of lower
    • projections with aggregate functions: min, max
    • projections with equivalent of expressions like add(1, 2)
  • Support for order by clause
    • order by with positions: order by 1
    • order by in descending order: order by 1 desc
    • order by in optional ascending order: order by 1 asc
  • Support for limit clause
    • limit clause with a value: limit 10
  • Support for various functions Functions
  • Support for formatting the results
    • Json formatter
    • Html formatter
    • Table formatter
  • Support for saving and executing queries using query alias
  • Support for exporting the formatted result
    • Console
    • File

Unsupported ideas

goselect does not support the following:

  1. Use of parentheses inside a function call. One can not use parentheses inside a function call. This means the following queries are treated invalid:
select * from . where eq(add(2, 3), (4))

select * from . where lt(size, (add(2, 3)))

If a file name contains parentheses, then one could use single or double quotes in the query to match such a file name.

select * from . where eq(name, "hello (1).txt")
  1. Use of expressions in order by. goselect supports attribute positions in order by. This means the following queries are treated invalid:
select name, extension from . order by name

select name, extension from . order by lower(name)

Screenshots

  • Limit clause Limit clause

  • Where clause Where clause

  • Export as json Json

  • Export as json to a file Json

  • Query alias Query alias

Planned changes

  1. Support for checking if a (text) file contains a specific term
  2. Caching the expression results. This is useful for cases like select lower(name) from . where eq(lower(name), sample). In this example, lower(name) need not be evaluated twice for a row
  3. Support for concurrent execution and streaming the results as soon as available. Will not work for order by and aggregate functions. It is applicable for queries that involve scalar functions without order by. It might make sense to use this feature where the number of files is too many, say more than 0.1 million
  4. Support installation using brew, apt, yum
  5. Support for between scalar function

About

SQL like 'select' interface for files

License:GNU General Public License v2.0


Languages

Language:Go 100.0%