panzi / punktum

dotenv implementation in Rust

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

punktum

Yet another dotenv implementation for Rust. Just for fun. Don't use it, it won't be maintained. You may fork it if you want to.

"Punkt" is the German word for "dot" and "Umgebung" means "environment".

Work in progress!

I'm trying to implement multiple dotenv dialects with mixed success. Also so far I don't have any dependencies and like to keep it that way, which might be a problem for certain dialects that use complex regular expressions.

This also comes with an executable that can be used as a program starter that sets the environment of a process. Also see that section for a description of configuration environment variables that are read by both the library and the executable.

Dialects

Of course no guarnatee is made that anything actually works. This is just with my limited manual test.

Dialect Status Description
Punktum Works Crazy dialect I made up. More details below.
PythonDotenv Works Compatible to the python-dotenv pypi package.
PythonDotenvCLI Works Compatible to the dotenv-cli pypi package. This is different to the above! Not sure which one is commonly used, so I'm working on implementing both.
ComposeGo Works Compatible to the compose-go/dotenv as use in docker-compose. Variable substitution is not 100% compatible yet, the punktum implementation of this dialect accepts things where compose-go/dotenv errors out.
GoDotenv Works Compatible to godotenv. This seems to be a predecessor to the above.
RubyDotenv Works Compatible to the dotenv Ruby gem. The two above each claim to be compatible to this, but clearly at least one of them is wrong. NOTE: Command $() support is deliberately not implemented. I deem running programs from a .env file to be dangerous. Use a shell script if you want to do that.
JavaScriptDotenv Works Compatible to the dotenv npm package. The NodeJS dialect is meant to be the same as this, but of course isn't.
NodeJS Works Compatible to NodeJS v22's built-in --env-file=... option. The parser changed between NodeJS versions.
JavaDotenv Works Compatible to java-dotenv. Yet again subtly different.
Dotenvy Not Implemented Probably won't implement dotenvy support, since it is already a Rust crate. And it is a good dialect with a sane parser and at a glance comprehensive looking tests. Use that!
Binary Works Another silly dialect I made up. Records are always just KEY=VALUE\0 (i.e. null terminated, since null cannot be in environment variables anyway). It ignores any encoding setting and only uses UTF-8.

Note that Works means parsing files the same way. There might still be differences in other behavior, like if "not override" means it still can override variables defined in the .env file, or if it is only about inherited variables.

I might not implement any more dialects than I have right now.

This feature matrix only gives somewhat of an overview. The exact syntax for escape sequences or variable substitution, the places where either can be used and the algorithm for resolving variable substitutions differs between dialects.

Dialect Multiline Esc Seq " ' ` $variable $(command)
Punktum +
PythonDotenv +
PythonDotenvCLI
ComposeGo +
GoDotenv
RubyDotenv ⚠️
JavaScriptDotenv \"
Sub-dialect: dotenv-expand \" +
Sub-dialect: dotenvx \" +
NodeJS
JavaDotenv
Binary

Esc Seq: \" means that quotes can be escaped so the string doesn't end, but for some reaosn the backslash remains in the value.

$variable: ✅ + means that some extra syntax like ${name:-default} is supported.

$(command): ⚠️ means that the way command substitution is implemented can lead to command injections. Also note that command substitution isn't implemented by punktum.

Punktum Dialect

Details might change!

If DOTENV_CONFIG_STRICT is set to false (default is true) all sorts of syntax errors are forgiven. Even if there are encoding errors parsing resumes in the next line. While the implementations of all dialects are somewhat respecting this setting only Punktum is resuming decoding on the next line, since it was implemented as a line based parser, reading one line at a time from the file.

Examples

# comment line
VAR1=BAR # comment after the value
VAR2=BAR# no need for a space before the #
VAR3="BAR" # this comment is handled correctly even though it ends with "
VAR4="BAR" "BAZ" # produces: "BAR BAZ"

WHITESPACE1=  spaces around the value are ignored  
WHITESPACE2=  but  between  the  words  spaces  are  preserved

MULTILINE="
  a multiline comment
  spanning several
  lines
  # not a comment
"

VARIABLE_SUBSTITUTIONS1="
  only in unquoted and double quoted values
  normal: $VAR1
  in braces: X${VAR1}X
  ${VAR1:?error message if \$VAR1 is empty or not set}
  default value: ${VAR1:-$OTHER}
  for more see below
"

VARIABLE_SUBSTITUTIONS2=${FOO:-
  variable substitutions
  can of course also be
  multiline, even without
  double quotes
}

ESCAPES="
  only in double quoted values
  no newline: \
  newline: \n
  carrige return: \r
  tab: \t
  backslash: \\
  dollar: \$
  unicode: \u00e4
  for more see below
"

RAW_STRING1='
  these escapes are taken verbatim: \n \t \\
'

# to write a value with single quotes and otherwise as a raw string:
RAW_STRING2='You cant'"'"'t fail!'

# explicitly import variables from the parent environment:
PATH
HOME
PWD
SHELL

# only then you can re-use them
SOME_PATH=$HOME/.env

# export keywords are ignored, the line is parsed as if there where no export:
export EXPORT_IGNORED=FOO BAR

Syntax Definition

DOS line endings (\r\n) are converted to Unix line endings (\n) before parsing, but single carrige returns (\r) are left as-is.

PUNKTUM       := { { WS } [ VAR_ASSIGN | VAR_IMPORT ] { WS } [ COMMENT ] ( "\n" | EOF ) }
VAR_ASSIGN    := NAME { WS } "=" { WS } [ VALUE ]
VAR_IMPORT    := NAME
NAME          := NAME_CHAR { NAME_CHAR }
NAME_CHAR     := "a"..."z" | "A"..."Z" | "0"..."9" | "_"
VALUE         := { DOUBLE_QUOTED | SINGLE_QUOTED | UNQUOTED }
DOUBLE_QUOTED := '"' { ESCAPE_SEQ | NOT('"' | "\" | "$") | VAR_SUBST } '"'
SINGLE_QUOTED := "'" { NOT("'") } "'"
UNQUOTED      := { NOT('"' | "'" | "$" | "\n" | "#") | VAR_SUBST }
VAR_SUBST     := "$" NAME | "${" NAME [ ( ":?" | "?" | ":-" | "-" | ":+" | "+" ) VALUE ] "}"
ESCAPE_SEQ    := "\" ( "\" | '"' | "'" | "$" | "r" | "n" | "t" | "f" | "b" | "\n" ) |
                 UTF16_ESC_SEQ | UTF32_ESC_SEQ
UTF16_ESC_SEQ := "\u" HEX*4
UTF32_ESC_SEQ := "\U" HEX*6
WS            := "\t" | "\x0C" | "\r" | " "
COMMENT       := "#" { NOT("\n") }

A single name without = imports the value from the parent environment. This way you can e.g. use the punktum command with the --replace option to create a whole new environemnt, but still explicitely use certain environment variables from the system environment.

A value consists of a sequence of quoted and unquoted strings.

If not quoted, spaces around a value are trimmed. A comment starts with # even if it touches a word on its left side.

Both single and double quoted strings can be multiline. Variables can be referenced in unquoted and double quoted strings. Escape sequences are only evaluated inside of double quoted strings. (Should they be also evaluated in unquoted values?)

Note that UTF-16 escape sequences need to encode valid surrogate pairs if they encode a large enough code-point. Invalid Unicode values are rejected as an error.

Variable Substitution Syntax

The variable substitution syntax is similar to the Unix shell. Variables are only read from the current environment, not the parent environemnt. You need to import them first to use them. (Should that be changed?)

Syntax Description
$VAR or ${VAR} Empty string if unset.
${VAR:?MESSAGE} Error if $VAR is empty or unset. If provided MESSAGE will be printed as the error message.
${VAR?MESSAGE} Error if $VAR is unset. If provided MESSAGE will be printed as the error message.
${VAR:-DEFAULT} Use DEFAULT if $VAR is empty or unset.
${VAR-DEFAULT} Use DEFAULT if $VAR is unset.
${VAR:+DEFAULT} Use DEFAULT if $VAR is not empty.
${VAR+DEFAULT} Use DEFAULT if $VAR is set.

The MESSAGE/DEFAULT part can be anything like in a value, only not a } outside of a quoted string. (Maybe I should add \{ and \} escapes?)

Write a Punktum compatible file

If you want to write a .env file in the Punktum dialect conatining arbitarary characters you can quote the values very easily like this:

var env = new Map();
// env is filled somehow...
for (const [key, value] of env) {
    console.log(`${key}='${value.replaceAll("'", "'\"'\"'")}'`);
}

Meaning, you put the value into single quotes and replace any ' in your value with '"'"'.

The keys need to be valid names as described above, though. This then happens to also be valid Unix shell syntax and I think also valid syntax for dotenvy. It isn't valid for many (any?) other dotenv implementations, since they only allow one single quoted string and not a sequence of quoted strings.

The fllowing also works for the Punktum dialect:

var env = new Map();
// env is filled somehow...
for (const [key, value] of env) {
    console.log(`${key}=${JSON.stringify(value).replaceAll('$', '\\u0024')}`);
}

This should also work with Python's dotenv-cli, but the other dialects don't support UTF-16 Unicode escape sequences (\u####).

Binary Dialect

The Binary dialect as an output-format can be used for things like this:

punktum --replace --file examples/vars.env --sorted --print-env --binary | while read -r -d "" line; do
    printf "%s=%q\n" "${line%%=*}" "${line#*=}"
done

I don't know why you'd want to do that, but you can!

Writing it is also simple:

let env = HashMap::new();
// env is filled somehow...
for (key, value) in &env {
  write!(writer, "{key}={value}\0")?;
}

NodeJS Dialect

Based on the dotenv parser of NodeJS v22. After complaining about some of these quirks they said they'll fix it. Meaning once this is done this dialect needs to be adapted again. Making myself more work. 🤦

Quirks

This is meant to be compatible to the JavaScript Dotenv Dialect, but isn't.

While this dialect does support quoted values if there is any space between the = and " it will not parse it as a quoted value, meaning the quotes will be included in the output. I.e. this in .env:

FOO= "BAR"

Is equivalent to this in JSON:

{ "FOO": "\"BAR\"" }

This dialect supports strings quoted in double quotes ("), single quotes (') and back ticks (`). These strings can be multi-line, but only in double quoted strings \n will be translated to newlines.

If the second quote is missing only the current line is used as the value for the variable. Parsing of more variables continues in the next line!

Comments start with #. There doesn't need to be a space before the #.

Keys may contain anything except spaces ( ), including tabs and newlines. Meaning this:

FOO#=1
BAR
=2

Is equivalent with this JSON:

{ "FOO#": "1", "BAR\n": "2" }

Lines with syntax errors (i.e. no =) are silently ignored, but they will trip up the parser so that the following correct line is also ignored.

Leading export will be ignored. Yes, the export needs to be followed by a space. If its a tab its used as part of the key.

Accepts \n, \r\n, and even singular \r as line seperator by replacing /\r\n?/ with \n. Meaning if you have a single carrige return or DOS line ending in a quoted string it will be replaced by a single newline.

JavaScript Dotenv Dialect

Based on version of main.js from the dotenv npm package.

There is a sub-dialect with the package dotenv-expand and another with the package @dotenvx/dotenvx. dotenv-expand builds on dotenv and adds variable expansion. Of course that also works in unexpected ways, see below. dotenvx builds on that and adds $(command) substitution, but not in a broken way like the Ruby dialect.

Quirks

This dialect supports strings quoted in double quotes ("), single quotes (') and back ticks (`). These strings can be multi-line, but only in double quoted strings \n and \r will be translated to newlines and carrige returns.

It doesn't process any other escape sequences, even though the regular expression used to match quoted strings implies the existence of \", \', and \` in the respective quoted stirngs. If a value does not match such a quoted string literal correctly it will be interpreted as an unquoted string and only be read to the end of the line.

However, later the decision on whether to replace \r and \n is made by simply checking if the first character of the matched string was a double quote, not if the double quote kind of regular expression had matched. Similarly the quotes around a value are stripped if the first and last character are matching quotes, again not if the reqular expression (that has the not processed escaped quote in it) had matched.

The way the used regular expression parses quoted strings works means that if the last quote in a file is escaped (\") it is taken as the ending quote of a quoted value anyway.

Instead of = this dialect also accepts :, but only if there is no space between it and the variable name.

A comment starts with # even if it touches a word on its left side.

The way the used regular expression works means that there can be a newline between the varialbe name and =. Meaning this:

FOO
=BAR

Is equivalent to this JSON:

{ "FOO": "BAR" }

This also means that this is parsed the same even though one might expect it to be the variable is set check syntax:

export FOO
=BAR

Lines with syntax errors (i.e. no =) are silently ignored, but in contrast to the NodeJS dialect it won't trip up the parser and the next line is correctly parsed (if it doesn't have have syntax error itself).

Accepts \n, \r\n, and even singular \r as line seperator by replacing /\r\n?/ with \n. Meaning if you have a single carrige return or DOS line ending in a quoted string it will be replaced by a single newline.

Leading export will be ignored. The export and the following variable name can be separated by any kind of white space.

Accepts . and - in addition to a...z, A...Z, and 0...9 as part of variable names.

Dotenv-Expand Sub-Dialect

This is not (yet?) implemented by Punktum.

This adds variable substitution on top, but because it is not integrated in the parser it works differently than one might expect. It scans all variables that where defined in an environment and recursively resolves any found variable references. It also resolves references in variables that where defined outside the .env file, though it won't replace those variables with the resolved value. It only will use that altered value in variables defined in the .env file that reference that other variable.

This leads to e.g. the following behavior.

Pre-defined environemnt:

FOO='${BAR}'

Actual .env file:

BAR='this is bar'
FOO_ON_LINE2="$FOO"
BAR='replaced bar'
FOO_ON_LINE4="$FOO"

This will result in an environment equivalent to this JSON:

{
    "FOO": "${BAR}",
    "BAR": "replaced bar",
    "FOO_ON_LINE2": "replaced bar",
    "FOO_ON_LINE4": "replaced bar"
}

And if you add two more lines like this:

BAR='this is bar'
FOO_ON_LINE2="$FOO"
BAR='replaced bar'
FOO_ON_LINE4="$FOO"
FOO='replaced foo'
FOO_ON_LINE6="$FOO"

This will result in an environment equivalent to this JSON:

{
    "FOO": "${BAR}",
    "BAR": "replaced bar",
    "FOO_ON_LINE2": "replaced foo",
    "FOO_ON_LINE4": "replaced foo",
    "FOO_ON_LINE6": "replaced foo"
}

Also this means that this will give a maximum call stack exceeded error:

A=$B
B=$A

Further it supports ${FOO:-DEFAULT} and ${FOO-DEFAULT}, but handles both exactly the same. The default value will be used if $FOO is empty or unset. It does variable substitution in the default value, but starts to fail when the default value has too many nested default values, because the regular expresion has a limited number of nested { } defined.

Like in the Ruby dialect { and } in variable substitution don't need to be balanced. ${FOO, $FOO}, ${FOO}, and $FOO all do the same. But more importantly the fallback is applied even if there are no braces! $FOO:-BAR will show BAR if $FOO is unset or empty.

Dotenvx Sub-Dialect

This sub-dialect adds $(command) substitution. While it does it in a separate phase to variable substitition, in contrast to the Ruby dialect it does it before variables are substituted and thus doesn't have a command injection vulnerability. So this part is fine.

Ruby Dotenv Dialect

Based on this version of parser.rb and substitution/variable.rb of the dotenv Ruby gem. Command substitution is deliberately not implemented.

Quirks

This dialect supports variable and command substitution. (The latter deliberately not implemented in Punktum.) Command substitution is problematic on its own, but the way it is implemented in Ruby dotenv is especially problematic since its done in two passes. First variable references like $FOO and ${BAR} are substituted. Then in the resulting string commands like $(rm -rf /) are substituted. This means if any of the variables contain command syntax in literal form it will be executed in the command pass. It can even be split over multiple variables. E.g. this .env file:

FOO='$'
BAR='(date)'
BAZ="$FOO$BAR"

Used like this:

dotenv -f test.env ruby -e 'puts ENV.select {|k| %w(FOO BAR BAZ).include? k}'

Will give output like this:

{"FOO"=>"$", "BAR"=>"(date)", "BAZ"=>"Fr 14 Jun 2024 17:49:33 CEST"}

Personally I consider this as a code execution vulnerability. It is not unthinkable that an environment variable used in a substitution contains a string controlled by a user who injects a command this way. See this bug report.

Another minor quirk is that { and } in variable substitution don't need to be balanced. ${FOO, $FOO}, ${FOO}, and $FOO all do the same.

The dotenv file is parsed with a regular expression. Anything not matching is simply silently ignored.

The regular expression handles escapes in when tokenizing the file that way, but if the quoted string part of the regular expression fails the no-quote part will still match. It is not checked how the value was matched, only if it starts and ends in the same kind of quote in order to determine how to process the value.

Quoted strings can be multiline. In double and single quoted strings any backslashes of escape sequences are simply removed. Meaning \n becomes n. However, if the environment variable DOTENV_LINEBREAK_MODE is set to legacy (either in the currently created environment or if it is unset there also the system environment) then \n and \r in double quoted strings are replaced with newlines and carrige returns.

The way the used regular expression parses quoted strings works means that if the last quote in a file is escaped (\") it is taken as the ending quote of a quoted value anyway.

Variable and command substitution is performed in double quoted and non-quoted strings.

Lines in the form of export FOO BAR BAZ are interpreted as checking if the listed keys exist in the environment. If not an error is raised.

Instead of = this dialect also accepts :, but only if there is no space between it and the variable name.

Accepts \n, \r\n, and even singular \r as line seperator by replacing /\r\n?/ with \n. Meaning if you have a single carrige return or DOS line ending in a quoted string it will be replaced by a single newline.

Accepts . in addition to a...z, A...Z, 0...9, and _ as part of variable names.

Python Dotenv-CLI Dialect

Based on this version of core.py of the dotenv-cli pypi package.

Quirks

This dialect uses Python str functions for many things, and as such grammar rules often derive from that. I.e. lines are split on \r\n, single \n, and on single \r.

It only supports single line variables, because it parses one line at a time.

Comments start with # and must be on their own line (though can be preceeded by white space)! Anything non-white space after a = is always part of the variable value.

A key can contain anything except for = and white space around it will be stripped. But you can do:

 FOO  BAR! = BAZ 

Equivalent to this in JSON:

{ "FOO  BAR!": "BAZ" }

If a key starts with export (yes, including the single space) this prefix and any remaining heading white spaces are stripped. This of course means that export =foo will lead to an empty string, which is an OS level error for environment variable names, while export=foo is perfectly fine. For any other variable name spaces between it and the = are insignifficant.

Values are also stripped of white space. If the remaining string starts or ends with the same kind of quotes (either " or ') those quotes are removed. If it's a double quoted string escape sequences are processed using this Python code:

value = bytes(value, "utf-8").decode("unicode-escape")

Meaning which escape sequences are supported is defined by Python and might change in a futher Python release!

The Python documentation says about this encoding:

Encoding suitable as the contents of a Unicode literal in ASCII-encoded Python source code, except that quotes are not escaped. Decode from Latin-1 source code. Beware that Python source code actually uses UTF-8 by default.

This leads to typical "UTF-8 interpreted as ISO-8859-1" errors for every double quoted string! Completely breaks UTF-8 support for double quoted strings. This dotenv file:

FOO=ä
BAR="ä"
BAZ='ä'

Is equivalent to this JSON:

{ "FOO": "ä", "BAR": "ä", "BAZ": "ä" }

For what escape sequences are actually supported see the Python documentation.

NOTE: The Punktum implementation of this dialect doesn't do that. It treats the string as the Unicode that it is.

NOTE: The Punktum implementation of this dialect doesn't implement named Unicode escape sequences (\N{name}).

Python Dotenv Dialect

Based on this version of parser.py of the python-dotenv pypi package.

Quirks

The way the used regular expression parses quoted strings works means that if the last quote in a file is escaped (\") it is taken as the ending quote of a quoted value anyway.

A variable name without = deletes the variable from the newly constructed environment. It doesn't affect the parent environment, though, since that is merged after the new environment is constructed.

The Punktum implementation merges in the parent environment before parsing the .env file and thus doesn't know at that point if the variable comes from the parent and just always deletes it.

# needs to be separated form an unquoted value by white-space to be read as the start of a comment.

ComposeGo Dialect

Based on these versions of parser.go and template.go of compose-go, the Go implementations of docker-compose.

Quirks

It uses IsLetter() and IsNumber() from the unicode package, meaning variable names can be any Unicode code point from the Letter (L) and Number (N) catagories, plus ., -, _, [, and ]. Meaning e.g. this would be a valid variable name: .ᾖⅧ²⅛ However, a source comment above the usage of these functions claims:

// variable name should match [A-Za-z0-9_.-]

I use Rust's char::is_alphanumeric() to implement this, which should do the same, sans both languages being up to date with the latest Unicode standard.

Variable names in variable substitution however only match [_a-z][_a-z0-9]*, but compiled with the i (ignore case) flag. Yes, really only ASCII letters and numbers (and _) this time, and this time it has to start with letters (or _).

Also accepts : instead of =, which it calls "yaml-style value declaration".

Comments start with #, but when they're not in their own line they need to be separated by a space ( ) from the preceeding value.

Similar to the Python version this uses a library function to parse escape sequences (strconv.UnquoteChar(), source), but only passes the escape sequences \a, \b, \c (this seems to be a bug, since no such escape sequence is implemented by that function), \f, \n, \r, \t, \v, \\ and octal escape sequences to that function. Further it manually also parses \$ and manually requires octal escape sequences to always be prefixed by \0.

This dialect considers the following code points as (inline) whitespace:

C Unicode Description
\t U+0009 horizontal tab
\v U+000B vertical tab
\f U+000C form feed
\r U+000D carrige return
U+0020 space
\x85 U+0085 next line
\xA0 U+00A0 no-break space

This dialect supports single quoted ('), double quoted ("), and unquoted values. In double quoted and unquoted values variable substitition is performed.

This substitution syntax supports fallback and error messages similar to bash. Need to investigate im more detail how variable substitution in the fallback/message part is performed. Given that the whole variable is parsed with a simple regular expression I think it's not possible it has nested braced variable references in that part. But simple un-braced references it seems to support. It might be because of greedy .* expressions it matches too much and doesn't even support two braced variables in one string? Need to test that.

Strips export if followed by white space from the start of parsed lines. Meaning it doesn't support variables named export. (Need to test that.)

Supports explicitely inheriting (importing) variables from the parent environment by naming them in a line without =:

# This reads the variable FOO from the parent (system) environment and adds
# it to the newly constructed environment:
FOO

Note that there cannot be a comment after the variable name. Only white space and then the new line. Because it actually checks for the new line (\n) character there is also buggy behavior if it's the last line and it is not new line terminated. In that case it will think that the variable name is the empty string and the value is the variable name.

I.e. if in the above example file is not new line terminated the environment will be this in JSON:

{ "": "FOO" }

The regular expression for octal escape sequences matches too much (0\d{0,3} instead of 0[0-7]{0,3}, although that is still too much), replaces a \0 prefix with just \, and then if the unquoting of the escape sequence fails it inserts the manipulated match.

Meaning this value: "\079"
Gives this string: "\\79"
While it should give: "\x079" (bytes: [ 0x07, 0x39 ])

I.e. this are two bugs. Using the manipulated match when unquoting fails and matching too much and thus failing valid octal escape sequences.

Also the used strconv.UnquoteChar() function wants octal escape sequences to be exactly 3 octal numbers long, meaning the regular expression should actually be 0[0-7]{3}, or the match needs to be 0-padded to 3 characters long. The way it is now any shorter octal escape sequences are an error and the (buggy) fallback mechanism is applied.

Further the regular expression also matches \c, which I can't find in the Go spec. Though because strconv.UnquoteChar() fails for that the fallback is applied.

The punktum implementation of this dialect is bug-compatible to this octal escape sequence handling logic.

The variable substitution syntax is basically the same as in Punktum, except that the default value/message part may not contain a newline character, even when the value is in quotes. This is because the used regular expression uses .* for this, which doesn't match newline characters.

Because regular expressions can't handle recursive syntax they try to fix up miss-parsed substitutions somehow, which leads to in my opinion unexpected behavior. Examples:

FOO="${BAR:-{{}
}"
# FOO contains "{{\n}"

FOO="${BAR:-{baz}
bla}"
# FOO contains "{baz\nbla}"

FOO="${BAR:-${BAZ}
}"
# 2024/07/02 04:36:20 Invalid template: "${BAZ"

FOO="${BAR:-
}"
# 2024/07/02 04:34:52 Invalid template: "${BAR:-\n}"

GoDotenv Dialect

Based on this version of parser.go from godotenv.

This seems like a predecessor to the ComposeGo dialect.

Quirks

There are many things that aren't or aren't correctly handled by this that are better handeled by the docker-compose version. Both suffer from problems that arise from variable substitution being distinct from string literal and escape sequence parsing and by cheaping out by using regular expressions.

This dialect supports single quoted ('), double quoted ("), and unquoted values. Single quoted strings may contain \' and \\, double quoted values may contain \", \\, \n, and \r, which is evaluated appropriately.

It supports the same (inline) white space as the ComposeGo dialect.

It uses IsLetter() and IsNumber() from the unicode package, meaning variable names can be any Unicode code point from the Letter (L) and Number (N) catagories, plus . and _. Meaning e.g. this would be a valid variable name: .ᾖⅧ²⅛ However, a source comment above the usage of these functions claims:

// variable name should match [A-Za-z0-9_.]

I use Rust's char::is_alphanumeric() to implement this, which should do the same, sans both languages being up to date with the latest Unicode standard.

Variable names in variable substitution however only match: [A-Z0-9_]+ Yes, only upper case ASCII letters!

Java Dotenv Dialect

Based on this version of DotenvParser.java.

Quirks

Supports double " and single ' quoted strings in the regular expression, but then only removes the surrounding quotes from double quoted strings. Also there are no escape sequences, i.e. quoted strings may not include the quote, and there are no multiline strings.

It checks if a string is quoted by checking if it starts and ends with " and if yes slices of the first and last character of the value. But if the value was a single " this will lead to an StringIndexOutOfBoundsException.

Given that when the quoted regular expression fails it just treats any characters not including # (or \n of course) and then looks if the string starts and ends with " anyway the quoted string regular expression doesn't even matter.

You cannot define a variable multiple times. That will crash with:

java.lang.IllegalStateException: Duplicate key VARNAME (attempted merging values VALUE1 and VALUE2)

In the Punktum implementation of this dialect you can.

punktum Executable

Punktum comes as a library and as a binary.

NOTE: On Windows (or any non-Unix operating system supported by Rust) there is no exec() available. Meaning there is no way to replace the currently executing program with another. So instead the command is spawned as a sub-process and it's exit code is passed through at the end. However, forwarding things like Ctrl+C (or killing sub-processes when the parent exits) is not straight forward under Windows. This would need to be implemented with a lot of custom unsafe code calling Win32 functions, so I didn't do it. This means if you kill the punktum process the child process will keep running. I think. I haven't tested it under Windows, I use Linux.

Usage of the binary:

usage: punktum [--file=PATH...] [--replace] [--] command [args...]
       punktum [--file=PATH...] [--replace] --print-env [--sorted] [--export] [--binary]
       punktum [--help] [--version]

Punktum executes a given command with environment variables loaded from a .env file.

Positional arguments:
  command                   Program to execute.

Optional arguments:
  -h, --help                Print this help message and exit.
  -v, --version             Print program's version and exit.
  -f PATH, --file=PATH      File to use instead of ".env"
                            This option can be passed multiple times.
                            All files are loaded in order.
                            Pass "-" to read from stdin.
  -r, --replace             Completely replace the environment with the one loaded
                            from the .env file.
  -p, --print-env           Instead of running a command print the built environment
                            in a syntax compatible to Punktum and bash.
      --sorted              Sort printed environment variables for reproducible output.
      --export              Add "export " prefix to every printed environment variable.
      --strict=bool         Overwrite DOTENV_CONFIG_STRICT
      --debug=bool          Overwrite DOTENV_CONFIG_DEBUG
      --override=bool       Overwrite DOTENV_CONFIG_OVERRIDE
      --encoding=ENCODING   Overwrite DOTENV_CONFIG_ENCODING
      --dialect=DIALECT     Overwrite DOTENV_CONFIG_DIALECT

Environemnt variables:
  DOTENV_CONFIG_PATH=FILE
    File to use instead of ".env".
    This can be overwritten with --file.
    [default: ".env"]

  DOTENV_CONFIG_STRICT=true|false
    Stop and return an error if any problem is encounterd,
    like a file is not found, an encoding error, or a syntax error.
    This can be overwritten with --strict.
    [default: true]

  DOTENV_CONFIG_DEBUG=true|false
    Write debug messages to stderr if there are any problems.
    This can be overwritten with --debug.
    [default: false]

  DOTENV_CONFIG_OVERRIDE=true|false
    Replace existing environment variables.
    This can be overwritten with --override.
    [default: false]

  DOTENV_CONFIG_ENCODING=ENCODING
    Encoding of ".env" file.
    This can be overwritten with --encoding.
    [default: UTF-8]

    Supported values:
    - ASCII
    - ISO-8859-1  (alias: Latin1)
    - UTF-8       (default)
    - UTF-16BE
    - UTF-16LE
    - UTF-32BE
    - UTF-32LE

  DOTENV_CONFIG_DIALECT=DIALECT
    Dialect for the parser to use.
    This can be overwritten with --dialect.
    [default: Punktum]

    Supported values:
    - Punktum (default)
    - NodeJS
    - JavaScriptDotenv
    - PythonDotenv
    - PythonDotenvCLI
    - ComposeGo
    - GoDotenv
    - RubyDotenv
    - JavaDotenv
    - Binary

  DOTENV_LINEBREAK_MODE=legacy
    RubyDotenv dialect-only. If this environment variable is set to "legacy"
    "\n" and "\r" in unquoted values and double quoted values are replaced
    with actual newline and carrige return characters.

About

dotenv implementation in Rust

License:MIT License


Languages

Language:Rust 95.9%Language:Shell 2.1%Language:JavaScript 0.7%Language:Go 0.7%Language:Java 0.6%