scy / timesheet.txt

A plain-text timesheet file format and tools for it.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How should hashtags be parsed?

scy opened this issue · comments

Basically, I want to support hashtags in the description text. These should be used for specifying a project, subproject or really anything that you'd like to group or filter by afterwards.

(Note that full-text searches will be supported, too, but in my experience hashtags improve things nevertheless, since they are explicit tags.)

Example input: debug #CLI parser #tstxt

Example output: { 'tags': [ 'cli', 'tstxt' ] }

There's one complicated decision to be made though:

Sometimes, hashtags are used mid-sentence with the implication that their word actually belongs there and should still appear in the resulting description text (like #CLI in the example above), but sometimes they are simply used as metadata and should be removed from the description (like #tstxt in the example). I see two ways of dealing with this:

  1. If the hashtag contains a capital letter, leave it in the description, with the # character removed. This would lead to a description of debug CLI parser in the example above. On the other hand, it's awkward if the word in question is supposed to be all lower-case. Converting the word to all lower-case after extraction doesn't make sense either, because that would lead to something like debug cli parser in the example, which just looks sloppy.
  2. Add a special syntax for one of the two cases, like a double hash character. But should the default be to keep the tag in the description, or to remove it? What's the "usual" use case? debug ##CLI parser #tstxt or debug #CLI parser ##tstxt?

I will not support something like "hashtags at the end should be removed", because that prevents you from ending a sentence with a hashtagged word.

My 2 cents:
I think distinguishing via lower/upper case is a bad idea for the reasons you mentioned.

If you want to go for the 2nd option, I think I would use the ## in the description and # for 'actual' hashtags to be omitted.

A third option could be using ## (or some other token like a semicolon) indicate the end of the description and to separate the description field from the list of tags, e.g.
debug #CLI parser ## tstxt foo bar or debug #CLI parser; tstxt foo bar
This would require that the tags to be omitted from reporting appear always at the end of the line after the separator token.

I kinda like the third option, thanks! I prefer ## over ;, because semicolons are way too common, and also, # is already the character associated with tags, so ## feels rather natural.

The only ugliness with this is that # on its own (if at the start of a line, followed by a space, or at the end of the line) starts a comment, and I want to keep it that way. It works like this:

# this whole line is a comment
#so is this, because the hash sign is at the start of the line
    #whitespace before the comment is permitted too
1234  some entry # this is a comment
1325  another entry ## these are tags
1414  third entry ## these are tags again # and this is a comment
1531  fourth entry # this is a comment but ## these are words in the comment instead of tags
1612  another entry #this is not a comment but the tag "this" followed by more description

I could also allow ## to start the list of tags without requiring a space after it:

1234  another #entry with an inline tag ##and four additional ones

But imho this looks messy and its only advantage would be to save a single space.

If you require a space after ##, keeping ##spaceless (as) text means it can be used to reference tags without actually tagging:

0001  something #tagged
0002  this is specifically not ##tagged as such

Basically a commented out tag, otherwise requiring less obvious options like #!prefixes or new syntax like \#.

Hm. I understand what you mean, and I'm leaning towards requiring a space after ## anyway, but I fail to see a use case for your example. Why would I want to use a hashtag without also wanting to tag the respective entry? Can you give me a practical example?

One case I've had was something like:

1234  unofficial meeting with some person from ##companyname

to identify and link the company I know them from, but without actually tagging it as it's unrelated to the particular meeting/entry.

If you require whitespace it's a freebie anyway, since it will be tagged differently even if not explicitly supported.

jrnl.sh for instance discusses the use of # from a cli program, this is often interpreted as a comment by the shell and requires cumbersome escaping. To be honest, I do not think I will need comments myself (at least inline comments). But I can see that they could be useful for a very terse invoicing system. If not, then the tags could be replaced by something like :tag: (as done in vimwiki). On a related note I think tags could be used to marking timespans as billable not-billable without introducing additional entities (:$:).

jrnl.sh for instance discusses the use of # from a cli program, this is often interpreted as a comment by the shell and requires cumbersome escaping.

I'm aware of that. The timesheet.txt format is intended to be edited mainly in an editor. I'm looking forward to tools/scripts that allow command-line based adding of entries, and they are free to use -t list,of,tags or @tag1 @tag2 or whatever they like. However, since the concept of "hashtags" is so ubiquitous and intuitive to people at this point, I've decided to use that instead of some other syntax, just because of shell escaping.

On a related note I think tags could be used to marking timespans as billable not-billable without introducing additional entities (:$:).

Yes, the billable flag could also be implemented by tags alone. I don't have a clear opinion on whether it should though. I might just add an issue for discussing it.