add flag for marking duplicate entries with [d]

Question

add flag for marking duplicate entries with [d]

claustromaniac opened this issue 6 years ago · comments

...or the number of instances instead of [d]

Thorin-Oakenpants · Answer 1 · Thu Jul 12 2018 18:56:55 GMT+0800 (China Standard Time)

still haven't played with this yet (soz, am held up with so much stuff to see and people to do .. or is that stuff to do and people to see... its so confusing!)

Am thinking (in our case) about the parrots - while it kinda sucks to list 31 instances, I would rather this than just state a count (because how hard is it to count yourself, and most would only be a single double-up, maybe three in worse case scenario), because ideally the more info in the output, the less you need to refer to the original input.

If I did a compare using my actual js which includes overrides, this is handy. Here's what i'm thinking

   <snip>..
    4 prefs with matching values but inactive in pk.js
   17 prefs with both mismatching values and states
 ----
  670 combined unique prefs

    2 duplicated prefs in ghacks.js
    0 duplicated prefs in pk.js

 Reference: [i] = inactive pref (commented-out)

------------------------------------------------------------------------------
 The following 2 prefs have duplicate entries (in order) in ghacks

 [i] browser.tabs.warnOnClose                                       false
     browser.tabs.warnOnClose                                       false
     ---
     security.OCSP.require                                       true
     security.OCSP.require                                       false

------------------------------------------------------------------------------
 The following 0 prefs have duplicate entries (in order) in pk

This would be an added to the end of the output, and is separate from all the "diffs". The 0 prefs section is not needed, it was just to show that it would be a separate section if there were any duplicates in pk

claustromaniac · Answer 2 · Fri Jul 13 2018 00:28:40 GMT+0800 (China Standard Time)

Sup, 👖 ?

When I came up with this the other day I thought that, while the script doesn't list all the instances of each pref (only the last active, or the last inactive if active is not available), it could still be useful to know which ones of those turn up more than once in the source. Additionally, it would be pretty trivial to implement; most of the necessary code is already there. However, I reckon this falls slightly out of the scope of this thing, so I'm not so sure about it.

Maybe I could add an -extended parameter or something for stuff like this.

claustromaniac · Answer 3 · Fri Jul 13 2018 00:33:45 GMT+0800 (China Standard Time)

it would be pretty trivial to implement; most of the necessary code is already there

Forgot to mention that listing all of the duplicate prefs and their values is not as trivial to implement. It's definitely easy, but it would take a lot more code for that sole purpose.

Thorin-Oakenpants · Answer 4 · Fri Jul 13 2018 08:22:50 GMT+0800 (China Standard Time)

Sup, 🐈 ?

I'm just expanding on what I thought you were trying to accomplish. In all my 83 years working with data in Information Systems, any reporting must do three things - be accurate, timely, and relevant. It's the relevant part I'm thinking about here. The extra data must be useful to an end user.

by adding a [d] next to a pref, that would be good enough to indicate a possible problem, but how do you indicate which file has the duplicate when a pref only has one line, or how do you indicate when it's duplicated in both. An extra section at the end is the best solution - saves scrolling/searching and the end user is (generally) only concerned with their file, not the other

It's definitely easy, but it would take a lot more code for that sole purpose.

So do it. I know you love your code to be uber compact (I remember the days of the 5K challenge .. that's right, building entire multi-paged websites in under 5kb - wish I could find a link, it was the late 90s - the winning entry one year had a catalog of goods, shopping basket, and checkout and everything!). But all things considered, whether the code here is 1kb or 1.5kb I don't think it really matters in today's age.

Up to you of course.

claustromaniac · Answer 5 · Sat Jul 14 2018 00:08:01 GMT+0800 (China Standard Time)

The extra data must be useful to an end user.

Well, personally, I would find a simple flag useful enough. Like you said, most prefs won't be duplicated typically, so it would be nice to have the script tell me which ones are, so I don't have to figure that out on my own (which could take a while). Knowing that, I can look for the values of the duplicates in the source files if I want.

At least, that was my first take on it, but then you made this compelling argument:

how do you indicate which file has the duplicate when a pref only has one line, or how do you indicate when it's duplicated in both.

I could sill make the flags themselves more descriptive as an alternative, specifying how many duplicates are in each file. I could use a reference like... [A:2] [B:4], but it may not look well.

As for me loving the code being compact, I don't really care so much about it as I do about performance. Ideally I want the script to not take forever when parsing huge files on a crappy computer, for example. That's the key factor for me when deciding whether to see this one idea as a little bonus or as a big extra that falls slightly out of the original scope. If the script already does most of the work and I just need to add a few lines to get the number of duplicates for each pref, that's a little bonus. If it instead has to go out of its way to get the specific prefnames and values of each duplicate pref and list them separately, that's a way bigger bonus.

In any case, I can always make it an optional feature. I'm first going to see how well I can implement your idea. and then I'll decide that part.

Thorin-Oakenpants · Answer 6 · Sat Jul 14 2018 01:30:27 GMT+0800 (China Standard Time)

I don't think it should be optional - information is power. Just output it IMO.

Using descriptors like [A:2] [B:4] suck IMO. Firstly you have the [i] indicator using up the left hand column, so unless you make that wider (which is not really ideal IMO), then indicators would go at the end of the line, and that makes them hard to [edit] visually parse. Cogito ergo sum ^{edit: source} ... if you do it, do it right, as a complete separate section at the end and the end user does not have to look anything up in external files to get the info

PS: I know how I would have coded this back in the day, and can see no overhead (don't get me wrong, in my day we used different tools) - but as you parse the file, you're already getting info on each pref name and whether or not it already exists (last active state etc), so it should be simple enough to build two arrays (one per file) as you parse the file anyway, and just sort and output those at the end. I really can't see the overhead

Thorin-Oakenpants · Answer 7 · Sat Jul 14 2018 01:35:02 GMT+0800 (China Standard Time)

PS: just to be clear, my thumbs up on OP was for the marking of duplicates, not counting the duplicates. Showing there are duplicates can highlight potential issues for end users

claustromaniac · Answer 8 · Sat Jul 14 2018 03:03:36 GMT+0800 (China Standard Time)

it should be simple enough to build two arrays (one per file) as you parse the file ... I really can't see the overhead

Currently the script uses the most efficient method I could come up with, using hash tables (which is a fancy name for arrays with key-value pairs), which reduces the number of loops considerably.

In a single loop (without nested loops at all) it parses all of the active prefs in the file and builds the array with the information that will be used to produce the output later. It does the same for the inactive prefs declared behind // and the ones between /*...*/. That's 3 loops + 1 loop for comparing the values/states and producing the report. It gets the whole shit done in a total of 4 loops. To add the information about duplicate entries the way you describe, I have to add at least 2 extra loops just for that.

It's not tragic, but there will be an overhead. I will have to test and see. It's probably not that bad, but I'm thinking about those crazy-long user.js files with 3K+ lines when I worry about performance. Maybe I can still optimize it a bit more in other parts, though.

Correction: Pretty sure I can write this shit without any additional loops.
Correction²: Hmm... I would have to change the structure a lot. It's not worth it. Will use extra loops.

Thorin-Oakenpants · Answer 9 · Sat Jul 14 2018 03:53:28 GMT+0800 (China Standard Time)

I have the utmost faith in you 🐈

Thorin-Oakenpants · Answer 10 · Sat Jul 14 2018 03:55:11 GMT+0800 (China Standard Time)

It gets the whole shit done in a total of 4 loops

That's loopy man ... one loop to rule to em all

claustromaniac · Answer 11 · Sat Jul 14 2018 07:36:51 GMT+0800 (China Standard Time)

That's loopy man ... one loop to rule to em all

I know you're just messing around, but just FYI, it's done in 4 loops to...

make the code shorter and more readable.
parse more complex shit.

If it did everything in a single big loop, it wouldn't be able to parse shit like this:

user_pref  /* :trollface: */   ("example", // I like to make a mess of my
          'valueeeee\'eeee'   // config files :trollface:
) /* don't you like to make a mess of your config files too?
 I mean, simply typing a one-liner user_pref("prefname", "value"); is just too boring.
*/ ; user_pref("example2",true);sticky_pref("example3", false);// pref('example4',"https://thisis\"chaos.com/*/*/**"); // pref("example5", true);

(Yes, it can parse that just fine. Do try it out!)

claustromaniac · Answer 12 · Wed Jul 18 2018 08:58:35 GMT+0800 (China Standard Time)

I found a couple of minor bugs while I was implementing this, one of which was a PITA to debug. Thanks for the suggestions, Pants!

Thorin-Oakenpants · Answer 13 · Wed Jul 18 2018 09:59:55 GMT+0800 (China Standard Time)

awesome work 🐈 .. I pinky swear I will get around to using and testing and playing with this soon