ciaranm / detectindent

Vim script for automatically detecting indent settings

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Ignore multiline comment blocks in HTML and JS

ghostwords opened this issue · comments

Anything between should be ignored when determining indentation.

/* */ blocks should also be ignored in JavaScript files.

For example, a file starting with the following contents

<!doctype html>
<!--
 * line starts with a space, several lines like it might follow
-->
<html>
<head>
<script type="text/javascript">
function func() {
  if (true) { // line starts with a tab, and all lines following start with tabs

is incorrectly parsed by Detect Indent:

Detected spaces and tabs; has_leading_tabs: 1, has_leading_spaces: 1, shortest_leading_spaces_run: 1, shortest_leading_spaces_idx: 3, longest_leading_spaces_run: 1
Initial buffer settings changed: tabstop changed from 4 to 2, shiftwidth changed from 4 to 1

I expect tabs since the entire file past the comment block on top uses tabs.

Also, is it necessary to check for filetype (HasCStyleComments) when checking for comment lines? Perhaps filetype checks should be removed. I could have a long multiline JavaScript comment with /* and */ in an html file that could throw off Detect Indent, for example.

This one's really fiddly. We've got to be careful not to screw up common cases in order to get particular situations to work -- it's not realistically possible to get every single file right.

Can you produce a candidate patch that we can try against a bunch of files? I'd like to avoid getting anything wrong that is currently detected correctly.

Are you referring to my latter comment regarding filetype checks? How about the main issue, ignoring multiline HTML comments?

No, even HTML comments are a nuisance. Don't people stick those things after <script> and the like?

They could be anywhere on the page, yes. But it's better to ignore them wrongly than to choose incorrect indentation, no?

I'm not convinced that ignoring them won't break some things that are currently correctly detected. Script blocks with a comment are likely to contain sensible indentation, currently.

Can you give some examples of what ignoring HTML comments might break?

I'm thinking files that are mostly just a commented out script block.

The thing with the comment skipping... It shouldn't really be there at all. But for Cish languages, particularly when using Doxygen or Javadoc, you get a lot of things like this at the top of a file:

/**
 * Silly single line indent to make the * line up
 */

I'd really rather reduce the number of special tricks, not increase it... Is what you're doing with HTML really that common?

Yes, absolutely.

Every file in several projects I work on contains a comment block at the top. Sometimes the comment block is a comment, and sometimes it is a /* */ comment in a JavaScript file (which actually trips up Detect Indent as well now that I check).

Are files that are mostly commented-out script blocks common in your experience? They are not in mine. Couldn't you have a case for them as well, if this situation needed fixing?

These bugs are deal breakers for me, unfortunately, and I don't know if I have the time to fix them myself in the near future.

It's not whether there's a comment block there. It's whether there's a comment block there with bogus indenting. The only time I've seen that happen is with aligned stars for C-like comments.

Right, all the comment blocks I deal with here are either

/*!
 * Description paragraph.
 *
 * Copyright line.
 */

in JS or the same but with in HTML.

There is nothing bogus about them.

I get it when there are *s there, since that's for alignment. But why the odd indenting for HTML comments?

For consistency with the JavaScript header comments, so that the comment format is almost entirely the same. If you look at it critically, the JavaScript commenting situation is a clearer case in need of fixing, but I argue both JS and HTML comments are valid issues with Detect Indent.

That seems to be a fairly convoluted set of circumstances that would have to come together to trigger this. Can you find out whether odd indenting for HTML comments is wide-spread, as opposed to just an organisational oddity? I don't do much work with HTML, but I've not seen it in what I have done.

C-style comment headers blocks in JavaScript files:

Let's start with most watched/forked JS repository on GitHub, Bootstrap:

https://github.com/twitter/bootstrap/blob/master/js/bootstrap-carousel.js for instance.

Unfortunately, "<!--" and "/*" are hard to search for across GitHub source code.

Ok, that's a sign we should give JavaScript files the multiline C commenting hack. What about HTML?

A suggestion: Ignore all whitespace which is inside a comment block. My vimscript is pretty weak, but you could probably cover most cases with something like

synIDattr(synID($LINENUMBER, 0, 1), 'name') =~ /Comment$/

The obvious problem is that this might be very slow. Needs benchmarking, and perhaps an option for the vimrc.

I'm definitely not going that route. It's simply too slow, fiddly and unreliable.

It's also wrong, since comments are not in general a problem. The comment heuristics are there to deal with a limited, common set of special cases. If they're to be extended, it's only to cover similar common issues. The goal is not to handle every single oddity.

I'm running into this issue a lot, where a 2-space indented JS file gets converted to 1-space because of comment blocks. Is there no way to fix that?