ziglang / zig

General-purpose programming language and toolchain for maintaining robust, optimal, and reusable software.

Home Page:https://ziglang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The Hard Tabs Issue

basmith opened this issue · comments

Hi,

(This report is based on the v0.1.1 Win64 binary artifact from the Zig website.)

I noticed that if I create a Zig source file in Windows with a native editor (eg Notepad), the compiler complains about line endings:

$ zig build-exe hello.zig
':\code\zig\first\hello.zig:1:30: error: invalid character: '
const io = @import("std").io;
                             ^

If I manually kill the newlines (resulting in the code being all on one line) it compiles.

I tried using Vim in a Cygwin shell and the file it wrote also compiled without complaint (presumably Unix-style newlines, as Notepad renders that file on one line while Vim looks correct).

You need to configure your editor to use unix line endings to write Zig code. Additionally, you need to configure your editor to use spaces instead of hard tabs for indentation.

notepad.exe has neither of these features, and it can't even comprehend unix line endings. This has been a long standing bug/missing feature in the windows default plain text editor. Notepad is in fact so deficient as a text editor, that literally every single other text editor in popular use today can comprehend unix line endings. Notepad is the worst text editor in popular use, and has been for decades. Zig will not bend to accommodate Microsoft's gross incompetence or nefarious stunts in their inability or unwillingness to provide a decent default text editor to their paying consumer base. Notepad is the problem here, not Zig. (It's not just me. Here's other angry people complaining about Notepad.)

The rationale for only supporting unix line endings and no hard tabs is part of the "only one obvious way to do things" philosophy. From a practical perspective, never having windows line endings makes it easier to write tools that read zig source files. For example, a tool that searches for "\n\n// TODO" and replaces it with something else that includes newlines: it's much easier to do this without worrying about newline style. Furthermore, git and svn have strange features that convert newline styles at odd times, and now all that's irrelevant for Zig.

Variable newline style and variable indentation style are features that Zig does not support.

Is this documented anywhere, or are users just expected to run into cryptic errors like this? Like a CR character doesn't even print properly in a terminal.

I'm not sure if Zig does this on purpose or it was overlooked, but as far as I'm concerned this is a good feature. Different styles of indentation and line endings cause endless headaches when working on a collaborative project, for example using source control software such as git, etc.

I know notepad is the default text editor in windows, but nearly all developers use something else to write code, such as notepad++ or visual studio.

VS Code is also a good option. It's developed by Microsoft, and it's free.

Nim does it right:

Any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform.

Multiline string should insert LF newlines, as in C. If someone wants CR he could add it via \r.

Correct newlines are is not just problem of stupid Notepad: if one copy pastes example from webpage he gets CR/LF too. Imagine someone failing with Hello World.

if one copy pastes example from webpage he gets CR/LF too.

Pasting into what editor does that?

@thejoshwolfe: Notepad, Sublime Text 2 and probably anything else. I do not know Windows editor which by default uses Unix line ending and converts to this style automatically.

@PavelVozenilek Are you saying that when an editor is configured to use unix line endings or is editing a file that already has consistent unix line endings, then pasting from a webbrowser inserts the wrong kind of line ending? Or are you just saying that windows line endings are typically the default line ending style before you configure it in your editor?

@thejoshwolfe: the editors I know (VC++, Sublime Text 2, Notepad ...) do not have configuration option to force Unix ending everywhere from now. At the best one switch it manually file by file.

Programable editors like vim, probably, but I tend to avoid tools smarter than me.

I do not understand why this is even a problem. Line ending chaos is real, it won't go away, pragmatic solution (accept all) is easy and then the mess disappears from the view of ordinary user.

@thejoshwolfe: I think it is the browser that does this, not the editor. HTML is defined as using \r\n not \n. Most browsers let you get away with it on input, but when you copy and paste I think it recreates "correct" HTML from the DOM. Not sure about this, but I have run into the problem consistently.

I think @PavelVozenilek has a point. Every useful editor can manage to translate the line endings just fine but few allow you to do it at a project level and change everything automatically. However, the two main platforms, Windows and Mac, do not use the line ending convention that Zig uses. I happen to use Linux, but that is a minority platform.

I also tend to like the use of tools like go-fmt simply because it completely eliminates an entire class of bike-shedding. I've wasted too much time fighting about formats over the years. It is not a winnable war unless you create something like go-fmt.

I just did some experimentation on my Windows machine. Here's what I found:

Eclipse and Notepad++ normalize line endings when you paste text into a file. Each file is determined to be in a particular style, and anything you put into the file through typing or pasting gets normalized to that style.

In Visual Studio, when you press enter, it uses the newline style of the lines around your cursor. When you paste code with CRLF line endings into any file, you get CRLF line endings for the text that you're pasting without affecting the surrounding text. If you save the file, it saves with mixed line endings without warning you. (You can convert line endings while saving through the "Advanced Save Options".) Visual Studio has no option to automatically normalize line endings on paste or on save. If you want normalized line endings, you gotta do it every time you save.

When you open a file in Visual Studio that has mixed line endings, you get a dialog that prompts you to normalize all the line endings to one style or another.

This is not a bug thread for Visual Studio, but that is a Visual Studio bug. Why would they let you create mixed line endings without warning, but then warn you when you open a file that has mixed line endings? This leads to a "best practice" where you should close and reopen all your files before making a commit to make sure you're not committing files that will produce warnings, which is just silly. This is a bug/missing feature in Visual Studio.

I don't know about Sublime Text; it's not free.

Meanwhile in Linux, copying text out of a web browser always seems to result in unix line endings, not windows line endings. I don't know where you're getting the idea that HTML uses windows line endings; I don't see it in the spec. Maybe you mean HTTP headers? There are parts of the HTML spec that talk about normalizing to CRLF, but I can't figure out how to observe that as an end user. I tried copy-paste and drag-drop text from a Google search page and from the textarea editor I'm typing this in right now, but I always got unix line endings (tested in Chrome).

The so-called "Mac"-style line ending style actually refers to pre-2005 "classic" Mac OS-9 line endings. Modern Mac uses Unix line endings, just like Linux.

I do not understand why this is even a problem. Line ending chaos is real, it won't go away, pragmatic solution (accept all) is easy and then the mess disappears from the view of ordinary user.

This kind of reasoning leads to JavaScript's automatic semicolon insertion. This kind of reasoning has proven to be very successful at getting widespread adoption. This kind of reasoning is also contrary to the Zen of Zig. In Zig, the code author is required to do more work so that code readers are required to do less work.

I also tend to like the use of tools like go-fmt simply because it completely eliminates an entire class of bike-shedding. I've wasted too much time fighting about formats over the years. It is not a winnable war unless you create something like go-fmt.

Some kind of zig-fmt tool is definitely within the scope of what we want to create. Some plans so far are to make the Zig compiler outright reject any source files that would be modified by zig-fmt. This not only establishes a clear precedent, but forces everyone to use it, or else your code won't compile, not even in debug mode. Whole classes of bikeshedding are gone with this strategy, and all working Zig code has a consistent style. This is already partly the case as we're discussing here, although there's no zig-fmt to fix these problems for you yet. But since the only formatting that's currently rejected is '\r' and '\t' characters, you can pretty trivially clean these up. (You could use this tool, for example.)

@thejoshwolfe, thanks for running the experiments on cut-and-paste from browsers. Interesting that you did not get the CRLN combos. It has been a while since I cared to check and I tend to set all my editor tools to use LN only on save. As you note, Visual Studio is perhaps not the example of what to do :-)

Not sure how I feel about the idea of having the compiler reject code that is not in the One True Format(tm). While I like the idea that all Zig code would be formatted the same, that might be a little too draconian. For Python this almost works because indentation matters and if someone enters code using both tabs and spaces the meaning is ambiguous.

I think your example of JavaScript's semicolon insertion is taking this a bit far. The semicolon insertion (IMO) is an abomination because it can be wrong and change the intended meaning of the code. I do not the see the same thing with handling CRLN, LN or CR as white space.

If format is so important that you would want to make it enforced by the compiler, then perhaps the syntax should be closer to Python? I mean this in all seriousness. I think Guido van Rossum did something really interesting when he decided to make the visual layout elements of Python have meaning at the language level. Python code is not formatted all the same, but even without a python-fmt tool, the code from different projects has more formatting similarities than code in most other languages. I think van Rossum made a mistake in allowing tabs.

If format is so important that you would want to make it enforced by the compiler, then perhaps the syntax should be closer to Python? I mean this in all seriousness. I think Guido van Rossum did something really interesting when he decided to make the visual layout elements of Python have meaning at the language level.

Yes. My idea is to have both C-like curly braces and Pythonic indentation, and they must agree. Curly braces are arguably easier for tools to understand, and indentation is absolutely easier for humans to understand, so I want both. Curly braces enable things that you can't do with just indentation. And as for the compiler enforcing indentation rules, come on, you should always get indentation right; no excuse for wrong indentation; it's not that hard, and it makes a huge difference for readability.

A neat advantage of having strict indentation rules and curly brace block scopes is that you can have better compile errors for unbalanced curly braces, which is something that is especially chaotic in C and Java.

fn SomeClassThing(comptime T: type) -> type {
    struct {
        const Self = this;
        field: T,
        fn method(self: &const Self) {
            {var i = u32(0); while (i < self.field.len) {
                self.field.something(i);
            }
        } // ERROR: missing '}', or wrong indentation
        // At this point, the compiler can trust the indentation
        // rather than the curly braces for parsing the rest of the file.
    }
}

In practice, indentation tends to be more correct than curly brace balance. This is especially relevant for IDE's where the tooling is trying to follow along with you as you type. Unbalanced parentheses, quotes, curly braces, etc. are very common while you're in the midst of typing code. By contrast, wrong indentation is much less common. Usually the indentation is wrong if you past/move a bunch of code at once, and in that case, you can have an IDE hotkey to trust the curly braces and fix the indentation; then everything's back in agreement.

Generally there are two facets to code formatting: readable for tools and readable for humans. C leans toward readable for tools (curly braces, etc.); Python leans toward readable for humans (indentation, etc.); Zig wants to have it both ways, and so has two sets of formatting rules that must be in agreement for your code to compile. (As a reminder, this is an informal plan for a future version of Zig, not status quo.)

Related is #114.

I think van Rossum made a mistake in allowing tabs.

Absolutely agree. It's horrifying how ugly you can make "correct" indentation in Python by mixing spaces and tabs, even in the same line. What a mess.

Not sure how I feel about the idea of having the compiler reject code that is not in the One True Format(tm). While I like the idea that all Zig code would be formatted the same, that might be a little too draconian.

I have high hopes for this strategy. We've already seen some people scared away by Zig's decision to not support hard tabs, which is a shame. But on the plus side, all Zig code will be consistent with this kind of design philosophy.

@thejoshwolfe, doesn't the use of both curly braces and indentation violate the DRY principle? If one of them is wrong, which one? I think this will add to the cognitive load of the programmer before he or she even thinks about the logic of the code itself.

One of the things I like about Python is that it showed you can have both human friendly and machine friendly syntax at the same time. Parsing Python is not markedly harder than parsing a brace-heavy language. Tooling has become intelligent enough that pleasing the human far outweighs pleasing the machine.

If Zig is to become a useful replacement for C, and I think it has many parts that are very positive, putting too many barriers in the way of adoption could be a problem. The balance that the Go creators did with go-fmt ended up being a pretty good one. Use of go-fmt is not actually required, but your code is going to be heavily criticized and not reused if it isn't used.

I think use of an enforced indentation scheme and providing a tool like zig-fmt would go a very long way to stopping the bike-shedding and help a lot in making all code heavily reusable.

For instance you could simply mandate that all indentation is three spaces per indent level. Fine, 99% of all editors can handle that right now. Mandating that you must have curly braces and that the indentation of the code must also match is not something existing editors are going to help with.

That said, using indentation as a hint that the programmer missed a curly brace? That would be a good thing. I think some editors may do that now. We catch the misaligned indentation by eye easier than the missing curly braces.

Obviously this is all IMO!

doesn't the use of both curly braces and indentation violate the DRY principle?

Yes, and I think this is a good time to violate that principle. DRY taken to the extreme leads to Haskell's complete type inference, which is very hard to read. Information duplication is only a problem because it's more work to do, which Zig is ok with forcing on authors, and because it can create conflicting information:

If one of them is wrong, which one?

When you're trying to compile your code, probably the indentation is right (still a compile error though). When you're trying to autoformat your code, probably the curly braces are right.

I think this will add to the cognitive load of the programmer before he or she even thinks about the logic of the code itself.

It doesn't seem like much to ask of a programmer to get their indentation right before trying to compile their code. I'm always careful to keep my indentation correct, even if when it's not a compile error, because it makes the code easier to read. An error for incorrect indentation would add 0 cognitive load for me, but if you're not used to being careful to keep your indentation correct, perhaps have your zig compile command preceded by a zig-fmt command. This would be similar to Eclipse JDT's option to run the autoformatter while saving Java source files.

Parsing Python is not markedly harder than parsing a brace-heavy language.

Maybe I'm just bad at it, but I find writing indentation-scoped parsers to be much harder than start/end token-scoped parsers.

Tooling has become intelligent enough that pleasing the human far outweighs pleasing the machine.

I still want to consider people creating new tools. There are lots of cases where you'll want to make a machine that reads Zig code, e.g. custom linters, syntax highlighters, even a one-off sed command to do some refactoring. The more constrained the syntax is, the easier it is to write these tools.

Mandating that you must have curly braces and that the indentation of the code must also match is not something existing editors are going to help with.

Vim can already do this. The = command indents your code based on curly-brace matching even without any installed zig syntax highlighting. It doesn't behave quite correctly in all cases, but it helps.

I don't think curly-braces-to-indentation is an outrageous feature to expect editors to have. And again, I don't think indentation is very difficult to get right manually in the first place.

As an example of how easy Zig is to comprehend with tools, here's a perl one-liner that deletes the content of all the free-form text you can find in Zig code (// comments, "strings", \\ strings, 'characters'). After doing this substitution, every { character is part of the structural syntax.

perl -p -e 's/(["'"'"'])([^\\]|\\.)*?\1/$1$1/g; s/(\/\/|\\\\).*/$1/g'

Even if you don't understand that mess, do notice how short it is. You can't get anything near that simple for C/C++/Java/JavaScript/C# (due to multiline comments), Python (due to multiline string literals), JavaScript/Ruby (due to template strings), PHP/Perl (I don't even want to know), etc. This tokenization simplicity is one reason why Zig does not support /* */-style comments. The tokenizer state is always reset on a newline.

And by newline, I mean '\n', not /\r\n?|\n/. Bringing this rant back to the original topic, Zig source code is meant to be easy to understand by tools, because it's all in a consistent format. The more formatting variability that's allowed, the harder it is to write tools to read it. There should only be one way to do newlines in Zig source code, so that tools don't need to worry about that variability.

EDIT: Just for fun, here's some code in Chrome's debugger console that tries to understand JavaScript source code using simple regex. JavaScript is way too complex for that to work, and you can observe lots of misbehavior in that area if you poke at it long enough. This serves as just one counterexample to the "Tooling has become intelligent enough" idea, fwiw.

What is the use case for tools massaging source code? Qt does it because C++ is lacking usable metaprogramming, but it is hated and very clumsy to use within IDE.

If one-true-newline-rules-them-all is really that important feature then I suggest to switch to CR/LF everywhere. Number of Windows programmers dwarfs the others, and they are not used to accommodate to other platforms.

Even if you don't understand that mess, do notice how short it is.

That's not a feature by Zig's standards though :)

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler. It's unfortunate that a set of people will have to configure their editors beyond the defaults, but that is necessary for one or the other standard to be selected.

It's not my intention to shut down any discussion, but I would posit the thought in everybody's heads who is involved in this thread: is this how you want to spend your time, discussing whitespace? Or do you want to challenge yourself, and switch over to figuring out some of the more fundamental engineering problems that this project is trying to tackle?

Pros for CRLF: Notepad support. Visual Studio users can be sloppy. You usually don't need to change your native-Windows editor's configuration from the default.

Pros for LF: Easier to write tools that scan for LF than for tools that scan for CRLF. Easier to write tools that produce '\n' instead of "\r\n". sed -i support (always outputs LF regardless of input style). diff looks cleaner, including git diff (metadata always in LF even if the + and - lines end with CRLF).

This is just a start, but the pattern is that LF is more friendly to programmers, and CRLF is more friendly to windows users who don't know any better. In other words, LF is better for advanced users, and CRLF is better for adoption. As an advanced user, I vote for favoring advanced users.

Number of Windows programmers dwarfs the others, and they are not used to accommodate to other platforms.

The number of bad programmers dwarfs the others too, and I'm not sure I want to cater to bad programmers. Sure it's better for adoption, but compromising to increase adoption is not in line with Zig's goals.

The issue with errors like this is, when a new user like me downloads Zig and starts coding in Visual Studio Code ( Windows )... and get this error, the result is confusing. Spend first 10 minutes trying out other examples, to run into the same issue. Still did not figure it was file issues. My first idea? Must be a bug in Zig...

In simple terms, the error message is inadequate and needs to be much more clear.

Made some improvements (in the above pull request) to these error messages that should handle the most obvious cases and hopefully help a user diagnose exactly where the problem is a little better.

Open to any wording changes or extra special cases if they are considered noteworthy. Regardless of the stance on line endings, hopefully this helps.

2017-10-26-193828_527x46_scrot

2017-10-26-193929_294x49_scrot

2017-10-26-194055_302x48_scrot

Reasons Zig should support tabs in source code:

  • People with small screens exist (yes, still).
  • Partially-sighted users that use large font sizes exist, and they tend to reduce tab widths in order to see source code without scrolling their editor.
  • People that write obnoxiously long if statement chains exist.
  • It reduces the filesize of source code, sometimes quite significantly.
  • Some people have editors that are not set up (or cannot be set up) to interpret four spaces as tabs, making editing code annoying.
  • If you're a soft-tab user, your editor is clearly configured in a way that allows you to ignore the difference between tabs and spaces. Working on hard-tab code is not a problem, since your editor should automatically detect the indentation style and alter its behaviour accordingly.
  • People use a variety of soft-tab widths. I've seen 0, 1, 2, 3, 4 and 8 spaces all used in different projects. Hard-tabs remove this problem and places control of the appearance of the code in the hands of who should have control: the person reading it. Forcing people to read code in this manner is akin to forcing people to use a specific typeface or font colour when viewing the code.
  • Some people just prefer to use tabs.

Tabs are useful in cases where text-aligned whitespace is not required. Spaces are useful in cases where text-aligned whitespace is required. There is a strong distinction between them, and they have independent uses. To smother this by pretending that there exists no distinction between them and forcing developers to use one or the other totally circumvents the whole purpose for them being distinct characters in the first place.

Maintaining this position will lose you a lot of potential users, including me. Good luck with growing your reach while you enforce such arbitrary rules.

I propose the following actions are taken:

  1. Allow tabs in Zig source code.
  2. Add a point to the style guide explaining the difference between soft and hard indentation, and the circumstances in which each is appropriate to use.

Here is an example of the correct use of tabs vs spaces.

The tabs are used to indent code that does not require alignment.
The spaces are used to indent code that does require alignment (in this case, because someLongFunction and some_long_value should be aligned).

screenshot from 2018-02-02 01 07 55

The decision to ban hard tabs in zig is founded on two main principles:

  1. There should only be one obvious way to do things.
  2. Hard tabs are harder and more complicated to use than just spaces for everything.

If there's only one obvious way to do things, then we need to get everyone to use either soft tabs always or hard tab indentation always. Letting users use whichever they prefer leads to nightmares, such as: what if a single source file has a mix of hard and soft tabs; should that be an error? what if a source directory has a mix of hard and soft tabs; should that be an error? what if an app uses hard tabs and links against a library with soft tabs; should that be an error? And in the case where the user can use whichever they prefer, you will always have the problem that copypasting code from one location will sometimes need to be reformatted to fit the indentation style in another location (such as from stack overflow into your codebase); if you don't check for that, then you get a mix of hard and soft tabs in a single line.

The solution to these nightmares is that everyone has to use the one true official indentation style. The only question that remains is should the official indentation style be hard or soft tabs.

Hard tab indentation, if done properly, can have some nice features. You've outlined a number of positive points in favor of hard tabs, and they're all somewhat compelling.

People with small screens exist (yes, still).

My general response to the points about fitting code into small screens is that hard tabs only solve part of the problem; they're not a proper solution. Hard tabs help a little, but other factors seem like they would matter much more, like whether you wrap your lines at 80 columns or 120 columns or never. I've even seen people criticize using long descriptive variable names because it causes lines to overflow the width of their screen. Not only are hard tabs not a proper solution, but they work against some proper solutions. Like if you did want to wrap all your lines at 80 columns, that concept is undefined if some of those "columns" are occupied by hard tabs; how many "columns" does each hard tab use? The question defeats the purpose of hard tabs in the first place.

It reduces the filesize of source code, sometimes quite significantly.

Yep. I have to agree. My experimentation showed about 88% (plain) and 96% (compressed) size ratio when switching to hard tabs. Hard tabs result in smaller file sizes in almost all cases.

Some people have editors that are not set up (or cannot be set up) to interpret four spaces as tabs, making editing code annoying.

Not quite sure I understand this. Interpreting spaces as tabs? You mean when you press the Tab key, it inserts spaces instead of a hard tab? I can think of one editor that is lacking this feature, and it's Notepad.exe, but that editor is so incompetent, I shudder to even classify it as a text editor. You can't write Zig in Notepad.exe for numerous reasons. Notepad.exe is not supported; it's a terrible piece of software. I'm aware that this compromises Zig's adoption slightly.

It's often true that editors are configured by default to use hard tabs instead of spaces. This is a reality that every programmer, every programming project, and every programming language needs to be ready to deal with. If the programmer and the project turn a blind eye to the issue, as most programming languages do, then the result is a mix of hard and soft tabs in source files creating indentation chaos. It surprises me that so many programmers shrug off this chaos and just carry on with broken indentation. This scenario is unacceptable to me and my projects, and it's currently unacceptable to the zig programming language.

At some point, someone needs to fix the chaos caused by a misconfigured/unconfigured editor, and Zig is enforcing that that must happen before your code will compile. There are lots of solutions to this problem. You could use this tool, for example.

People use a variety of soft-tab widths.

Yeah, this brings up a violation of "one obvious way to do things" on the soft tabs side. If soft tabs are how you indent your code, how many soft tabs should Zig authors use? The answer is 4. This is not currently enforced by the compiler, but if it were, then it would resolve the "one obvious way to do things" violation.

Hard-tabs ... place control of the appearance of the code in the hands of who should have control: the person reading it. Forcing people to read code in this manner is akin to forcing people to use a specific typeface or font colour when viewing the code.

I have to agree that you could decouple the display of the code and the meaning of the code in this way. Hard tabs give you this feature.

If we require that everyone uses hard tabs, we get all the benefits you've outlined above.

The biggest argument against hard tabs is that they're more complicated than spaces:

  • With hard tabs, there's a strategy for how to indent vs vertical align, as you've pointed out. With spaces, just make it look the way you want, and it's correct. Spaces are simple.
  • All source files contain spaces. If your source file also contains hard tabs, then there's multiple kinds of horizontal whitespace. You might want to enable the "show whitespace" option in your editor so you can follow what you're looking at. With no hard tabs, there's only one kind of horizontal whitespace, so any spacing you see, you know what it is. (The slight exception is you still can't see spaces at the ends of lines, but that's out of scope here.)
  • If there's ever a tab after non-tab characters, then there's this weird algorithm for how wide the tab is. It's not just 4 or 8 or whatever you have configured; it's enough space to reach the next multiple of your configured tab width number of columns measured from the start of the line. Now, if you're using tabs properly, you'll never need to think about this, but still. Tabs are complicated.
  • There are cases where proper use of tabs vs spaces is hard to determine by machine, and so the compiler could let through improper use of tabs vs spaces. This could result in code that appears to be formatted correctly by the author, but will appear to be formatted incorrectly by another viewer with differently configured tab width. The way you avoid this accident is you, as an author, enable "show whitespace" to be careful, or you write a linting tool, or you have code reviews to catch this, or ... Or you just don't have hard tabs, and this never happens.

Spaces are simple. Tabs are complicated.

The most important thing is that we all do it the same way, and spaces are simpler and easier to use than hard tabs. Tabs have more features, but it's not worth it.

You raise some interesting issues. To address them:

Letting users use whichever they prefer leads to nightmares, such as: what if a single source file has a mix of hard and soft tabs; should that be an error? what if a source directory has a mix of hard and soft tabs; should that be an error? what if an app uses hard tabs and links against a library with soft tabs; should that be an error?

Why should these things be an error? This seems like pedantry for the sake of pedantry, rather than any hard, technical reasoning behind it. At the very most, the compiler should spit out a warning. It doesn't need to do more than that.

The solution to these nightmares is that everyone has to use the one true official indentation style.

This is not a nightmare. None of these situations are nightmares. Since you insist that people should be using modern editors, you will know that modern editors automatically format pasted code to fit with the existing style of the source file, circumventing this problem.

I find your use of the phrase "one true" troubling. As if things like personal preference can be standardised. As the Unicode committee have discovered, attempting to tame the complex preferences of humanity is impossible. Enforcing an indentation standard will not make everybody use that indentation standard. It will simply turn people away from the language.

Besides, this will never be a significant problem for maintainers. The to-be zigfmt tool would include all the necessary ability to format any amount of code any way the user likes. There is even talk of adding a --fmt flag to the compiler that automatically formats Zig as required by the standard.

With hard tabs, there's a strategy for how to indent vs vertical align, as you've pointed out. With spaces, just make it look the way you want, and it's correct. Spaces are simple.

If people are looking for simplicity that hides the true nature of a problem, they shouldn't be using Zig. As has already been expressed by @andrewrk and others, Zig is a language that makes clear the true nature of any given problem space. Since there is a distinct difference between the use of tabs and spaces in a text file, it only seems sensible to not hide this distinction behind a restrictive, compiler-enforced rule such as this.

With no hard tabs, there's only one kind of horizontal whitespace, so any spacing you see, you know what it is. (The slight exception is you still can't see spaces at the ends of lines, but that's out of scope here.)

This isn't really a problem at all. As I say, there are legitimate and distinct reasons for using both tabs and spaces, so having both is only sensible. If the programmer really cannot stand having both in a text file (I'd question the capabilities of a developer if the distinction between tabs and spaces has them confused), then the zigfmt or the --fmt tag will solve the problem for them quickly and easily.

If there's ever a tab after non-tab characters, then there's this weird algorithm for how wide the tab is.

No there isn't. Hard tabs should not appear after non-tab characters in code. End of, no exceptions. It's really very simple.

Spaces are simple. Tabs are complicated.

Python is simple. Zig is complicated. Not because complexity exists for the sake of itself, but because it is required to correctly and fully articulate the problem space. Similarly, the use of hard tabs for non-text-aligned code is the correct solution, even if it is not the simplest hack.

Instead of all this, I suggest a sensible compromise between both stances

  1. Correct use of tabs and spaces should be noted in the style guidelines
  2. Zig should fail to compile code that exhibits tab characters appearing after non-tab characters on any given line

As I've said before: not allowing tabs will push away many potential users. The kind of people pedantic enough to use tabs and spaces correctly are also the kind of people this community should be trying to welcome. Providing them with a compilation error when they try to use their preferred - and completely justified - style is obnoxious and sends a bad message.

Without any intention of being rude, I personally cannot stand not making use of tabs. Therefore, I will be continuing to maintain a tab-permitting fork of the project for my own uses.

I'm putting this issue back on the table.

@zesterer Considering the case for usage of tabs, as you have specified, how does one type the following 2 lines?

    const tmp_path = try allocator.alloc(u8, dest_path.len +
                                             base64.Base64Encoder.calcSize(rand_buf.len));

For the 2nd line, one could press tab to get a \t to the correct indentation level. Next, we want 40 spaces before the word base64.

With spaces, here's how you'd accomplish this, starting from the cursor at the end of line 1, with any text editor which is replacing the tab key with 4 spaces:

  • hit tab 10 times

With tabs,

  • press space bar 40 times

Maybe you prefer, as I do, to indent like this instead:

    const tmp_path = try allocator.alloc(u8, dest_path.len +
        base64.Base64Encoder.calcSize(rand_buf.len));

Assuming your text editor does not understand the syntax of your language, with spaces, you have to press tab once, with tabs you have to press space 4 times.

Am I missing something? Or do you have to press a lot of spacebar for lines that wrap?

Also, how do you know how long your lines are? Can teams who use tabs agree on a maximum line width? Or do they have to give that up?

Considering the case for usage of tabs, as you have specified, how does one type the following 2 lines?

Personally, I'd format that code like the following:

const tmp_path = try allocator.alloc(
	u8,
	dest_path.len +
		base64.Base64Encoder.calcSize(rand_buf.len)
);

Advantages of this include:

  • It makes clear that alloc has two arguments (as shown by the number of items that are singularly indented) meaning you don't accidentally miss the tiny u8 when refactoring
  • It makes clear that base64.Base64Enc... is a component of the dest_path.len argument rather than a distinct argument in its own right, since it has an additional indentation
  • An absurd amount of spacebar/tab pressing is not required to smartly and understandably organise the code
  • It's clear that this is just one function call, since the trailing ); is immediately visible to anybody reading the code

Personally, I'm not afraid to use additional lines if doing so aids readability, as is the case in the example above. The primary reason we use syntax instead of writing machine code in hex is because it aids readability, so I don't see the use of additional lines as a problem.

Also, how do you know how long your lines are? Can teams who use tabs agree on a maximum line width? Or do they have to give that up?

Line length is still determined by the line character (column) count. This is for 3 reasons:

  • Indentation isn't really a part of the code: it's just a way of indicating which parts are distinct from others. Tab indentation width can be altered at will anyway, so counting it as if it were fixed screen real estate isn't useful
  • Indentation is local: a line of code is likely to have a similar indentation to the line before it, particularly if the lines are relevant to one another. For this reason, scrolling the text editor if one has a small screen is less of an issue, since you're looking at large blocks of code rather than single lines
  • Column counts are simple to measure. Everything/everyone understands it: it's just the number of characters in a line.

For examples of me using the aforementioned style in C++, see this and this. I'm sure you will agree: it's quite elegant and makes the code much easier to interpret.

Personally, I'd format that code like the following:

I agree with all of those statements about how to format code. It looks like your answer to my question is: use formatting conventions that allow you to indent with a full indentation amount instead of aligning on a particular character.

Does your suggested formatting convention never have any spaces in between a tab and the start of the code? E.g. is there ever a case for a tab followed by a space?

Indentation isn't really a part of the code: it's just a way of indicating which parts are distinct from others. Tab indentation width can be altered at will anyway, so counting it as if it were fixed screen real estate isn't useful

I don't think this is fair to say. There objectively exists the problem of code wrapping on one person's screen, and not wrapping on another person's screen. If one person prefers 8 space tabs and another prefers 1 space tabs, no matter the screen widths they each have, there will be a conflict where code looks fine on one screen and wraps on the other.

It will then become a project-specific argument about how many spaces tabs should be, which defeats the benefit of tabs allowing each user to choose the width.

Counting a tab as a width of 1 column has one positive effect, which is that if every user's editor at least (max_indentation_count * preferred_tab_size) characters wide, then code will not wrap for them. This provides a range of minimum screen widths whose upper bound is inversely proportional to their preferred tab size.

But this benefit is soured by the fact that it makes a formatting convention difficult to specify.

Zig has the opportunity to prevent a huge amount of bickering about this issue, and let people focus on their code.

Minimize energy spent on coding style.

The question is, what is the best way to accomplish this?

I'm not convinced that allowing hard tabs does.

Minimize energy spent on coding style.

I'm not at all convinced that this is a desirable objective. Style is important. It makes code easier to write, easier to read and easier to maintain. It's not an annoyance that should be overlooked.

E.g. is there ever a case for a tab followed by a space?

Additional spaces are useful when aligning parts of function declarations and comments such as this. Other than that, my personal style doesn't tend to align text in a manner that would require spaces after tabs. Of course, this changes depending on who you ask - particularly in commercial settings - meaning that disallowing non-standard style at a compiler level will likely limit the industry uptake of Zig.

Zig has the opportunity to prevent a huge amount of bickering about this issue, and let people focus on their code.

I really don't think the problem is as bad as you say. The people that bicker are also the people that are the most set in their ways and are least likely to change no matter how hard you try by enforcing style through the compiler. It's not a thing I find to be particularly contentious amongst developers unless they're forced to not use their own preferred style.

As previously stated, indentation style is trivial to change with tools that are often built in to editors by default. It doesn't alter the semantic meaning of the code, and besides: style isn't something that should be rigidly enforced. Often, good style doesn't fit a set of rules and the best style is the one that makes a specific piece of code readable under specific circumstances.

Zen guidelines are a far better fit to this problem than compiler errors.

Alright. I've heard the arguments, I've considered carefully, and I've reevaluated my position. And it is still that hard tabs are not allowed in valid zig source. Even if the compiler allowed hard tabs, there would still be the question of zig fmt, which is going to be opt-in, but have no configuration options. (Note that zig fmt will in fact allow hard tabs, windows line endings, and a few more mistakes that have unambiguous corrections).

Hard tabs allow users to abstract the concept of indentation at the syntax level. Zig does not believe in the benefit of this abstraction being valuable enough to justify the headaches that mixing hard tabs and spaces causes developers trying to work together across different platforms.

It is a premise that Zig, and tools that parse Zig code, know the column index and actual display column of every byte of source code. This premise will not be broken by hard tabs. It is a premise that Zig source code that one looks at on their own screen will match what another person sees on their screen, excepting for fancy IDE features that are out of scope of this issue. The Zig project recognizes that these premises have the downsides mentioned above in this issue, namely that text editors without syntax awareness cannot modify indentation widths to a particular user's preference, that some text editors make it difficult to avoid hard tabs, and that file sizes of Zig sources may be a few percent larger.

Exactly 1 ascii control code is recognized, and that is the newline character.

Minimize energy spent on coding style.

This is in the Zig Zen and it is here to stay. Zig wants programmers to focus on the semantics of their code and tolerate differences in style as much as possible.

You will notice that I set an example for this, in that I never comment on style or naming in a pull request. At most I will merge it in a branch, make the edits that I prefer myself, and then merge the pull request into master.

In order to facilitate this Zen, I hereby am redirecting discussion of hard tabs in the official IRC channel, Reddit, and email list. All discussion about hard tabs is to take place here, on this issue. I will not lock the issue just as I will not lock my mind. If someone convinces me to change things, we will change things. So far I am not convinced. Nobody gets in trouble if they violate this new rule; I will simply politely request that they redirect their comments to this issue, so that users can focus on what's more important: writing robust, optimal, clear code.

I wish Zig all the best but this decision will result in constatnt flame wars in the future, because despite all the merit it breaks one fundamental rule above all: KISS. If proper copy-pasting "Hello World" may be dependent on the system/editor you use it is the fundamental KISS flaw. The in-home formatting tool is the way to go (not only for Go) that resolves and balances well all the inconveniences of code formatting diversity.

As a side note: enforcing code style in the compiler is the last thing that may convince to use any hardcore C programmer considering switch, they are even proud of IOCCC https://www.ioccc.org/years.html. For C++ demanding community there is already Rust option, that Zig seems to be inspired by and if you check most of already well established cummunity projects, the code formating convention is fairly consintent between them without any enforcement.

commented
diff --git a/src/tokenizer.cpp b/src/tokenizer.cpp
index badbd695..e829226b 100644
--- a/src/tokenizer.cpp
+++ b/src/tokenizer.cpp
@@ -17,6 +17,7 @@
 
 #define WHITESPACE \
          ' ': \
+    case '\t': \
     case '\n'
 
 #define DIGIT_NON_ZERO \

Applying this patch on master (commit b11c5d8) seems to enable compiling the hello world example replacing the indenting spaces with tabs. Not sure if there are any other side effects, YMMV.

Fork is here: https://github.com/zovt/zig

commented

Some thoughts on hard tabs and tooling:

  • If a tool needs to align horizontally in a monospace environment, replace any non-whitespace character with a single space, and leave other whitespace alone. (See my fork for an example with the compiler)
  • Alternatively, we can say that tools should replace tabs with N spaces for display or alignment purposes

Is there a use case that you can think of where the above would be insufficient?

Thoughts on style and hard tabs in general:

  • Making the use of certain whitespace characters a compiler error seems weird
  • Using zig fmt as the canonical zig style seems better, much like gofmt
  • Each programming project should have some style guide that prevents bickering about style, but that shouldn't be enforced at the language level

@thejoshwolfe @andrewrk thoughts on the above? My fork seems to work properly with tabs in the stage1 compiler, and I'll be updating the self-hosting compiler to use similar logic. Happy to submit a PR if you change your mind about this issue.

Is there a use case that you can think of where the above would be insufficient?

multi-byte utf8 sequences make it nontrivial to measure horizontal space in a text file. there are also multi-codepoint graphemes and wide characters in Unicode.

Using zig fmt as the canonical zig style seems better, much like gofmt

zig fmt works today. The std library is formatted with it. It's self hosted (not in stage 1). If it doesn't support fixing hard tabs yet, it will.

commented

multi-byte utf8 sequences make it nontrivial to measure horizontal space in a text file. there are also multi-codepoint graphemes and wide characters in Unicode.

how does using strictly spaces make dealing with this any easier?

Oh sorry. I understand what you're saying now. Your proposals would be sufficient.

Making the use of certain whitespace characters a compiler error seems weird

Java classifies form feed as whitespace. In Go, a form feed is a compiler error.

Zig is even more restrictive than those languages on which ascii control codes are allowed.

commented

Sure, but why does it need to be more restrictive? There seems to be very little appeal here other than the personal preference of the language devs

commented

it's not a big deal but it's just weird. If you want to go this way then the compiler should have an option to automatically rewrite source files that contain tabs to spaces so that people can still use ziglang without having to switch their editing software. That's weird too (possibly dangerous) but at least it's convenient.

If you want to go this way then the compiler should have an option to automatically rewrite source files that contain tabs to spaces so that people can still use ziglang without having to switch their editing software.

That's coming soon. We have zig fmt in the self-hosted compiler, and it's planned to support hard tabs.

Column counts are simple to measure. Everything/everyone understands it: it's just the number of characters in a line.

This is actual quite hard to define once you allow non-latin characters.
The width of unicode characters is quite hard to calculate; the 'standard' solution of wc_width is inconsistent, and often incorrect (due to characters with variable widths; and the most common implementation hard-coding an outdated version of unicode)!

I don't think column counts can be consistent in a unicode world, and should not count as a reason against tab-based indentation.

Regarding zig fmt and "only one obvious way to do things":

Anyone who has written 4x4 matrices for 3D graphics (e.g. here) aligns the columns visually. In the case of an identity matrix...

const myMatrix = {1.0, 0.0, 0.0, 0.0,
                  0.0, 1.0, 0.0, 0.0,
                  0.0, 0.0, 1.0, 0.0,
                  0.0, 0.0, 0.0, 1.0};

Or in a transformed matrix...

const transformed = {1.25,   1.0,  1.0,   0.0,
                     0.095,  1.0,  0.0,   1.0,
                     1.05,   3.5,  10.24, 8.5,
                     105.25, 35.2, 92.1,  1.0};

If there were "only one obvious way to do things," I would be forced to live with exactly four spaces at the beginning of each line, and a single space between array elements, thus giving the following result...

const transformed = {1.25, 1.0, 1.0, 10.0,
    0.095, 1.0, 0.0, 1.0,
    1.05, 3.5, 10.24, 8.5,
    105.25, 35.2, 92.1, 1.0};

... or worse, no newlines between elements.

const transformed = {1.25, 1.0, 1.0, 10.0, 0.095, 1.0, 0.0, 1.0, 1.05, 3.5, 10.24, 8.5, 105.25, 35.2, 92.1, 1.0};

"Only one obvious way to do things" is a noble and worthwhile goal from a functional standpoint. The enforcement of indentation tabs/spaces equivalent to telling a painter how to hold his paintbrush. A system that forces its formatting and indentation rules upon me make the above scenario frustratingly difficult, and when I find out that an automated system has removed my tedious alignment I will get red in the face and toss the laptop off the table. Yes, tediously aligning those matrix elements in the above example does suck: I did not enjoy it, but it's "the right way" for the tools I have available to me. A better method of text input, a better human interface than a keyboard, and a better method of rendering text that automatically breaks my matrix into a 4x4 grid without requiring my tedious spacing are all steps in the right direction, but all well beyond the scope of "programming language." In a world where I want to nicely space out my matrices and use zig fmt, can someone explain how the two could coexist?

In Andrew's YouTube talk about Zig he asked for feedback about what is and isn't working for game developers: this is a big one for me.

These are the output of the current zig fmt:

const myMatrix = []f32{
    1.0, 0.0, 0.0, 0.0,
    0.0, 1.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0,
    0.0, 0.0, 0.0, 1.0,
};

const transformed = []f32{
    1.25, 1.0, 1.0, 0.0,
    0.095, 1.0, 0.0, 1.0,
    1.05, 3.5, 10.24, 8.5,
    105.25, 35.2, 92.1, 1.0,
};

And if you really wanted column alignment, I guess you could pad with zeros instead of spaces, but I don't think this looks very good:

const transformed2 = []f32{
    001.250, 01.0, 01.00, 0.0,
    000.095, 01.0, 00.00, 1.0,
    001.050, 03.5, 10.24, 8.5,
    105.250, 35.2, 92.10, 1.0,
};

One of the advantages of not doing any column alignment, like go fmt does, is that when you add or remove items from a list, the line-based diff (like git diff and most other version control) will show only the lines that changed, instead of showing that the whole table changed due to formatting. This wouldn't really apply to these fixed-size matrixes we're discussing here, but it does apply to lists in general.

The formatted output for the transformed matrix is unacceptable to me, and I agree that the transformed2 does not look good either.

git diff showing more-perfect output is of much lower importance to me than being able to quickly scan and reason about the transform of a particular matrix in my code. The git diff having the nicest possible output is valuable every once in a while when inspecting diffs, but matrices having the nicest possible layout is valuable all the time as I continuously re-read the code I've written.

And if you really wanted column alignment, I guess you could pad with zeros instead of spaces

you can use d21a192

zig fmt: off
foo_array = {...}
zig fmt: on

I even remember there is a thing where fmt does a different thing when you put something at the end of the line or something not sure right now but there are some nobs to turn

@winduptoy I think that

const transformed = {1.25,   1.0,  1.0,   0.0,
                     0.095,  1.0,  0.0,   1.0,
                     1.05,   3.5,  10.24, 8.5,
                     105.25, 35.2, 92.1,  1.0};

is a good use case to argue in favor of alignment in zig fmt. Can you open a separate issue? I don't think it's related to The Hard Tabs Issue.

Can you open a separate issue?

I'm doing it.

I opened a separate issue for this here: #1793

@andrewrk My intention was to argue in favor of "don't dictate my indentation at all." Solving the issue of "the one true way to enter computer code" is a little lofty without fundamentally altering the way that we interact with a computer and redesigning the medium of code from monospace text to something different.

You asked the community "is this how you want to spend your time, discussing whitespace? Or do you want to challenge yourself, and switch over to figuring out some of the more fundamental engineering problems that this project is trying to tackle?" to which I answer, "Yes! Of course! So please take your hand off my paintbrush and let me get to work. Wouldn't you like to do the same rather than forever having half of the programming population find issue and friction with this decision?"

That's a reasonable stance. zig fmt removes lots of freedom from the programmer, and that has its downsides. The tradeoff is that you get consistency between programmers. If everyone is forced to conform to one standard, not everyone will be happy with it, but at least we can try to make it pleasant for as many people as possible. If you have proposals for how everyone's zig code should be formatted, please open issues to argue for them. If you want every zig programmer to be able to format their own code differently, then you're arguing against the purpose of zig fmt (and go fmt).

commented

@andrewrk Will tabs be allowed at some point?

Any updates on this? It's extremely annoying to have the compiler enforce a coding style on you. Even more so than dealing with Rusts borrow checker.

zig fmt fixes whitespace now. I suggest configuring your editor to run that on save. https://github.com/ziglang/zig/wiki/FAQ#why-does-zig-force-me-to-use-spaces-instead-of-tabs

I'd have my two cents to add regarding this topic but if zig fmt can now handle it correctly, it should be far less invasive. Thank you for the quick response.

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler.

Just a casual observation, but making this preference a syntax error is a sure fire way to guarantee the bikeshedding about it never ends, at best.

The biggest reason to enforce an indentation and line endings is that it eliminates energy spent on debating what the standard should be, since the standard is enforced by the compiler.

Just a casual observation, but making this preference a syntax error is a sure fire way to guarantee the bikeshedding about it never ends, at best.

I don't think you can ever end the bike-shedding. The difference is that since Zig is choosing to only support one format, developers no longer have a decision to make or debate on a project-by-project basis. The bike-shedding is now centralized :)

They do have a decision to not use the language tho x)

Spaces-only is a real usability issue. Spaces are highly programmer-unfriendly and only work in some way with fancy editor configurations. Let's try to compare in an objective way:

Tabs

advantages

  • user-friendly: tab-width can be configured and adapted to screen width, media type (LaTeX documentation, mobile phone), and personal preferences with a single config setting
  • works in every (even the simplest) editor
  • fewer key presses
  • simple conversion from tabs to spaces possible if needed

disadvantages

  • might not work in some browser input fields
  • maximum line width is tab width dependent

Spaces

advantages

  • maximum line length enforceable

disadvantages

  • accessibility nightmare
  • larger files
  • almost always the wrong indentation width, users prefer 1,2,3,4,8 spaces
  • needs agreement or style guide
    • really annoying when you switch between projects with the same editor
  • has to be reformatted for different media
  • usually leads to mixtures of tabs and spaces as tabs are re-configured to soft tabs etc. in broken editors
  • diffs are harder to read (debatable, see below)
  • conversion from spaces to tabs much harder and usually requires tooling (think tabs for indentation spaces for alignment)

To be honest, I never saw a convincing argument for spaces. It just makes no sense to not use a key that was designed exactly for that and mimicry tabs with soft tabs and alternatives. "Looks everywhere the same" is exactly what you don't want and what brought us the indentation mess.

I'd highly appreciate it if that decision would be reconsidered.

To be honest, I never saw a convincing argument for spaces.

To play devil's advocate (I think tabs should be supported), there are two IMO somewhat convincing arguments against them, and so for spaces:

  1. Inside the lexer/parser/compiler, it's impossible to accurately identify the exact location of a token/node/error. You have to either arbitrarily pick a tab width to use or assume it's one character, and report errors at the offsets using the inaccurate location. For example if there's a syntax error at the front of a line indented with a tab, is the error at column 2 because it's the second char, or 4 or 8 or other guessed tab width? The only way to fix it is to add another option to the tool like editors have to say how many characters a tab is considered to be.
  2. Even if you follow the better practice of using tabs for indentation and spaces for alignment, there are certain cases where the alignment can still get messed up, like if you have aligned with spaces trailing comments at the end of some lines with different indentation levels with tabs, it will be wrong unless viewed with the same tab-width setting as it was aligned with. For example this code was aligned with an editor configured for 2-char tab width:
void foo() {      // this function is silly
	if (1)          // as is this condition
		printf("hi"); // but at least it's friendly
}

A middle ground would be to add a warning flag like -Wtabs, either enabled or disabled by default, so that each project could choose their own preference/convention. IMO, that's overkill though and a better approach would be to just put the switch-case for tab (back?) in the lexer and add a FAQ answering "don't use tabs" for the question "why are my diagnostic message locations inaccurate?".

is the error at column 2 because it's the second char, or 4 or 8 or other guessed tab width

2 of course (as other compilers do and IDEs expect as error location). Even for fancy arrows in the error message it's simple: copy the affected line, cut at error and replace all non-white-space code points with space.

if you have aligned with spaces trailing comments

This only happens when block comments span different indentation levels. This is a code smell and breaks with every refactoring. (You don't want to check if comments are aligned in every location when you rename a variable, right?) If you're creating ASCII art or quines - fine, but don't use it in real-world code. go fmt for instance would break those too.

And again, this happens with spaces too: try to integrate such sections into documents with different indention requirements...

[tabs] work in every (even the simplest) editor

nope. textarea inputs in web browsers don't support tabs by default, including the github comment editor where i'm typing this right now. It's actually spaces that are supported in every text editor.

[spaces make] diffs are harder to read

nope. it's tabs that are rendered strangely when you prefix every line with a + or -.

...key that was designed exactly for that...

nope. the tab key and character were originally designed to align the cells of a table, not indent structured code. The original purpose of the tab character was to appear in the middle of a line, which is today considered bad practice (at least before the rise of elastic tabstops).

inputs in web browsers

Because it's a browser and not an editor. A proper in-browser code editor supports tabs (and monospace) - see GitHub's online editor. There are many plugins, key combinations etc. to solve this if one really wants to write code in a browser?! I'd consider an editor that is not able to even work with the ASCII character set (minus weird control characters) as broken.

it's tabs that are rendered strangely when you prefix every line with a + or -

I see what you mean, although it's not rendered "strangely" - it stops exactly at the same level. I thought more about small indentations (1-2 spaces) where you can't make out the indentation level. With tabs you can just pipe it through less -x16 and it gets much clearer.

align the cells of a table

That's exactly aligning at fixed indentation levels... (tabs on typewriters were also used to indent paragraphs or lists, not just tables).

I think people are not aware of zig's stance on hard tabs. I updated the wiki page to make it more clear:

Why does Zig force me to use spaces instead of tabs?

see also Why does zig fmt have no configuration options?

less key presses
simple conversion from tabs to spaces possible if needed

Neither of those is strictly true. First off, grammatically speaking the word less does not fit there, and should be replaced with fewer; secondly and more practically, you can use the tab key to insert spaces even in most plain text editors that I've used.

Conversion from tabs to spaces is possible, but the inverse is true as well: the text editor I use literally has the options to go back and forth right next to each other. Also, it's trivial with any good find+replace system to go in either direction.

works in every (even the simplest) editor

Yeah, and? That's not an advantage. To qualify as an advantage, it can't be in the center of a venn diagram, it has to be on a specific side. Both tabs and spaces work in every editor.

larger files

3 bytes per indentation level is not nearly large enough to be a serious concern. You might as well complain that comments require two characters, or that Zig has no multiline comments and therefore a ten-line comment requires twenty characters instead of e.g. four. It's just not a real issue regardless.

usually leads to mixtures of tabs and spaces as tabs are re-configured to soft tabs etc. in broken editors

That's not a disadvantage of spaces; again, that argument could easily be made in either direction. Furthermore, zig uses spaces right now, and you'd be hard pressed to claim that this is an issue with any zig code at the moment.

conversion from spaces to tabs much harder and usually requires tooling (think tabs for indentation spaces for alignment)

Firstly, this is only an issue if you care about supporting both spaces and tabs. Secondly, it's not even true. I've literally done it dozens of times in the past.

needs agreement or style guide

Which exists, that's literally the point of having it be compiler enforced.

There's also many more advantages and disadvantages missing for both sides of the argument in that post.

grammatically

No grammar policing please. Plenty of people around here have English as a second language and teaching English is off topic. Just try to understand intent.

grammatically speaking the word less does not fit there, and should be replaced with fewer

Thanks, fixed.

secondly and more practically, you can use the tab key to insert spaces even in most plain text editors that I've used.

This misuse led to mixes of tabs/spaces in many code bases.

Conversion from tabs to spaces is possible, but the inverse is true as well: the text editor I use literally has the options to go back and forth right next to each other. Also, it's trivial with any good find+replace system to go in either direction.

That's wrong. You can't go from spaces to tabs without a syntax-aware formatter. Find/replace is simply not capable of distinguishing between indentation and alignment.

Yeah, and? That's not an advantage. To qualify as an advantage, it can't be in the center of a venn diagram, it has to be on a specific side. Both tabs and spaces work in every editor.

I wouldn't consider it "working" from a usability perspective when I have to press 24 times space to reach the 3rd indentation level (or try to hit it with auto-repeat).

3 bytes per indentation level is not nearly large enough to be a serious concern

I prefer a tab to be 8 spaces wide on most screens...

That's not a disadvantage of spaces; again, that argument could easily be made in either direction.

No. Press a space/tab - get a space/tab. Everything else is just a misconfiguration to cope with usability issues of spaces and leads to mixed up indentations (see above).

Furthermore, zig uses spaces right now, and you'd be hard pressed to claim that this is an issue with any zig code at the moment.

It is a usability and more importantly an accessibility issue (think of the limited space on a braille display or the inflexibility to change indentation width).

needs agreement or style guide

Which exists, that's literally the point of having it be compiler enforced.

The compiler doesn't enforce anything. You can indent your code in any way as long as you use spaces. You have to agree on 1/2/3/4/8 spaces for indentation per project. With tabs that's not a problem at all.

There's also many more advantages and disadvantages missing for both sides of the argument in that post.

Please list the most important ones.

@pixelherodev the tabs vs spaces thing isn't for aesthetics it's very important for accessibility, tabs are better for accessibility because some users use huge font sizes to be able to see in that case they need to adjust their tab width, because at larger font sizes it becomes harder to even see the spaces.
Just because browsers and some tools don't render tabs correctly doesn't mean to say FU to people with eyesight problems, i think you might regret making that argument some years down the hill when that computer screen finishes it's job :D

Checkout this post which goes into the issue a bit more in detail.

I wanted to try to experiment with this language, but the fact that it imposes no tabs, no Windows newlines and in general other coding style issues which will cause me to add 2 more days for a medium size project just to fix these things that shouldn't need fixing as they don't affect the reliability and correctness of the compiled code at all (I am fine with Rust's borrow checker), I'm not going for this anymore. (I would make a "fork" of this language without these... "Political" I shall call them even if not related to actual politics, issues).

I probably will work with 4 spaces per soft tab/indent level. It's fine. But I want my editor (usually an IDE) to be able to give me the proper spaces, automatically convert all preexisting tabs to spaces at a 4-wide alignment (I agree that mixed tabs and spaces are not a good idea), and to be able to delete an indent level with a single backspace character instead of 4 (assuming my style). The only context where I use a different style is Linux kernel, which has its own coding style, imposed things AND the statement that you may break the rules where it makes sense.

And this is the most important factor in imposing the rules at compile time (with an error; warnings may be fine as long as you can locally override them) -- you will be unable to break the rules where it makes sense to do so.

add 2 more days for a medium size project

I find this hard to believe.

But I want my editor (usually an IDE) to be able to give me the proper spaces, automatically convert all preexisting tabs to spaces at a 4-wide alignment (I agree that mixed tabs and spaces are not a good idea), and to be able to delete an indent level with a single backspace character instead of 4 (assuming my style).

I use vscode, and it does this. (bottom right - Spaces: 4, UTF-8, LF)

you will be unable to break the rules where it makes sense to do so.

Zig fmt is optional, and can be turned off for top-level declarations with // zig fmt: off. The only hard rules are:

  • No Tabs
  • No Windows newlines
  • UTF-8 (or something that works as a subset of it, like ascii)

and I'm not sure where you'd want to break these rules. Zig fmt is also fairly lenient.

On the UTF-8 one I fully agree, it's non-controversial. I fully agree with the premise. Skipping the BOM might be a good feature though, which can be done before giving the characters to the tokenizer (and also, the BOM is a valid UTF-8 character with the value 0xFEFF, which can be conditionally skipped if it's the first one). You can even deny overlong forms of characters (ASCII characters should always be 1 byte), that too makes sense. I won't insist on this.

On Windows newlines, I mostly agree, though again simply skipping the character before the tokenizer (and a stray \r that isn't followed by \n would therefore not be considered a newline) -- so it isn't even part of strings unless escaped in the \r form -- might be an easy solution. Most tools can skip \r on their own as well and, if not, you could run dos2unix on said file anyway. Again, you could run dos2unix on .zig files before compiling or as an added build step so I won't insist either.

On the no tabs one it's a bit more complicated. I'd argue that you should default to no tabs BUT allow support for them in larger projects (not single-file projects) by having some sort of configuration parameter or command line switch to allow tabs (and their width), though only at the beginning of the line (tabs following non-tab characters on the same line can be forbidden just fine). For example build.zig could get by with no tabs at all, and it could have one configuration option that tabs are x spaces wide (which "zig fmt" would also obey). Also preferring 3 spaces per indent level, that's a bit odd (you're the first that I've seen with such a preference, being used to 4 spaces in most projects and 8-wide tabs on the Linux kernel specifically). I'm not sure there are tools that could do this preprocessing either so that we can still fit within our own coding style specifications.

@exoticus Thanks for the link.

Accessibility sways me instantly. Tabs win IMO.

I don't think you can ever end the bike-shedding. The difference is that since Zig is choosing to only support one format, developers no longer have a decision to make or debate on a project-by-project basis. The bike-shedding is now centralized :)

The difference is that developers just won't bother to use Zig)))
I understand about returns, they're not a visible change, but tabs and 4 spaces:
I use an editor which has the tab character be a vertical box draw line, showing the "body" of the function. I can't do that with 4 spaces. Only solution for me: accommodate to the language and feel
pain using 4 tabs or just not use the language since i didn't really learn it.
And the irony was that Zig was supposed to make C painless, by making me suffer from 4 spaces.
Ok, i'm sorry for the rudeness, i'm still wildly interested in Zig, but forcing you to use certain kinds of tabs and a specific type of returns? In my opinion, a language like Zig shouln't care about whitespace at all!

This thread has nothing useful left to offer. Here's the FAQ entry pasted:

Why does zig fmt use spaces instead of tabs?

Because no human and no contemporary code editor is capable of handling tabs correctly. Humans tend to mix tabs and spaces on accident, and editors don't have a way to "indent with tabs, align with spaces" without pressing the space bar many times, leading programmers to use tabs for alignment as well as indentation.

Tabs would be better than spaces for indentation because they take up fewer bytes. But in practice, what ends up happening is incorrectly mixed tabs and spaces. In order to simplify everything, tabs are not allowed. Spaces are necessary; we can't ban spaces. But tabs are not strictly needed, so the null hypothesis is to not have them.

Maybe someday, we'll switch to tabs for indentation, spaces for alignment and make it a compile error if they are incorrectly mixed. But if we did that today, writing Zig code would be too hard. For now your options are to configure your editor to insert spaces when you press the tab key, or configure your editor run zig fmt on save (recommended).

What will make it into the final language specification? It isn't decided yet and it doesn't really matter. Just run zig fmt on save.