Shen-Language / shen-sbcl

SBCL port of the Shen language.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

REPL on windows doesn't recognize newlines

tizoc opened this issue · comments

@rkoeninger when you run this code in SBCL, whats the output? (you will see a bunch of warnings, ignore those, what is important is the result of the call to read-byte)

(setq *standard-two-way* (make-two-way-stream *standard-input* *standard-output*))
(setq *stinput* *standard-two-way*)
(read-byte *stinput* nil -1)

@rkoeninger have you discovered anything new related to this? Also, you don't have this problem in the CLisp port, right?

@tizoc I'm going to look at this next.

And no, the CLisp port doesn't have this problem.

Ok, great. Btw, on my system the result of (read-byte *stinput* nil -1) then <enter> is 10. If it is different in your system then we have found the problem.

Well, it can be different, any combination of 13 and 10 is expected.

@tizoc If I start sbcl and enter (read-byte *STANDARD-INPUT* nil -1) on the repl and hit <enter>, nothing happens, it just goes to the next line.

If it type a and then hit <enter>, it prints 97, as expected.

@rkoeninger you hit enter to complete the expression or you mean enter after the expression has been submitted to the repl?

It has to be: (read-byte *STANDARD-INPUT* nil -1)<enter><enter>

Btw are you using any kind of alternative terminal program?

If it turns out the output is correct, here is another thing to try:

(define toplineread_loop
  -1 [] -> (exit 0)
  Byte _ -> (error "line read aborted")  where (= Byte (hat))
  Byte Bytes -> (let Line (compile (/. X (<st_input> X)) Bytes (/. E nextline))
                     It (record-it Bytes)
                     _ (output "~%Line: ~R~%Bytes: ~R~%" Line Bytes) \\ ++LINE ADDED++
                  (if (or (= Line nextline) (empty? Line))
                      (toplineread_loop (read-byte (stinput))
                                        (append Bytes [Byte]))
                      (@p Line Bytes)))
      where (element? Byte [(newline) (carriage-return)])
  Byte Bytes -> (toplineread_loop (read-byte (stinput))
                                  (if (= Byte -1)
                                      Bytes
                                      (append Bytes [Byte]))))

If the reason is not that the newline character is not recognized correctly, then it has to do with the parser not being able to handle it (probably because of some extraneous byte that gets inserted in the input).

I've been using ConEmu. The (READ-BYTE *STANDARD-INPUT* NIL -1) returns 10 for me on the standard Windows cmd terminal.

But the Shen REPL still doesn't handle newlines even when run from cmd.

Ok, good, just eliminating possibilities.

I wrote your above code in the kl because I didn't want to re-build the shen:

(defun shen.toplineread_loop (V3817 V3818)
    (cond
        ((and (= -1 V3817) (= () V3818))
         (exit 0))
        ((= V3817 (shen.hat))
         (simple-error "line read aborted"))
        ((element? V3817 (cons (shen.newline) (cons (shen.carriage-return) ())))
         (let Line (compile (lambda X (shen.<st_input> X)) V3818 (lambda E shen.nextline))
         (let It (shen.record-it V3818)
 \\ ADDED LINE BELOW
         (let V1234234 (do (print "Line: ") (do (print Line) (do (print "Bytes: ") (print V3818))))
         (if (or (= Line shen.nextline) (empty? Line))
           (shen.toplineread_loop (read-byte (stinput)) (append V3818 (cons V3817 ())))
           (@p Line V3818))))))
        (true
         (shen.toplineread_loop
           (read-byte (stinput))
           (if (= V3817 -1) V3818 (append V3818 (cons V3817 ())))))))

I start the shen repl, type (+ 1 2) and hit <enter>, it spits out "Line: "shen.nextline"Bytes: "[40 0 43 0 32 0 49 0 32 0 50 0 41 0] and doesn't print the sum or do anything else, just like before.

Using cmd just in case.

In the CLisp port, the above version of shen.toplineread_loop spits out "Line: "[[+ 1 2]]"Bytes: "[40 43 32 49 32 50 41].

Well, there is the problem. It was not that the bytes had the value multiplied by 10, it is that between reads a NULL byte is inserted. This is what causes the parser to fail.

iirc you were running SBCL 1.3.15, I just checked the release notes of 1.3.16 and I don't see anything related to I/O bug fixes.

In my own install this:

(list (read-byte *stinput* nil -1) (read-byte *stinput* nil -1))<enter><enter><enter>

results in:

(10 10)

No weird bytes added in between.

Not sure what causes it, but we are getting closer.

I'm now thinking that your SBCL install may be using a multibyte STDIN, could that be it? Trying to find if there is a way to control that in SBCL.

@rkoeninger whats your output for this?

(stream-external-format *standard-input*)

mine is:

(:UTF-8 :REPLACEMENT #\REPLACEMENT_CHARACTER)

(:UCS-2LE :REPLACEMENT #\REPLACEMENT_CHARACTER). I figured it was 16-bit.

Can I insert something like (setf *stream-external-format* *standard-input* :utf-8) somewhere?

Do you launch Shen from the console?

What I found (on random threads by Python and Go programmers having encoding issues in the console) is that you can change the encoding to UTF-8 by doing:

chcp 65001

So doing so before launching Shen may help.

I don't know if the encoding of STDIN can be changed after the process has started (or how it can be done in SBCL)

I don't think chcp 65001 helped.

I'm not a CommonLisp person, but I'm imaging wrapping the stdin stream in another stream that translates to a particular encoding.

Or having a different version of read-byte that reads two bytes, ignoring the second.

Does this make sense?

(IF (FIND :WIN32 *FEATURES*)) ; or identify a 16-bit encoding
  (DEFUN read-byte (S) (LET ((B (READ-BYTE S NIL -1))) (READ-BYTE S NIL -1) B)))
  (DEFUN read-byte (S) (READ-BYTE S NIL -1)))

Thats not going to work, because it only "fixes" the problem for the console input, breaking it for everything else (including STDIN when it is not the terminal). It may also work in your environment but cause problems for other users.

Btw, the chcp command would be for launching SBCL, not Shen. I'm guessing here, but I think whatever SBCL decides is the port encoding at first, will be inherited by the Shen binary, and not change between runs, thats why the binary provided by Mark works fine for you, while the one you are compiling yourself doesn't (the port code didn't change anything related IO, the only relevant thing changing between a binary and the other is the environment in which it was compiled).

So, somehow, the SBCL input encoding has to change before generating the Shen binary, not after.

I ran chcp before build, then tried running shen.exe.

There's no way to say something like (setq *stinput* (with-format :ascii *standard-input*))?

I don't know, I tried to find a way to do it and the best thing I found was the chcp command that fixed the issue to other people for what they were doing. If I have time during the weekend I will investigate it more.

It is also not entirely clear to me how the terminal decides which encoding to send to the program, the older Shen.exe, and your CLisp compiled one do not have this problem, their STDIN is in a single-byte encoding. And what is more important, the old Shen.exe uses almost the same code, what is different is the environment in which it was compiled (that is, whatever version of Windows and SBCL Mark used at the time), there was no code to set the encoding of STDIN.

@rkoeninger can you try obtaining the SBCL version from the old Shen binary? Being that it was compiled without the changes that get rid of the SBCL flags, --version should still work on it (or maybe it uses /version on windows, no idea).

If we know the version, we can then see if something changed between SBCL versions regarding terminal I/O. It will also let you test the current code compiled with that older version of SBCL to see how it behaves.

Also, when running that Shen binary, whats the output of: (STREAM-EXTERNAL-FORMAT (value *stinput*))

the sbcl that comes in the 19.2 download is version 1.1.1. When I try to run it, I get:

Expression unexpectedly false: win32-os.c:465
 ... VirtualAlloc(addr, len, (mem_info.State == MEM_RESERVE)? MEM_COMMIT: MEM_RESERVE, PAGE_EXECUTE_READWRITE)
     ===> returned #X00000000,
     (in thread 00000000) ... Win32 thinks:
     ===> code 487, message => Attempt to access invalid address.

 ... CRT thinks:
     ===> code 0, message => No error
ensure_space: failed to validate 943718400 bytes at 0x22300000
(hint: Try "ulimit -a"; maybe you should increase memory limits.)

(STREAM-EXTERNAL-FORMAT (value *stinput*)) gives [UCS-2LE REPLACEMENT �funex1345�]. I embedded that in the kl.

Well, thats very interesting, it means that the encoding of STDIN is the same as the one compiled with a newer SBCL version (that throws a big part of my theory out of the window, what still stands is the difference in SBCL versions used to compile it). I will investigate later and see if I find what changed between 1.1.1 and 1.3.x.

Oh, sorry the [UCS-2LE...] output was from the version i've been trying to build. I guess that didn't tell us anything we didn't know earlier.

I don't know what the pre-built shen.exe would say.

Trying to run the shen.exe that comes with the standard download gives me the ugly error above.

...i thought it worked before? I'll have to try it on my other machine, or on something that doesn't have sbcl installed separately.

I found this: https://bugs.launchpad.net/sbcl/+bug/1660906

It is a fix introduced 1.3.15 with the description bug fix: better console IO on Windows.

I don't think that is the issue, leaving it here just for the record.

Aaaaand, in version 1.1.2: enhancement: Console I/O streams use UCS-2.

This is why the old binary works and the new one doesn't.

I guess this is the code that does that:

https://github.com/sbcl/sbcl/blob/202e00628820daf9a6d141daea222273cb252f6d/src/code/fd-stream.lisp#L2557-L2566

I don't know what the solution is yet but I'm starting to get a better idea of how to approach it.

@rkoeninger just pushed a new branch named fix-win-console-io, can you try that one?

If that doesn't work we have to figure out another way to make a separate stream for STDIN. I know binding it to the file descriptor with id 0 will work on unix-like systems, not sure if it will work on windows.

It built, and when I ran shen.exe:

couldn't read from #<SB-SYS:FD-STREAM for "descriptor 0" {10001F0FE3}>: Access is denied.

And then it crashed.

I found something very weird. If I type anything else other than <enter> as the first character, it breaks in the same way you just saw (but without crashing). If I instead start the interaction with an <enter> (without typing anything else before), then the REPL works fine.

For me, shen.exe fails before I have the opportunity to provide any input.

If I type (READ-BYTE (SB-SYS:MAKE-FD-STREAM 0 :INPUT T :ELEMENT-TYPE '(UNSIGNED-BYTE 1) :EXTERNAL-FORMAT :UTF-8)) NIL -1) on the sbcl repl, i get the Access Denied and another repl prompt.

This library seems to have the ability to wrap a stream in another stream with a different format:

http://weitz.de/flexi-streams/#make-flexi-stream

It's built on gray streams:

https://github.com/sbcl/sbcl/blob/master/src/pcl/gray-streams.lisp

@rkoeninger I made some changes, one that fixes the intermittent io errors I was having on OSX when using the new branch, another one that may fix the crash you were having on windows, give it another try when you get a chance.

And just in case, don't think I'm not considered your suggestions, it is just that I'm trying to solve this at the right level, wrapping the streams will cause different issues because doing so just masks the problem instead of solving it, and as a result we may end having different problems (like for example, what happens when STDIN is not the console, but a redirected file).

fix-win-console-io no longer builds on my windows machine:

; caught ERROR:
;   during macroexpansion of
;   (EXTERN-ALIEN "_get_osfhandle" (FUNCTION INT INT-PTR)). Use *BREAK-ON-SIGNALS*
;   to intercept.
;
;    unknown alien type: COMMON-LISP-USER::INT-PTR

@rkoeninger mmm.... it is supposed to be defined on windows, maybe it needs some prefix (one I don't know of).

Does it work if you change INT-PTR to (SIGNED 64) (this is assuming your windows is 64 bits, if not, use 32 instead).

In the SBCL REPL, does this work? (sb-win32:get-osfhandle 0)

Edited: the prefix is sb-win32 instead of sb!win32

(SIGNED 64) allows it to build, but when I run shen.exe, i get copies of bad thing to be a type specifier: "EXCEPTION_ACCESS_VIOLATION" flying down the screen.

(sb!win32:get-osfhandle 0) raises Package SB!WIN32 does not exist..

@rkoeninger I borrowed a machine with windows, I can now try stuff without bothering you (thanks for your patience btw). I explained the situation in SBCL's irc, but the answers are not very helpful (it is basically, "well, shen shouldn't be doing that"), I'm not even being told if what I'm trying to do is possible at all or plain nonsense.

@rkoeninger just pushed a new branch fix-win-console-io-again. I don't like that fix at all, but it seems to work. Ctrl+C and Ctrl+D don't work like in other OSes, but I don't know if thats because of how the windows terminal works or something else (this is not something that older versions of Shen handled).

When I try to build the fix-win-console-io-again branch, I get:

Heap exhausted during garbage collection: 0 bytes available, 16 requested.
 Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB   LUB  !move  Alloc  Waste   Trig    WP  GCs Mem-age
   0:     0     0     0     0     0     0     0     0     0        0     0 10737418    0   0  0.0000
   1: 18423     0     0     0  9852     0     0     0     4 322732016 98320 10737418    0   0  1.0000
   2: 32459 32458     0     0 15639   118    12     0    48 516265008 453584  2000000 15604   0  0.0339
   3:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   4:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   5:     0     0     0     0     0     0     0     0     0        0     0  2000000    0   0  0.0000
   6:     0     0     0     0  1107   354     0     0     0 47874048     0  2000000 1062   0  0.0000
   Total bytes allocated    = 1073189920
   Dynamic-space-size bytes = 1073741824
GC control variables:
   *GC-INHIBIT* = true
   *GC-PENDING* = true
   *STOP-FOR-GC-PENDING* = false
fatal error encountered in SBCL pid 14784(tid 00000000011F71B0):
Heap exhausted, game over.

Welcome to LDB, a low-level debugger for the Lisp runtime environment.
ldb> quit
Really quit? [y] y

Sorry, I messed something up when writing the code again on my machine (what I did in the windows machine worked, I just could not commit from there so I just rewrote the code here, but made a mistake).

Already pushed the fixed version.

With your latest change, it builds and the shen repl handles input as expected.

I can even do Ctrl + D and hit Enter and it exits the shen repl.

Also, the test suite passes just fine (~5.1s), so I guess file reads are still good.

Good job, man! Thanks for all your time and effort on this!

I can even do Ctrl + D and hit Enter and it exits the shen repl.

Oh, thats great. The "windows" machine I was testing on was a VirtualBox VM running on someone else's Linux computer, I guess thats why those keys were being weird when I tested.

Btw, don't merge anything yet, I'm thinking of a cleaner solution that doesn't have to special-case windows or check for *stinput*, but it requires a small change in the kernel. I have to go now, but once it is done I will let you know so that you can give it a try and tell me your opinion.

This should be fixed now.