effekt-lang / effekt

A language with lexical effect handlers and lightweight effect polymorphism

Home Page:https://effekt-lang.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

String-related problems on the LLVM backend

jiribenes opened this issue · comments

Warning

I've split this to two separate issues: #503 and #504

Description

The following (already quite minimised [!]) program is supposed to:

  1. read a file
  2. split it into lines
  3. for each line, tell me the positions of the first and last occurrences of a given digit in the line

The program somewhat heavily uses strings and I/O, and acts wildly differently between the JS and the LLVM backends:

The buggy program

import io/files
import io/error
import io

def processLine(line: String) = {
    println(line)
    val numbers: List[Int] = [1, 2, 3, 4, 5, 6, 7, 8, 9]
    numbers.foreach { n =>
      val numberStr = n.show
      val firstSeen = line.indexOf(numberStr)
      val lastSeen = line.lastIndexOf(numberStr)
      println("n: " ++ n.show ++ " ('" ++ numberStr ++ "') is seen: first: " ++ firstSeen.show ++ " & last: " ++ lastSeen.show)
    }
}

def processAll(lines: String) = {
  lines.split("\n").foreach { line => processLine(line) }
}

def main() = {
  // processAll("kjrqmzv9mmtxhgvsevenhvq7\nfour2tszbgmxpbvninebxns6nineqbqzgjpmpqr")

  eventloop(box {
    with on[IOError].panic;
    with filesystem;

    val contents = do readFile("tiny.input")
    processAll(contents)
  })
}

where tiny.input is just

kjrqmzv9mmtxhgvsevenhvq7
four2tszbgmxpbvninebxns6nineqbqzgjpmpqr

The various behaviours

1. On the JS backend, I can uncomment the first processAll and everything just works :)

🔍 Expected output from the JS backend
kjrqmzv9mmtxhgvsevenhvq7
n: 1 ('1') is seen: first: None() & last: None()
n: 2 ('2') is seen: first: None() & last: None()
n: 3 ('3') is seen: first: None() & last: None()
n: 4 ('4') is seen: first: None() & last: None()
n: 5 ('5') is seen: first: None() & last: None()
n: 6 ('6') is seen: first: None() & last: None()
n: 7 ('7') is seen: first: Some(23) & last: Some(23)
n: 8 ('8') is seen: first: None() & last: None()
n: 9 ('9') is seen: first: Some(7) & last: Some(7)
four2tszbgmxpbvninebxns6nineqbqzgjpmpqr
n: 1 ('1') is seen: first: None() & last: None()
n: 2 ('2') is seen: first: Some(4) & last: Some(4)
n: 3 ('3') is seen: first: None() & last: None()
n: 4 ('4') is seen: first: None() & last: None()
n: 5 ('5') is seen: first: None() & last: None()
n: 6 ('6') is seen: first: Some(23) & last: Some(23)
n: 7 ('7') is seen: first: None() & last: None()
n: 8 ('8') is seen: first: None() & last: None()
n: 9 ('9') is seen: first: None() & last: None()
kjrqmzv9mmtxhgvsevenhvq7
n: 1 ('1') is seen: first: None() & last: None()
n: 2 ('2') is seen: first: None() & last: None()
n: 3 ('3') is seen: first: None() & last: None()
n: 4 ('4') is seen: first: None() & last: None()
n: 5 ('5') is seen: first: None() & last: None()
n: 6 ('6') is seen: first: None() & last: None()
n: 7 ('7') is seen: first: Some(23) & last: Some(23)
n: 8 ('8') is seen: first: None() & last: None()
n: 9 ('9') is seen: first: Some(7) & last: Some(7)
four2tszbgmxpbvninebxns6nineqbqzgjpmpqr
n: 1 ('1') is seen: first: None() & last: None()
n: 2 ('2') is seen: first: Some(4) & last: Some(4)
n: 3 ('3') is seen: first: None() & last: None()
n: 4 ('4') is seen: first: None() & last: None()
n: 5 ('5') is seen: first: None() & last: None()
n: 6 ('6') is seen: first: Some(23) & last: Some(23)
n: 7 ('7') is seen: first: None() & last: None()
n: 8 ('8') is seen: first: None() & last: None()
n: 9 ('9') is seen: first: None() & last: None()

2. On the LLVM backend, if I keep the first processAll commented out

Then I get only one output for the whole string combined for some reason?

EDIT: It's is simply that .split("\n") doesn't quite work for strings passed from the outside as the function processLine gets called only once.

Update: When tracing it, it seems like string::isSubstringAt is at fault, but only for the strings from the outside world.

kjrqmzv9mmtxhgvsevenhvq7
four2tszbgmxpbvninebxns6nineqbqzgjpmpqr
n: 1 ('1') is seen: first: None() & last: None()
n: 2 ('2') is seen: first: Some(29) & last: Some(29)
n: 3 ('3') is seen: first: None() & last: None()
n: 4 ('4') is seen: first: None() & last: None()
n: 5 ('5') is seen: first: None() & last: None()
n: 6 ('6') is seen: first: Some(48) & last: Some(48)
n: 7 ('7') is seen: first: Some(23) & last: Some(23)
n: 8 ('8') is seen: first: None() & last: None()
n: 9 ('9') is seen: first: Some(7) & last: Some(7)

3. On the LLVM backend, if I uncomment the first processAll

Then I get a hard crash with exit code 134 (but the first processAll part works correctly, everything is split as it should, no more overflows!)

kjrqmzv9mmtxhgvsevenhvq7
n: 1 ('1') is seen: first: None() & last: None()
n: 2 ('2') is seen: first: None() & last: None()
n: 3 ('3') is seen: first: None() & last: None()
n: 4 ('4') is seen: first: None() & last: None()
n: 5 ('5') is seen: first: None() & last: None()
n: 6 ('6') is seen: first: None() & last: None()
n: 7 ('7') is seen: first: Some(23) & last: Some(23)
n: 8 ('8') is seen: first: None() & last: None()
n: 9 ('9') is seen: first: Some(7) & last: Some(7)
four2tszbgmxpbvninebxns6nineqbqzgjpmpqr
n: 1 ('1') is seen: first: None() & last: None()
n: 2 ('2') is seen: first: Some(4) & last: Some(4)
n: 3 ('3') is seen: first: None() & last: None()
n: 4 ('4') is seen: first: None() & last: None()
n: 5 ('5') is seen: first: None() & last: None()
n: 6 ('6') is seen: first: Some(23) & last: Some(23)
n: 7 ('7') is seen: first: None() & last: None()
n: 8 ('8') is seen: first: None() & last: None()
n: 9 ('9') is seen: first: None() & last: None()
[error] Process exited with non-zero exit code 134.

EDIT: Investigated a bit closer, here's the sanitiser trace:

🔍 Sanitiser output
repro(11106,0x100d18580) malloc: nano zone abandoned due to inability to preallocate reserved vm space.
[...]
=================================================================
==10874==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x619000000080 in thread T0
    #0 0x1033592c0 in wrap_free+0x90 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x4d2c0) (BuildId: 7615c595d022355bb91dd63f44086bc832000000200000000100000000000b00)
    #1 0x102eed76c in topLevel+0x38 (repro:arm64+0x10000976c) (BuildId: 59fdb92c81573573a0ba6a4222172fd232000000200000000100000000000b00)
    #2 0x10305feb4 in uv_run+0x25c (libuv.1.dylib:arm64+0x7eb4) (BuildId: 208160f0746f3fb2a43dfcdb736ed89c32000000200000000100000000000b00)
    #3 0x102eefdc0 in k_300+0x1c (repro:arm64+0x10000bdc0) (BuildId: 59fdb92c81573573a0ba6a4222172fd232000000200000000100000000000b00)
    #4 0x10325d088 in start+0x204 (dyld:arm64+0x5088) (BuildId: 38ee9fe9b66d30668c5c6ddf0d6944c632000000200000000100000000060c00)
    #5 0x2d157ffffffffffc  (<unknown module>)

Address 0x619000000080 is a wild pointer inside of access range of size 0x000000000001.
SUMMARY: AddressSanitizer: bad-free (libclang_rt.asan_osx_dynamic.dylib:arm64+0x4d2c0) (BuildId: 7615c595d022355bb91dd63f44086bc832000000200000000100000000000b00) in wrap_free+0x90
==10874==ABORTING

where k_300 is:

define fastcc void @k_300(%Env %env, %Sp %sp) {
    entry:
        %x_8415p_301 = getelementptr {%Pos}, %Env %env, i64 0, i32 0
        %x_8415 = load %Pos, ptr %x_8415p_301
        %x_8414 = call %Pos @start_3001()
        %x_8413 = call %Pos @stop_3002()
        %tmp411540_6285p_302 = getelementptr {%Pos}, %Env %env, i64 0, i32 0
        store %Pos %x_8413, ptr %tmp411540_6285p_302
        %sp_304 = getelementptr %FrameHeader, %Sp %sp, i64 -1
        %retadrp_305 = getelementptr %FrameHeader, %Sp %sp_304, i64 0, i32 0
        %f_303 = load %RetAdr, ptr %retadrp_305
        tail call fastcc void %f_303(%Env %env, %Sp %sp_304)
        ret void
}

# OPT:
define fastcc void @k_300(ptr %env, ptr %sp) {
entry:
  %loop.i = tail call ptr @uv_default_loop()
  %run_result.i = tail call i32 @uv_run(ptr %loop.i, i32 0)
  %loop.i2 = tail call ptr @uv_default_loop()
  tail call void @uv_stop(ptr %loop.i2)
  tail call void @uv_loop_close(ptr %loop.i2)
  %sp_304 = getelementptr %FrameHeader, ptr %sp, i64 -1
  tail call void @llvm.memset.p0.i64(ptr noundef nonnull align 8 dereferenceable(16) %env, i8 0, i64 16, i1 false)
  %f_303 = load ptr, ptr %sp_304, align 8
  tail call fastcc void %f_303(ptr nonnull %env, ptr nonnull %sp_304)
  ret void
}

Update: Of course, if you move the processLines inside the eventloop, it "just" returns the wrong result (case 2.), so the core of 3. is just "doing things outside of the eventloop".

The issue 2. is hard to reproduce as on the LLVM REPL, as it works for normal strings:

> def foo(s: String): Unit = s.split("\n").foreach { x => println(x) }
foo: String => Unit
> foo("a\nb")
a
b
()

but when I try to pipe strings gotten from files into it, I run into a hard crash (possibly a different one than in 3.)

> def getInput(s: String) = { var out in global = ""; eventloop(box { with on[IOError].panic; with filesystem; out = do readFile(s) }); out }
getInput: String => String
> getInput("tiny.input")
[error] Process exited with non-zero exit code 134.
🔍 Here's a more detailed sanitiser trace
=================================================================
==11270==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x619000000080 in thread T0
    #0 0x1049712c0 in wrap_free+0x90 (libclang_rt.asan_osx_dynamic.dylib:arm64+0x4d2c0) (BuildId: 7615c595d022355bb91dd63f44086bc832000000200000000100000000000b00)
    #1 0x10455b2f0 in topLevel+0x40 (interactive:arm64+0x1000072f0) (BuildId: ac687bd67e75387bb75bb5090faec84932000000200000000100000000000b00)
    #2 0x1046bfeb4 in uv_run+0x25c (libuv.1.dylib:arm64+0x7eb4) (BuildId: 208160f0746f3fb2a43dfcdb736ed89c32000000200000000100000000000b00)
    #3 0x10455d9bc in k_332+0x1c (interactive:arm64+0x1000099bc) (BuildId: ac687bd67e75387bb75bb5090faec84932000000200000000100000000000b00)
    #4 0x10477d088 in start+0x204 (dyld:arm64+0x5088) (BuildId: 38ee9fe9b66d30668c5c6ddf0d6944c632000000200000000100000000060c00)
    #5 0x304dfffffffffffc  (<unknown module>)

Address 0x619000000080 is a wild pointer inside of access range of size 0x000000000001.
SUMMARY: AddressSanitizer: bad-free (libclang_rt.asan_osx_dynamic.dylib:arm64+0x4d2c0) (BuildId: 7615c595d022355bb91dd63f44086bc832000000200000000100000000000b00) in wrap_free+0x90
==11270==ABORTING

which seems like the very same bug as 2., if my eyes don't deceive me.

When formatted nicely, getInput looks like:

def getInput(s: String) = { 
  var out in global = "";
  eventloop(box { 
    with on[IOError].panic;
    with filesystem;
    
    // escape hatch, trying to return a string
    out = do readFile(s)
  })
    
  out
}

On the JS backend, it returns only an empty string (which makes sense as out has not been written into yet), so this is probably not the way to go about making a small testcase...

I added a sanitiser trace to 3. by applying the following patch:

diff --git c/effekt/jvm/src/main/scala/effekt/Runner.scala i/effekt/jvm/src/main/scala/effekt/Runner.scala
index 5eb85f65..d5f52921 100644
--- c/effekt/jvm/src/main/scala/effekt/Runner.scala
+++ i/effekt/jvm/src/main/scala/effekt/Runner.scala
@@ -209,7 +209,7 @@ object LLVMRunner extends Runner[String] {
   override def prelude: List[String] = List("effekt", "option", "list", "result", "exception", "string") // "array", "ref")
 
 
-  lazy val gccCmd = discoverExecutable(List("cc", "clang", "gcc"), List("--version"))
+  lazy val gccCmd = discoverExecutable(List("clang"), List("--version"))
   lazy val llcCmd = discoverExecutable(List("llc", "llc-15", "llc-16"), List("--version"))
   lazy val optCmd = discoverExecutable(List("opt", "opt-15", "opt-16"), List("--version"))
 
@@ -267,7 +267,7 @@ object LLVMRunner extends Runner[String] {
 
     val gccMainFile = (C.config.libPath / ".." / "llvm" / "main.c").unixPath
     val executableFile = basePath
-    val gccArgs = Seq(gcc, gccMainFile, "-o", executableFile, objPath) ++ libuvArgs
+    val gccArgs = Seq(gcc, "-fsanitize=address,undefined,leak,integer", gccMainFile, "-o", executableFile, objPath) ++ libuvArgs
     exec(gccArgs: _*)
 
     executableFile

Here's some more details: I tried printing the frees in @topLevel:

<called from `runPos`>
Freeing base: 0x144d2c800
Freeing base.0: 0x0
Freeing base.1: 0x619000000080
Freeing base.2: 0x0
Freeing env: 0x154d34800

<called from `run`> 
Freeing base: 0x144d2c800
Freeing base.0: 0x0
Freeing base.1: 0x619000000080
<uhoh, 💥>

It turns out that 2. is "just" mishandling escapes on the LLVM backends:

def main() = {
  eventloop(box {
    with on[IOError].panic;
    with filesystem;

    def analyse(str: String) = {
      println("Analysing: '" ++ str ++ "'")
      each (0, str.length) { i =>
        with on[OutOfBounds].panic;
        val c = str.charAt(i)
        println("char " ++ c.show ++ " (" ++ c.toInt.show ++ ")")
      }
    }

    val longstr = "kjrqmzv9mmtxhgvsevenhvq7\nfour2tszbgmxpbvninebxns6nineqbqzgjpmpqr"
    analyse(longstr)

    val contents = do readFile("day1-tiny.input") // same contents as `longstr`, just from a file
    analyse(contents)
  })
}

If you look into the log, you'll see:

Analysing: 'kjrqmzv9mmtxhgvsevenhvq7\nfour2tszbgmxpbvninebxns6nineqbqzgjpmpqr'
char k (107)
[...]
char \ (92)
char n (110)
[...]
Analysing: 'kjrqmzv9mmtxhgvsevenhvq7
four2tszbgmxpbvninebxns6nineqbqzgjpmpqr'
char k (107)
[...]
char 
 (10)
[...]

So actually the string read from a file is correct, the string literal isn't.

I've split this to two separate issues: #503 and #504