orthecreedence / cl-async

Asynchronous IO library for Common Lisp.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fix no-overlap test

orthecreedence opened this issue · comments

It's failing in linux, and pretty much renders cl-async useless for TCP communication. Need to figure out when this happened, although the last time it tested successfully was many, many commits ago.

Strange, it doesn't fail on my machine. Does *buffer-writes* affect it?

Nope, that was my first guess. I dug in a bit last night and it looks like it's some kind of test timeout issue, not that the data is actually "crossing streams." So maybe the content mixing is a drakma-async bug, or maybe it's still a cl-async bug in a different place. Need to keep digging!

Let me know if there is anything I can do to help. This is a major blocker for me and Im glad to help in any way I can :)

Have you noticed this with any other projects or just drakma-async? Also, any test case you can give me would be really helpful.

Sure I will try exact same work flow with carrier and see. I use drakma async with older cl-async that uses libevent and doesn't have the issue though. If carrier shows the same issue I will I try and get a simple repo case for you.

Sorry the delay, going to try with carrier today and repo with it. :)

yeah I replaced drakma-async with carrier call and it does same thing. Its like switching to libuv seemed to have added issue where if multiple cl-async requests for data (using drakma or carrier) crosses streams and gets the content from multiple requests mixed together.

Actually upon tracking this down more. It looks like it might be wookie when its sending the responses that is causing the mixing. Which would also explain why it happens with drakam and carrier (since it is neither of those and is instead when the responses are send back to client).

Still tracking down though, will keep you posted.

Yeah it is definitely wookie reponse handling. I took an original js file from webserver, and proxied it though wookie using drakma, and took that result right before calling wookie:send-response and output it to file, and diffed that file with original and there was zero diff, then I took what wookie gave me in the browser and it was different, its like the js file got scrambled up a little.

Okay using the latest libuv and all latest master code, I dont get overlap anymore under load. But, instead I get some sort of strange encoding issue. Take a look at the diff highlighted in red at bottom of screenshot. Whats wierd is it only does it sometimes (like 2/3 of time).
screenshot_2015-02-21-16-07-39

Okay did a site with multiple js files and they are completely mixing together. So its not just 1 stream stomping on itself, its multiple streams mixing together like I originally thought :(

Is there some global used for writing to stream or maybe the streams themselves are shared somehow?

Okay I finally found a good way to debug this, so found the issue and fix it locally and seems to work. Will make a pull request shortly. Basically output-buffer is being stomped on inside write-to-uvstream when libuv is processing the buffer, another request is coming in and altering output-buffer. I just did a (copy-seq buffer-c) to it seemed to fix it. Not sure if there is a way to lock it so that copies dont have to be made or not but it fixes the issue.

(defun write-to-uvstream (uvstream data &key start end)                                                                                                                           
  "Util function to write data directly to a uv stream object."                                                                                                                   
  (do-chunk-data data *output-buffer*                                                                                                                                             
    (lambda (buffer-c bufsize)                                                                                                                                                    
      (let ((req (uv:alloc-req :write))                                                                                                                                           
            (buf (uv:alloc-uv-buf (static-vectors:static-vector-pointer (copy-seq buffer-c)) bufsize)))                                                                                      
        (let ((res (uv:uv-write req uvstream buf 1 (cffi:callback streamish-write-cb))))                                                                                          
          (uv:free-uv-buf buf)                                                                                                                                                    
          (unless (zerop res)                                                                                                                                                     
            (let ((streamish (getf (deref-data-from-pointer uvstream) :streamish)))                                                                                               
              (uv:free-req req)                                                                                                                                                   
              (error (errno-event streamish res))))                                                                                                                               
          (attach-data-to-pointer req uvstream))))                                                                                                                                
    :start start                                                                                                                                                                  
    :end end)) 

Nick, thanks a ton for tracking this down. I suspected it had something to do with buffering but wasn't completely sure. Are you by any chance running Wookie in multiple threads? Otherwise locking shouldn't be an issue.

I'll look into this code and see what I can do. Sorry for the month-long hiatus, things got really busy at work the past month and I could barely look at a screen at the end of each day, much less code =].

Not running in multi thread mode just normally :)

This is really weird, I'm not able to reproduce at all on win or linux. Going to keep plugging away though.

I start up a wookie, and proxy a web page that has a bunch of css, js, images etc, and the file contents get crossed as shown above. This is done in a pre-route filter. If I do a buffer copy as shown in the pull request above the issue goes away. Hope that info helps you repo a little easier. If I do a few requests especially in serial there is no issue at all. It's only when doing a bunch of parallel requests for differing files (like when proxying a webpage) that it happens. I think it's when say 10 requests come in to wookie and each request makes a carrier/drakma request for original content and those requests responses come back in chunks (some from fileA some from fileB) and those chunks each get written to the output buffer as they come and so get mixed up. If that makes sense?

I'm doing simultaneous carrier requests (20 of them per run) for a number of JS files served by wookie and comparing the md5s of the files coming back to the known md5 sums of the files (precomputed). I haven't gotten one mismatch, after testing about 50 times.

Can you make sure you're loading fast-io from master?

So for example: proxying content with wookie. Two request come in at same time, one for js, one for css, the js wookie request makes a request for the original js file via carrier, at the same time the css wookie request makes via carrier as well (as they both are non blocking and so wookie switches bestrewn the two wookie requests). The js carrier request gets some data back, writes to buffer, then the css carrier request gets some data back and writes to buffer, then the js carrier request gets some more data and writes to output buffer (the data is now mixed) and the mixed response is send as the response from the js original js wookie request. Not sure if that helps or makes it more confusing.

But, basically when accepting lots of requests at same time with wookie and those request themselves make async network requests the output buffer seems to get trampled.

Yeah I'm use latest master of fast-io and fast-http.

It's not when wookie itself is serving the files. It's when wookie requests themselves make outbound async requests.

Request 1 a.js => wookie => carrier request to get a.js from a different server.

Request 2 b.css => wookie => carrier request to get b.css from a different server.

Request 1 gives back result with a.js and b.css mixed together under load.

Hope that helps :)

Ah ok, thanks for describing the proxying, I didn't get that the first time.

Either way, I'm really surprised output-buffer is getting overwritten because it's only ever used synchronously. Let me do some more testing with the proxying aspect.

Try and make sure the resources you proxy have some fairly large chunked and non-chunked responses too (lots of the files I proxy arelarger and might contribute to issue). :)

Thanks again for looking into it. Let me know if I can give any more info to help you repo the issue.

How large are we talking? I'm mostly doing <= 200K (just javascript files) but they are definitely over the 64K buffer limit (they should trigger multiple chunks/writes).

I rewrote the test to proxy back to an apache process, and getting the same result (all the MD5s are coming back fine). You're on linux/sbcl right? I'm doing linux/sbcl as well, not seeing the overlap =[. Any other details that might help?

I can bring my server my up with master wookie (not my patched version and uri you can hit) so you can see effect maybe? I'm running on Ubuntu 14.10 and sbcl 1.2.8 and latest quick lisp with master repos in local-project.

Out of curiosity, what libuv are you using? I just upgraded to 1.4.2 in case that was the problem, can you do the same (you might have to compile from source)?

I'm using 1.4.0. I can try 1.4.2 after you take a look at the two uris above. Let me know and I will take the uris down and upgrade libuv :)

1.4.0 is probably fine, I was on 1.0.0-rc2

See it now =]. Had to load a 3.5mb file.

Okay great! So you don't need the test uris above anymore? Safe to take down?

Yeah please take them down. However, I'm noticing a failure only on large files, consistently. So any 3mb file fails, whether other requests are coming through or not. Then when I curl that file through the wookie proxie, the MD5 matches.

Are you seeing this on random files or different ones per-request, or is there any kind of consistency?

It's seem to vary as you can see in the first uri above. If you full refresh a few times, each time it effects different files (sometimes even working perfectly with no issues at all). If you then take a file that got mixed and ask for just that file in browser alone its perfectly fine.

The second uri (patched) works every time perfectly.

Ok, thanks. I'll keep working to reproduce.

Thanks let me know if I can help in any way or if you accept the pull request (in which case I will switch back to master instead of my fork).

Thanks again :)

Ok, can you pull cl-async and wookie (both master)? I implemented your copying concept (I think copying the output buffer is the only way). Also pushing some somewhat unrelated fixes in Wookie. Oh, and cross your fingers =]

Sure thing, I will test it out tonight. Thanks!!!

This is working! Thanks :)