scramjetorg / scramjet

Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.

Home Page:https://www.scramjet.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Batch issues yet again.

MichalCz opened this issue · comments

Hi Thanks for great library.

but this issue happened again with some strange scenario

I have two functions like this one

one function work like expect but this one process stalled at first .batch() wiith no error no return just process timeout


` res.setHeader('Content-Type', 'application/json; charset=utf-8')
            StringStream
              .from(async function* () {
                const response = await axios.get(csv_url, {
                  responseType: 'stream'
                })
                yield* response.data.pipe(await stripBomStream())
              }, { maxParallel: 4 })
              .CSVParse({ skipEmptyLines: true, header: true })
              .filter((item: any) => (item.Class_Id && parseInt(item.Class_Id) !== 0 && item.Class_Discount > 0))
              .map(async (item: any) => {
                
                const master_name = master_subgroup.find(
                  (master_item: any) => (parseInt(master_item.id) === parseInt(item.Class_Id)))
              
                if (!master_name.name || master_name.name === "") return {}
                all_data_count++
                batch_data_count++
                console.log("pass 3")
                const class_obj = {
                  id: item.Class_Id,
                  name: master_name.name,
                  discount_rate: parseFloat(item.Class_Discount),
                  parent_id: master_name.parent_id,
                  status: "active"
                }
            
             
                return class_obj

              })
              .batch(500)  < process Stalled here
              .map(async (items: any) => {
                console.log("All Items is : " + items.length)
                const start_at = (batch_count * round_count) + 1
                round_count += 1
                const end_at = (start_at + items.length) - 1
                const range = start_at + "-" + end_at

                const unique_items = discount.objectToArray(discount.arrayToObject(items, "id"))
                const complete_subgroup = unique_items.filter((item: any) => {
                  return Object.keys(item).length > 0
                })
                console.log("Master group is : " + JSON.stringify(complete_subgroup))

            
                if (complete_subgroup.length > 0) {
                  const discount_obj = { subgroup: complete_subgroup }
                  console.log(discount_obj)
                  console.log("before send")
                  const update_result = await discount.update(req.params.uid, discount_obj)

             
                  console.log("before result")
                  console.log(update_result.success)
                  console.log("before return")
                  if (update_result.success) {
                    return JSON.stringify({
                      rows: range,
                      received: items.length,
                      accepted: unique_items.length - 1,
                      status: "import_successful",
                      message: "ข้อมูลได้รับการบันทึกแล้ว"
                    }
                    )
                  } else {
                    return JSON.stringify({
                      rows: range,
                      received: items.length,
                      accepted: 0,
                      status: "problem_with_database",
                      message: "ไม่สามารถบันทึกข้อมูล ช่วงข้อมูลที่: " + range,
                    })
                  }
                } else {
                  return JSON.stringify({
                    rows: range,
                    received: items.length,
                    accepted: 0,
                    status: "data_id_not_exits",
                    message: "ช่วงข้อมูลที่: " + range + " รหัสไม่มีอยู่จริงในระบบ ",
                  })
                }
              })
              .catch((error: any) => {
                console.log("catch block 1" + error.stack)
                if (error.code === "ERR_SCRAMJET_EXTERNAL") {
                  res.statusCode = error.cause.response.status
                  const error_obj = makeErr("can_not_reach_product_api",
                    "ไม่สามารถติดต่อ Product API ได้ กรุณาตรวจสอบ",
                    error.cause.response.status,
                    process.env.PRODUCT_API_URL)
                  res.send(JSON.stringify(error_obj))
                  return
                } else {
                  res.statusCode = 422
                  const error_obj = makeErr(error.code, error.message, 422)
                  res.send(JSON.stringify(error_obj))
                  return
                  //  return JSON.stringify(error_obj)
                }
              })
              .batch(10000)
              .stringify((resp: any) => {
                // console.log(resp)
                res.statusCode = 200
                return "[" + resp + "]"
              })
              .catch((error: any) => {
                console.log("catch block 3")
                if (error.code === "ERR_SCRAMJET_EXTERNAL") {
                  res.statusCode = error.cause.response.status
                  const error_obj = makeErr(error.cause.response.data.code,
                    error.cause.response.data.message,
                    error.cause.response.status)
                  //res.send(JSON.stringify(error_obj))
                  return JSON.stringify(error_obj)
                } else {
                  res.statusCode = 500
                  const error_obj = makeErr(error.code, error.message, 500)
                  // res.send(JSON.stringify(error_obj))
                  return JSON.stringify(error_obj)
                }
              })
              .toStringStream()
              .pipe(res)`


Originally posted by @KwangGan in #2 (comment)

@KwangGan if you could post a sample CSV link here so we could try to reproduce the problem.

Two things may be important:

  1. The csv should include similar data - like if there's some Thai chars it would be good to provide similar example
  2. The length of the csv - if that's smaller than the batch size we should keep it that way.

Here's the overall bugfixing procedure:

  1. Isolate the test case reproducing the issue
  2. Add the test case to scramjet/test/methods (or if that's multi-method we'll find a better place)
  3. Fix the issue.
  4. Confirm issue resolution on your end.
  5. Bugfix release.

Thank you for reopen issue

  1. I tried decrease batch size to 1 or increase to 500 but no luck.

  2. This is my CSV file that contain only EN alphabet.
    https://ybin.me/p/b2a673d1480d1c94#WGKBUy47K9F/ZaRyg5/xEd581xCW9hy+E6VkQOfmCKE=

  3. this is my code using multer_s3 and express js for handle request and upload.
    with this code I managed to upload and parse file URL to axios and pipe stream data to Stringstream convert everything to the new Json object and then process stalled at batch function

https://ybin.me/p/4f62a44fa86cb9a7#W2Asc7kv9vX2JQw2D8TqUikJPgwMRiB4Sjyhz73fUxs=

  1. here my package.json
    https://ybin.me/p/c96db83f4852f288#LDqRVel2AMDFcBhcbxZG0oqHQv5pc820fNhP1XYF6Dk=

node 10.15

Best Regards

My solution for now is Downgrade to v4.23 and everything works as expect :D

Ok, good, I'll check the differences over the weekend and I'll try to figure out what went down.

4.23 may rely on a bit older papaparse, but that should not affect the parsing. There may be some things missing around rate and exec, but I don't see you using this.

@KwangGan I have tested batch in your scenario and I don't think it's our case (I've added some special tests that should cover the case).

I believe your code may be affected by a change in pull which in turn handles the function* slightly different. Can you try a slightly different way of getting the stream?

StringStream
              .from(async function () {
                const response = await axios.get(csv_url, {
                  responseType: 'stream'
                })
                return response.data.pipe(stripBomStream())
              }, { maxParallel: 4 })

I'm not sure if this helps but the answer will lead me somewhere...

@KwangGan could you create some kind of a repo (or fork of scramjet) that would have a test for you scenario?

commented

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.