scramjetorg / scramjet

Public tracker for Scramjet Cloud Platform, a platform that bring data from many environments together.

Home Page:https://www.scramjet.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

'Sequence unpack failed' issue

tomkeee opened this issue · comments

I am trying to run a sequence, that includes the files below. Unfortunatelly, whenever i try to start my sequence (si seq start <instance id>) I get an error { message: 'Sequence unpack failed', exitcode: 10 }

main.py

from scramjet.streams import Stream
import time
import asyncio

result = 0 

def count(x):
    result += 1

async def run(context,input):
    start = time.time()
    with open(input) as file_in:
        x = (Stream
            .read_from(file_in)
            .map(count)
        )
    execution_time = time.time() - start
    return f"The number of interpreted lines: {result}\nExecution time {execution_time}"

package.json

{
    "name": "@scramjet/big-input-file",
    "version": "0.22.0",
    "lang": "python",
    "main": "./main.py",
    "author": "XYZ",
    "license": "GPL-3.0",
    "engines": {
        "python3": "3.9.0"
    },
    "scripts": {
        "build:refapps": "yarn build:refapps:only",
        "build:refapps:only": "mkdir -p dist/__pypackages__/ && cp *.py dist/ && pip3 install -t dist/__pypackages__/ -r requirements.txt",
        "postbuild:refapps": "yarn prepack && yarn packseq",
        "packseq": "PACKAGES_DIR=python node ../../scripts/packsequence.js",
        "prepack": "PACKAGES_DIR=python node ../../scripts/publish.js",
        "clean": "rm -rf ./dist"
    }
}

requirements.txt
scramjet-framework-py

Hi @tomkeee thanks for your reporting.
How did you pack and send your sequence to our platform?
If you used our CLI what version (si -v), Did you used si sequence deploy or si seq pack <package> & si seq send <package>

Hi @daro1337,
I used si seq pack <package> & si seq send <package>
CLI version is 0.28.3

@tomkeee Did you try to pack and send sequence again?
I have successfully run the sequences from code snippet above, but there were some code erros:

2022-09-16T12:51:51.416Z DEBUG Host Request [
  'date: 2022-09-16T12:51:46.814Z, method: POST, url: /api/v1/sequence/83840e85-c1be-42ce-8906-c19523c368f7/start, status: 200'
]
    return future.result()
  File "//runner.py", line 54, in main
    await self.run_instance(config, args)
  File "//runner.py", line 215, in run_instance
    result = await result
  File "/package/main.py", line 12, in run
    with open(input) as file_in:
TypeError: expected str, bytes or os.PathLike object, not Stream

Can you inspect your package.tar.gz archive?

$ tar -ztvf issue125.tar.gz 
-rw-rw-r-- daro/1000       411 2022-09-16 14:45 main.py
-rw-rw-r-- daro/1000       672 2022-09-16 14:46 package.json
-rw-rw-r-- daro/1000        22 2022-09-16 14:46 requirements.txt

issue125.tar.gz

@daro1337
I have no issues with sending the sequence (it actually works). The problem is that I am not able to run the sequence. It fails at si seq start <instance-id>. I tried to use it on my local machine (code below) and it worked as it should

 from scramjet.streams import Stream
import time
import asyncio

result = 0 

def count(x):
    global result
    result += 1

async def run(context,input):
    start = time.time()
    with open(input) as file_in:
        
        data = file_in.read()
        for i in data:
            count(i)
        global result
        print(f"result form file.read(): {result}")
        result = 0

        x = (Stream
            .read_from(file_in)
            .map(count)
        )
    execution_time = time.time() - start
    return f"The number of characters: {result}\nExecution time {execution_time}"

res =asyncio.run(run({},"new.csv"))
print(res)

The result is:

result from file.read(): 327
The number of characters: 0
Execution time 0.00022649765014648438

I am trying to figure out what goes wrong in starting the sequence at the platform

@daro1337
Same issue occurs even when I try to start sequence with the very basic code provided on your github repo - link (as shown below)

hello.py

from scramjet.streams import Stream

def run(context, input):
    return Stream.read_from(input).map(lambda s: f"Hello {s}!")

package.json

{
    "name": "@scramjet/python-hello-python",
    "version": "0.22.0",
    "lang": "python",
    "main": "./hello.py",
    "author": "Jan Warchoł <open-source@scramjet.org>",
    "license": "GPL-3.0",
    "repository": {
        "type": "git",
        "url": "https://github.com/scramjetorg/transform-hub.git"
    },
    "engines": {
        "python3": "3.9.0"
    },
    "scripts": {
        "build:refapps": "yarn build:refapps:only",
        "build:refapps:only": "mkdir -p dist/__pypackages__/ && cp *.py dist/ && pip3 install -t dist/__pypackages__/ -r requirements.txt",
        "postbuild:refapps": "yarn prepack && yarn packseq",
        "packseq": "PACKAGES_DIR=python node ../../scripts/packsequence.js",
        "prepack": "PACKAGES_DIR=python node ../../scripts/publish.js",
        "clean": "rm -rf ./dist"
    }
}

requirements.txt
scramjet-framework-py

Hi @tomkeee,
I reproduced your case based on our refapp hello. Could you please follow the steps below and let us know if you still get the same error? These steps are exactly what I did and the app worked just fine.

  1. clone reference-apps repo
git clone git@github.com:scramjetorg/reference-apps.git
  1. Go to hello dir:
cd python/hello
  1. Install requirements:
pip3 install -t __pypackages__/ -r requirements.txt

after this step you should see __pypackages__ in the app directory with its dependencies:
image

  1. Now leave hello dir:
cd ../
  1. Make .tar.gz package:
si seq pack hello
  1. send hello.tar.gz to the hub (make sure your CLI config is set for beta panel, token added, env set to production, etc.)
si seq send hello.tar.gz

the output you should see in the console:

$ si seq send hello.tar.gz

{"_id":"46b9fd3d-4762-4ce6-8b28-c579a9d595f7","host":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","client":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","log":{}}},"sequenceURL":"sequence/46b9fd3d-4762-4ce6-8b28-c579a9d595f7"}
  1. start the sequence:
si seq start -

the output you should see in the console:

$ si seq start -

{"host":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","client":{"apiBase":"https://api.beta.scramjet.cloud/api/v1/space/org-a10d5cb5-abc4-42c8-8327-4ca53c2e2a05-manager/api/v1/sth/sth-0/api/v1","log":{}}},"_id":"1145e550-6c5b-430d-b62b-f00bffc5f9f9","instanceURL":"instance/1145e550-6c5b-430d-b62b-f00bffc5f9f9"}
  1. get instance's output stream:
si inst output -

the request stays open, the instance awaits some input.
To deliver input data:

  1. open another terminal and copy the instance's ID from the output above (step 7.) and use it in the command:
si inst input 1145e550-6c5b-430d-b62b-f00bffc5f9f9

hit enter
10. type in some data, for example your nick "tomkeee" and hit enter, you should see Hello tomkeee in the first terminal.
image

have fun! 🤞🏼 and please let us know if it worked this time, thank you!

@tomkeee I checked on our platform this simple hello sample, and from my perspective it's working. Can you describe exactly how you running it?

In your sample, don't pack your input file with sequence. You can directly send this file to input of instance with our CLI.
si inst input <instance_id> <input_file>. As far as I know, we have some limit for size of sequence itself.
Second thing, you don't have to open file with python open and then open hook in with our framework. You can directly call
x = Stream.read_from(input). We expect that from .run() function user return Stream, not str, so if you want to see str on output, you can use same structure, as you mentioned in hello sample.
return Stream.read_from(input).map(lambda s: f"Hello {s}!").

Please note that map function takes all input data, and map every chunk with lambda function here. So on output you are returning str type. Your count function is not returning anything, so it won't work. If you don't want to modify stream data, you can use .each method.

Hi @a-tylenda,

I tried it today and it worked successfully 🎉 (I can see that there were some fixes on the platform, so maybe I just had bad timing and was starting the sequences during the maintenance 🤔 ).

Hi @tomkeee - that's great to hear. I'll pin this thread for reference. :)