How to download data?

Question

How to download data?

LighthouseInTheSea opened this issue 3 years ago · comments

When I try to call the following code
doc_download = doccano_client.get_doc_download(2, 'json')
print(doc_download.text)
`
<!doctype html>

<title>doccano - doccano</title>

Loading...

Operating System: windows
Python Version: 3.10.2
Package Version: doccano(1.5.5) doccano-client(1.0.3)

Hiroki Nakayama commented 2 years ago

fixed #59

海中灯塔 · Answer 1 · Fri Mar 11 2022 11:21:34 GMT+0800 (China Standard Time)

return self.get_file( "v1/projects/{project_id}/docs/download".format(project_id=project_id), params={"q": file_format, "onlyApproved": str(only_approved).lower()}, headers=headers, )

http://xxx.xx.xxx.xx:xxxx/v1/projects/13/download

Error requesting address?

Pedro Queirós · Answer 2 · Mon Apr 11 2022 17:18:13 GMT+0800 (China Standard Time)

For those struggling with the same, I copied some code from another issue and added the zip file creation. It's all a bit obscure so I'm reposting it here

def export_project(project_id,save_path):
    result = doccano_client.post(f'{doccano_client_url}v1/projects/{project_id}/download', json={'exportApproved': False, 'format': 'JSONL'}) 
    task_id = result['task_id']
    while True:
        result = doccano_client.get(f'{doccano_client_url}v1/tasks/status/{task_id}')
        if result['ready']:
            break
        time.sleep(1)
    result = doccano_client.get_file(f'{doccano_client_url}v1/projects/{project_id}/download?taskId={task_id}')
    with open(save_path, 'wb') as f:
        for chunk in result.iter_content(chunk_size=8192): 
            f.write(chunk)

William · Answer 3 · Wed Apr 13 2022 18:56:30 GMT+0800 (China Standard Time)

@PedroMTQ This is great, thanks.

David Engelmann · Answer 4 · Thu Apr 28 2022 22:42:46 GMT+0800 (China Standard Time)

For those struggling with the same, I copied some code from another issue and added the zip file creation. It's all a bit obscure so I'm reposting it here

def export_project(project_id,save_path):
    result = doccano_client.post(f'{doccano_client_url}v1/projects/{project_id}/download', json={'exportApproved': False, 'format': 'JSONL'}) 
    task_id = result['task_id']
    while True:
        result = doccano_client.get(f'{doccano_client_url}v1/tasks/status/{task_id}')
        if result['ready']:
            break
        time.sleep(1)
    result = doccano_client.get_file(f'{doccano_client_url}v1/projects/{project_id}/download?taskId={task_id}')
    with open(save_path, 'wb') as f:
        for chunk in result.iter_content(chunk_size=8192): 
            f.write(chunk)

I slightly modified the end of the function to avoid writing to disk. It returns the results as a list of json blocks. I also use the baseurl from the doccano_client object instead of the doccano_client_url parameter.

...
result = doccano_client.get_file(f'{doccano_client.baseurl}v1/projects/{project_id}/download?taskId={task_id}')
file_like_object = BytesIO(result.content)
zipfile_obj = ZipFile(file_like_object)
data = zipfile_obj.open(zipfile_obj.namelist()[0]).read().splitlines()
data = [json.loads(line) for line in data]
return data

Cheers!

Peter McCabe · Answer 5 · Thu Aug 18 2022 23:11:20 GMT+0800 (China Standard Time)

Is there anyway to download the documents and include metadata?