sentinelsat / sentinelsat

Search and download Copernicus Sentinel satellite images

Home Page:https://sentinelsat.readthedocs.io

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maximum number of concurrent flows achieved

aRay1010 opened this issue · comments

I am using a python script to query and download S2L2A products(tiles). I need to download multiple tiles, and I am using path filter to download only selected bands.
I have created path filters as below:

path_filter1 = make_path_filter("*_B0[24]_10m.jp2")
path_filter2 = make_path_filter("*_B0[24]_20m.jp2")
api.download(id,
             directory_path=download_directory, nodefilter=path_filter1)

api.download(id,
             directory_path=download_directory, nodefilter=path_filter2)

When I download one tile it works and downloads, but when I request for multiple tiles, I am getting "An exception occured while creating a stream: Maximum number of 4 concurrent flows achieved by the user ".
Also I am downloading serially, for few requests I started getting 504 Gateway Time-out am not understanding how I am achieving 4 concurrent requests ? Is there a limit for requests in a given timeframe?

commented

A complete minimal working example would be great to reproduce and debug this efficiently.

I will try to upload a working example, although the original workflow is on aws lambda.

I have the same problem here. Moreover, this appears to be a new problem which I had not had some time ago (end of last year). But I checked both versions sentinelsat 1.1.1 (released this year) and 1.1.0 (with which I most likely worked before) and the problem is present with both version.

Here is a minimal code example. On my device, it produced an ServerError: 504 Gateway Time-out on the first try. Usually, sometimes the error is raised, sometimes not. Very unpredictable, but not rare. So, after a few tries, it is usually raised. If somebody can reproduce the error with this code, please let me know.

Also note that this error is very rare if the nodefilter is removed. Maybe even not existent. For instance, removing the nodefilter parameter in the code example resulted in a download without error on my device.

import datetime
from pathlib import Path

import sentinelsat

wkt_valencia = "POLYGON ((-0.72499 39.2492, -0.109267 39.2492, -0.109267 39.6981, -0.72499 39.6981, -0.72499 39.2492))"
date_now = datetime.datetime.utcnow()
date_with_delta_ago = date_now - datetime.timedelta(days=5)
date = (date_with_delta_ago, date_now)

client = sentinelsat.SentinelAPI(
    api_url="https://apihub.copernicus.eu/apihub/",
    user=your_username,
    password=your_password,
    timeout=(None, None),
)
products = client.query(
    area=wkt_valencia,
    area_relation="Intersects",
    date=date,
    producttype="S2MSI2A",
)
product_id = list(products)[0]
tempdir = Path("test_downloads/sen2")
tempdir.mkdir(exist_ok=True, parents=True)
client.download(
    id=product_id,
    directory_path=tempdir.as_posix(),
    nodefilter=sentinelsat.make_path_filter("*B12_20M.jp2"),
)

True the error is frequently observed when we have nodefilter, not in the case where nodefilter is removed and download is completed without error for the later case. Thanks @diddy449 for sharing the minimal working code.

commented

The detail that the issue only occurs when using a nodefilter could be important. @valgur knows the nodefilter logic best - anything that comes to mind, why that would cause a timeout or "maximum number of concurrent flows" issue? Do we make additional requests (I guess one for loading the manifest)? Otherwise, it could be that the server side started handling requests for specific nodes differently. I remember those being awfully slow at some point...?

One more information which might be helpful: The error can be consistent but depending on the area. For instance, in a timeframe (say of at least 30 minutes) there could always be an error for one area but never for another. Then checking later that day this could change.

Also, this might be a coincidence but since @j08lue mentioned being slow was an issue at some point: Last week the download was very slow (100-300kb/s) and it had nothing to do with my network connection. Now it is back to over one mb/s. But the error is still present.

Hi everyone,
Does anyone get an alternative option to using nodefilter function? I am also getting the same issue the download option works without nodefilter but with it gives following error: ServerError: 504 Gateway Time-out.

However, after some time I was able to download specific bands using nodefilter, I don't know how and why it works. I am attaching my code if anyone can get around it will be helpful.

Thank you
Girish

from sentinelsat import SentinelAPI
from sentinelsat import geojson_to_wkt
from sentinelsat import make_path_filter
from datetime import date
import os

# set up API connection
api = SentinelAPI('username', 'password', 'https://scihub.copernicus.eu/dhus')

# define search parameters
tile_id = 'T43QEU' # specify the tile ID you want to download
start_date = '20181210'
end_date = '20181220'
platform_name = 'Sentinel-2'
product_type = 'S2MSI2A'
band_ids = [ 3, 8] # specify the band numbers you want to download

# convert tile ID to WKT polygon
geojson = {
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "properties": {},
      "geometry": {
        "coordinates": [
          75.79552580417214,
          16.342775502579258
        ],
        "type": "Point"
      }
    }
  ]
}
wkt_polygon = geojson_to_wkt(geojson)

# search for products
products = api.query(wkt_polygon,
                     date=(start_date, end_date),
                     platformname=platform_name,
                     producttype=product_type)

for key, value in products.items():
    print(f"{key}: {value}")

print('\n',type(products),'\n\n')
# print('\n',products,'\n\n')
# download products
for product in products.items():
    print('\n',type(product[1]),'\n\n')
    print('\n',product[0],'\n\n')
    # get product info
    product_id = product[0]
    product_dict = product[1]
    product_title = product_dict['title']
    
    # create directory for product if it doesn't exist
    output_dir = f'/home/girish/Desktop/almatti_dam/Sentinel_data/Bands_products/{product_title}'
    if not os.path.exists(output_dir):
        os.makedirs(output_dir)


    band_filter = make_path_filter('*_B0[38]_10m.jp2')
    # api.download_all(product_id, output_dir, nodefilter = band_filter)
    product_info = api.get_product_odata(product_id)
    is_online = product_info['Online']
    # is_online = api.is_online(product_id)

    if is_online:
        print(f'Product {product_id} is online. Starting download.')
        api.download(product_id, output_dir, nodefilter=band_filter)
    else:
        print(f'Product {product_id} is not online.')
        api.trigger_offline_retrieval(product_id, output_dir, nodefilter=band_filter)

Note: It takes almost an hour after the first run for the images which are generally shown as offline in Copernicus Open Access Hub (generally the older images), while the online (current images) images can be downloaded in first run only.

commented

The transient nature of this issue and relation to online/offline status makes this a really nasty issue.

Since the Copernicus Open Access Hub / SciHub API is going away soon, I would not waste a lot more time on this but try to solve your use case with the new Copernicus Data Spaces or other repositories of Sentinel-2 L2A scenes.

For example with https://registry.opendata.aws/sentinel-2/, you just need to get an AWS account and use S3 commands to pull the items you need directly from S3. You can still use Sentinelsat for identifying the scenes you need, if you want. But better use the STAC V1.0.0 endpoint. It is going to be fast and reliable and the few bucks you have to pay for the data transfer (requester pays) are probably well worth your time. 🤷