sentinel-hub / sentinel2-cloud-detector

Sentinel Hub Cloud Detector for Sentinel-2 images in Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Sentinel2 input example

Fahdben opened this issue · comments

Hi thanks for this python package!
Would it be possible to add an example of how to prepare and load the Sentinel2 bands (inputs) into the ndarray format (without using the WMS service).

Hello,

our sentinelhub Python package also supports obtaining original ESA data stored at AWS. This examples show how to obtain data into numpy arrays. However a disadvantage of this approach is that you can only obtain band data for entire S-2 tiles and different band images will have different spatial resolution.

The other option is to create a free trial account for the Sentinel Hub services. That will provide you with an instance ID which can be then used for running this package or for obtaining data with sentinelhub package via WMS/WCS services.

Thanks for the reply!

i wanted to try with a downloaded L1C package from open access hub and created the following script:

from s2cloudless import S2PixelCloudDetector
import numpy as np
import rasterio
from rasterio.warp import reproject, Resampling



with rasterio.open("/data/test/T32VNJ_20170403T104021_B01.jp2") as scl:
    B01=scl.read()
    tmparr = np.empty_like(B01)
    aff = scl.transform
    print(B01.shape)
    
with rasterio.open("/data/test/T32VNJ_20170403T104021_B02.jp2") as scl:
    B02=scl.read()    
    reproject(
        B02, tmparr,
        src_transform = scl.transform,
        dst_transform = aff,
        src_crs = scl.crs,
        dst_crs = scl.crs,
        resampling = Resampling.bilinear)
    B02 = tmparr
    print(B02.shape)

with rasterio.open("/data/test/T32VNJ_20170403T104021_B04.jp2") as scl:
    B04=scl.read()
    reproject(
        B04, tmparr,
        src_transform = scl.transform,
        dst_transform = aff,
        src_crs = scl.crs,
        dst_crs = scl.crs,
        resampling = Resampling.bilinear)
    B04 = tmparr
    print(B04.shape)
with rasterio.open("/data/test/T32VNJ_20170403T104021_B05.jp2") as scl:
    B05=scl.read()
    reproject(
        B05, tmparr,
        src_transform = scl.transform,
        dst_transform = aff,
        src_crs = scl.crs,
        dst_crs = scl.crs,
        resampling = Resampling.bilinear)
    B05 = tmparr
    print(B05.shape)
with rasterio.open("/data/test/T32VNJ_20170403T104021_B08.jp2") as scl:
    B08=scl.read()
    reproject(
        B08, tmparr,
        src_transform = scl.transform,
        dst_transform = aff,
        src_crs = scl.crs,
        dst_crs = scl.crs,
        resampling = Resampling.bilinear)
    B08 = tmparr
    print(B08.shape)
    
with rasterio.open("/data/test/T32VNJ_20170403T104021_B8A.jp2") as scl:
    B8A=scl.read()
    reproject(
        B8A, tmparr,
        src_transform = scl.transform,
        dst_transform = aff,
        src_crs = scl.crs,
        dst_crs = scl.crs,
        resampling = Resampling.bilinear)
    B8A = tmparr
    print(B8A.shape)
with rasterio.open("/data/test/T32VNJ_20170403T104021_B09.jp2") as scl:
    B09=scl.read()
with rasterio.open("/data/test/T32VNJ_20170403T104021_B10.jp2") as scl:
    B10=scl.read()
with rasterio.open("/data/test/T32VNJ_20170403T104021_B11.jp2") as scl:
    B11=scl.read()
    reproject(
        B11, tmparr,
        src_transform = scl.transform,
        dst_transform = aff,
        src_crs = scl.crs,
        dst_crs = scl.crs,
        resampling = Resampling.bilinear)
    B11 = tmparr
    print(B11.shape)
with rasterio.open("/data/test/T32VNJ_20170403T104021_B12.jp2") as scl:
    B12=scl.read()
    reproject(
        B12, tmparr,
        src_transform = scl.transform,
        dst_transform = aff,
        src_crs = scl.crs,
        dst_crs = scl.crs,
        resampling = Resampling.bilinear)
    B12 = tmparr
    print(B12.shape)

print(B12.shape)
bands = np.array([np.dstack((B01[0]/10000.0,B02[0]/10000.0,B04[0]/10000.0,B05[0]/10000.0,B08[0]/10000.0,B8A[0]/10000.0,B09[0]/10000.0,B10[0]/10000.0,B11[0]/10000.0,B12[0]/10000.0))])
print(bands.shape)
cloud_detector = S2PixelCloudDetector(threshold=0.4, average_over=4, dilation_size=2)
cloud_probs = cloud_detector.get_cloud_probability_maps(bands)
mask = cloud_detector.get_cloud_masks(bands).astype(rasterio.uint8)

with rasterio.open("/data/test/clouds.tif", "w",  driver='GTiff',compress="lzw",height=mask.shape[1],width=mask.shape[2],count=1,dtype=rasterio.uint8) as dest:
    dest.write(mask)
with rasterio.open("/data/test/clouds_prob.tif", "w",  driver='GTiff',compress="lzw",height=cloud_probs.shape[1],width=cloud_probs.shape[2],count=1,dtype=cloud_probs.dtype) as dest:
    dest.write(cloud_probs)

I dont get the /10000.0 part , but its needed to make it work.

Is it possible to get a comment on why the values was scaled like this?

The values you get after dividing with 10000 are called reflectance and have a theoretical meaning. They are supposed to be from interval [0, 1] (which is however not always the case).

ESA multiplies them with 10000 and stores them as integers in order to decrease the file sizes.

Thanks @AleksMat

Looks like my script is then working.

Do you have any recommendation on what sample size I should run this at? In the example i downsamled everything to 60m. I was not 100% impressed with the results i got. Should i try upsample everything to 10m instead?

Are there other things I should pay attention for?

I read your blog posts and wanted to ask, if you guys get better results later from more learning data. Will you then simply update the python package and, if i used this in an implementation, get the benefits of your work?

The reason i am looking at your work is that I bene using the Sen2cor and I have found it to have to many false positives on show surrounded by clouds. I was hoping to get a better cloud detection algorithm and remove those false positives.

one small comment to the 10000 also.

How does that relate to the bands data downloaded from L1C packages. thats uint8 between 0 and 255 values. still divide them by 10000 ?

Do you have any recommendation on what sample size I should run this at? In the example I
downsamled everything to 60m. I was not 100% impressed with the results i got. Should i try upsample everything to 10m instead?

We have ran tests at several different resolutions and it looks like that 160m resolution has the best performance/cost (CPU sec. required during inference time) ratio. But this might be problem specific. I would suggest you to run cloud detection over the same scene at different resolutions, down- or upscale the cloud probability maps or masks to some fixed resolution (i.e. 10m) and compare the results. This way you can get better insight into how resolution impacts the final result.

Are there other things I should pay attention for?

There are 3 knobs that you can tune and impact the final cloud mask:

  • threshold: by increasing/decreasing the threshold value the cloud detection efficiency and clear sky fake rate (i.e. land being classified as clouds) go down/up. We find the default value 0.4 to work best in average, but might be different for your scenes (areas of interest).
  • average_over and dilation_size: these two parameters depend on the resolution. At 10m resolution the recommended values are 22 and 11, respectively, and at 160m resolution the recommended values are 2 and 1. These two parameters have impact on size of the buffer region around the clouds.

I was not 100% impressed with the results I got.

Can you send us the details about your area of interest (bounding box and time interval)? Any feedback, being either positive or negative, is very valuable to us. It can help us identify scenes (or geographic areas) with specific land cover where the classification is more often wrong, which can be addressed in future trainings.

Sure. I will experiment abit with the resolution and when done playing I will post the result. Ids, regions and visual inspections. Possible comparison with sen2cor of the same location.

Expect a few days before I am done playing.

I read your blog posts and wanted to ask, if you guys get better results later from more learning data. Will you then simply update the python package and, if i used this in an implementation, get the benefits of your work?

More training data would definitely help. However, as pointed in the blog post the training sample is not easy to acquire. But as pointed out above, having feedback from users might help us to identify problematic regions, which can be addressed in future trainings.

Hello,

I have used code @pksorensen (pasted above) and I have received different layer mask then I was using WMS Requests.

I have used parameters: threshold=0.4, average_over=4, dilation_size=2
All bands has been resampled to pixel = 60m . Method of resampling: bilinear.

My algorithm produces unnecessary areas of mask. May you have any idea what is wrong?
How do you resample all bands into one size?

Bbox of my area is: [18.04779052734375, 53.875202055794965, 18.865441745827482, 54.084779979324]
Time = ('2018-03-17')

Code from example (WMS Request):
image

My code:
image