capstone-coal / pycoal

Python toolkit for characterizing Coal and Open-pit surface mining impacts on American Lands

Home Page:http://capstone-coal.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Make Small Files Default for Examples

theoilie opened this issue · comments

Running example_mineral.py took over 5 days to complete with 16GB of memory and a 3.4 GHz Intel i7 CPU that, according to Activity Monitor, ran at around 400% for most of the time. My output was the following:

(base) Theo-iMac:examples theo$ /Users/theo/anaconda3/bin/python3.7 example_mineral.py 
Saving /Users/theo/pycoal/examples/ang20150420t182050_corr_v1e_img_rgb.img
No overlap for target band 31 (0.687000 / -0.006435)
No overlap for target band 32 (0.664300 / -0.006565)
No overlap for target band 95 (1.268990 / -0.003520)
Saving /Users/theo/pycoal/examples/ang20150420t182050_corr_v1e_img_class.img
Traceback (most recent call last):
  File "example_mineral.py", line 167, in <module>
    sys.exit(main())
  File "example_mineral.py", line 142, in main
    raise e
  File "example_mineral.py", line 135, in main
    run_mineral(image, slib)
  File "example_mineral.py", line 87, in run_mineral
    mineral_classification.classify_image(input_filename, classified_filename)
  File "../pycoal/mineral.py", line 298, in classify_image
    self.algorithm(image_file_name, classified_file_name, **self.args)
  File "../pycoal/mineral.py", line 144, in SAM
    pycoal.mineral.MineralClassification.filter_classes(classified_file_name)
AttributeError: module 'pycoal' has no attribute 'mineral'

It encountered an error (likely due to some kind of path problem or incomplete installation) that I won't be able to test unless I run the program for another 5 days. The problem is not the error, though. The problem is the length of time it would take me to test a fix for the error, and my proposed solution is to use much smaller examples by default. We could also have a flag that
can be optionally switched on to use the full example (the currently-used 18GB file).

A good small example (just for computing purposes, not necessarily for having meaningful data to analyze), is flight f100825t01p00r08 from 8/25/2010 UTC 18:55. It's about 1.11GB once unzipped. Direct download link: ftp://avoil:Gulf0il$pill@popo.jpl.nasa.gov/y10_data/f100825t01p00r08.tar.gz

I think that faster testing times will be crucial for any future contributions in the pixel processing parts of the program. Any thoughts on this? @lewismc

I COMPLETELY agree @Lactem thanks... +1 go ahead.

@aheermann is saying that on his laptop even the smaller example is taking about a full day. For even faster testing we could use this file (0.21 GB once unpacked):
ftp://avoil:Gulf0il$pill@popo.jpl.nasa.gov/y11_data/f110512t01p00r04.tar.gz

Files found from: https://aviris.jpl.nasa.gov/alt_locator/. Flight name for the linked data is: f180116t01p00r05.

Alternatively, flight f180201t01p00r05 from Hawaii is only 0.15GB when unpacked: ftp://avoil:Gulf0il$pill@popo.jpl.nasa.gov/y18_data/f180201t01p00r05.tar.gz

Whatever makes sense to run in a reasonable amount of time. For sure the classifications can take a long time. That justifies our current project :)

@lewismc I pushed a commit for this (link). It makes the input file configurable for the three main examples (mineral, mining, and environment). I want to make sure you think I'm on the right track with this before making similar changes to the rest of the examples.

Also, it seems that even this smaller file took 10 hours for me (iirc @aheermann said it took him 5 hours) when running mineral (mining and environment are very quick). This is the output from my log:

2019-09-26 13:56:31,118 Instantiated Mineral Classifier with following specification: -classifier function 'SAM'
2019-09-26 13:56:31,118 Starting generation of three-band RGB image from input file: 'f180201t01p00r05rdn_e/f180201t01p00r05rdn_e_sc01_ort_img.hdr' with following RGB values R: '680.0', G: '532.5', B: '472.5'
2019-09-26 13:58:48,046 Saving RGB image as 'f180201t01p00r05rdn_e_sc01_ort_img_rgb.hdr'
2019-09-26 13:58:48,062 Completed RGB image generation. Time elapsed: '0:02:16'
2019-09-26 13:58:48,066 Starting Mineral Classification for image 'f180201t01p00r05rdn_e/f180201t01p00r05rdn_e_sc01_ort_img.hdr', saving classified image to 'f180201t01p00r05rdn_e_sc01_ort_img_class.hdr'
2019-09-27 00:12:42,104 Completed Mineral Classification. Time elapsed: '10:13:54'

This is way too long.
We need to subset an image to something which can be completed in 30 minutes max. Even that is hellishly long for an example. Maybe we should make the unit test and the example the same small subset image? The algorithm seems to have slowed down quite some.

@lewismc I pushed some code cleanup for the examples that I think will help before changing the images (https://github.com/capstone-coal/pycoal/tree/ISSUE-165). I can now work on making a subset of the image.

In other news, I ran the mineral example again and it only took 3 hours. My computer might have fallen asleep during the 10-hour run, so I installed something that should ensure it stays awake for future tests. I'll let you know later this week if I can get an example working in under 30 minutes.

Excellent, please create a pull request for ISSUE-165 branch. Good work.

Update on this as well @Lactem please. Thank youy

For this one we have PR #171 awaiting review. Once that's merged we should be able to close this issue (it should actually close it automatically since I tagged it with the "closes" keyword). @lewismc