Nothing happens after first round of calculations in all-to-one mode
oskeng opened this issue · comments
When running in all-to-one mode, I get at "time taken to solve linear system" for all workers after the first round of calculations, but then nothing happens. Processes are, however, still going. See screenshot.
Pairwise works fine.
.ini as follows. Any help is very much appreciated.
[Circuitscape Mode]
data_type = raster
scenario = all-to-one
[Version]
version = 5.11.2
[Habitat raster or graph]
habitat_file = /home/oskeng/Dropbox/Jobb/MIUN/Projekt/Konnektivitet_Norrbotten/Analys/Circuitscape/Input/test/calc_resistance_test_20m.asc
habitat_map_is_resistances = true
[Connection Scheme for raster habitat data]
connect_four_neighbors_only = false
connect_using_avg_resistances = false
[Short circuit regions (aka polygons)]
use_polygons = false
polygon_file = False
[Options for advanced mode]
ground_file_is_resistances = false
source_file = (Browse for a current source file)
remove_src_or_gnd = keepall
ground_file = (Browse for a ground point file)
use_unit_currents = false
use_direct_grounds = false
[Mask file]
use_mask = false
mask_file = None
[Options for one-to-all and all-to-one modes]
use_variable_source_strengths = false
variable_source_file = None
[Options for pairwise and one-to-all and all-to-one modes]
included_pairs_file = (Browse for a file with pairs to include or exclude)
use_included_pairs = false
point_file = /home/oskeng/Dropbox/Jobb/MIUN/Projekt/Konnektivitet_Norrbotten/Analys/Circuitscape/Input/test/calc_focal_points_ras_test_20m.asc
[Calculation options]
solver = cg+amg
print_timings = True
parallelize = True
max_parallel = 14
[Output options]
write_cum_cur_map_only = true
log_transform_maps = false
output_file = /home/oskeng/Dropbox/Jobb/MIUN/Projekt/Konnektivitet_Norrbotten/Analys/Circuitscape/Output/test/all-to-one/out_test
write_max_cur_maps = false
write_volt_maps = false
set_null_currents_to_nodata = false
set_null_voltages_to_nodata = false
compress_grids = false
write_cur_maps = false
Could you send me the data files?
Could you send me the data files?
https://www.dropbox.com/s/bm4ixw9izrrcb5f/circuitscape_input.zip?dl=0
Edit: I used the "_20m" input
Many thanks!
Any feedback @ranjanan?
A bit of a hurry to decide on simulation method and prepare data and .ini for a full-scale run on the supercomputer. Any help is greatly appreciated.
Well, I tried also full scale simulations on the high-memory cluster but got the exact same behaviour:
From worker 7: [ Info: 2022-11-20 18:09:55 : Solving point 5 of 500
From worker 7: [ Info: 2022-11-20 18:10:17 : Solver used: AMG accelerated by CG
From worker 5: [ Info: 2022-11-20 18:10:37 : Solving point 3 of 500
From worker 2: [ Info: 2022-11-20 18:10:43 : Solving point 2 of 500
From worker 4: [ Info: 2022-11-20 18:10:44 : Solving point 1 of 500
From worker 6: [ Info: 2022-11-20 18:10:56 : Solving point 4 of 500
From worker 5: [ Info: 2022-11-20 18:10:59 : Solver used: AMG accelerated by CG
From worker 8: [ Info: 2022-11-20 18:11:02 : Solving point 6 of 500
From worker 3: [ Info: 2022-11-20 18:11:08 : Solving point 7 of 500
From worker 4: [ Info: 2022-11-20 18:11:08 : Solver used: AMG accelerated by CG
From worker 2: [ Info: 2022-11-20 18:11:10 : Solver used: AMG accelerated by CG
From worker 6: [ Info: 2022-11-20 18:11:18 : Solver used: AMG accelerated by CG
From worker 8: [ Info: 2022-11-20 18:11:25 : Solver used: AMG accelerated by CG
From worker 3: [ Info: 2022-11-20 18:11:29 : Solver used: AMG accelerated by CG
From worker 7: [ Info: 2022-11-20 18:21:12 : Time taken to construct preconditioner = 453.981257612 seconds
From worker 5: [ Info: 2022-11-20 18:21:36 : Time taken to construct preconditioner = 442.207463115 seconds
From worker 4: [ Info: 2022-11-20 18:21:47 : Time taken to construct preconditioner = 437.223975284 seconds
From worker 3: [ Info: 2022-11-20 18:21:49 : Time taken to construct preconditioner = 439.184317842 seconds
From worker 6: [ Info: 2022-11-20 18:21:51 : Time taken to construct preconditioner = 438.728753843 seconds
From worker 2: [ Info: 2022-11-20 18:22:47 : Time taken to construct preconditioner = 490.79701734 seconds
From worker 8: [ Info: 2022-11-20 18:22:49 : Time taken to construct preconditioner = 486.515464396 seconds
From worker 4: [ Info: 2022-11-20 19:33:37 : Time taken to solve linear system = 4295.120061158 seconds
From worker 6: [ Info: 2022-11-20 19:33:53 : Time taken to solve linear system = 4312.302845471 seconds
From worker 5: [ Info: 2022-11-20 19:35:10 : Time taken to solve linear system = 4395.431408273 seconds
From worker 7: [ Info: 2022-11-20 19:35:48 : Time taken to solve linear system = 4457.293068107 seconds
From worker 2: [ Info: 2022-11-20 19:35:53 : Time taken to solve linear system = 4377.015238594 seconds
From worker 3: [ Info: 2022-11-20 19:36:01 : Time taken to solve linear system = 4438.088842233 seconds
From worker 8: [ Info: 2022-11-20 19:37:00 : Time taken to solve linear system = 4441.687107102 seconds
Then nothing happened until I canceled the job the day after:
[ Info: 2022-11-20 17:35:55 : Precision used: Double
[ Info: 2022-11-20 17:35:55 : Starting up Circuitscape to use 7 processes in parallel
[ Info: 2022-11-20 17:36:17 : Reading maps
[ Info: 2022-11-20 17:43:24 : Resistance/Conductance map has 277679571 nodes
[ Info: 2022-11-20 18:03:15 : There are 277679571 points and 1 connected components
slurmstepd: error: *** JOB 21086745 ON b-cn0549 CANCELLED AT 2022-11-21T14:18:07 ***
But even though "nothing happened" it used a steady 1.33 TiB of memory for ~20 hrs:
Abolutely lost here. Any idea, @ranjanan ?
#373 does indeed fix this!