Jammy2211 / PyAutoGalaxy

PyAutoGalaxy: Open-Source Multiwavelength Galaxy Structure & Morphology

Home Page:https://pyautogalaxy.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Error when running search.fit

Conor-Larison opened this issue · comments

Hello, I am getting this error when running search.fit() in the introduction python notebook in the autogalaxy workspace. It looks like the error is in autofit. For context, this was run on a fresh conda environment using python 3.10.

2023-10-18 16:06:24,768 - autogalaxy.analysis.analysis - INFO - PRELOADS - Setting up preloads, may take a few minutes for fits using an inversion.
2023-10-18 16:06:25,454 - introduction - INFO - The output path of this fit is /Users/conor/autogalaxy_workspace/output/introduction/8b2bc1782c7e4c41686b415f10f06a12
2023-10-18 16:06:25,454 - introduction - INFO - Outputting pre-fit files (e.g. model.info, visualization).
2023-10-18 16:06:25,956 - introduction - INFO - Starting new Nautilus non-linear search (no previous samples found).
2023-10-18 16:06:25,957 - introduction - INFO - number of cores == 1
2023-10-18 16:06:25,957 - introduction - INFO - Creating multiprocessing Pool of size 1...
2023-10-18 16:06:25,958 - autofit.non_linear.parallel.sneaky - INFO - ... using multiprocessing
#########################

Exploration Phase

#########################

Adding Bound 1: done
Ellipsoids: 0
Neural Networks: 0
Filling Bound 1: 0%| | 0/400 [00:00<?, ?it/s]

RemoteTraceback Traceback (most recent call last)
RemoteTraceback:
"""
Traceback (most recent call last):
File "/Users/conor/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py", line 125, in worker
result = (True, func(*args, **kwds))
File "/Users/conor/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py", line 48, in mapstar
return list(map(*args))
File "/Users/conor/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/parallel/sneaky.py", line 384, in fitness_cache
return FunctionCache.fitness(x, *FunctionCache.fitness_args,
AttributeError: type object 'FunctionCache' has no attribute 'fitness'
"""

The above exception was the direct cause of the following exception:

AttributeError Traceback (most recent call last)
Cell In[10], line 1
----> 1 result = search.fit(model=model, analysis=analysis)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/abstract_search.py:527, in NonLinearSearch.fit(self, model, analysis, info, bypass_nuclear_if_on)
520 self.pre_fit_output(
521 analysis=analysis,
522 model=model,
523 info=info,
524 )
526 if not self.paths.is_complete:
--> 527 result = self.start_resume_fit(
528 analysis=analysis,
529 model=model,
530 )
531 else:
532 result = self.result_via_completed_fit(
533 analysis=analysis,
534 model=model,
535 )

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/abstract_search.py:643, in NonLinearSearch.start_resume_fit(self, analysis, model)
640 self.timer.start()
642 model.freeze()
--> 643 self._fit(
644 model=model,
645 analysis=analysis,
646 )
647 samples = self.perform_update(
648 model=model, analysis=analysis, during_analysis=False
649 )
651 result = analysis.make_result(
652 samples=samples,
653 )

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/nest/nautilus/search.py:137, in Nautilus._fit(self, model, analysis)
135 else:
136 if not self.using_mpi:
--> 137 self.fit_multiprocessing(fitness=fitness, model=model, analysis=analysis)
138 else:
139 self.fit_mpi(fitness=fitness, model=model, analysis=analysis)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/search/nest/nautilus/search.py:244, in Nautilus.fit_multiprocessing(self, fitness, model, analysis)
241 self.output_sampler_results(sampler=sampler)
242 self.perform_update(model=model, analysis=analysis, during_analysis=True)
--> 244 sampler.run(
245 **self.config_dict_run,
246 )
248 self.output_sampler_results(sampler=sampler)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/nautilus/sampler.py:418, in Sampler.run(self, f_live, n_shell, n_eff, discard_exploration, verbose)
415 while (self.live_evidence_fraction() > f_live or
416 len(self.bounds) == 0):
417 self.add_bound(verbose=verbose)
--> 418 self.fill_bound(verbose=verbose)
419 if self.filepath is not None:
420 self.write(self.filepath, overwrite=True)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/nautilus/sampler.py:961, in Sampler.fill_bound(self, verbose)
959 points, n_bound, idx_t = self.sample_shell(-1, shell_t)
960 assert len(points) + len(idx_t) == n_bound
--> 961 log_l, blobs = self.evaluate_likelihood(points)
962 self.points[-1].append(points)
963 self.log_l[-1].append(log_l)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/nautilus/sampler.py:760, in Sampler.evaluate_likelihood(self, points)
758 result = list(zip(*result))
759 elif self.pool_l is not None:
--> 760 result = list(self.pool_l.map(self.likelihood, args))
761 else:
762 result = list(map(self.likelihood, args))

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/site-packages/autofit/non_linear/parallel/sneaky.py:499, in SneakierPool.map(self, function, iterable)
483 def map(
484 self, function: Callable,
485 iterable: Iterable
486 ):
487 """
488 Map a function over an iterable using the map method
489 of the initialized pool.
(...)
497
498 """
--> 499 return self.pool.map(function, iterable)

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py:367, in Pool.map(self, func, iterable, chunksize)
362 def map(self, func, iterable, chunksize=None):
363 '''
364 Apply func to each element in iterable, collecting the results
365 in a list that is returned.
366 '''
--> 367 return self._map_async(func, iterable, mapstar, chunksize).get()

File ~/opt/anaconda3/envs/autogalaxy/lib/python3.10/multiprocessing/pool.py:774, in ApplyResult.get(self, timeout)
772 return self._value
773 else:
--> 774 raise self._value

AttributeError: type object 'FunctionCache' has no attribute 'fitness'

pip install threadpoolctl==3.1.0

Common issue I'll do a release to fix this requirement issue properly! But that pip command will sort it.

hm, I am still recovering the same error after this pip line (restarted kernel, used pip in notebook itself, etc.).

OK yeah wrong error message lol.

The issue is that parallelization isn't supported in your Jupyter notebook. Are you on Windows?

For now, disabling parallel runs and using dybesty will fix it. So replace the nautilus code with this:

search = af.DynestyStatic(
path_prefix=path.join("searches"),
name="DynestyStatic",
nlive=50,
sample="rwalk",
force_x1_cpu=True,
number_of_cores=1,
)

Obviously replace things like name and path_prefix with what you're using.

I'll send full instructions tomorrow (it's late here in the UK) on how to fix the parallelization so you can get the speed up.

On macOS. Perhaps GoogleColab is the way to go for this.

I will try this fix and post an update tn, really appreciate all the support. Cheers!

OK I think I know how to fix it but will post in the morning once I'm on my laptop!

Took about 23 minutes on my local machine but the above solution worked on the introduction material! Thank you so much again for the help, will check back in tomorrow morning.

Ok, it'll require a few back-and-forth experiments as its do with understanding how new MacOS parallelizes things so let me know if you've got an hour free.

Basically, Jupyter notebook + parallelization often = crash.

First, can you run the notebook with number_of_cores=1 so I can understand if the error occurs even with 1 core (it still calls Python multiprocessing) when this happens:

search = af.Nautilus(
    path_prefix=path.join("imaging", "modeling"),
    name="start_here",
    unique_tag=dataset_name,
    n_live=150,
    number_of_cores=1,
    iterations_per_update=10000,
)

Next, can you run the Python script version to see if that fixes it:

https://github.com/Jammy2211/autogalaxy_workspace/blob/release/scripts/imaging/modeling/start_here.py

On the command line as python start_here.py

Finally, this script will hopefully fix it if the others don't:

https://github.com/Jammy2211/autogalaxy_workspace/blob/main/scripts/imaging/modeling/customize/parallel_bug_fix.py

If its still broken let me know and we can try some other things.

I'm not sure whether you can get multiprocessing to run in Jupyter Notebook cells, I will ask around.

Took about 23 minutes on my local machine

Happy to offer some support on run times, the tutorials are currently set up for things like nested sampling which are slow but fits complex models very robustly (and provide things like the evidence). So the run times are gonna be a lot longer than something like GALFIT, but a lot more robust.

Nautilus should give you a ~x3 speed up on dynesty, and with parallelization you should get another x3... so hopefully you can break the < 3 minute barrier lol.

Hey James, unfortunately this is still not working. The start_here.py was not working for reasons I believe the bug fixing script was meant to solve. The bug parallel_bug_fix.py script did seem to fix the error in the other script, but now I am getting the same error that I was getting in the Jupyter notebook. Should I email you to get added to the Slack?

2023-10-19 08:48:48,239 - autogalaxy.analysis.analysis - INFO - PRELOADS - Setting up preloads, may take a few minutes for fits using an inversion.
2023-10-19 08:48:48,251 - light[bulge_disk] - INFO - The output path of this fit is /Users/conor/autogalaxy_workspace/scripts/imaging/modeling/customize/output/imaging/modeling/simple/light[bulge_disk]/f256ec0321c48a10972066d9e08975fd
2023-10-19 08:48:48,252 - light[bulge_disk] - INFO - Outputting pre-fit files (e.g. model.info, visualization).
2023-10-19 08:48:48,645 - light[bulge_disk] - INFO - Starting new Nautilus non-linear search (no previous samples found).
2023-10-19 08:48:48,645 - light[bulge_disk] - INFO - number of cores == 4
2023-10-19 08:48:48,645 - light[bulge_disk] - INFO - Creating SneakierPool...
2023-10-19 08:48:48,645 - autofit.non_linear.parallel.sneaky - INFO - ... using multiprocessing
#########################

Exploration Phase

#########################

.
.
.

AttributeError: type object 'FunctionCache' has no attribute 'fitness'

Yeah lets go to SLACK.

Ok, pretty sure it was a bug in my source code implementation of Nautilus in autofit, dohhhhh.

Will release a new autogalaxy with a fix soon.

Ok, I'll shoot you an email to join the slack anyway