lighthouse.py failing in RadialPlacer

Question

lighthouse.py failing in RadialPlacer

andrewgait opened this issue 7 years ago · comments

Recent changes mean that lighthouse.py isn't working at the moment:

2017-11-21 10:13:20 ERROR: Error when calling pacman.operations.placer_algorithms.radial_placer.RadialPlacer.call with inputs {'machine': [Machine: max_x=1, max_y=1, n_chips=4], 'machine_graph': <pacman.model.graphs.machine.machine_graph.MachineGraph object at 0x7fa5c9d94958>}
Traceback (most recent call last):
File "/localhome/mbbssag3/spinnaker/git/MarkovChainMonteCarlo/mcmc_examples/lighthouse/lighthouse.py", line 67, in
degrees_of_freedom=3.0, seed=seed) # , n_chips=2) # n_chips=23*48)
File "/localhome/mbbssag3/spinnaker/git/MarkovChainMonteCarlo/mcmc/mcmc_framework.py", line 116, in run_mcmc
g.run(None)
File "/localhome/mbbssag3/spinnaker/git/SpiNNakerGraphFrontEnd/spinnaker_graph_front_end/init.py", line 131, in run
globals_variables.get_simulator().run(duration)
File "/localhome/mbbssag3/spinnaker/git/SpiNNakerGraphFrontEnd/spinnaker_graph_front_end/spinnaker.py", line 119, in run
AbstractSpinnakerBase.run(self, run_time)
File "/localhome/mbbssag3/spinnaker/git/SpiNNFrontEndCommon/spinn_front_end_common/interface/abstract_spinnaker_base.py", line 778, in run
self._run(run_time)
File "/localhome/mbbssag3/spinnaker/git/SpiNNFrontEndCommon/spinn_front_end_common/interface/abstract_spinnaker_base.py", line 872, in _run
self._do_mapping(run_time, n_machine_time_steps, total_run_time)
File "/localhome/mbbssag3/spinnaker/git/SpiNNFrontEndCommon/spinn_front_end_common/interface/abstract_spinnaker_base.py", line 1616, in _do_mapping
inputs, algorithms, outputs, "mapping", optional_algorithms)
File "/localhome/mbbssag3/spinnaker/git/SpiNNFrontEndCommon/spinn_front_end_common/interface/abstract_spinnaker_base.py", line 1127, in _run_algorithms
executor.execute_mapping()
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/executor/pacman_algorithm_executor.py", line 432, in execute_mapping
self._execute_mapping()
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/executor/pacman_algorithm_executor.py", line 448, in _execute_mapping
results = algorithm.call(self._internal_type_mapping)
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/executor/algorithm_classes/abstract_python_algorithm.py", line 44, in call
results = self.call_python(method_inputs)
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/executor/algorithm_classes/python_class_algorithm.py", line 56, in call_python
return method(**inputs)
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/operations/placer_algorithms/radial_placer.py", line 48, in call
vertices_on_same_chip)
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/operations/placer_algorithms/radial_placer.py", line 100, in _place_vertex
vertex.resources_required, vertex.constraints, chips)
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/utilities/utility_objs/resource_tracker.py", line 1002, in allocate_constrained_resources
ip_tags, reverse_ip_tags)
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/utilities/utility_objs/resource_tracker.py", line 1186, in allocate_resources
chips, board_address, ip_tags, reverse_ip_tags):
File "/localhome/mbbssag3/spinnaker/git/PACMAN/pacman/utilities/utility_objs/resource_tracker.py", line 379, in _get_usable_chips
"No valid chips found on the specified board")
pacman.exceptions.PacmanInvalidParameterException: ('chips and board_address', '[(0, 0)] and None', 'No valid chips found on the specified board')

Andrew Rowley · Answer 1 · Tue Nov 21 2017 18:23:14 GMT+0800 (China Standard Time)

Yes, this is the difficulty in working out how many cores we have when using Alan's model. It is hard to tell how many cores you will have left over on each chip, as you don't know what else will force its way in there...

Alan Stokes · Answer 2 · Tue Nov 21 2017 18:23:44 GMT+0800 (China Standard Time)

// calculate total number of 'free' cores for the given board
// (i.e. does not include those busy with SARK or reinjection)
total_number_of_cores =
front_end.get_number_of_available_cores_on_machine()

you may need to pull master gfe to get the updated version of this call to fix it. Its not perfect, as it does not take into account LPGs from live event connections. But it does encapsulate power monitor, extra_montor etc

Andrew Rowley · Answer 3 · Tue Nov 21 2017 18:28:48 GMT+0800 (China Standard Time)

This doesn't give me the number of cores free on a chip though, which is what this code is currently doing. This is currently a limitation of this code, which makes use of a coordinator per board.

This looks like a common pattern emerging however, so some support in the tools for this would be nice. This pattern is a "one per board" coordinator-type object with all vertices on that board connecting to it (possibly incoming, possibly outgoing). If there were some way to easily represent this at the GFE level, this would be a nice option...

Alan Stokes · Answer 4 · Tue Nov 21 2017 19:12:41 GMT+0800 (China Standard Time)

on a chip is a wee bit harder, given we've not ran any of the pre_alloc functions by the time you call this. You could have a call of such, which goes get machine, goes all all pre alloc stuff and then asks the resource manager what n cores you have. but the question is what are we trying to support here.

But note this is a issue between app space and machine space. your trying to merge them for exploitation effects, which is fine....... but puts out of wack the extra functionality additions without the top script also knowing which ones you've asked for.

"This pattern is a "one per board" coordinator-type object with all vertices on that board connecting to it (possibly incoming, possibly outgoing)"
theres the speed up stuff.............. but live packet connection isnt always that format....... and i dont know of any more.
what have you got that does that format?

Andrew Rowley · Answer 5 · Tue Nov 21 2017 19:17:56 GMT+0800 (China Standard Time)

I think you mentioned them all... If we can find a way to generalise what they do, we could expose this to the user. I don't think the current method exposes well, as you need an algorithm to add edge to each. One idea is to create an edge to all of them for now, but then filter off those edges that are not on the same board before routing... Otherwise a general way to say this edge is only to those that are on the same board...

Alan Stokes · Answer 6 · Tue Nov 21 2017 19:25:51 GMT+0800 (China Standard Time)

i dont see how that solves your issue about core allocation. That is going back to edge filtering for routing, but if your letting the gfe graph do routing entries, you've bust the entire interface wide open.

your direct issue is "how many cores do i have to work with" which is either chip of machine or board based. setting up a resourceTracker with pre alloc res and handing that back, or a wrapped up resource tracker, would do your issue.

Andrew Rowley · Answer 7 · Tue Nov 21 2017 19:31:02 GMT+0800 (China Standard Time)

OK, so I don't need to do direct core allocation per chip if I can say:

Add one of these per board (and then I subtract that from the total number of cores)
Add an edge to each thing on the same board from this thing (I would guess there would also exist a scenario for the reverse)

At this point, I don't care where things get placed, so I won't add placement constraints, and your front_end.get_number_of_available_cores_on_machine() is enough.

Alan Stokes · Answer 8 · Tue Nov 21 2017 21:21:27 GMT+0800 (China Standard Time)

"Add one of these per board (and then I subtract that from the total number of cores)"
that's exactly what front_end.get_number_of_available_cores_on_machine() does right now.

"Add an edge to each thing on the same board from this thing (I would guess there would also exist a scenario for the reverse)"
this i cannot see a reasoning for. your doing more work. yes for speed up it works. and that's what FEC interface function insert_edges_to_extra_monitor_functionality does. But you cant do that for live event connections, because its not board wide. its pop wide.

This is also why front_end.get_number_of_available_cores_on_machine() isn't perfect, because it cant know at the point of calling, how many of the live events connections your going to use. Becuase as a front end user, you don't know till you've asked how many cores there are, as your using that to build your application. chicken and egg issue. Which is why I'm not exactly happy with the logic of "fit the application to use the entire machine" mentality in this case, but thats where we are, so thats why front_end.get_number_of_available_cores_on_machine() is the current hack.

Andrew Rowley · Answer 9 · Tue Nov 21 2017 22:22:01 GMT+0800 (China Standard Time)

Add an edge to each thing on the same board from this thing (I would guess there would also exist a scenario for the reverse)

"this i cannot see a reasoning for. your doing more work. yes for speed up it works. and that's what FEC interface function insert_edges_to_extra_monitor_functionality does. But you cant do that for live event connections, because its not board wide. its pop wide."

OK, this is drifting off topic. I need to be able to do this for the MCMC loader, as this is how it works - the same data is to be sent everywhere, but doing this from a single location was too much, as the confirmation packets overloaded the core. Doing it one per board works just fine, so this is what this was doing.

The solution for the current tools would then be to add around 720 MCMC workers connected to each MCMC loader, which is effectively the same thing. I don't know for sure that I will end up with the connections on each board, but that theoretically doesn't matter. In practice it probably will matter, though as the connections will now cross where they weren't before. Only experimentation will tell if this is an issue.

Andrew Gait · Answer 10 · Thu Apr 12 2018 17:44:27 GMT+0800 (China Standard Time)

Just noticed that this issue was still open - I've come up with a solution for making the current toolchain and MCMC work together, so I'm going to close this.