STEllAR-GROUP / hpx

The C++ Standard Library for Parallelism and Concurrency

Home Page:https://hpx.stellar-group.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

HPX sets affinity wrong with multiple processes per node and LCI parcelport enabled

JiakunYan opened this issue · comments

Expected Behavior

For example, when there are 64 cores/node, 4 processes per node, and 16 threads per process, HPX should assign one core per thread, making all processes use different cores.

Actual Behavior

When using the MPI parcelport, HPX works as expected. However, when using the LCI parcelport, all processes are mapped to the first 16 cores.

I don't think there are any codes in the LCI parcelport that can affect affinity, could you think of any possible reason for this? @hkaiser

I don't think there are any codes in the LCI parcelport that can affect affinity, could you think of any possible reason for this?

Could it be caused by the process mask that is being set by slurm? I'm not sure, really.

Could you give me the output of --hpx:print-bind for both, using MPI and LCI?

The affinity code is not just one spot, let's see what's actually happing, then I might be able to narrow it down a bit.

The two commands shown here are run in a single SLURM interactive session with 2 nodes on Perlmutter.

Below is the output with the LCI parcelport. I set the progress_type to worker to exclude the effect of the lci-progress-poll, but the affinity issue is the same in both cases.

nid200433:~>srun -n 8 fibonacci_futures_distributed --hpx:print-bind --hpx:threads=16 --hpx:ini=hpx.parcel.lci.priority=1000 --hpx:ini=hpx.parcel.lci.progress_type=worker


locality: 7
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 4
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 5
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 6
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 3
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 2
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 1
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 0
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"
fibonacci_serial(10) == 55,elapsed time:,541,[s]
fibonacci_future(10) == 55,elapsed time:,1330950718,[s],56
serial-count,{0000000100000000, 0000000000000000},1
serial-count,{0000000200000000, 0000000000000000},0
serial-count,{0000000300000000, 0000000000000000},0
serial-count,{0000000400000000, 0000000000000000},0
serial-count,{0000000500000000, 0000000000000000},0
serial-count,{0000000600000000, 0000000000000000},0
serial-count,{0000000700000000, 0000000000000000},0
serial-count,{0000000800000000, 0000000000000000},0

Below is the output with the MPI parcelport.

nid200433:~> srun -n 8 fibonacci_futures_distributed --hpx:print-bind --hpx:threads=16 --
hpx:ini=hpx.parcel.mpi.priority=1000


locality: 6
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"


locality: 1
0: PU L#64(P#32), Core L#32(P#32), Socket L#0(P#0), on pool "default"
1: PU L#66(P#33), Core L#33(P#33), Socket L#0(P#0), on pool "default"
2: PU L#68(P#34), Core L#34(P#34), Socket L#0(P#0), on pool "default"
3: PU L#70(P#35), Core L#35(P#35), Socket L#0(P#0), on pool "default"
4: PU L#72(P#36), Core L#36(P#36), Socket L#0(P#0), on pool "default"
5: PU L#74(P#37), Core L#37(P#37), Socket L#0(P#0), on pool "default"
6: PU L#76(P#38), Core L#38(P#38), Socket L#0(P#0), on pool "default"
7: PU L#78(P#39), Core L#39(P#39), Socket L#0(P#0), on pool "default"
8: PU L#80(P#40), Core L#40(P#40), Socket L#0(P#0), on pool "default"
9: PU L#82(P#41), Core L#41(P#41), Socket L#0(P#0), on pool "default"
10: PU L#84(P#42), Core L#42(P#42), Socket L#0(P#0), on pool "default"
11: PU L#86(P#43), Core L#43(P#43), Socket L#0(P#0), on pool "default"
12: PU L#88(P#44), Core L#44(P#44), Socket L#0(P#0), on pool "default"
13: PU L#90(P#45), Core L#45(P#45), Socket L#0(P#0), on pool "default"
14: PU L#92(P#46), Core L#46(P#46), Socket L#0(P#0), on pool "default"
15: PU L#94(P#47), Core L#47(P#47), Socket L#0(P#0), on pool "default"


locality: 5
0: PU L#96(P#48), Core L#48(P#48), Socket L#0(P#0), on pool "default"
1: PU L#98(P#49), Core L#49(P#49), Socket L#0(P#0), on pool "default"
2: PU L#100(P#50), Core L#50(P#50), Socket L#0(P#0), on pool "default"
3: PU L#102(P#51), Core L#51(P#51), Socket L#0(P#0), on pool "default"
4: PU L#104(P#52), Core L#52(P#52), Socket L#0(P#0), on pool "default"
5: PU L#106(P#53), Core L#53(P#53), Socket L#0(P#0), on pool "default"
6: PU L#108(P#54), Core L#54(P#54), Socket L#0(P#0), on pool "default"
7: PU L#110(P#55), Core L#55(P#55), Socket L#0(P#0), on pool "default"
8: PU L#112(P#56), Core L#56(P#56), Socket L#0(P#0), on pool "default"
9: PU L#114(P#57), Core L#57(P#57), Socket L#0(P#0), on pool "default"
10: PU L#116(P#58), Core L#58(P#58), Socket L#0(P#0), on pool "default"
11: PU L#118(P#59), Core L#59(P#59), Socket L#0(P#0), on pool "default"
12: PU L#120(P#60), Core L#60(P#60), Socket L#0(P#0), on pool "default"
13: PU L#122(P#61), Core L#61(P#61), Socket L#0(P#0), on pool "default"
14: PU L#124(P#62), Core L#62(P#62), Socket L#0(P#0), on pool "default"
15: PU L#126(P#63), Core L#63(P#63), Socket L#0(P#0), on pool "default"


locality: 4
0: PU L#64(P#32), Core L#32(P#32), Socket L#0(P#0), on pool "default"
1: PU L#66(P#33), Core L#33(P#33), Socket L#0(P#0), on pool "default"
2: PU L#68(P#34), Core L#34(P#34), Socket L#0(P#0), on pool "default"
3: PU L#70(P#35), Core L#35(P#35), Socket L#0(P#0), on pool "default"
4: PU L#72(P#36), Core L#36(P#36), Socket L#0(P#0), on pool "default"
5: PU L#74(P#37), Core L#37(P#37), Socket L#0(P#0), on pool "default"
6: PU L#76(P#38), Core L#38(P#38), Socket L#0(P#0), on pool "default"
7: PU L#78(P#39), Core L#39(P#39), Socket L#0(P#0), on pool "default"
8: PU L#80(P#40), Core L#40(P#40), Socket L#0(P#0), on pool "default"
9: PU L#82(P#41), Core L#41(P#41), Socket L#0(P#0), on pool "default"
10: PU L#84(P#42), Core L#42(P#42), Socket L#0(P#0), on pool "default"
11: PU L#86(P#43), Core L#43(P#43), Socket L#0(P#0), on pool "default"
12: PU L#88(P#44), Core L#44(P#44), Socket L#0(P#0), on pool "default"
13: PU L#90(P#45), Core L#45(P#45), Socket L#0(P#0), on pool "default"
14: PU L#92(P#46), Core L#46(P#46), Socket L#0(P#0), on pool "default"
15: PU L#94(P#47), Core L#47(P#47), Socket L#0(P#0), on pool "default"


locality: 2
0: PU L#96(P#48), Core L#48(P#48), Socket L#0(P#0), on pool "default"
1: PU L#98(P#49), Core L#49(P#49), Socket L#0(P#0), on pool "default"
2: PU L#100(P#50), Core L#50(P#50), Socket L#0(P#0), on pool "default"
3: PU L#102(P#51), Core L#51(P#51), Socket L#0(P#0), on pool "default"
4: PU L#104(P#52), Core L#52(P#52), Socket L#0(P#0), on pool "default"
5: PU L#106(P#53), Core L#53(P#53), Socket L#0(P#0), on pool "default"
6: PU L#108(P#54), Core L#54(P#54), Socket L#0(P#0), on pool "default"
7: PU L#110(P#55), Core L#55(P#55), Socket L#0(P#0), on pool "default"
8: PU L#112(P#56), Core L#56(P#56), Socket L#0(P#0), on pool "default"
9: PU L#114(P#57), Core L#57(P#57), Socket L#0(P#0), on pool "default"
10: PU L#116(P#58), Core L#58(P#58), Socket L#0(P#0), on pool "default"
11: PU L#118(P#59), Core L#59(P#59), Socket L#0(P#0), on pool "default"
12: PU L#120(P#60), Core L#60(P#60), Socket L#0(P#0), on pool "default"
13: PU L#122(P#61), Core L#61(P#61), Socket L#0(P#0), on pool "default"
14: PU L#124(P#62), Core L#62(P#62), Socket L#0(P#0), on pool "default"
15: PU L#126(P#63), Core L#63(P#63), Socket L#0(P#0), on pool "default"


locality: 7
0: PU L#32(P#16), Core L#16(P#16), Socket L#0(P#0), on pool "default"
1: PU L#34(P#17), Core L#17(P#17), Socket L#0(P#0), on pool "default"
2: PU L#36(P#18), Core L#18(P#18), Socket L#0(P#0), on pool "default"
3: PU L#38(P#19), Core L#19(P#19), Socket L#0(P#0), on pool "default"
4: PU L#40(P#20), Core L#20(P#20), Socket L#0(P#0), on pool "default"
5: PU L#42(P#21), Core L#21(P#21), Socket L#0(P#0), on pool "default"
6: PU L#44(P#22), Core L#22(P#22), Socket L#0(P#0), on pool "default"
7: PU L#46(P#23), Core L#23(P#23), Socket L#0(P#0), on pool "default"
8: PU L#48(P#24), Core L#24(P#24), Socket L#0(P#0), on pool "default"
9: PU L#50(P#25), Core L#25(P#25), Socket L#0(P#0), on pool "default"
10: PU L#52(P#26), Core L#26(P#26), Socket L#0(P#0), on pool "default"
11: PU L#54(P#27), Core L#27(P#27), Socket L#0(P#0), on pool "default"
12: PU L#56(P#28), Core L#28(P#28), Socket L#0(P#0), on pool "default"
13: PU L#58(P#29), Core L#29(P#29), Socket L#0(P#0), on pool "default"
14: PU L#60(P#30), Core L#30(P#30), Socket L#0(P#0), on pool "default"
15: PU L#62(P#31), Core L#31(P#31), Socket L#0(P#0), on pool "default"


locality: 3
0: PU L#32(P#16), Core L#16(P#16), Socket L#0(P#0), on pool "default"
1: PU L#34(P#17), Core L#17(P#17), Socket L#0(P#0), on pool "default"
2: PU L#36(P#18), Core L#18(P#18), Socket L#0(P#0), on pool "default"
3: PU L#38(P#19), Core L#19(P#19), Socket L#0(P#0), on pool "default"
4: PU L#40(P#20), Core L#20(P#20), Socket L#0(P#0), on pool "default"
5: PU L#42(P#21), Core L#21(P#21), Socket L#0(P#0), on pool "default"
6: PU L#44(P#22), Core L#22(P#22), Socket L#0(P#0), on pool "default"
7: PU L#46(P#23), Core L#23(P#23), Socket L#0(P#0), on pool "default"
8: PU L#48(P#24), Core L#24(P#24), Socket L#0(P#0), on pool "default"
9: PU L#50(P#25), Core L#25(P#25), Socket L#0(P#0), on pool "default"
10: PU L#52(P#26), Core L#26(P#26), Socket L#0(P#0), on pool "default"
11: PU L#54(P#27), Core L#27(P#27), Socket L#0(P#0), on pool "default"
12: PU L#56(P#28), Core L#28(P#28), Socket L#0(P#0), on pool "default"
13: PU L#58(P#29), Core L#29(P#29), Socket L#0(P#0), on pool "default"
14: PU L#60(P#30), Core L#30(P#30), Socket L#0(P#0), on pool "default"
15: PU L#62(P#31), Core L#31(P#31), Socket L#0(P#0), on pool "default"


locality: 0
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#2(P#1), Core L#1(P#1), Socket L#0(P#0), on pool "default"
2: PU L#4(P#2), Core L#2(P#2), Socket L#0(P#0), on pool "default"
3: PU L#6(P#3), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#8(P#4), Core L#4(P#4), Socket L#0(P#0), on pool "default"
5: PU L#10(P#5), Core L#5(P#5), Socket L#0(P#0), on pool "default"
6: PU L#12(P#6), Core L#6(P#6), Socket L#0(P#0), on pool "default"
7: PU L#14(P#7), Core L#7(P#7), Socket L#0(P#0), on pool "default"
8: PU L#16(P#8), Core L#8(P#8), Socket L#0(P#0), on pool "default"
9: PU L#18(P#9), Core L#9(P#9), Socket L#0(P#0), on pool "default"
10: PU L#20(P#10), Core L#10(P#10), Socket L#0(P#0), on pool "default"
11: PU L#22(P#11), Core L#11(P#11), Socket L#0(P#0), on pool "default"
12: PU L#24(P#12), Core L#12(P#12), Socket L#0(P#0), on pool "default"
13: PU L#26(P#13), Core L#13(P#13), Socket L#0(P#0), on pool "default"
14: PU L#28(P#14), Core L#14(P#14), Socket L#0(P#0), on pool "default"
15: PU L#30(P#15), Core L#15(P#15), Socket L#0(P#0), on pool "default"
fibonacci_serial(10) == 55,elapsed time:,491,[s]
fibonacci_future(10) == 55,elapsed time:,1610536221,[s],58
serial-count,{0000000100000000, 0000000000000000},1
serial-count,{0000000200000000, 0000000000000000},0
serial-count,{0000000300000000, 0000000000000000},0
serial-count,{0000000400000000, 0000000000000000},0
serial-count,{0000000500000000, 0000000000000000},0
serial-count,{0000000600000000, 0000000000000000},0
serial-count,{0000000700000000, 0000000000000000},0
serial-count,{0000000800000000, 0000000000000000},0

@hkaiser I can reproduce the same issue with the current HPX master branch on Rostam.

[hpx-master-lci] [jiakun@rostam1 ~]$ srun -n 8 --mpi=pmix fibonacci_futures_distributed --hpx:print-bind --hpx:threads=10 --hpx:ini=hpx.parcel.mpi.priority=1000


locality: 1
0: PU L#10(P#20), Core L#10(P#16), Socket L#0(P#0), on pool "default"
1: PU L#11(P#22), Core L#11(P#20), Socket L#0(P#0), on pool "default"
2: PU L#12(P#24), Core L#12(P#17), Socket L#0(P#0), on pool "default"
3: PU L#13(P#26), Core L#13(P#19), Socket L#0(P#0), on pool "default"
4: PU L#14(P#28), Core L#14(P#18), Socket L#0(P#0), on pool "default"
5: PU L#15(P#30), Core L#15(P#28), Socket L#0(P#0), on pool "default"
6: PU L#16(P#32), Core L#16(P#24), Socket L#0(P#0), on pool "default"
7: PU L#17(P#34), Core L#17(P#27), Socket L#0(P#0), on pool "default"
8: PU L#18(P#36), Core L#18(P#25), Socket L#0(P#0), on pool "default"
9: PU L#19(P#38), Core L#19(P#26), Socket L#0(P#0), on pool "default"


locality: 5
0: PU L#10(P#20), Core L#10(P#16), Socket L#0(P#0), on pool "default"
1: PU L#11(P#22), Core L#11(P#20), Socket L#0(P#0), on pool "default"
2: PU L#12(P#24), Core L#12(P#17), Socket L#0(P#0), on pool "default"
3: PU L#13(P#26), Core L#13(P#19), Socket L#0(P#0), on pool "default"
4: PU L#14(P#28), Core L#14(P#18), Socket L#0(P#0), on pool "default"
5: PU L#15(P#30), Core L#15(P#28), Socket L#0(P#0), on pool "default"
6: PU L#16(P#32), Core L#16(P#24), Socket L#0(P#0), on pool "default"
7: PU L#17(P#34), Core L#17(P#27), Socket L#0(P#0), on pool "default"
8: PU L#18(P#36), Core L#18(P#25), Socket L#0(P#0), on pool "default"
9: PU L#19(P#38), Core L#19(P#26), Socket L#0(P#0), on pool "default"


locality: 4
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
2: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
3: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
5: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
6: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
7: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
8: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
9: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"


locality: 2
0: PU L#20(P#1), Core L#20(P#0), Socket L#1(P#1), on pool "default"
1: PU L#21(P#3), Core L#21(P#4), Socket L#1(P#1), on pool "default"
2: PU L#22(P#5), Core L#22(P#1), Socket L#1(P#1), on pool "default"
3: PU L#23(P#7), Core L#23(P#3), Socket L#1(P#1), on pool "default"
4: PU L#24(P#9), Core L#24(P#2), Socket L#1(P#1), on pool "default"
5: PU L#25(P#11), Core L#25(P#12), Socket L#1(P#1), on pool "default"
6: PU L#26(P#13), Core L#26(P#8), Socket L#1(P#1), on pool "default"
7: PU L#27(P#15), Core L#27(P#11), Socket L#1(P#1), on pool "default"
8: PU L#28(P#17), Core L#28(P#9), Socket L#1(P#1), on pool "default"
9: PU L#29(P#19), Core L#29(P#10), Socket L#1(P#1), on pool "default"


locality: 7
0: PU L#30(P#21), Core L#30(P#16), Socket L#1(P#1), on pool "default"
1: PU L#31(P#23), Core L#31(P#20), Socket L#1(P#1), on pool "default"
2: PU L#32(P#25), Core L#32(P#17), Socket L#1(P#1), on pool "default"
3: PU L#33(P#27), Core L#33(P#19), Socket L#1(P#1), on pool "default"
4: PU L#34(P#29), Core L#34(P#18), Socket L#1(P#1), on pool "default"
5: PU L#35(P#31), Core L#35(P#28), Socket L#1(P#1), on pool "default"
6: PU L#36(P#33), Core L#36(P#24), Socket L#1(P#1), on pool "default"
7: PU L#37(P#35), Core L#37(P#27), Socket L#1(P#1), on pool "default"
8: PU L#38(P#37), Core L#38(P#25), Socket L#1(P#1), on pool "default"
9: PU L#39(P#39), Core L#39(P#26), Socket L#1(P#1), on pool "default"


locality: 6
0: PU L#20(P#1), Core L#20(P#0), Socket L#1(P#1), on pool "default"
1: PU L#21(P#3), Core L#21(P#4), Socket L#1(P#1), on pool "default"
2: PU L#22(P#5), Core L#22(P#1), Socket L#1(P#1), on pool "default"
3: PU L#23(P#7), Core L#23(P#3), Socket L#1(P#1), on pool "default"
4: PU L#24(P#9), Core L#24(P#2), Socket L#1(P#1), on pool "default"
5: PU L#25(P#11), Core L#25(P#12), Socket L#1(P#1), on pool "default"
6: PU L#26(P#13), Core L#26(P#8), Socket L#1(P#1), on pool "default"
7: PU L#27(P#15), Core L#27(P#11), Socket L#1(P#1), on pool "default"
8: PU L#28(P#17), Core L#28(P#9), Socket L#1(P#1), on pool "default"
9: PU L#29(P#19), Core L#29(P#10), Socket L#1(P#1), on pool "default"


locality: 3
0: PU L#30(P#21), Core L#30(P#16), Socket L#1(P#1), on pool "default"
1: PU L#31(P#23), Core L#31(P#20), Socket L#1(P#1), on pool "default"
2: PU L#32(P#25), Core L#32(P#17), Socket L#1(P#1), on pool "default"
3: PU L#33(P#27), Core L#33(P#19), Socket L#1(P#1), on pool "default"
4: PU L#34(P#29), Core L#34(P#18), Socket L#1(P#1), on pool "default"
5: PU L#35(P#31), Core L#35(P#28), Socket L#1(P#1), on pool "default"
6: PU L#36(P#33), Core L#36(P#24), Socket L#1(P#1), on pool "default"
7: PU L#37(P#35), Core L#37(P#27), Socket L#1(P#1), on pool "default"
8: PU L#38(P#37), Core L#38(P#25), Socket L#1(P#1), on pool "default"
9: PU L#39(P#39), Core L#39(P#26), Socket L#1(P#1), on pool "default"


locality: 0
0: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "default"
1: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
2: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
3: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
4: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
5: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
6: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
7: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
8: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
9: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
fibonacci_serial(10) == 55,elapsed time:,650,[s]
fibonacci_future(10) == 55,elapsed time:,5549434,[s],57
serial-count,{0000000100000000, 0000000000000000},1
serial-count,{0000000200000000, 0000000000000000},0
serial-count,{0000000300000000, 0000000000000000},0
serial-count,{0000000400000000, 0000000000000000},0
serial-count,{0000000500000000, 0000000000000000},0
serial-count,{0000000600000000, 0000000000000000},0
serial-count,{0000000700000000, 0000000000000000},0
serial-count,{0000000800000000, 0000000000000000},0
[hpx-master-lci] [jiakun@rostam1 ~]$ srun -n 8 --mpi=pmix fibonacci_futures_distributed --hpx:print-bind --hpx:threads=10 --hpx:ini=hpx.parcel.lci.priority=1000


locality: 5
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"


locality: 4
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"


locality: 6
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"


locality: 3
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"


locality: 1
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"


locality: 2
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"


locality: 7
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"


locality: 0
0: PU L#1(P#2), Core L#1(P#4), Socket L#0(P#0), on pool "default"
1: PU L#2(P#4), Core L#2(P#1), Socket L#0(P#0), on pool "default"
2: PU L#3(P#6), Core L#3(P#3), Socket L#0(P#0), on pool "default"
3: PU L#4(P#8), Core L#4(P#2), Socket L#0(P#0), on pool "default"
4: PU L#5(P#10), Core L#5(P#12), Socket L#0(P#0), on pool "default"
5: PU L#6(P#12), Core L#6(P#8), Socket L#0(P#0), on pool "default"
6: PU L#7(P#14), Core L#7(P#11), Socket L#0(P#0), on pool "default"
7: PU L#8(P#16), Core L#8(P#9), Socket L#0(P#0), on pool "default"
8: PU L#9(P#18), Core L#9(P#10), Socket L#0(P#0), on pool "default"
9: PU L#0(P#0), Core L#0(P#0), Socket L#0(P#0), on pool "lci-progress-pool"
fibonacci_serial(10) == 55,elapsed time:,719,[s]
fibonacci_future(10) == 55,elapsed time:,454013811,[s],59
serial-count,{0000000100000000, 0000000000000000},1
serial-count,{0000000200000000, 0000000000000000},0
serial-count,{0000000300000000, 0000000000000000},0
serial-count,{0000000400000000, 0000000000000000},0
serial-count,{0000000500000000, 0000000000000000},0
serial-count,{0000000600000000, 0000000000000000},0
serial-count,{0000000700000000, 0000000000000000},0
serial-count,{0000000800000000, 0000000000000000},0