PLOT: v1+ (and if possible for older versions too, mainly 0.4+) avg-mteps

Question

PLOT: v1+ (and if possible for older versions too, mainly 0.4+) avg-mteps

jowens opened this issue 4 years ago · comments

avg-mteps
Not interested in any parameters used here, just plain ol' mteps and runtime for every dataset for every primitive. Tooltips are appreciated.
Note for pagerank, the time would be divided by max-iter.

Muhammad Osama commented 4 years ago

Yes!

John Owens · Answer 1 · Sat May 09 2020 08:40:04 GMT+0800 (China Standard Time)

Same thing as #46, but MTEPS. Please tell me how to fix or tell me to push to the docs @neoblizz.

Note the JSON has no MTEPS for pr or tc; we have no data there.

SSSP:
BC:

Muhammad Osama · Answer 2 · Sat May 09 2020 13:41:58 GMT+0800 (China Standard Time)

Can you use the following for pagerank mteps? I will fix it in code:

m_teps = (double)num-edges / (elapsed * 1000.0);

Muhammad Osama · Answer 3 · Sat May 09 2020 13:43:28 GMT+0800 (China Standard Time)

Also, a few things:

Does this also include Chuck's changes? So, best performance INCLUDING v1.1.x results.
Is there a way to show this as a really long bar-chart of some sort? Or have horizontal-spacing between the GPU markers so we can see what the difference is easily?

John Owens · Answer 4 · Sat May 09 2020 13:45:09 GMT+0800 (China Standard Time)

This is

roots = [
    "../gunrock-output/v1-0-0/sssp",
    "../gunrock-output/v1-0-0/bc",
    "../gunrock-output/v1-0-0/tc",
    "../gunrock-output/v1-0-0/pr",
    "../gunrock-output/v1-0-0/bfs",
]

only. If you want another directory (@crozhon's) lemme know.

John Owens · Answer 5 · Sat May 09 2020 13:45:42 GMT+0800 (China Standard Time)

I can fix PR MTEPS.

John Owens · Answer 6 · Sat May 09 2020 13:46:22 GMT+0800 (China Standard Time)

We could do a bar chart but it would be really really wide. I'll think about that!

Muhammad Osama · Answer 7 · Sat May 09 2020 13:47:39 GMT+0800 (China Standard Time)

We could do a bar chart but it would be really really wide. I'll think about that!

I figured, but if there's a better way to add some separation to datasets and have mini plots in that giant plot. Anything that shows the difference between the GPUs is nice.

Muhammad Osama · Answer 8 · Sat May 09 2020 13:49:15 GMT+0800 (China Standard Time)

This is

roots = [
    "../gunrock-output/v1-0-0/sssp",
    "../gunrock-output/v1-0-0/bc",
    "../gunrock-output/v1-0-0/tc",
    "../gunrock-output/v1-0-0/pr",
    "../gunrock-output/v1-0-0/bfs",
]

only. If you want another directory (@crozhon's) lemme know.

Please include these:

gunrock-output/launch_bounds_comparison/*
gunrock-output/cuda_arch_comparison/*

John Owens · Answer 9 · Sat May 09 2020 13:50:34 GMT+0800 (China Standard Time)

OK. I'll do that for all of them. But, I'm pretty sure it's not a complete set of runs in the way that the v-1-0-0 are.

Chuck Rozhon · Answer 10 · Sat May 09 2020 13:55:01 GMT+0800 (China Standard Time)

Yes they are not complete sets. But everything in ‘cuda_arch_comparsion’ is in dev branch.

Muhammad Osama · Answer 11 · Sat May 09 2020 20:56:09 GMT+0800 (China Standard Time)

That's fine, we can eventually get to a complete set after all your changes are pushed @crozhon. I would just like to see the entire v1.x.x runs anyways.

John Owens · Answer 12 · Sun May 10 2020 01:34:15 GMT+0800 (China Standard Time)

@neoblizz My current recipe is "copy edges-queued into edges-visited, compute m-teps, then normalize runtime by the number of iterations". I assume edges-queued is global (across entire computation), not per iteration, so I need to do the m-teps compute before I normalize runtime. Tell me yea or nay.

John Owens · Answer 13 · Sun May 10 2020 05:57:49 GMT+0800 (China Standard Time)

So I am hoping this is good enough and I don't have to turn this into a bar. We now only have one dot per GPU. It is feasible to jitter these in x but it's more involved in terms of the programming.

PR:
BC:
SSSP:

Muhammad Osama · Answer 14 · Sun May 10 2020 08:02:02 GMT+0800 (China Standard Time)

@neoblizz My current recipe is "copy edges-queued into edges-visited, compute m-teps, then normalize runtime by the number of iterations". I assume edges-queued is global (across entire computation), not per iteration, so I need to do the m-teps compute before I normalize runtime. Tell me yea or nay.

That's correct.

This looks good. Just missing BFS now.

John Owens · Answer 15 · Sun May 10 2020 08:49:25 GMT+0800 (China Standard Time)

Oh yeah forgot (BFS).

John Owens · Answer 16 · Sun May 10 2020 14:23:03 GMT+0800 (China Standard Time)

DOBFS:
BFS:

Muhammad Osama · Answer 17 · Sun May 10 2020 14:32:39 GMT+0800 (China Standard Time)

Also can go live! 🎉

John Owens · Answer 18 · Sun Jun 07 2020 06:57:59 GMT+0800 (China Standard Time)

Are you cool with what we have now?