YaleDHLab / pix-plot

A WebGL viewer for UMAP or TSNE-clustered images

Home Page:https://s3-us-west-2.amazonaws.com/lab-apps/pix-plot/index.html

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How best to visually compare models?

dale-wahl opened this issue · comments

I love both the pipeline and visualization. I'm trying to test out different parameters to the UMAP model and compare results, but having some difficulties. The n_neighbors parameter returns an error:

(pix_env) re-byodm-145-109-99-47:pix-plot dale$ pixplot --images "media/*.jpg" --n_neighbors 30

...

2021-07-06 15:26:02.274952: Vectorized 3000/3000 images
2021-07-06 15:26:02.760305: Creating UMAP layout
Traceback (most recent call last):
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/bin/pixplot", line 8, in <module>
    sys.exit(parse())
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 1430, in parse
    process_images(**config)
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 133, in process_images
    get_manifest(**kwargs)
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 403, in get_manifest
    layouts = get_layouts(**kwargs)
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 533, in get_layouts
    umap = get_umap_layout(**kwargs)
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 617, in get_umap_layout
    relations=[relations_dict.copy() for _ in params[:-1]]
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/lib/python3.7/site-packages/umap/aligned_umap.py", line 318, in fit
    relations = expand_relations(self.dict_relations_, window_size)
  File "/Users/dale/Anaconda/anaconda3/envs/pix_env/lib/python3.7/site-packages/umap/aligned_umap.py", line 47, in expand_relations
    + [max(d.values()) for d in relation_dicts]
ValueError: max() arg is an empty sequence

The min_dist seems to work (though it is listed as min_distance in the readme).

I got around this problem by editing the values in the config starting line 90 and I can see that multiple models are created in the data folder, but I am only able to visualize one model. It appears to be the first one (looking at the code, it refers to the first variant in a couple places layouts['umap']['variants'][0]). What do I need to update in order to visualize the already created models? I cannot simply update the variants (layouts['umap']['variants'][1]) without rerunning the full program which takes a while and appears to rebuild the models.

I am thinking that I want to do is generate a different manifest.json file for each model so that I can swap them out, but I am wondering why we are creating multiple models if I cannot visualize them. I did notice in some of your examples using PixPlot 2.0 codebase have a dropdown enabling a user to switch models. Am I missing something? Is 2.0 available somewhere?

Thank you all again for this super helpful tool.

Thanks for your kind words @dale-wahl, and sorry for the snag! Previously the pipelined assumed there were multiple UMAP layouts to construct and so used AlignedUMAP. We just changed it so that a single UMAP layout (e.g. with --n_neighbors 30 and --min_dist 0.1) should now work.

If you have a moment, could you please try updating your pixplot installation by running:

pip uninstall pixplot
pip install https://github.com/yaledhlab/pix-plot/archive/master.zip

If you then try your command again, it'd be great to hear how it goes!

Thank you, duhaime. That allows me to use my own inputs for n_neighbors and min_dist much more easily!

I see that you changed the default parameters so that only one layout will be created. I actually would like to create multiple layouts and compare them to identify which parameters worked "best". What do you think the easiest way to swap out layouts would be?

So the good news is, you can pass an array to either or both of the hyperparameters. This might look like:

pixplot \
  --images "datasets/oslomini/images/*" \
  --metadata "datasets/oslomini/metadata/metadata.csv" \
  --n_neighbors 2 50 \
  --min_dist 0.001 1.0

When one or both of the hyperparameters is set to > 1 values, the code will automagically trigger Aligned UMAP (to minimize the difference between the projections), and additionally the GUI will expose one or two sliders so you can animate between the states.

My one tip would be to consider the extra time required to run Aligned UMAP multiple times. I ran a pixplot of 31,000 images with 3 values for n_neighbors, and it took about 4 hours to run on a ThreadRipper Pro 3975WX (32 cores). The vast majority of this time was Aligned UMAP, which is luckily multi-threaded. Here's a link to that demonstration -- click the gear icon at the top right to get the advanced area where you can adjust the n_neighbors: http://pixplot.yale.edu/builds/0109/oslo/

Oh, I love that build! I need to get my metadata properly organized; the category feature is very nice and useful.

I do not have access to so many cores, so will tread carefully.

I also noticed it was defaulting to kmeans and it appears that hdbscan wasn't in the setup.py file. I've installed it now and am interested in how that effects the hotspots.

@dale-wahl One of our plans this summer is to update the PixPlot documentation to make features like the clustering method options more clear to users. There was a period of time when installing HDBScan was tricky on Windows, so we started defaulting to K-means...

Is it safe to close this issue out?

I have not done too much experimenting yet, but so far have not gotten it to run with multiple parameters. I hit this:

2021-07-09 18:53:58.139354: Vectorized 57366/57366 images
2021-07-09 18:54:13.654045: Creating UMAP layout
Traceback (most recent call last):
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 487, in save
    data_name = overloads[key]
KeyError: ((array(int64, 2d, C), array(float32, 2d, C), array(float32, 2d, C), type(CPUDispatcher(<function squared_euclidean at 0x7fe2363919e0>)), array(int64, 1d, C), float64), ('x86_64-unknown-linux-gnu', 'sandybridge', '+64bit,-adx,+aes,-avx,-avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vpopcntdq,-bmi,-bmi2,-cldemote,-clflushopt,-clwb,-clzero,+cmov,+cx16,+cx8,-enqcmd,-f16c,-fma,-fma4,-fsgsbase,+fxsr,-gfni,-invpcid,-lwp,-lzcnt,+mmx,-movbe,-movdir64b,-movdiri,-mwaitx,+pclmul,-pconfig,-pku,+popcnt,-prefetchwt1,-prfchw,-ptwrite,-rdpid,-rdrnd,-rdseed,-rtm,+sahf,-sgx,-sha,-shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,+ssse3,-tbm,-vaes,-vpclmulqdq,-waitpkg,-wbnoinvd,-xop,+xsave,-xsavec,+xsaveopt,-xsaves'), ('1566b624ec4710ff21277d163ff1d5d780943ac8328dcb5962340daf38986bdc', 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dale/.conda/envs/pix_env/bin/pixplot", line 8, in <module>
    sys.exit(parse())
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 1467, in parse
    process_images(**config)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 133, in process_images
    get_manifest(**kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 403, in get_manifest
    layouts = get_layouts(**kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 533, in get_layouts
    umap = get_umap_layout(**kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 581, in get_umap_layout
    return process_multi_layout_umap(w, **kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 658, in process_multi_layout_umap
    save_model(model, model_path)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 684, in save_model
    pickle.dump(all_params, open(path, 'wb'))
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pynndescent/pynndescent_.py", line 902, in __getstate__
    self._init_search_graph()
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pynndescent/pynndescent_.py", line 1001, in _init_search_graph
    self.diversify_prob,
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/dispatcher.py", line 434, in _compile_for_args
    raise e
File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/dispatcher.py", line 367, in _compile_for_args
    return self.compile(tuple(argtypes))
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)  
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/dispatcher.py", line 825, in compile
    self._cache.save_overload(sig, cres)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 671, in save_overload
    self._save_overload(sig, data)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 681, in _save_overload
    self._cache_file.save(key, data)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 496, in save
    self._save_index(overloads)   
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 542, in _save_index
    data = self._dump(data)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 570, in _dump
    return pickle.dumps(obj, protocol=-1)
TypeError: can't pickle weakref objects

I will mess with it some more next week. But just wanted to mention it as you asked if you should close the issue and technically I'm still having trouble getting multiple models in the same run.

Ah, interesting @dale-wahl, we haven't seen that yet. It looks like your machine won't load cached UMAP models. We'll get this sorted.

In the interim, if you wanted, you could rm -rf output and then run your command again--it should proceed just fine (if not please let us know)!

I have not done too much experimenting yet, but so far have not gotten it to run with multiple parameters. I hit this:

2021-07-09 18:53:58.139354: Vectorized 57366/57366 images
2021-07-09 18:54:13.654045: Creating UMAP layout
Traceback (most recent call last):
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 487, in save
    data_name = overloads[key]
KeyError: ((array(int64, 2d, C), array(float32, 2d, C), array(float32, 2d, C), type(CPUDispatcher(<function squared_euclidean at 0x7fe2363919e0>)), array(int64, 1d, C), float64), ('x86_64-unknown-linux-gnu', 'sandybridge', '+64bit,-adx,+aes,-avx,-avx2,-avx512bf16,-avx512bitalg,-avx512bw,-avx512cd,-avx512dq,-avx512er,-avx512f,-avx512ifma,-avx512pf,-avx512vbmi,-avx512vbmi2,-avx512vl,-avx512vnni,-avx512vpopcntdq,-bmi,-bmi2,-cldemote,-clflushopt,-clwb,-clzero,+cmov,+cx16,+cx8,-enqcmd,-f16c,-fma,-fma4,-fsgsbase,+fxsr,-gfni,-invpcid,-lwp,-lzcnt,+mmx,-movbe,-movdir64b,-movdiri,-mwaitx,+pclmul,-pconfig,-pku,+popcnt,-prefetchwt1,-prfchw,-ptwrite,-rdpid,-rdrnd,-rdseed,-rtm,+sahf,-sgx,-sha,-shstk,+sse,+sse2,+sse3,+sse4.1,+sse4.2,-sse4a,+ssse3,-tbm,-vaes,-vpclmulqdq,-waitpkg,-wbnoinvd,-xop,+xsave,-xsavec,+xsaveopt,-xsaves'), ('1566b624ec4710ff21277d163ff1d5d780943ac8328dcb5962340daf38986bdc', 'e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855'))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/dale/.conda/envs/pix_env/bin/pixplot", line 8, in <module>
    sys.exit(parse())
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 1467, in parse
    process_images(**config)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 133, in process_images
    get_manifest(**kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 403, in get_manifest
    layouts = get_layouts(**kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 533, in get_layouts
    umap = get_umap_layout(**kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 581, in get_umap_layout
    return process_multi_layout_umap(w, **kwargs)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 658, in process_multi_layout_umap
    save_model(model, model_path)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pixplot/pixplot.py", line 684, in save_model
    pickle.dump(all_params, open(path, 'wb'))
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pynndescent/pynndescent_.py", line 902, in __getstate__
    self._init_search_graph()
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/pynndescent/pynndescent_.py", line 1001, in _init_search_graph
    self.diversify_prob,
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/dispatcher.py", line 434, in _compile_for_args
    raise e
File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/dispatcher.py", line 367, in _compile_for_args
    return self.compile(tuple(argtypes))
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/compiler_lock.py", line 32, in _acquire_compile_lock
    return func(*args, **kwargs)  
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/dispatcher.py", line 825, in compile
    self._cache.save_overload(sig, cres)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 671, in save_overload
    self._save_overload(sig, data)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 681, in _save_overload
    self._cache_file.save(key, data)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 496, in save
    self._save_index(overloads)   
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 542, in _save_index
    data = self._dump(data)
  File "/home/dale/.conda/envs/pix_env/lib/python3.7/site-packages/numba/core/caching.py", line 570, in _dump
    return pickle.dumps(obj, protocol=-1)
TypeError: can't pickle weakref objects

I will mess with it some more next week. But just wanted to mention it as you asked if you should close the issue and technically I'm still having trouble getting multiple models in the same run.

Hello, I also met the same problem with one parameter . Have you resolved it ? @dale-wahl

@JanySunny @dale-wahl I just pushed a commit that will attempt to handle the case where model persistence fails. If you have a second, could I please ask you to:

pip uninstall pixplot
pip install https://github.com/yaledhlab/pix-plot/archive/master.zip

then try your command again? We'd be grateful if you could follow up and let us know whether that commit allows you to progress!

Hey @duhaime, sorry for the delay. I just tested out the multiple parameters for n_neighbors and min_dist and that worked like a charm! Thanks!

I noticed that you removed both the map and metatags views. We had some students use PixPlot over the last couple of weeks and found the category and metatags to be very useful in visualizing their data. I'll have to figure out how to get that back if I want to use this update. I did want to mention that they seem to be named in reverse in the UI (the categories are associated with the tag column of the metadata csv and the metadata view was associated with the category column). It also appears that images do not load when selecting them individually in this latest update.

I built a docker wrapper that can run pixplot so that I could easily deploy it. I also added Flask so that I could add an API and send commands to pixplot easily while keeping a simple viewer to see the results. I'd be happy to push that as a separate branch if you are interested. Regardless, this issue has been resolved so thank you very much!

Thanks for your follow up @dale-wahl ! I'm glad this issue seems resolved.

I just pushed a branch #219 that should allow one to display geographic and other conditional layouts again.

I also pushed another branch #220 that should display the tags, and another to fix the min_dist argument in the readme. Thanks for spotting all of these oversights!

It'd be great to see your Docker + Flask branch. We've been keen on trying to keep the application serverless for ease of deployment, but for certain circumstances a server can be helpful indeed.

Please send a flare if you spot other oversights in the application or documentation! Your notes have been very helpful.

Thanks duhaime. I merged those into my Docker branch and all seems to work! The photos are even on top of the map this time haha.

If you want to check that out, I had to post it to my own repo: https://github.com/digitalmethodsinitiative/dmi_pix_plot
The Docker_README.md should help you run it. The API ought to be expanded to allow someone to easily handle a folder of photos, but at the moment I don't think that will matter for my purposes since my photos will already be accessible in the container.

Thanks very much @dale-wahl ! We're planning a big documentation overhaul this summer, and if it'd be okay, it'd be great to include some notes on your Docker container as part of that work.

In the meantime, is it safe to close this issue out?