[Feature Request] Port stereoscope functionality to ComfyUI

Question

[Feature Request] Port stereoscope functionality to ComfyUI

Dobidop opened this issue 3 months ago · comments

I have been trying to port the stereoscope functionality for way too many hour now since I don't have a single clue of what I'm doing.

I tried doing this by just importing the functions from the stereoimage_generation.py into a comfy node and inputting the depth map and related image. I made some progress but I'm most likely completely on the wrong path:

As such I'd like to request that the stereoscope functionality be ported to a node for ComfyUI since I simply cannot do this.

Semjon Kravtšenko · Answer 1 · Thu Mar 07 2024 06:54:38 GMT+0800 (China Standard Time)

Hi! This is great that you tried! I am no expert in ComfyUI, but I hope it all works out and you achieve your goal :)

Copying functions is indeed the correct approach, but please make sure to include a comment linking the code to this repo and commit, so the connection is not lost in time and if there are any changes to this repo, they can be backported.

By looking at the image, I suspect that inputs are not really correct. You could try debugging these functions and see what format the inputs are it. Then try to ensure that the format is the same for ComfyUI.

Speaking of backporting... The stereoimage generation implementation in this repo is not really correct, I probably will fix it at some point. Then it would be nice to change some functions.

Good luck!

Dobidop · Answer 2 · Fri Mar 08 2024 18:11:39 GMT+0800 (China Standard Time)

Thanks, but I've tried debugging this. Simply not smart enough and I don't have the knowledge to fix it.

Just curious, what is incorrect with the stereo implementation in the repo?

It's so annoying because it feels like I'm just lacking the knowledge and that it would be fixed by 2 lines of correct code when providing the create stereo function the image and depth map, and when converting it back to the correct format for the node.

To save someone a minute of coding creating the comfyUI node, here is what I used for the GenerateStereo.py for the image above:

import torch
import numpy as np 
from PIL import Image
import stereoimage_generation as sig

class StereoImageNode:
    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "image": ("IMAGE",),
                "depth_map": ("IMAGE",),
                "modes": (["left-right", "right-left", "top-bottom", "bottom-top", "red-cyan-anaglyph"],),
                "fill_technique": (['none', 'naive', 'naive_interpolating', 'polylines_soft','polylines_sharp'],),
            },
            "optional": {
                "divergence": ("FLOAT", {"default": 2.5, "min": 0.05, "max": 15, "step": 0.01}), 
                "separation": ("FLOAT", {"default": 0, "min": -5, "max": 5, "step": 0.01}),
                "stereo_balance": ("FLOAT", {"default": 0, "min": -0.95, "max": 0.95, "step": 0.05}),
                "stereo_offset_exponent": ("FLOAT", {"default": 2, "min": 2, "max": 2, "step": 1})
            }
        }
    
    RETURN_TYPES = ("IMAGE",) 
    FUNCTION = "generate"

    def generate(self, image, depth_map, divergence, separation, modes, 
                 stereo_balance, stereo_offset_exponent, fill_technique):
                 
        print(f"000Check depth map shape: {depth_map.shape}")

        depth_map_2d = depth_map[:, :, :, 0]  # Keep this line
        
        print(f"00Check original image shape: {image.shape}")
        print(f"00Check depth map shape: {depth_map_2d.shape}")
        
        results = sig.create_stereoimages(image, depth_map_2d, divergence, separation,  
                                           [modes], stereo_balance, stereo_offset_exponent, fill_technique)
        

        # Check and print the type of results[0] again for clarity
        print(f"0Type of results: {type(results)}")

        # If results[0] is a PyTorch tensor, print its shape
        if isinstance(results, torch.Tensor):
            print(f"1Shape of results as PyTorch tensor: {results.shape}")
        else:
            print("1results is not a PyTorch tensor.")

        #results_with_batch = [np.expand_dims(r, axis=0) for r in results]
        
        # If results[0] is a numpy array or a PyTorch tensor, print its shape
        if isinstance(results, np.ndarray):
            print(f"3Shape of results as numpy array: {results.shape}")
        elif 'torch' in str(type(results)):  # Checking for a PyTorch tensor in a more general way
            print(f"3Shape of results as PyTorch tensor: {results.shape}")
        else:
            print("3results is not a numpy array or PyTorch tensor.")
        results_tensor = [torch.from_numpy(r.astype(np.uint8)) for r in results]


        return (results_tensor,)

        
NODE_CLASS_MAPPINGS = {
    "StereoImageNode": StereoImageNode
}

NODE_DISPLAY_NAME_MAPPINGS = {
    "StereoImageNode": "Stereo Image Node"
}

And this is the modification to stereoimage_generate.py (which really shouldn't be needed if I provided it with the correct format):

            raise Exception('Unknown mode')
            
    #print(results)
    # If results[0] is a numpy array or a PyTorch tensor, print its shape
    if isinstance(results[0], np.ndarray):
        print(f"2Shape of results[0] as numpy array: {results[0].shape}")
    elif 'torch' in str(type(results[0])):  # Checking for a PyTorch tensor in a more general way
        print(f"2Shape of results[0] as PyTorch tensor: {results[0].shape}")
    else:
        print("2results[0] is not a numpy array or PyTorch tensor.")
        
    #results = [np.asarray(r, dtype=np.uint8) for r in results]
    return results



def apply_stereo_divergence(original_image, depth, divergence, separation, stereo_offset_exponent, fill_technique):
    # # Explicitly remove the batch dimension and ensure numpy array format
    original_image = np.squeeze(original_image)

    depth = np.squeeze(depth)
        
    print(f"Data type of original_image: {original_image.dtype}")
    print(f"Data type of depth: {depth.dtype}")
    print(f"Post-processing pixel range: min={original_image.min()}, max={original_image.max()}")

    print(f"Final check original image shape: {original_image.shape}")
    print(f"Final check depth map shape: {depth.shape}")

Semjon Kravtšenko · Answer 3 · Sat Mar 09 2024 01:51:46 GMT+0800 (China Standard Time)

The annoying bugs most of the time are like this...
Sometimes we win, but sometimes we lose, such is life :)
Thank you for posting the code here, might be useful if somebody this one more time.
Actually, it would be cool to have all the features ported, and maybe even change this extension to be multi-platform - that is, working on both A1111 and ComfyUI. But that's more work of course.