UltraStar-Deluxe / USDX

I'm not even sure which screens this applies to. From the top of my head, at least song selection, but maybe also during singing.

Currently, pressing K toggles something somewhere but there's no visual feedback whatsoever. What I don't propose is adding that visual feedback, but instead removing whatever K does altogether, because:

@MairusuPawa, myself and many others have never heard it do anything useful. It just completely garbles the audio
It never gets toggled intentionally -- always accidentally and then it's a not-fun time to figure out what key was pressed

Has anyone successfully used this feature at any point in the past?

I've looked into the code, this is the bit that appears to be the 'implementation' of automatic voice removal:

USDX/src/base/UMusic.pas

Lines 1058 to 1081 in 82d41ac

    
           procedure TVoiceRemoval.Callback(Buffer: PByteArray; BufSize: integer); 
        
           var 
        
             FrameIndex, FrameSize: integer; 
        
             Value: integer; 
        
             Sample: PPCMStereoSample; 
        
           begin 
        
             FrameSize := 2 * SizeOf(SmallInt); 
        
             for FrameIndex := 0 to (BufSize div FrameSize)-1 do 
        
             begin 
        
               Sample := PPCMStereoSample(Buffer); 
        
               // channel difference 
        
               Value := Sample[0] - Sample[1]; 
        
               // clip 
        
               if (Value > High(SmallInt)) then 
        
                 Value := High(SmallInt) 
        
               else if (Value < Low(SmallInt)) then 
        
                 Value := Low(SmallInt); 
        
               // assign result 
        
               Sample[0] := Value; 
        
               Sample[1] := Value; 
        
               // increase to next frame 
        
               Inc(PByte(Buffer), FrameSize); 
        
             end; 
        
           end;

I'm not too well read into this kind of stuff, but am I correct in saying that this is entirely based on channel difference, which in turn means it will only work if one channel is the normal track and the other is only the vocals? And not even vice-versa?
And that the result will always be mono?

If the answer to all of these things is "yes", then I would like to go ahead and delete it. Is this some kind of special but widely-used audio format for mono karaoke cabinets or something? I assume we have more people that would just have a duplicate txt for the stereo instrumental?

AFAIK it uses the difference between the left and the right channel. Thus the result will always be mono and it will not work if you don't have a stereo signal to begin with.

»If you subtract the channels (which can be accomplished by inverting one signal and then adding them together), the vocal signals are cancelled. This procedure often has the most profound effect on the lead singer and not the background vocals or instruments.« (found this by googling).

I've only tried it when it was added but was never satisfied with the result, do I haven't really used the feature, even though I think it was a nice idea to try (it is better than nothing if you prefer karaoke over sing-along for some songs).

We the rise of AI-based audio separation, I think it is safe to say that there are superior ways to implement (one of them potentially being the Syncer additionally creating an instrumental version if wanted, I have already looked into this a bit).

This is how karaoke machines (for instance, what you'd find in Japanese parlors in the 90s) used to work indeed.

👍 for removing that code completely.

Did some looking into this, apparently there's also a whole underlying bit with functions like AddSoundEffect, which from what I can see would all end up being unused. Would it be okay to completely remove that kind of infrastructure (are we ever going to do anything with it?), or would it be better to leave that in and only the bits specifically related to the voice removal?

	procedure TVoiceRemoval.Callback(Buffer: PByteArray; BufSize: integer);
	var
	FrameIndex, FrameSize: integer;
	Value: integer;
	Sample: PPCMStereoSample;
	begin
	FrameSize := 2 * SizeOf(SmallInt);
	for FrameIndex := 0 to (BufSize div FrameSize)-1 do
	begin
	Sample := PPCMStereoSample(Buffer);
	// channel difference
	Value := Sample[0] - Sample[1];
	// clip
	if (Value > High(SmallInt)) then
	Value := High(SmallInt)
	else if (Value < Low(SmallInt)) then
	Value := Low(SmallInt);
	// assign result
	Sample[0] := Value;
	Sample[1] := Value;
	// increase to next frame
	Inc(PByte(Buffer), FrameSize);
	end;
	end;

Delete 'K' karaoke mode