ufal / whisper_streaming

Whisper realtime streaming for long speech-to-text transcription and translation

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Client sample in python.

donaldos opened this issue · comments

/I want to develop the client module written by Python.
so I wrote the sample code as below.

`
import socket

def send_file_over_socket(file_path, host, port, chunk_size):
try:

    with open(file_path, 'rb') as file:
        
        client_socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
        client_socket.connect((host, port))

        while True:               
            data = file.read(chunk_size)
            if not data:
                break 
            
            client_socket.send(data)

        client_socket.close()
        print("sending complete")

except Exception as e:
    print(f"Error on sending data': {str(e)}")

if name == 'main':
file_path = '/Users/Thankyou/Desktop/data/readalong.wav' # wave path
host = 'XXX.XXX.XXX.XXX'
port = XXXXXX # port
chunk_size = 32044 # chunk_size

send_file_over_socket(file_path, host, port, chunk_size)

`

And I got the error as follow
`From cffi callback <function SoundFile._init_virtual_io..vio_read at 0x7f897e1703a0>:
Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/soundfile.py", line 1214, in vio_read
data_read = file.readinto(buf)
File "/opt/conda/lib/python3.8/site-packages/soundfile.py", line 713, in getattr
raise AttributeError(
AttributeError: 'SoundFile' object has no attribute 'readinto'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/opt/conda/lib/python3.8/site-packages/soundfile.py", line 1219, in vio_read
buf[0:data_read] = data
ValueError: right operand length must match slice length
Traceback (most recent call last):
File "whisper_online_server.py", line 223, in
proc.process()
File "whisper_online_server.py", line 187, in process
a = self.receive_audio_chunk()
File "whisper_online_server.py", line 145, in receive_audio_chunk
audio, _ = librosa.load(sf,sr=SAMPLING_RATE)
File "/opt/conda/lib/python3.8/site-packages/librosa/core/audio.py", line 165, in load
raise (exc)
File "/opt/conda/lib/python3.8/site-packages/librosa/core/audio.py", line 146, in load
with sf.SoundFile(path) as sf_desc:
File "/opt/conda/lib/python3.8/site-packages/soundfile.py", line 629, in init
self._file = self._open(file, mode_int, closefd)
File "/opt/conda/lib/python3.8/site-packages/soundfile.py", line 1183, in _open
_error_check(_snd.sf_error(file_ptr),
File "/opt/conda/lib/python3.8/site-packages/soundfile.py", line 1357, in _error_check
raise RuntimeError(prefix + _ffi.string(err_str).decode('utf-8', 'replace'))
RuntimeError: Error opening SoundFile(<_io.BytesIO object at 0x7f898182bd10>, mode='r', samplerate=16000, channels=1, format='RAW', subtype='PCM_16', endian='LITTLE'): File contains data in an unknown format.
whisper-server-INFO: killing process 60900`

please comment your idea.

Thank you,.

Hi,
I'm sorry, but generally I'm not available for advice regarding your code. Issue tracker is primarily for reporting issues within the code in this the repo, or for related topics such as requesting new features etc.

If you want to send file to the server to simulate real-time processing, I suggest to make a simple script that emits the audio in real-time, e.g. 1 second audio every 1 second, and then use a pipe and netcat client.

@donaldos I implemented the same to run in a client-side python script. You can use this or modify it according to your use case:

`import pyaudio
import socket
import threading

p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server_address = ('<server_address>', )
sock.connect(server_address)

def is_allowed_char(char):
return char.isalnum() or char in [' ', '.', ',', '!', '?', ':', ';', '-', "'", '"']

def send_audio_callback(in_data, frame_count, time_info, status):
sock.sendall(in_data)
return (None, pyaudio.paContinue)

stream.stream_callback = send_audio_callback
def send_audio():
while True:
data = stream.read(1024)
sock.sendall(data)

def receive_response():
while True:
response = sock.recv(1024)
decoded_response = response.decode('utf-8')
cleaned_response = ''.join(char for char in decoded_response if is_allowed_char(char))
if cleaned_response:
with open ('response.txt','a') as f:
f.write(cleaned_response)
print(cleaned_response)

send_thread = threading.Thread(target=send_audio)
receive_thread = threading.Thread(target=receive_response)
send_thread.start()
receive_thread.start()

send_thread.join()
receive_thread.join()

stream.stop_stream()
stream.close()
p.terminate()
sock.close()`
Please note that this can handle only one client-server connection at a time. i am trying to make an API to overcome this. You can comment here or ping me if you have a similar use-case.