jtara1 / dictation

self-hosted dictation for speech to text anywhere on linux as exec & declarative build

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dictation

FlakeHub

This enables you to dictate your speech to text and toggle to pause or resume it with the press of a hotkey. Everything runs locally. This code helps build, download and load the model, add hotkeys, and run nerd-dictation. This was specifically tested and built for x86_64-linux X11.

Packages several things:

  • nerd-dictation
  • nerd-dictation-model
  • linux desktop app (inputs.dictation.nixosModules.default)
    • hotkeys.py
      • toggle-typing.sh
$ nix flake show github:jtara1/dictation
github:jtara1/dictation/8dafb6e2a7aecf33f6698f0f1e64587ebf1c6695
├───nixosModules
│   └───default: NixOS module
└───packages
    └───x86_64-linux
        ├───nerd-dictation: package 'nerd-dictation'
        └───nerd-dictation-model: package 'nerd-dictation-model'

Requirements

None if using or building through nix.

Each executable has its required packages, but these are installed through nix derivations I defined.

Some of the bigger models take several GB's on disk and ~5 GB memory.

By default, nerd-dictation-model (and linux desktop app), uses a bigger model.

For other models, see https://alphacephei.com/vosk/models and reference my model.nix.

Install

Flake

in your system flake,

  inputs = {
    # ...
    dictation.url = "github:jtara1/dictation";
    # if nixpkgs is defined here,
    dictation.inputs.nixpkgs.follows = "nixpkgs"; # where nixpkgs is your var for nixos nixpkgs in inputs
    # if home-manager is defined here,
    dictation.inputs.home-manager.follows = "home-manager"; # where home-manager is your var for home-manager in inputs
  };

  # ...
  # in outputs
  # in modules list, add entry, inputs.dictation.nixosModules.default

Other

Download an exec release for your system.

Usage

Linux Desktop App

Search for and open Dictation desktop application. This opens a terminal running hotkeys.py which can help you monitor the logs.

Direct Usage

Everything is layered so you can choose your entrypoint.

Change the 2 files to executable chmod +x myfile, check their hashbangs.

cp hotkeys.py ..
cd ..
./hotkeys.py

If you don't want the global hotkeys, you can

./toggle-typing.sh # start
./toggle-typing.sh # pause
./toggle-typing.sh # resume
# ...
./toggle-typing.sh end # end

requires: bash, nerd-dictation, GNU screen

If you kill its screen session directly, it won't deallocate memory for the model resulting in a memory leak.

Nix Run

Optionally build model derivation then link it

nix build 'github:jtara1/dictation#nerd-dictation-model'
src=$(nix path-info 'github:jtara1/dictation#nerd-dictation-model')
dst=~/.config/nerd-dictation/

mkdir -p ~/.config/nerd-dictation/
ln -sfn "$src"/model "$dst"/model

Alternatively, you can download, unpack, and move the model in place yourself.

Run nerd-dictation

nix run 'github:jtara1/dictation#nerd-dictation'

Hotkeys

After hotkeys.py is executed,

Hotkey description
ctrl + shift + ] load model and start dictation or pause dictation or resume dictation
ctrl + shift + [ end dictation deallocating memory taken by the model and libs

Default hotkeys are subject to change.

TODO

These are improvements I thought of. I'm not necessarily planning on doing these.

dictation/

  • bundle my nix-build derivation to serve an executable for non-Nix systems
  • test and fix usage on headless linux (TTY) - switch to using ydotool
  • test 1.0 release exec on fresh install of x86_64-linux (X11)
  • download and bundle a better model in the build of the store derivation
  • fix .desktop application - it's not linking to toggle-typing.sh correctly
  • absolute path for nerd-dictation and ./venv created by nix-shell python.nix and bundle nerd-dictation with nix module default.nix
  • python.nix (nix-shell) could build from my own derivation for nerd-dictation instead
  • add build attribute to let the user choose which model to download and use
  • avoid multiple processes running hotkeys.py - toggle-typing.sh should be multi-process safe as it should refer to the same screen session
  • add build attributes to define custom hotkeys
  • define word mapping for common words and phrases that aren't normally spoken (like "nix")
  • debug reason nerd-dictation dictating silence to "the" for me on silent idle -- (used an improved model for VOSK)
  • better default hotkeys
  • package as a nix flake
  • self-hosted server to run nerd-dictation with pulseaudio socket protocol changed to tcp?
  • other projects and APIs that offer speech to text?

References

This software is directly built on top of other software including:

About

self-hosted dictation for speech to text anywhere on linux as exec & declarative build

License:Apache License 2.0


Languages

Language:Nix 67.3%Language:Python 17.6%Language:Shell 15.1%