molsonkiko / NPP_protein_lexer

A simple tool for applying colors to proteins in Notepad++

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Notepad++ Protein Lexer

It can be helpful when looking at a protein sequence to be able to quickly identify regions that are hydrophilic, acidic, basic, lipophilic, and so on.

Some online tools like CLUSTAL Omega have good colorizing, but wouldn't it be nice to have this kind of highlighting in Notepad++?

A multiple sequence alignment file after colorizing with protein_lexer.py

EnhanceAnyLexer version

This should be preferred if you have Notepad++ 8.4.3 or later (any version where EnhanceAnyLexer can be installed).

  1. Install the EnhanceAnyLexer plugin.
  2. Add protein_lexer_udl.xml into the userDefineLang folder as a child of the NotepadPlus element.
  3. Add protein_lexer.ini into EnhanceAnyLexerConfig.ini.
  4. Now this installation of Notepad++ will have colored proteins!

PythonScript version

I've created a script with the PythonScript plugin that colorizes protein files. Go to that link for info on how to install the plugin.

Once you've installed PythonScript to Notepad++, you can download the attached protein_lexer.py and drop it into the plugins/PythonScript/scripts subfolder of your Notepad++ installation's directory.

If you just want to colorize a file without always running the script at startup, you can just run it from the Plugins->PythonScript->Scripts drop-down menu whenever you open a protein file.

You can set the script to run on startup by opening plugins/PythonScript/scripts/setup.py and adding two lines to import protein_lexer. Then you can go to Plugins->PythonScript->Configuration... from the main menu and change the Initialisation combo box value to ATSTARTUP.

Change settings to load protein lexer at startup

Once the script runs, it will automatically colorize fasta and clustal_num files whenever they are opened in the editor.

You can add more file extensions and customize the colors for each type of amino acid by editing protein_lexer.py.

The styles are the tuples of three ints in all caps near the top of the file, e.g.

ACID_STYLE = (0xbe, 0, 0) # red

and the file extensions are just above that.

By default the colors for amino acids are as follows:

  • Acid (D, E): red
  • Amphiphilic (A, C, G, Y): black
  • Base (K, R): blue
  • Cyclic (Proline): green
  • Hydrophilic (N, Q, S, T): cyan
  • Lipophilic (I, L, M, F, W, V): grey
  • Any characters other than the standard one-letter codes will also be colored black.

    TODO

    1. The lexer is quite slow. For example, there's a noticeable delay in lexing even a 5kb file. It's faster when loading a file than it is when lexing a pre-opened file, though. Not sure if there's any way to fix this.
    2. Consider only applying styles to large blocks of several (say 8+) contiguous UPPERCASELETTERS. That might reduce performance though.
    3. For some more recent versions of Notepad++ (this probably doesn't apply for anything before 8.4.6), the little colored swatch at the side of a line that indicates if there was a saved or unsaved change since the file was opened will consume the entire line once the plugin has been run (see below). Not sure how to fix this.

    Annoying orange line for unsaved changes after plugin runs

    For reference, it should look like this:

    How unsaved changes should look

    About

    A simple tool for applying colors to proteins in Notepad++


    Languages

    Language:Python 100.0%