argent-smith / AutoFenc.vim

Tries to automatically detect file encoding

Home Page:http://www.vim.org/scripts/script.php?script_id=2721

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This is a mirror of http://www.vim.org/scripts/script.php?script_id=2721

This script tries to automatically detect and set file encoding when
opening a file in Vim. It does this in several possible ways (according
to the configuration) in this order (when a method fails, it tries
the following one):
  (1) detection of BOM (byte-order-mark) at the beginning of the file,
      only for some multibyte encodings
  (2) HTML way of encoding detection (via <meta> tag), only for HTML based
      file types
  (3) XML way of encoding detection (via <?xml ... ?> declaration), only
      for XML based file types
  (4) CSS way of encoding detection (via @charset 'at-rule'), only for
      CSS files
  (5) checks whether the encoding is specified in a comment (like
      '# Encoding: latin2'), for all file types
  (6) tries to detect the encoding via specified external program
      (the default one is enca), for all file types

If the autodetection fails, it's up to Vim (and your configuration)
to set the encoding.

Configuration options for this plugin (you can set them in your $HOME/.vimrc):
 - g:autofenc_enable (0 or 1, default 1)
     Enables/disables this plugin.
 - g:autofenc_emit_messages (0 or 1, default 0)
     Emits messages about the detected/used encoding upon opening a file.
 - g:autofenc_max_file_size (number >= 0, default 10485760)
     If the size of a file is higher than this value (in bytes), then
     the autodetection will not be performed.
 - g:autofenc_disable_for_files_matching (regular expression, see below)
     If the file (with complete path) matches this regular expression,
     then the autodetection will not be performed. It is by default set
     to disable autodetection for non-local files (e.g. accessed via ftp,
     scp etc., because the script can't handle some kind of autodetection
     for these files). The regular expression is matched case-sensitively.
 - g:autofenc_disable_for_file_types (list of strings, default [])
     If the file type matches some of the filetypes specified in this list,
     then the autodetection will not be performed. Comparison is done
     case-sensitively.
 - g:autofenc_autodetect_bom (0 or 1, default 0 if 'ucs-bom' is in
                              'fileencodings', 1 otherwise)
     Enables/disables detection of encoding by BOM.
 - g:autofenc_autodetect_html (0 or 1, default 1)
     Enables/disables detection of encoding for HTML based documents.
 - g:autofenc_autodetect_xml (0 or 1, default 1)
     Enables/disables detection of encoding for XML based documents.
 - g:autofenc_autodetect_css (0 or 1, default 1)
     Enables/disables detection of encoding for CSS documents.
 - g:autofenc_autodetect_comment (0 or 1, default 1)
     Enables/disables detection of encoding in comments.
 - g:autofenc_autodetect_num_of_lines (number >= 0, default 5)
     How many lines from the beginning and from the end of the file should
     be searched for the possible encoding declaration.
 - g:autofenc_autodetect_ext_prog (0 or 1, default 1)
     Enables/disables detection of encoding via external program
     (see additional settings below).
 - g:autofenc_ext_prog_path (string, default 'enca')
     Path to the external program. It can be either relative or absolute.
     The external program can take any number of arguments, but
     the last one must be a path to the file for which the encoding
     is to be detected (it will be supplied by this plugin).
     Output of the program must be the name of encoding in which the file
     is saved (string on a single line).
 - g:autofenc_ext_prog_args (string, default '-i -L czech')
     Additional program arguments (can be none, i.e. '').
 - g:autofenc_ext_prog_unknown_fenc (string, default '???')
     If the output of the external program is this string, then it means
     that the file encoding was not detected successfully. The string must
     be case-sensitive.

Requirements:
 - filetype plugin must be enabled (a line like 'filetype plugin on' must
   be in your $HOME/.vimrc [*nix] or %UserProfile%\_vimrc [MS Windows])

Notes:
  This script is by all means NOT perfect, but it works for me and suits my
  needs very well, so it might be also useful for you. Your feedback,
  opinion, suggestions, bug reports, patches, simply anything you have
  to say is welcomed!

  There are similar plugins to this one, so if you don't like this one,
  you can test these:
   - FencView.vim (http://www.vim.org/scripts/script.php?script_id=1708)
     Mainly supports detection of encodings for asian languages.
   - MultiEnc.vim (http://www.vim.org/scripts/script.php?script_id=1806)
     Obsolete, merged with the previous one.
   - charset.vim (http://www.vim.org/scripts/script.php?script_id=199)
     Not very complete/correct and last update in 2002.
   - http://vim.wikia.com/wiki/Detect_encoding_from_the_charset_specified_in_HTML_files
     Same basic ideas but only for HTML files.
  Let me know if there are others and I'll add them here.

Changelog:
1.3.1 (2011-07-23) Thanks to Benjamin Fritz for the updates in this version.
  - Fixed the plugin behavior when reloading a file with different settings.

1.3 (2011-04-22) Thanks to Benjamin Fritz for the updates in this version.
   - Added support for HTML version 5 encoding detection.
   - The script now dies gracefully in old Vims.
   - 'g:autofenc_autodetect_comment_num_of_lines' renamed to 'g:autofenc_autodetect_num_of_lines'

1.2.1 (2011-04-13)
   - Fixed a typo in a variable name (this resulted in an error in some
     occasions). Thanks to Charles Lee for pointing this bug out.

1.2 (2011-03-31) Thanks to Benjamin Fritz for the updates in this version.
  - TOhtml's IANA name/Vim encoding conversion functions are now used.
  - Changed BOM detection so it does not duplicate a check Vim already did by
    default (i.e. default to off if ucs-bom is in the 'fileencodings').
  - Put autocmds in the AutoFenc augroup for easier handling.
  - Made autocmd nested so we don't need to worry about restoring everything
    that other autocmds may set (e.g. syntax).
  - Jumplist or cursor position during detection are not affected.
  - The g:autofenc_autodetect_comment_num_of_lines option is now used also in
    HTML/XML/CSS detection routines (previously only used for encoding
    specified in comments).
  - Improved HTML charset line regex.
  - Added an option (g:autofenc_emit_messages) to emit messages about the
    detected/used encoding upon opening a file.

1.1.1 (2009-10-03)
  - Fixed the comment encoding detection function (the encoding was not
    detected if there were some alphanumeric characters before
    the "encoding" string, like in "# vim:fileencoding=<encoding-name>").

1.1 (2009-08-16)
  - Added three configuration possibilites to disable autodetection for
    specific files (based on file size, file type and file path).
    See script description for more info.

1.0.2 (2009-08-11)
  - Fixed the XML encoding detection function.
  - Minor code and documentation fixes.

1.0.1 (2009-08-02)
  - Encoding autodetection is now performed only if the opened file
    exists (is stored somewhere). So, for example, the autodetection
    is now not performed when a new file is opened.
  - Correctly works with .viminfo, where the last cursor position
    in the file is stored when exiting the file. In the previous version
    of this script, this information was sometimes ignored and the cursor
    was initially on the very last line in a file. If the user does not
    use this .viminfo feature (or he does not use .viminfo at all),
    then the cursor will be initially placed on the very first line.
  - (Hopefully) fixed the implementation of the function which sets
    the detected encoding.

1.0 (2009-07-26)
  - Initial release version of this script.

About

Tries to automatically detect file encoding

http://www.vim.org/scripts/script.php?script_id=2721


Languages

Language:Vim Script 100.0%