mattduck / kilo

My implementation of the kilo text editor

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Disclaimer

This project is something fun that I did at the start of 2020. It might be useful to help understand how you can build new features on top of Kilo, but please read the changes with scepticism as they haven’t been thoroughly tested. In particular, the basic undo/redo implementation that I started writing is full of memory leaks and makes this code unusable as a real editor (see #1 for more details).

Kilo

This is my implementation of the Kilo text editor, written by following Build your own text editor.

It was initially written as an org-mode file, as an exercise for me to learn a bit more about writing terminal applications with C, and to see whether the literate programming approach with org-mode is useful.

Overall I think embedding the code in this file actually made it harder to keep the overall structure in my head as I went, because I was only operating on individuals parts at a time. Next time I will just write notes separately.

I’ve now renamed the org-mode version to kilo-org.c. For future edits I’ll work on kilo.c directly.

New features

I’ve extended kilo.c with a few things that I’m used to from vim/emacs:

  • Splitting user input into normal and insert modes.
  • Word-based cursor movement that is normally found with w/W/b/B
  • A new prompt to simulate :wq and :q!.
  • Standard cursor movement with hjkl, ^/$, C-f/C-b, gg and G.
  • Using dd to remove lines, and J to join lines.
  • Adding the jj and jk bindings that I use in insert mode to exit to normal mode (which means waiting for a follow-up key to j, and inserting it into the row if it doesn’t come after a set timeout).

Compile with org-mode

This just concatenates all the C snippets to kilo.c, and then runs make.

(interactive)
(setq-local org-confirm-babel-evaluate nil)
(org-babel-tangle nil "kilo-org.c" "c")
(compile "make")

Code

Feature test macros

There are various macros that you can define that control what features are available to the compiler. There is more info in the GNU libc documentation. Some are added in step 59, to remove a warning about implicit declaration of getline().

# define _DEFAULT_SOURCE
# define _BSD_SOURCE
# define _GNU_SOURCE

Includes

#include <ctype.h>
#include <errno.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdarg.h>
#include <stdlib.h>
#include <string.h>
#include <sys/ioctl.h>
#include <sys/types.h>
#include <time.h>
#include <termios.h>
#include <unistd.h>

Constants

#define KILO_VERSION "0.0.1"
#define KILO_TAB_STOP 4
#define KILO_QUIT_TIMES 2

Some of these macros (like CTRL_KEY below) take a parameter, similar to functions. The main advantage of doing this is that the preprocessor replaces the template so there’s no stack or function call needed. There are downsides too: if you have a lot of macros it can increase the binary size, and they’re limited because they’re not functions - you can’t return a parameter, you can’t do recursion, etc.

In ASCII, the CTRL character strips bits 5 and 6 from whatever key you press. For example, h is 01101000, and C-h is 00001000. We define this below:

#define CTRL_KEY(k) ((k) & 0x1F)
enum editorKey {
                BACKSPACE = 127,
                ARROW_LEFT = 1000,
                ARROW_RIGHT,
                ARROW_UP,
                ARROW_DOWN,
                DEL_KEY,
                HOME_KEY,
                END_KEY,
                PAGE_UP,
                PAGE_DOWN
};

enum editorHighlight {
                      HL_NORMAL = 0,
                      HL_COMMENT,
                      HL_MLCOMMENT,
                      HL_KEYWORD1,
                      HL_KEYWORD2,
                      HL_STRING,
                      HL_NUMBER,
                      HL_MATCH
};
#define HL_HIGHLIGHT_NUMBERS (1<<0)
#define HL_HIGHLIGHT_STRINGS (1<<1)

State

The global editor state is stored in editorConfig. This stores data like the cursor position, screen offset, size of the terminal, whether the buffer has been modified, the associated filename, etc. It also contains some setup and teardown data (like the properties of the user’s terminal),

erow represents a single line of text. User input results in a lot of mutation of editorConfig, particularly the rows.

editorSyntax OTOH just contains information associated with a particular filetype, and is not affected by user input. The buffer can be associated with a single editorSyntax struct.

struct editorSyntax {
  char *filetype;
  char **filematch;
  char **keywords;
  char *singleline_comment_start;
  char *multiline_comment_start;
  char *multiline_comment_end;
  int flags;
};

typedef struct erow {
  int idx;  // which row in the buffer it represents
  int size;  // the row length, excluding the null byte at the end.
  char *chars;  // the characters in the line
  int rsize; // the length of the "rendered" line, where eg. \t will expand to n spaces
  char *render;  // the "rendered" characters in the line
  unsigned char *hl;  // the highlight property of a character
  int hl_open_comment;  // whether this line begins or is part of a multiline comment
} erow;

struct editorConfig {
  int cx, cy;  // cursor
  int rx;  // render index, as some chars are multi-width (eg. tabs)
  int rowoff; // file offset
  int coloff; // same as above
  int screenrows; // size of the terminal
  int screencols; // size of the terminal
  int numrows;  // size of the buffer
  erow *row;  // current row
  int dirty;  // is modified?
  char *filename;  // name of file linked to the buffer
  char statusmsg[80];  // status message displayed on at bottom of buffer
  time_t statusmsg_time;  // how long ago status message was written
  struct editorSyntax *syntax;  // the syntax rules that apply to the buffer
  struct termios orig_termios;  // the terminal state taken at startup; used to restore on exit
};

struct editorConfig E;  // the global state

Filetypes

The tutorial specifies an entry for C:

char *C_HL_extensions[] = { ".c", ".h", ".cpp", NULL };
char *C_HL_keywords[] = {
  "switch", "if", "while", "for", "break", "continue", "return", "else",
  "struct", "union", "typedef", "static", "enum", "class", "case",
  "int|", "long|", "double|", "float|", "char|", "unsigned|", "signed|",
  "void|", NULL
};

struct editorSyntax HLDB[] = {
                              {"c",
                               C_HL_extensions,
                               C_HL_keywords,
                               "//", "/*", "*/",
                               HL_HIGHLIGHT_NUMBERS | HL_HIGHLIGHT_STRINGS
                              },
};

#define HLDB_ENTRIES (sizeof(HLDB) / sizeof(HLDB[0]))

Exiting

Most C library functions that fail set the global errno. perror() looks at this and prints a descriptive message for it - for example, “inappropriate ioctl for device”.

void die(const char *s) {
  write(STDOUT_FILENO, "\x1b[2J", 4);  // clear screen
  write(STDOUT_FILENO, "\x1b[H", 3);  // reposition cursor
  perror(s);
  exit(1);
}

Prototypes

C compiles in a single pass, so you can’t always call functions that aren’t defined yet. We can define the signature though. These are the few functions that are required:

void editorSetStatusMessage(const char *fmt, ...);
void editorRefreshScreen();
char *editorPrompt(char *prompt, void (*callback)(char *, int));

Append buffer

Rather than calling write() regularly to modify the terminal output, we instead buffer everything in abuf, and only write to the terminal once our update is complete. This reduces the number of updates, can prevent screen flickering, etc.

struct abuf {
  char *b;
  int len;
};

#define ABUF_INIT {NULL, 0}  // Represents an empty buffer

void abAppend(struct abuf *ab, const char *s, int len) {
  // Get a block of memory that is the size of the current string, plus the
  // string we're appending.
  char *new = realloc(ab->b, ab->len + len);

  if (new == NULL) return;
  memcpy(&new[ab->len], s, len);  // copy "s" after the current data
  ab->b = new;
  ab->len += len;
}

void abFree(struct abuf *ab) {
  free(ab->b);
}

Terminal

There are a few functions here that just get information from the terminal. editorReadKey() translates ANSI codes into an editorKey() enum:

int editorReadKey() {
  int nread;
  char c;
  // read() returns the number of bytes read
  while ((nread = read(STDIN_FILENO, &c, 1)) != 1) {
    if (nread == -1 && errno != EAGAIN) die("read");
  }

  if (c == '\x1b') {
    char seq[3];
    if (read(STDIN_FILENO, &seq[0], 1) != 1) return '\x1b';
    if (read(STDIN_FILENO, &seq[1], 1) != 1) return '\x1b';
    if (seq[0] == '[') {

      // Page up / down, which are represented by \x1b[5~ and \x1b[6~
      if (seq[1] >= '0' && seq[1] <= '9') {
        if (read(STDIN_FILENO, &seq[2], 1) != 1) return '\x1b';
        if (seq[2] == '~') {
          switch (seq[1]) {
          case '1': return HOME_KEY;
          case '3': return DEL_KEY;
          case '4': return END_KEY;
          case '5': return PAGE_UP;
          case '6': return PAGE_DOWN;
          case '7': return HOME_KEY;
          case '8': return END_KEY;
          }
        }
      } else {

        // Arrows
        switch (seq[1]) {
        case 'A': return ARROW_UP;
        case 'B': return ARROW_DOWN;
        case 'C': return ARROW_RIGHT;
        case 'D': return ARROW_LEFT;
        case 'H': return HOME_KEY;
        case 'F': return END_KEY;
        }
      }
    } else if (seq[0] == '0') {
      switch (seq[1]) {
      case 'H': return HOME_KEY;
      case 'F': return END_KEY;
      }
    }
    return '\x1b';
  } else {
    return c;
  }
}

Control characters are prefixed by ESC. If we read ESC, immediately read two more bytes into seq. If the reads timeout, then assume the user just pressed escape.

getCursorPosition below doesn’t really need to exist for me. It is only used in getWindowSize if TIOCGWINSZ isn’t supported by the terminal.

int getCursorPosition (int *rows, int *cols) {
  char buf[32];
  unsigned int i = 0;
  // 6n (in the line below) asks for the cursor position. 6 is a function that
  // queries for terminal status info.
  if (write(STDOUT_FILENO, "\x1b[6n", 4) != 4) return -1;
  while (i < sizeof(buf) -1){
    if (read(STDIN_FILENO, &buf[i], 1) != 1) break;
    if (buf[i] == 'R') break;
    i++;
  }
  buf[i] = '\0';  // printf expects strings to end with a 0 byte

  if (buf[0] != '\x1b' || buf[1] != '[') return -1;

  // sscanf will parse out two integers ("%d;%d") and put them into rows/cols.
  if (sscanf(&buf[2], "%d;%d", rows, cols) != 2) return -1;

  printf("\r\n&buf[1]: '%s'\r\n", &buf[1]);
  editorReadKey();
  return -1;
}
int getWindowSize(int *rows, int *cols) {
  struct winsize ws;
  if (ioctl(STDOUT_FILENO, TIOCGWINSZ, &ws) == -1 || ws.ws_col == 0) {
  // ~C~ is cursor forward, and ~B~ is cursor down. We assume that 999 is a large
  // enough value to position to the bottom right.
    if (write(STDOUT_FILENO, "\x1b[999C\x1b[999B", 12) != 12) return -1;
    return getCursorPosition(rows, cols);
  } else {
    *cols = ws.ws_col;
    *rows = ws.ws_row;
    return 0;
  }
}

TIOCGWINSZ tells the terminal to return the window size. We check for 0 in the column value because “apparently” that’s a possible outcome.

Raw mode

struct termios orig_termios;

void disableRawMode() {
  if (tcsetattr(STDIN_FILENO, TCSAFLUSH, &E.orig_termios) == -1) die("tcsetattr");
}

void enableRawMode() {
  if (tcgetattr(STDIN_FILENO, &E.orig_termios) == -1) die("tcgetatr");
  atexit(disableRawMode);

  struct termios raw = E.orig_termios;
  raw.c_iflag &= ~(BRKINT | ICRNL | INPCK | ISTRIP | IXON);
  raw.c_oflag &= ~(OPOST);
  raw.c_cflag |= ~(CS8);
  raw.c_lflag &= ~(ECHO | ICANON | IEXTEN | ISIG);

  raw.c_cc[VMIN] = 0;
  raw.c_cc[VTIME] = 1;  // 100ms
  if (tcsetattr(STDIN_FILENO, TCSAFLUSH, &raw) == -1) die("tcsetattr");
}
  • TCSAFLUSH specifies when to apply the setattr change.
  • ECHO is a bitflag - &= ~~(ECHO) flips the echo bit off (00000000000000000000000000001000). We also do this to the ICANON flag, which disables canonical mode, making us read one byte at a time rather than reading the whole line when enter is pressed.

    IEXTEN controls C-v, and ISIG controls the C-c and C-z signals.

    IXON controls C-s and C-q, and ICRNL controls a feature where \r (character 13) is turned into a newline (character 10).

    OPOST controls some output processing. The main thing we want to disable here (and possibly the only thing enabled by default) is the output translation of \n into \r\n. The terminal requires these as distinct characters to begin a new line.

  • The CS8 line is not a flag, it’s a bit mask with multiple bits. Here we set the character size (CS) to 8 bits per byte. This is often a default.
  • c_lflag stores “local” flags, which is apparently a dumping ground for a few miscellaneous things. There are also iflag (input), oflag (output) and clfag (control flags).
  • c_cc stands for “control characters”. VMIN sets the minimum number of bytes of input needed before read() can return - we use 0 so that read() will return as soon as there’s any input to read. VTIME is the timeout value in 10ths of a second.

Syntax highlighting

This is one of the bigger features. editorUpdateSyntax operates on a single row, setting each column of the hl array according to that column’s syntax property. When following the steps, we initially only supported syntax state within a single line. Afterwards the multi-line feature was added.

This implementation could easily get unwieldy if you wanted to add support for more syntax features, because there’s a lot of state to keep track of in the main loop.

int is_separator(int c) {
  return isspace(c) || c == '\0' || strchr(",.()+-/*=~%<>[];", c) != NULL;
}

void editorUpdateSyntax(erow *row) {
  // The hl array is the same size as the render array
  row->hl = realloc(row->hl, row->rsize);
  memset(row->hl, HL_NORMAL, row->rsize);

  if (E.syntax == NULL) return;

  char **keywords = E.syntax->keywords;

  char *scs = E.syntax->singleline_comment_start;
  char *mcs = E.syntax->multiline_comment_start;
  char *mce = E.syntax->multiline_comment_end;

  int scs_len = scs ? strlen(scs) : 0;
  int mcs_len = mcs ? strlen(mcs) : 0;
  int mce_len = mce ? strlen(mce) : 0;

  int prev_sep = 1; // beginning of line can be considered a separator
  int in_string = 0;  // we store the string char in here so we know when it closes
  int in_comment = (row->idx > 0 && E.row[row->idx - 1].hl_open_comment);

  int i = 0;
  while (i < row->size) {
    char c = row->render[i];
    unsigned char prev_hl = (i > 0) ? row->hl[i - 1] : HL_NORMAL;

    // single line comments
    if (scs_len && !in_string && !in_comment) {
      if (!strncmp(&row->render[i], scs, scs_len)) {
          memset(&row->hl[i], HL_COMMENT, row->rsize - i);
          break;
      }
    }

    // multiline comments
    if (mcs_len && mce_len && !in_string){
      if (in_comment) {
        row->hl[i] = HL_MLCOMMENT; // highlight
        if (!strncmp(&row->render[i], mce, mce_len)) { // match end?
          memset(&row->hl[i], HL_MLCOMMENT, mce_len);  // highlight end token
          i += mce_len;
          in_comment = 0;
          prev_sep = 1;
          continue;
        } else {
          i++;
          continue;
        }
      } else if (!strncmp(&row->render[i], mcs, mcs_len)) { // match multiline start?
        memset(&row->hl[i], HL_MLCOMMENT, mcs_len);  // highlight the start token
        i += mcs_len;
        in_comment = 1;
        continue;
      }
    }

    if (E.syntax->flags & HL_HIGHLIGHT_STRINGS) {
      if (in_string) {
        row->hl[i] = HL_STRING;
        // backslashes should keep this as a string
        if (c == '\\' && i + 1 < row->rsize) {
          row->hl[i+1] = HL_STRING;
          i += 2;
          continue;
        }

        if (c == in_string) in_string = 0;  // this is the closing quote
        i ++;
        prev_sep = 1;
        continue;
      } else {
        if (c == '"' || c == '\''){
          in_string = c;
          row->hl[i] = HL_STRING;
          i++;
          continue;
        }
      }
    }

    if (E.syntax->flags & HL_HIGHLIGHT_NUMBERS) {
      if ((isdigit(c) && (prev_sep || prev_hl == HL_NUMBER)) ||
          (c == '.' && prev_hl == HL_NUMBER)) {  // support if number is a decimal
        row->hl[i] = HL_NUMBER;
        i ++;
        prev_sep = 0;  // it wasn't a separator because we know it was number
        continue;
      }
    }

    if (prev_sep) {
      int j;
      for (j = 0; keywords[j]; j++) {
        int klen = strlen(keywords[j]);
        int kw2 = keywords[j][klen - 1] == '|';
        if (kw2) klen--;

        if (!strncmp(&row->render[i], keywords[j], klen) &&
            is_separator(row->render[i + klen])) {
          memset(&row->hl[i], kw2 ? HL_KEYWORD2 : HL_KEYWORD1, klen);
          i += klen;
          break;
        }
      }
      if (keywords[j] != NULL) {
        prev_sep = 0;
        continue;
      }
    }

    prev_sep = is_separator(c);
    i++;
  }

  // set hl_open_comment appropriately
  int changed = (row->hl_open_comment != in_comment);
  row->hl_open_comment = in_comment;
  if (changed && row->idx + 1 < E.numrows)
    // Recursive iteration over the rest of the file as the highlighting may
    // have changed.
    editorUpdateSyntax(&E.row[row->idx + 1]);
}

int editorSyntaxToColor(int hl) {
  switch (hl) {
  case HL_COMMENT:
  case HL_MLCOMMENT: return 36;
  case HL_KEYWORD1: return 33;
  case HL_KEYWORD2: return 32;
  case HL_STRING: return 35;
  case HL_NUMBER: return 31;
  case HL_MATCH: return 34;
  default: return 37;
  }
}

void editorSelectSyntaxHighlight() {
  /*Sets E.syntax based on E.filename */
  E.syntax = NULL;
  if (E.filename == NULL) return;
  char *ext = strchr(E.filename, '.');
  for (unsigned int j = 0; j < HLDB_ENTRIES; j++) {
    struct editorSyntax *s = &HLDB[j];
    unsigned int i = 0;
    while (s->filematch[i]){
      int is_ext = (s->filematch[i][0] == '.');
      if ((is_ext && !strcmp(ext, s->filematch[i])) ||
          (!is_ext && strstr(E.filename, s->filematch[i]))) {
        E.syntax = s;

        int filerow;
        for (filerow = 0; filerow < E.numrows; filerow++) {
          editorUpdateSyntax(&E.row[filerow]);
        }

      }
      i++;
    }
  }
}

Row operations

These functions operate on rows - eg. to insert a row in the buffer, or insert a character into a row. They do not operate on the cursor position or the file offset.

Translation between Cx<->Rx below is quite simple because there is only one character supported (tab). Having to hard-code every translation isn’t ideal though.

int editorRowCxToRx(erow *row, int cx) {
  int rx = 0;
  int j;
  for (j=0; j<cx; j++) {
    if (row->chars[j] == '\t')
      rx += (KILO_TAB_STOP - 1) - (rx % KILO_TAB_STOP);
    rx++;
  }
  return rx;
}

int editorRowRxToCx(erow *row, int rx) {
  // For a given row, converts the given rx value to the corresponding cx
  int cur_rx = 0;
  int cx;
  for (cx = 0; cx < row->size; cx++) {
    if (row->chars[cx] == '\t')
      cur_rx += (KILO_TAB_STOP - 1) - (cur_rx % KILO_TAB_STOP);
    cur_rx++;
    if (cur_rx > rx) return cx;
  }
  return cx;
}
void editorUpdateRow(erow *row) {
  int tabs = 0;
  int j;
  for (j = 0; j < row->size; j++) {
    if (row->chars[j] == '\t') tabs++;
  }

  free(row->render);
  row->render = malloc(row->size + tabs*(KILO_TAB_STOP - 1) + 1);

  int idx =0;
  for (j = 0; j < row->size; j++) {
    if (row->chars[j] == '\t') {
      // insert spaces until the next % 8 is hit.
      row->render[idx++] = ' ';
      while (idx % KILO_TAB_STOP != 0) row->render[idx++] = ' ';
    } else {
      // Print the character
      row->render[idx++] = row->chars[j];
    }
  }
  row->render[idx] = '\0';
  row->rsize = idx; // idx contains the number of characters we copied into row->render

  editorUpdateSyntax(row);
}

void editorInsertRow(int at, char *s, size_t len) {
  if (at < 0 || at > E.numrows) return;

  E.row = realloc(E.row, sizeof(erow) * (E.numrows + 1));
  memmove(&E.row[at + 1], &E.row[at], sizeof(erow) * (E.numrows - at));
  for (int j = at + 1; j <= E.numrows; j++) E.row[j].idx++;

  E.row[at].idx = at;

  E.row[at].size = len;
  E.row[at].chars = malloc(len + 1);
  memcpy(E.row[at].chars, s, len);
  E.row[at].chars[len] = '\0';

  E.row[at].rsize = 0;
  E.row[at].render = NULL;
  E.row[at].hl = NULL;
  E.row[at].hl_open_comment = 0;
  editorUpdateRow(&E.row[at]);

  E.numrows++;
  E.dirty++;
}

void editorFreeRow(erow *row) {
  free(row->render);
  free(row->chars);
  free(row->hl);
}

void editorDelRow(int at) {
  if (at < 0 || at >= E.numrows) return;
  editorFreeRow(&E.row[at]);
  memmove(&E.row[at], &E.row[at + 1], sizeof(erow) * (E.numrows - at - 1));
  for (int j = at; j < E.numrows - 1; j++) E.row[j].idx--;
  E.numrows--;
  E.dirty++;
}

void editorRowInsertChar(erow *row, int at, int c) {
  if (at < 0 || at > row->size) at = row->size; // bounds
  row->chars = realloc(row->chars, row->size + 2); // the new character + null byte
  // shift later chars along
  memmove(&row->chars[at + 1], &row->chars[at], row->size - at + 1);
  row->size++;
  row->chars[at] = c;
  editorUpdateRow(row);
  E.dirty++;
}

void editorRowAppendString(erow *row, char *s, size_t len) {
  row->chars = realloc(row->chars, row->size + len + 1);
  memcpy(&row->chars[row->size], s, len);
  row->size += len;
  row->chars[row->size] = '\0';
  editorUpdateRow(row);
  E.dirty++;
}

void editorRowDelChar(erow *row, int at) {
  if (at < 0 || at >= row->size) return;
  memmove(&row->chars[at], &row->chars[at + 1], row->size - at);
  row->size--;
  editorUpdateRow(row);
  E.dirty++;
}

Editor operations

These are more user-focused operations that can perform row operations but also managed the cursor at the same time. They do not manage the file offset though.

void editorInsertChar(int c){
  if (E.cy == E.numrows) { // the cursor is on the tilde after the last line
    editorInsertRow(E.numrows, "", 0);
  }
  editorRowInsertChar(&E.row[E.cy], E.cx, c);
  E.cx++;
}

void editorInsertNewline() {
  if (E.cx == 0) {
    editorInsertRow(E.cy, "", 0);
  } else {
    erow *row = &E.row[E.cy];
    editorInsertRow(E.cy + 1, &row->chars[E.cx], row->size - E.cx);
    row = &E.row[E.cy];
    row->size = E.cx;
    row->chars[row->size] = '\0';
    editorUpdateRow(row);
  }
  E.cy++;
  E.cx=0;
}

void editorDelChar() {
  if (E.cy == E.numrows) return;
  if (E.cx == 0 && E.cy == 0) return;

  erow *row = &E.row[E.cy];
  if (E.cx > 0) {
    editorRowDelChar(row, E.cx -1);
    E.cx--;
  } else {
    E.cx = E.row[E.cy - 1].size;
    editorRowAppendString(&E.row[E.cy - 1], row->chars, row->size);
    editorDelRow(E.cy);
    E.cy--;
  }
}

File I/O

char *editorRowsToString(int *buflen) {
  int totlen = 0;
  int j;
  for (j=0; j < E.numrows; j++)
    totlen += E.row[j].size + 1; // + 1 for newline
  *buflen = totlen; // so the caller can inspect how long the string is

  char *buf = malloc(totlen);
  char *p = buf;
  for (j=0; j<E.numrows; j++) {
    memcpy(p, E.row[j].chars, E.row[j].size);
    p += E.row[j].size;
    *p = '\n';
    p++;
  }

  return buf;
}
void editorOpen(char *filename) {
  free(E.filename);
  E.filename = strdup(filename); // copies the given string to new memory loc.

  editorSelectSyntaxHighlight();

  FILE *fp = fopen(filename, "r");
  if (!fp) die("fopen");

  char *line = NULL;
  size_t linecap = 0;
  ssize_t linelen;
  while ((linelen = getline(&line, &linecap, fp)) != -1) { // iterate over lines
    while (linelen > 0 && (line[linelen -1] == '\n' || line[linelen -1] == '\r'))
      linelen--;
    editorInsertRow(E.numrows, line, linelen);
  }
  free(line);
  fclose(fp);
  E.dirty = 0;
}

void editorSave() {
  if (E.filename == NULL) {
    E.filename = editorPrompt("Save as: %s (ESC to cancel)", NULL);
    if (E.filename == NULL) {
      editorSetStatusMessage("Save aborted");
      return;
    }
    editorSelectSyntaxHighlight();
  }

  int len;
  char *buf = editorRowsToString(&len);

  int fd = open(E.filename, O_RDWR | O_CREAT, 0644);
  if (fd != -1) {
    if (ftruncate(fd, len) != -1) {
      if (write(fd, buf, len) == len) {
        close(fd);
        free(buf);
        E.dirty = 0;
        editorSetStatusMessage("%d bytes written to disk", len);
        return;
      }
    }
    close(fd);
  }
  free(buf);
  editorSetStatusMessage("Can't save! I/O error: %s", strerror(errno));
}
  • getline() can be used to read lines from a file when we don’t know how much memory to allocate for each line. It allocates memory for the next line it reads, and sets the second argument to point to that memory. You can then feed it the pointer back, to try to reuse the memory next time you use getline().
  • We strip out the newline and CR before copying it into erow - we know that every erow represents a single line of text, so we don’t need to actually store those characters at the end.

Search

Search is implemented using the prompt. It loops through all the rows in the file, uses strstr() to see if there is a substring match, and then if so scrolls and moves the cursor to the row.

void editorFindCallback(char *query, int key) {
  static int last_match = -1;
  static int direction = 1;

  static int saved_hl_line;
  static char *saved_hl = NULL;

  if (saved_hl) {
    memcpy(E.row[saved_hl_line].hl, saved_hl, E.row[saved_hl_line].rsize);
    free(saved_hl);
    saved_hl = NULL;
  }

  if (key == '\r' || key == '\x1b') {
    last_match = -1;
    direction = 1;
    return;
  } else if (key == ARROW_RIGHT || key == ARROW_DOWN) {
    direction = 1;
  } else if (key == ARROW_LEFT || key == ARROW_UP) {
    direction = -1;
  } else {
    last_match = -1;
    direction = 1;
  }

  if (last_match == -1) direction = 1;
  int current = last_match;
  int i;
  for (i = 0; i < E.numrows; i++) {
    current += direction;

    // loops around the file
    if (current == -1) current = E.numrows - 1;
    else if (current == E.numrows) current = 0;

    erow *row = &E.row[current];
    char *match = strstr(row->render, query);
    if (match) {
      last_match = current;
      E.cy = current;
      E.cx = editorRowRxToCx(row, match - row->render);
      E.rowoff = E.numrows;

      saved_hl_line = current;
      saved_hl = malloc(row->rsize);
      memcpy(saved_hl, row->hl, row->rsize);
      memset(&row->hl[match - row->render], HL_MATCH, strlen(query));
      break;
    }
  }
}

void editorFind(){
  int saved_cx = E.cx;
  int saved_cy = E.cy;
  int saved_coloff = E.coloff;
  int saved_rowoff = E.rowoff;

  char *query = editorPrompt("Search: %s (ESC/Arrows/Enter)", editorFindCallback);
  if (query) {
    free(query);
  } else { // NULL query means they pressed ESC.
    E.cx = saved_cx;
    E.cy = saved_cy;
    E.coloff = saved_coloff;
    E.rowoff = saved_rowoff;
  }
}

Output

There are a few functions here that handle drawing the terminal output, scrolling, refreshing the screen, drawing the status bar, etc.

void editorScroll() {
  E.rx = 0;
  if (E.cy < E.numrows) {
    E.rx = editorRowCxToRx(&E.row[E.cy], E.cx);
  }
  if (E.cy < E.rowoff) { // is the cursor above the visible window?
    E.rowoff = E.cy;
  }
  if (E.cy >= E.rowoff + E.screenrows) {
    E.rowoff = E.cy - E.screenrows + 1;
  }
  if (E.rx < E.coloff) {
    E.coloff = E.rx;
  }
  if (E.rx >= E.coloff + E.screencols) {
    E.coloff = E.rx - E.screencols + 1;
  }
}
void editorDrawRows(struct abuf *ab) {
  int y;
  for (y = 0; y < E.screenrows; y++) {
    int filerow = y + E.rowoff;
    if (filerow >= E.numrows) {
      // Draw things that come after the rows
      if (E.numrows == 0 && y == E.screenrows / 3) {
        char welcome[80];
        int welcomelen = snprintf(welcome, sizeof(welcome),
                                  "Kilo editor -- version %s", KILO_VERSION);
        if (welcomelen > E.screencols) welcomelen = E.screencols;
        // Add spaces for padding to center the welcome message
        int padding = (E.screencols - welcomelen) / 2;
        if (padding) {
          abAppend(ab, "~", 1);
          padding--;
        }
        while (padding--) abAppend(ab, " ", 1);
        abAppend(ab, welcome, welcomelen);
      } else {
        abAppend(ab, "~", 1);
      }
    } else {
      // Draw the row
      int len = E.row[filerow].rsize - E.coloff;
      if (len < 0) len = 0;
      if (len > E.screencols) len = E.screencols;  // Truncate the len
      char *c = &E.row[filerow].render[E.coloff];
      unsigned char *hl = &E.row[filerow].hl[E.coloff];
      int j;
      int current_color = -1; // keep track of colour to keep number of resets down
      for (j=0; j<len; j++){
        // control characters
        if (iscntrl(c[j])) {
          char sym = (c[j] <= 26) ? '@' + c[j] : '?';
          abAppend(ab, "\x1b[7m", 4); // invert colours
          abAppend(ab, &sym, 1);
          abAppend(ab, "\x1b[m", 3);  // reset
          if (current_color != -1) {
            char buf[16];
            int clen = snprintf(buf, sizeof(buf), "\x1b[%dm", current_color);
            abAppend(ab, buf, clen);
          }

        } else if (hl[j] == HL_NORMAL) {
          if (current_color != -1) {
            abAppend(ab, "\x1b[39m", 5);
            current_color = -1;
          }
          abAppend(ab, &c[j], 1);
        } else {
          int color = editorSyntaxToColor(hl[j]);
          if (color != current_color) {
            current_color = color;
            char buf[16];
            int clen = snprintf(buf, sizeof(buf), "\x1b[%dm", color);
            abAppend(ab, buf, clen);
          }
          abAppend(ab, &c[j], 1);
        }
      }
      abAppend(ab, "\x1b[39m", 5); // reset at end of line
    }
    abAppend(ab, "\x1b[K", 3);  // clear the rest of the row before drawing
    abAppend(ab, "\r\n", 2);  // this means there's always an empty row at the
                              // bottom of the screen
  }
}

filerow above represents the offset row, whereas y represents the absolute row.

void editorDrawStatusBar(struct abuf *ab) {
  abAppend(ab, "\x1b[7m", 4);
  char status[80], rstatus[80];
  int len = snprintf(status, sizeof(status), "%.20s - %d lines %s",
                     E.filename ? E.filename : "[No Name]", E.numrows,
                     E.dirty ? "(modified)" : "");
  int rlen = snprintf(rstatus, sizeof(rstatus), "%s | %d/%d",
                      E.syntax ? E.syntax->filetype : "no ft", E.cy + 1, E.numrows);
  if (len > E.screencols) len = E.screencols; // bounds
  abAppend(ab, status, len);
  while (len < E.screencols) {
    if (E.screencols - len == rlen) { // The starting column index to start
                                      // printing rstatus
      abAppend(ab, rstatus, rlen);
      break;
    } else {
      abAppend(ab, " ", 1);
      len++;
    }
  }
  abAppend(ab, "\x1b[m", 3);
  abAppend(ab, "\r\n", 2);
}

void editorDrawMessageBar(struct abuf *ab) {
  abAppend(ab, "\x1b[K", 3);
  int msglen = strlen(E.statusmsg);
  if (msglen > E.screencols) msglen = E.screencols; // bounds
  if (msglen && time(NULL) - E.statusmsg_time < 5)
    abAppend(ab, E.statusmsg, msglen);
}
void editorRefreshScreen() {
  editorScroll();

  struct abuf ab = ABUF_INIT;
  abAppend(&ab, "\x1b[?25l", 6);  // hide cursor
  abAppend(&ab, "\x1b[H", 3);  // reposition cursor
  editorDrawRows(&ab);
  editorDrawStatusBar(&ab);
  editorDrawMessageBar(&ab);

  // Move the cursor
  char buf[32];
  // The ~[H~ escape sequence moves the cursor to the position given by the
  // coordinates. The +1 is to convert because the terminal uses 1-indexed values.
  snprintf(buf, sizeof(buf), "\x1b[%d;%dH", (E.cy - E.rowoff) + 1, (E.rx - E.coloff) + 1);
  abAppend(&ab, buf, strlen(buf));

  abAppend(&ab, "\x1b[?25h", 6);  // show cursor
  write(STDOUT_FILENO, ab.b, ab.len);
  abFree(&ab);
}

Below, the ... takes a varying number of arguments. Between va_start() and va_end() you can use va_arg() to get the next argument. va_start() needs to know the last argument before the variable arguments list starts, so it can know the address of the next arguments. In our case we don’t use va_arg(), but instead just pass ap to vsnprintf, which can format the string with a varying number of arguments.

void editorSetStatusMessage(const char *fmt, ...) {
  va_list ap;
  va_start(ap, fmt);
  vsnprintf(E.statusmsg, sizeof(E.statusmsg), fmt, ap);
  va_end(ap);
  E.statusmsg_time = time(NULL);
}

Input

These are the main user input functions. editorPrompt is similar to the main loop - it waits for user input and then runs a callback function on RET. editorProcessKeypress is basically a big case statement that checks the key enum and performs appropriate operations.

char *editorPrompt(char *prompt, void (*callback)(char *, int)) {
  size_t bufsize = 128;
  char *buf = malloc(bufsize);

  size_t buflen = 0;
  buf[0] = '\0';

  while (1) {
    editorSetStatusMessage(prompt, buf);
    editorRefreshScreen();

    int c = editorReadKey();
    if (c == DEL_KEY || c == CTRL_KEY('h') || c == BACKSPACE) {
      if (buflen !=0) buf[--buflen] = '\0';
    } else if (c == '\x1b') {
      editorSetStatusMessage("");
      if (callback) callback(buf, c);
      free(buf);
      return NULL;
    } else if (c == '\r') {
      if (buflen != 0) {
        // clear status message, return the user input
        editorSetStatusMessage("");
        if (callback) callback(buf, c);
        return buf;
      }
    } else if (!iscntrl(c) && c < 128) {
      if (buflen == bufsize - 1) {
        bufsize *= 2; // dynamically increase memory as user input grows
        buf = realloc(buf, bufsize);
      }
      buf[buflen++] = c;
      buf[buflen] = '\0';
    }
    if (callback) callback(buf, c);
  }
}

void editorMoveCursor(int key) {
  erow *row = (E.cy >= E.numrows) ? NULL : &E.row[E.cy]; // get current row

  switch (key) {
  case ARROW_LEFT:
    if (E.cx != 0) {
      E.cx--;
    } else if (E.cy > 0) {
        // Move to the row above
        E.cy--;
        E.cx = E.row[E.cy].size;
    }
    break;
  case ARROW_RIGHT:
    if (row && E.cx < row->size) { // limit horizontal scrolling by column width
      E.cx++;
    } else if (row && E.cx == row->size) {
      // Move to the row below
      E.cy++;
      E.cx = 0;
    }
    break;
  case ARROW_UP:
    if (E.cy != 0) {
      E.cy--;
    }
    break;
  case ARROW_DOWN:
    if (E.cy != E.numrows - 1) {  // Allow advancing past the screen, but not the file.
      E.cy++;
    }
    break;
  }

  // Limit the cursor to the end of the row. Fixes the case where
  // different rows have different widths and you move to the row above/below.
  row = (E.cy >= E.numrows) ? NULL : &E.row[E.cy];
  int rowlen = row ? row->size : 0;
  if (E.cx > rowlen) {
    E.cx = rowlen;
  }

}
void editorProcessKeypress() {
  static int quit_times = KILO_QUIT_TIMES;

  int c = editorReadKey();
  switch (c) {
  case '\r':
    editorInsertNewline();
    break;
  case CTRL_KEY('q'):
    if (E.dirty && quit_times > 0){
      editorSetStatusMessage("Warning! File has unsaved changes. "
                             "Press C-q %d more times to quit.", quit_times);
      quit_times --;
      return;
    }
    write(STDOUT_FILENO, "\x1b[2J", 4);  // clear screen
    write(STDOUT_FILENO, "\x1b[H", 3);  // reposition cursor
    exit(0);
    break;
  case CTRL_KEY('s'):
    editorSave();
    break;
  case HOME_KEY:
    E.cx = 0;
    break;
  case END_KEY:
    if (E.cy < E.numrows)
      E.cx = E.row[E.cy].size;  // move to end of the line
    break;
  case CTRL_KEY('f'):
    editorFind();
    break;
  case BACKSPACE:
  case CTRL_KEY('h'): // legacy - C-h produces "8", which used to represent backspace
  case DEL_KEY:
    if (c == DEL_KEY) editorMoveCursor(ARROW_RIGHT);
    editorDelChar();
    break;
  case PAGE_UP:
  case PAGE_DOWN:
    {

      // Set cursor y position to simulate scrolling the page
      if (c == PAGE_UP) {
        E.cy = E.rowoff;
      } else if (c == PAGE_DOWN) {
        E.cy = E.rowoff + E.screenrows - 1;
        if (E.cy > E.numrows) E.cy = E.numrows; // cap to end of file
      }

      // move the cursor
      int times = E.screenrows;
      while (times--)
        editorMoveCursor(c == PAGE_UP ? ARROW_UP : ARROW_DOWN);
    }
    break;
  case ARROW_UP:
  case ARROW_DOWN:
  case ARROW_LEFT:
  case ARROW_RIGHT:
    editorMoveCursor(c);
    break;

  // C-l traditionally refreshes the screen. don't do anything as we refresh by
  // default after each keypress.
  case CTRL_KEY('l'):
  case '\x1b':
    break;

  default:
    editorInsertChar(c);
    break;
  }

  quit_times = KILO_QUIT_TIMES;  // reset to 3
}

Main

The entry point. initEditor() initialises all the fields in the E struct. main() handles arguments and enters the main loop.

void initEditor () {
  E.cx = 0;  // horizontal cursor
  E.cy = 0;  // vertical cursor
  E.rx = 0;  // cursor index
  E.rowoff = 0;
  E.coloff = 0;
  E.numrows = 0;
  E.row = NULL;
  E.dirty = 0;
  E.filename = NULL;
  E.statusmsg[0] = '\0';
  E.statusmsg_time = 0;
  E.syntax = NULL;
  if (getWindowSize(&E.screenrows, &E.screencols) == -1) die("getWindowSize");
  E.screenrows -= 2;  // For the status bar and message bar
}
int main(int argc, char *argv[]) {
  enableRawMode();
  initEditor();

  if (argc >= 2) {
    editorOpen(argv[1]);
  }

  editorSetStatusMessage("HELP: C-Q: quit | C-S: save | C-f: find");

  while (1) {
    editorRefreshScreen();
    editorProcessKeypress();
  }
  return 0;
}

Log

Notes that I’m writing as I go.

Raw mode

By default the terminal starts in canonical/cooked mode, which captures a lot of user input rather than passing it straight to the program. Input is only sent to the program when you hit enter, and various keys have special terminal behaviour, like C-c and C-z.

Interestingly you can “break” your terminal by running Step 5, which sets some termios flags, and it has to be reset by the reset trick.

Step 15 disables various flags that nowadays are usually disabled by default (but it’s still good practice to disable them to enable “raw mode”).

C-s and C-q

C-s stops data from being transmitted to the terminal, and C-q resumes it. I haven’t used these before. Then can be disabled with the IXON termios flag.

EAGAIN

EAGAIN is returned by read() on timeout in Cygwin, instead of just returning 0. I’m not using Cygwin so I suspect it’s safe to remove that part.

VT100 escape sequences

In an escape sequence like \x1b[2J, J is the function and 2 is an argument to it. I hadn’t thought about this before - I think I had just treated “2J” as a whole.

The m command controls text attributes like bold (1), underscore (4), blink (5) and inverted colours (7).

ncurses uses the terminfo database to figure out the capabilities of a terminal and what the escape sequences for that terminal are. In our case we’re just hardcoding the VT100 sequences.

Home and End

Home and End can have multiple representations depending on the OS, which is why they’re added in multiple places in editorReadyKey() in step 52.

Hide the cursor when drawing

This is standard practice - the cursor might jump around the screen if we’re writing to it. This can be controlled with ?25h and ?25l, at least in later VT models.

Enums

If you set the first constant in an enum (as we do in step 48), then the remaining constants are incremented automatically.

Saving the file

A safer way to write the file would be to write it to a temporary file, ensure it succeeds safely, and then rename it to the desired location. This is mentioned in step 106.

openemacs

There’s a fork of the project that implements some emacs-like features (eg. the movement bindings).

About

My implementation of the kilo text editor


Languages

Language:C 99.0%Language:Shell 0.6%Language:Makefile 0.4%