lukpueh / code-style-guidelines

Code Style Guidelines

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Code Style Guidelines

These guidelines provide examples of what to do (or not to do) when writing Python code for the projects of the Secure Systems Lab in the NYU Tandon School of Engineering. These guidelines are based upon @JustinCappos's experiences working with Stork, Guido van Rossum's Python style guidelines, and the experiences and suggestions of team members. Please give @JustinCappos feedback if there is anything you'd like to change.

Table of Contents

One of Guido's key insights in building Python is that code is read much more often than it is written. The guidelines provided here are intended to improve the readability of code and make it consistent across the wide spectrum of Python code. readability is the primary goal of the code written. The other features that your code must have are security, correctness, and robustness (notice that performance is not listed). The purpose of this document is to help to improve the readability of Secure Systems Lab code (which @JustinCappos believes strongly impacts the security, correctness, and robustness).

A style guide is about consistency. Consistency within a project is very important and since we're writing basically all of the code ourselves, there should be little reason for inconsistency.

There is only one good reason to break a rule in the style guideline and that's when applying the rule would make the code less readable, even for someone who is used to reading code that follows the rules.

Code lay-out

Indentation

Use 2 spaces per indentation level.

Tabs

Never use tabs to indent project code. There should be no tabs in Secure Systems Lab code.

Maximum Line Length

Try to limit most lines to a maximum of 79 characters. This certainly should be done for comments, but use common sense when applying this rule to code! I've seen developers who have a highly indented loop wrap a relatively short line across three separate lines to try to avoid going over 80 characters (don't do it!).

The preferred way of wrapping long lines is by using Python's implied line continuation inside parentheses, brackets and braces. If necessary, you can add an extra pair of parentheses around an expression, but sometimes using a backslash looks better. Make sure to indent the continued line appropriately; double-indentation (4 spaces) is prefered for line continuation, to distinguish it from functional indentation. The preferred place to break around a binary operator is after the operator, not before it. Some examples:

class Rectangle(Blob):
  def __init__(self, width, height,
    color='black', emphasis=None, highlight=0):
    if width == 0 and height == 0 and \
        color == 'red' and emphasis == 'strong' or \
        highlight > 100:
      raise ValueError("sorry, you lose")
    if width == 0 and height == 0 and (color == 'red' or
        emphasis is None):
      raise ValueError("I don't think so -- values are %s, %s" %
          (width, height))
    Blob.__init__(self, width, height,
        color, emphasis, highlight)

Blank Lines

Separate top-level function and class definitions with at least 5 blank lines.

Method definitions inside a class are separated by at least 2 blank lines.

Use blank lines in functions to indicate logical sections and help to offset comments.

Nesting

Where possible, for readability, avoid employing excessive levels of nesting and if and else clauses separated by a lot of code.

No:

if bizbaz:
  bizbaz.frobnicate(flimflam)
  print('Frobnication result: ' + repr(bizbaz))
  ...
  ...
  ...
else:
  raise YamException(...)

Yes:

  if not bizbaz:
    raise YamException(...)

  bizbaz.frobnicate(flimflam)
  print('Frobnication result: ' + repr(bizbaz))
  ...
  ...
  ...

If the not-bizbaz clause isn't going to return or raise an exception right away and you therefore need both if and else, it's slightly preferable to keep the shorter clause first, so that the else isn't too far from the if.

Imports

Imports should usually be on separate lines, e.g.:

Yes:

import os
import sys

No:

import sys, os

It is preferable to import an entire module than items from the module

Yes:

import subprocess

No:

from subprocess import Popen, PIPE

Definitely Not:

from subprocess import *

An exception to the last rule is when you must import items in a specific way for repy portability.

Imports are always put at the top of the file, just after any module comments and docstrings, and before module globals and constants.

Always use the absolute package path for all imports.

Avoid circular imports. This is where a.py imports b.py and then either b.py or a path of imports from b.py imports a.py. This does really odd things to Python, in particular if you perform any actions during import.

Try to avoid performing actions on module import. This doesn't play well with circular imports and is non-intuitive for most programmers.

Whitespace in Expressions and Statements

Pet Peeves

Avoid extraneous whitespace in the following situations:

Immediately inside parentheses, brackets or braces.

Yes:

spam(ham[1], {eggs: 2})

No:

spam( ham[ 1 ], { eggs: 2 } )

Immediately before a comma, semicolon, or colon:

Yes:

if x == 4: print x, y; x, y = y, x

No:

if x == 4 : print x , y ; x , y = y , x

Immediately before the open parenthesis that starts the argument list of a function call:

Yes:

spam(1)

No:

spam (1)

Immediately before the open bracket that starts an indexing or slicing:

Yes:

dict['key'] = list[index]

No:

dict ['key'] = list [index]

More than one space around an assignment (or other) operator to align it with another.

Yes:

x = 1
y = 2
long_variable = 3

No:

x             = 1
y             = 2
long_variable = 3

Other Recommendations

Always surround these binary operators with a single space on either side: assignment (=), augmented assignment (+=, -= etc.), comparisons (==, <, >, !=, <>, <=, >=, in, not in, is, is not), Booleans (and, or, not).

Use spaces around arithmetic operators:

Yes:

i = i + 1
submitted += 1
x = x * 2 - 1
hypot2 = x * x + y * y
c = (a + b) * (a - b)

Maybe not:

i=i+1
submitted +=1
x = x*2 - 1
hypot2 = x*x + y*y
c = (a+b) * (a-b)

Compound statements (multiple statements on the same line) are not allowed.

Yes:

if foo == 'blah':
  do_blah_thing()
do_one()
do_two()
do_three()

No:

if foo == 'blah': do_blah_thing()
do_one(); do_two(); do_three()

Definitely not:

if foo == 'blah': do_blah_thing()
else: do_non_blah_thing()

try: something()
finally: cleanup()

do_one(); do_two(); do_three(long, argument,
    list, like, this)

if foo == 'blah': one(); two(); three()

Comments

Comments that contradict the code are worse than no comments. Always make a priority of keeping the comments up-to-date when the code changes!

Comments should be complete sentences. If a comment is a phrase or sentence, its first word should be capitalized, unless it is an identifier that begins with a lower case letter (never alter the case of identifiers!).

If a comment is short, the period at the end can be omitted. Block comments generally consist of one or more paragraphs built out of complete sentences, and each sentence should end in a period.

You should use two spaces after a sentence-ending period.

When writing English, Strunk and White apply. This means you shouldn't have spelling or grammar errors. It also helps to make the project's code look more professional.

Block Comments

Block comments generally apply to some (or all) code that follows them, and are indented to the same level as that code. Each line of a block comment starts with a # and a single space (unless it is indented text inside the comment).

Paragraphs inside a block comment are separated by a line containing a single #.

Inline Comments

Use inline comments very sparingly.

An inline comment is a comment on the same line as a statement. Inline comments should be separated by at least two spaces from the statement. They should start with a # and a single space.

Inline comments are unnecessary and in fact distracting if they state the obvious. Don't do this:

x = x + 1                 # Increment x

But sometimes, this is useful:

x = x + 1                 # Compensate for border

Purpose of Comments

Comments should give the reader the context for why you are performing a specific action and indicate authorship and mindset.

It is very important that when you change code you didn't write you comment the change. I've spent days looking for bugs that people introduced because they changed code they thought they understood. Any time you make changes to a module or function you didn't write, use your initials in a comment describing the scope and purpose of the change.

Use the comment to explain what you are thinking when you make the change.

old code:

for line in file("foo"):
  print line.split()[1]

new code:

for line in file("foo"):
  # JAC: Need to check if there is a second word, if not then we
  # can skip the line because it isn't relevant to the output.   
  # Prior code threw an IndexError in this case.
  if len(line.split()) > 2:
    print line.split()[1]

Now when someone reads the new code they can understand why you changed what you did. If the original author reads your updated code, they can tell if you misunderstood their code and easily fix it (but if you misunderstood the code, this is a hint it needs better comments).

If you spend time struggling to understand something in code, it is useful to put a comment indicating you find it unclear but you think this is what is intended. This helps us to locate portions of the code that need to be cleaned up and also indicates "how" to clean the code.

Comments should describe why you are performing an action, not what action you are performing.

No:

i = i + 2     # add 2 to i

Yes:

i = i + 2     # skipping even numbers since even numbers > 2 are not prime

An exception to this rule is if you need to do some "magic operation" in the code.

# reuse the socket if it's recently been closed but is available 
socketobj.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

Hints for writing good comments

It is helpful many times to write comments as questions

# Is the file correctly signed?
if (foo.bar(fn)):

The "right time" to write comments is as you write the code (sometimes I even write them before the code). You will never have a better understanding of the code than when you write it.

If you are making assumptions then you should check that your assumptions are valid. If you can't test your assumptions then at least comment your code

total = 0
for item in list:
  # The list should contain only integers
  total = total + item

Comment quantity

There is no hard and fast rule for the number of comments a file should have. However, one way to check it is to read only the comments (not the code). If you could re-construct the code using only the comments, you are likely at the right level of comments.

Note that "well commented" modules may have more lines of comments than lines of code!

Documentation Strings

Conventions for writing good documentation strings (a.k.a. "docstrings") are immortalized in PEP 257.

Write docstrings for all public modules, functions, classes, and methods. Docstrings are not necessary for non-public methods, but you should have a comment that describes what the method does. This comment should appear after the def line.

Note that most importantly, the """ that ends a multiline docstring should be on a line by itself, and preferably preceded by a blank line, e.g.:

"""Return a foobang

Optional plotz says to frobnicate the bizbaz first.

"""

For one liner docstrings, it's okay to keep the closing """ on the same line.

Example documentation strings for files, modules, and classes.

Each module should have a header block that specifies its file name and explains the purpose (and any caveats or issues) of the module.

"""
<Program Name>
  storkreport.py

<Purpose>
  Provides Error Reporting/Logging Utility Functions.

  The functions are named loosely based on the syslog levels.  However,
  because syslog has 7 priority levels, and stork only has 4, the
  send_error and send_out_* series only contain debug, info, warning, and
  err (notice, alert, and emergency were omitted).  send_syslog supports
  all seven levels, however.

  Current syslog implementation assumes local syslog (no remote servers),
  and a facility of LOG_USER (defaults from the syslog module).

  On July 10th, 2007, Justin is doing a pretty substantial rewrite.   Blame
  him for any resulting cruft.
"""

Documentation strings are needed for every function in a module that will be called from other modules. It is recommended to create comment blocks for long functions even if they are private to a module. There are 5 main parts of a function comment block: purpose, arguments, exceptions, side effects, and return value. The purpose is to specify everything someone who calls your function needs to know (so they don't need to look at the code). For example:

def redirect_stdout(stream):
  """
  <Purpose>
    Redirects the standard output stream (stdout) to a new file stream.
    If this is the first time that output has been redirected, the
    original stdout stream will be saved for use with the restore_stdout
    function.

  <Arguments>
    stream:
           The new file stream for stdout.

  <Exceptions>
    TypeError on bad parameters.

  <Side Effects>
    Changes sys.stdout.

  <Returns>
    None.
  """

In general the use of classes is discouraged (link to below), but in the cases it is clear classes should be used, use the following format:

class single_conn(Thread):
  """
  <Purpose>
    Wraps up the client side of arizonacomm into a single class.
  <Side Effects>
    Defaults to running itself in a thread.
  <Example Use>
    # open a connection
    connection = single_conn()
    # write information to the connected party
    connection.send("hello")
    # and disconnect without checking if they have said anything
    connection.disconnect()
  """   

Classes that are used for exceptions can be written more succinctly (since the purpose, side effects, example use are obvious).

class UserError(Exception):
  """This exception indicates the user provided us with bad input"""

Naming Conventions

Descriptive Naming Styles

Use descriptive variable and function names.

Put at least one adjective in each variable name like slicename or 'inputfileobj or deststring. Also make sure that the type of the variable is readable by looking at the name. Also, the word "file" is ambiguous, use fileobj or filename (fn is an acceptable abbreviation for filename).

These rules are especially important for function arguments or variables that have long lifespans.

Yes:

inputfileobj = file(inputfn)   # fn is an acceptable abbreviation for file name
deststring = "abc" + currentstring
slicename = "uw_seattle"

No:

inobj = file(filename)
dest = "abc" + current
name = "uw_seattle"

Definitely not:

i = i + 1

The only case where it is okay to use single letter variable names is for arguments passed into a constructor or using e for an exception. For example:

class foo:
  ... # doc string omitted
  itemcount = None
  maxitemcount = None
  itemlist = None
  def __init__(self, i, m):
    self.itemcount = i
    self.maxitemcount = m
    self.itemlist = []

  def popanitem(self):
    try:
      return self.itemlist.pop()
    except KeyError, e:
      # Raise an error, the list is empty...
      raise KeyError("Can't pop an empty foo")

General Variable Names

Should be the same as function names, always use lower case, using an underscore if a separation is needed, such as: popanitem or pop_an_item

Constants

Variables which are set once and never change value should be all upper case, if a seperator is needed use an underscore.

Ignored Values

The variable name junk should be used in cases where a returned valued must be stored but the variable itself is unneeded.

Yes:

junk, surname = get_full_name()
print("Hello, Mr. " + surname)

No:

for _ in range(10):     
  do_something()

Yes:

for i in range(10):
  print(i)

No:

firstname, surname = get_full_name()
print("Hello, Mr. " + surname)

The majority of tools for static code analysis complain if a variable is assigned but not used thereafter. By convention, unused values are assigned to i, j, k, and _ (underscore character) and are understood as such. However, we think junk is more readable than _ and easier for newcomers to understand.

Exception Names

Because exceptions should be classes, the class naming convention applies here. However, you should use the suffix "Error" on your exception names (if the exception actually is an error).

Global Variable Names

(Let's hope that these variables are meant for use inside one module only.) The conventions are about the same as those for functions.

If you can do it without globals in an intelligent way, then don't use globals. When you must use globals then explain why you need to use globals. An example that uses globals in a good way is a program that keeps a cache.

Function Names

Function names should be lowercase, with words separated by underscores as necessary to improve readability.

mixedCase is allowed only in contexts where that's already the prevailing style (see for example the method names in threading.py), to retain backwards compatibility.

Function and method arguments

Always use self for the first argument to instance methods.

If a function argument's name clashes with a reserved keyword, it is generally better to append a single trailing underscore rather than use an abbreviation or spelling corruption. Thus print_ is better than prnt. (Perhaps better is to avoid such clashes by using a synonym.)

Programming Recommendations

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

Also, beware of writing if x when you really mean if x is not None -- e.g. when testing whether a variable or argument that defaults to None was set to some other value. The other value might have a type (such as a container) that could be false in a boolean context!

Use class-based exceptions

String exceptions in new code are forbidden, because this language feature is being removed in Python 2.6.

Modules or packages should define their own domain-specific base exception class, which should be subclassed from the built-in Exception class. Always include a class docstring. E.g.:

class MessageError(Exception):
  """Base class for errors in the email package."""

Class naming conventions apply here, although you should add the suffix "Error" to your exception classes, if the exception is an error. Non-error exceptions need no special suffix.

When raising an exception, use raise ValueError('message') instead of the older form raise ValueError, 'message'.

The paren-using form is preferred because when the exception arguments are long or include string formatting, you don't need to use line continuation characters thanks to the containing parentheses.

When catching exceptions, mention specific exceptions whenever possible instead of using a bare except: clause.

For example, use:

try:
  import platform_specific_module
except ImportError:
  platform_specific_module = None 

A bare except: clause will catch SystemExit and KeyboardInterrupt exceptions, making it harder to interrupt a program with Control-C, and can disguise other problems. If you want to catch all exceptions that signal program errors, use except Exception:.

A good rule of thumb is to limit use of bare except clauses to two cases:

  1. If the exception handler will be printing out or logging the traceback; at least the user will be aware that an error has occurred.

  2. If the code needs to do some cleanup work, but then lets the exception propagate upwards with raise. try...finally is a better way to handle this case.

Additionally, for all try/except clauses, limit the try clause to the absolute minimum amount of code necessary. Again, this avoids masking bugs.

Yes:

try:
  value = collection[key]
except KeyError:
  return key_not_found(key)
else:
  return handle_value(value)

No:

try:
  # Too broad!
  return handle_value(collection[key])
except KeyError:
  # Will also catch KeyError raised by handle_value()
  return key_not_found(key)

Use assert sparingly.

Assert has some tricky semantics because it is a statement, not a function. Thus you cannot call assert with a string to raise in the logical way

assert(x = y, 'x and y must be equal')   # BAD CODE

The above code will actually always be true because the tuple (bool, 'string') is truthy.

You should use assertions only when it is truly and internal error (i.e. somewhere it should be impossible to reach. Even so, it is fine to log / exitall instead.

Use string methods instead of the string module.

String methods are always much faster and share the same API with unicode strings. Override this rule if backward compatibility with Pythons older than 2.0 is required.

Use ''.startswith() and ''.endswith() instead of string slicing to check for prefixes or suffixes.

startswith() and endswith() are cleaner and less error prone. For example:

Yes:

if foo.startswith('bar'):

No:

if foo[:3] == 'bar':

Object type comparisons should always use isinstance() instead of comparing types directly.

Yes:

if isinstance(obj, int):

No:

if type(obj) is type(1):

When checking if an object is a string, keep in mind that it might be a unicode string too! In Python 2.3, str and unicode have a common base class, basestring, so you can do:

if isinstance(obj, basestring):

The types module has the StringTypes type defined for that purpose, e.g.:

import types
if isinstance(obj, types.StringTypes):

For sequences, (strings, lists, tuples), use the fact that empty sequences are false.

Yes:

if not seq:
if seq:

No:

if len(seq)
if not len(seq)

Don't write string literals that rely on significant trailing whitespace. Such trailing whitespace is visually indistinguishable and some editors (or more recently, reindent.py) will trim them.

Don't compare boolean values to True or False using ==

Yes:

if greeting:

No:

if greeting == True:

Worse:

if greeting is True:

However, there are a few cases where a function may return True, False, or other values. In these cases, checking if a value with an unknown type is True or False, is allowed.

Use the % string formatting operator only when necessary

The string formatting operator % provides a convenient way to do printf-like substitution of variables in strings. It's better in many cases to put variables in line when you don't need the more advanced options (like fixed length fields). The issue is that the string formatting operator makes understanding the output difficult in many cases. For example in the following example, it's difficult to read the code and determine what the resulting output will look like:

print "python %s %s %s/vesselinfodir/ %s/ > /tmp/carter.out 2> /tmp/carter.err"%(carter_script, dist_char, prefix,prefix)

A better way to write this:

print "python "+carter_script+" "+dist_char+" "+prefix+"/vesselinfodir/ "+prefix+"/ > /tmp/carter.out 2> /tmp/carter.err"

Use else statements for handling errors, not for normal flow in most cases.

For example, if you have a function that is supposed to take a positive integer argument, don't do the following:

if x > 0:
   ...
else: # x must be 0
   ...

It may be the case that x is negative, a different type, etc. Instead do the following:

if x > 0:
   ...
elif x == 0: # x must be 0
   ...
else:
   raise InternalError('Expected x to be a positive integer, not "'+str(x)+'"')

Potpourri

Unix not Windows style text files

If you develop on Windows make sure you use dos2unix before checking in.

Avoid objects

Objects are a terrific programming tool when used correctly, but make code nearly unreadable when used poorly. Unfortunately, even experienced developers have a hard time knowing how to correctly use objects. About 80% of the code one could write is equally good with and without objects. 10% is easier with objects and 10% is easier without them. Secure Systems Lab code should use objects only in very rare cases. The 90% of the code where objects can be avoided without significant impact, should be written without objects.

No lambda functions or lisp-esque code

Don't use lambda functions. Don't use map, flatten, etc. It's best not to use these functions and to write longer code that produces the same result instead because using these functions makes your code very dense and difficult to read.

Do not use syntax like:

must_testy = [must_test(line) for line in changed.split("\n")]

Write test cases

Since you're going to test your module anyways before integrating it into the rest of the project (right?), it makes sense to do it in a reusable way. This way when you make changes later, you can immediately test to see if anything is broken.

Write 90% tests. The test should make sure that it will catch 90% of the potential problems with the function. Avoid writing 50% tests (that only check a common case or two) and avoid writing 99.9% tests because it will consume too much of your time.

All "external" functions must be tested. Internal functions (starting with __) are up to your discretion.

Never use short circuit evaluation to produce output

No:

if synched or askokcancel("Change slice without synching?", "You
have changes that are not synched with the repository. These changes
will be lost if you change the slice right now. \n\nClick OK to change
slice without synching.", default=CANCEL):
  # then clause
else:
  # else clause

Yes:

if not synched:
  if askokcancel("Change slice without synching?", "You have
      changes that are not synched with the repository. These changes will
      be lost if you change the slice right now. \n\nClick OK to change
      slice without synching.", default=CANCEL):
    #then clause
  else:
    #else clause
else:
  #then clause

or if the then and else clauses are large write:

if not synched:
  isok = askokcancel("Change slice without synching?", "You have
      changes that are not synched with the repository. These changes will
      be lost if you change the slice right now. \n\nClick OK to change
      slice without synching.", default=CANCEL):
  else:
    isok = True

if isok:
  #then clause
else:
  #else clause

Writing the code the first way is hard for someone to read. The second two ways make it more clear that you are relying on the short circuit evaluation to perform or avoid the evaluation of the askokcancel function.

Don't use os.popen or os.system

These are deprecated and we've found bugs in them. Don't use them! Use subprocess instead.

Don't use subprocess.Popen with a string argument

(See also SeattleTestbed/attic#508)

Call subprocess.Popen with a first argument that looks like ['command', 'arg1', 'arg2', ...]. Do not use 'command arg1 arg2 ...'!

Avoid changing directory

(See also SeattleTestbed/attic#487)

Changing the current directory can cause multithreaded programs to break in scary ways and is usually an indication of bad programming style. Don't do it!

Don't use mutable objects as argument defaults

(See also SeattleTestbed/attic#828)

If mutable objects are modified in the function (or returned and modified outside of the function), the changes persist to future function calls. If you want the default of an empty list or dict, for example, use a default of None and then check for this value in the function and assign an empty list/dict inside the function.

About

Code Style Guidelines