unazed / python-micro--optimizations

a handful of optimizations, tricks and cool things i've learned while doing python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

python-micro--optimizations

a handful of optimizations, tricks and cool things i've learned while doing python

This exists for the people that want to micro-optimize code and have the consequence of possibly more unreadable, unmanageable and possibly incompatible (yet faster) code.

return statements in single else clauses at the end of functions

def f(x):
  if x:
    return x
  else:
    return f(x+1)

Consider the following case; the else syntax is redundant as the prior return already breaks the flow of the program therefore there is no point for the redundant else clause. Surprisingly, even though this is an obvious simplification, I've seen it multiple times so make sure you don't mistakenly make it. Although, it is more of a design optimization than a performance optimization.

def f(x):
  if x:
    return x
  return f(x+1)

when you should be using if not x, if x, if x == 1, if x == 0, if x is None or if x == None

if not x evaluates to True for when: x -> 0

The condition does not evaluate to True for negatives or any positives -- nor any special float values like inf or nan

if x evaluates to True for when: 1 <= x <= inf or -inf <= x < 0

The condition does not evaluate to True just for 1, so keep in mind when doing if x (where x is arbitrary) that you're not assuming x to be 1 (unless within logical reason) and treating it as so.

if x == 1 evaluates to True for when: x -> 1

When doing any form of code-review, don't note something like this as a request for simplification to something like if x - since as explained above - unless the conditions exempt this malpractice, you cannot always assume that x is 1, but rather between -inf, -1 and 1, inf.

if x == 0 evaluates to True for when: x -> 0

Unlike previous explanations, this can be utmostly simplified to if not x.

if x is None evaluates to True for when: x -> NoneType

This is the preferred form of checking whether arbitrary x is None as it is faster (since it doesn't call an underlying __eq__).

if x == None evaluates to True for when: x -> NoneType

This method calls the underlying (yet seemingly unexistent/unimplemented/hidden -- clarify?) __eq__ function, which provides call overhead and execution overhead.

 unazed@unazed  ~  python3.7 -m timeit --setup "a = None" -c "a is None" 
10000000 loops, best of 5: 27.9 nsec per loop
 unazed@unazed  ~  python3.7 -m timeit --setup "a = None" -c "a == None" 
10000000 loops, best of 5: 35.9 nsec per loop

 unazed@unazed  ~  python3.6 -m timeit --setup "a = None" -c "a is None"
10000000 loops, best of 3: 0.0245 usec per loop
 unazed@unazed  ~  python3.6 -m timeit --setup "a = None" -c "a == None"
10000000 loops, best of 3: 0.0352 usec per loop

 unazed@unazed  ~  python3.2 -m timeit --setup "a = None" -c "a is None" 
10000000 loops, best of 3: 0.0283 usec per loop               
 unazed@unazed  ~  python3.2 -m timeit --setup "a = None" -c "a == None" 
10000000 loops, best of 3: 0.0362 usec per loop

 unazed@unazed  ~  python2.7 -m timeit --setup "a = None" -c "a is None" 
10000000 loops, best of 3: 0.0326 usec per loop
 unazed@unazed  ~  python2.7 -m timeit --setup "a = None" -c "a == None" 
10000000 loops, best of 3: 0.0547 usec per loop

 unazed@unazed  ~  python2.0
...
>>> start = time.time(); good(None); print("Took: %f seconds" % (time.time()-start))
# GOOD
Took: 0.001744 seconds
>>> start = time.time(); bad(None); print("Took: %f seconds" % (time.time()-start))
# BAD
Took: 0.001774 seconds

 unazed@unazed  ~  python1.6
... 
>>> start = time.time(); good(None); print("Took: %f" % (time.time()-start))
Took: 0.001356
>>> start = time.time(); bad(None); print("Took: %f" % (time.time()-start))
Took: 0.001371

Python 3.7.0a2, 3.6.3, 3.2.3, 2.7.14, 2.0.1, 1.6.1 showed performance differences for x == None versus x is None.

NOTE: x == 0 and not x have different execution speeds as well:

pure python is faster than regular expressions, so optimize where you can

import re
import sys


pattern = re.compile(r"(\d+)\.(\d+)\.(\d+)\.(\d+)")
match = pattern.match(sys.argv[1])

if not match:
  sys.exit("Failed to match against %r." % sys.argv[1])

for octet in match.groups():
  if not (0 <= int(octet) <= 255):
    sys.exit("Invalid octet %r." % octet)
print("Looks OK.")

Here is a simple IP address validator. For 1,000,000 cycles against 127.0.0.1 it took the logic 4.677 seconds to finish, however with a pure Python version:

import sys

try:
  groups = [int(i) for i in sys.argv[1].split('.')]
except ValueError:
  sys.exit("Failed to match.")

for octet in groups:
  if not (0 <= octet <= 255):
    sys.exit("Invalid octet %d." % octet)
print("Looks OK.")

Run against 1,000,000 cycles with 127.0.0.1 as input, the code took only 2.7 seconds to run.

RegEx: Took: 4.677459 seconds Pure: Took: 2.743565 seconds

NOTE: I didn't use map(...) since (a) it wouldn't catch any errors until converted to a list (b) when it is converted to a list, it'll remove the point of any speed provided by the iterator (c) a list comprehension is faster (d) a list comprehension is more Pythonic.

% formatting is still faster than str.format

 unazed@unazed  ~  python3.6 -m timeit -c "'%s %s%c' % ('hello', 'world', '!')"
100000000 loops, best of 3: 0.0164 usec per loop
 unazed@unazed  ~  python3.6 -m timeit -c "'{} {}{}'.format('hello', 'world', '!')"                                                                          
1000000 loops, best of 3: 0.395 usec per loop
 unazed@unazed  ~  python3.6 -m timeit -c "'{0} {1}{2}'.format('hello', 'world', '!')"                                               
1000000 loops, best of 3: 0.431 usec per loop

say no more (jokingly stated, please do use f-strings or str.format when possible as %-formatting is soon to-be deprecated)

NOTE: But to make it fair; here's the test results that came back for f strings, str.format and % formatting:

% formatting beats f strings roughly by 0.0001 seconds, and all of those beat str.format by 0.0002-0.0003-ish seconds

sys.exit, os._exit, exit, quit and SystemExit

sys.exit, exit and quit do all the same underlying task, the former sys.exit is preferred however opposed to the other two. However when you don't already have sys imported you might instead raise SystemExit which doesn't require importing the sys library.

os._exit is a bit bad, it doesn't call any clean-up procedures unlike the other three choices and just terminates the process.

while 1: instead of while True:

In Python 2: True is a global variable, so for every reference you make, it equates to a globals() variable retrieval which isn't very fast, in Python 3: True is a keyword, so it's a built in piece of syntax which doesn't need to be loaded, it's the constant integer equivalent of 1 and so False for 0.

Also, in Python you might notice that integers aren't globals nor locals - they're CONSTs, so they're faster to load than locals and globals, you might also know (if you've read what I've said above) that 1 is the same as True, so you can replace the two and it'll equalize performance over all Python versions 2 to 3.

implicit better than explicit?

  1. Always remember that open(...)'s default access mode is r/read, so there's no need to go write open("file", 'r')- rather - open("file")... oh and use context managers.

  2. socket.socket(...) (as of Python 3) doesn't need to take family and type parameters, calling socket.socket() will automatically create a AF_INET, SOCK_STREAM socket.

  3. You can replace trivial accounts of str.split(' ') with str.split(), although they are not technically the same as str.split(' ') splits on every space (0x20 ASCII) whereas str.split() splits on any account of whitespace (as found in strings.whitespace)-- it is, however, uncommon that you have to split specifically on the 0x20 character.

seq.copy(), seq[:] or list(seq)?

Note: list.copy() doesn't exist on Python 2

 unazed@unazed  ~  python3.6 -m timeit -n 10000000 --setup "seq = [1, 2, 3]" -c "seq.copy()" 
10000000 loops, best of 3: 0.128 usec per loop
 unazed@unazed  ~  python3.6 -m timeit -n 10000000 --setup "seq = [1, 2, 3]" -c "seq[:]"    
10000000 loops, best of 3: 0.112 usec per loop
 unazed@unazed  ~  python3.6 -m timeit -n 10000000 --setup "seq = [1, 2, 3]" -c "list(seq)"
10000000 loops, best of 3: 0.226 usec per loop

The literal notation seq[:] is faster than seq.copy() which are both faster than list(seq), but this is only for a three element list - here follows testing for lists with 256 Python 3.6.3 integers which all have sizes of 48 bytes, therefore making the list approximately, perfectly, 12KB.

 unazed@unazed  ~  python3.6 -m timeit -n 10000000 --setup "seq = [1<<160]*256" -c "seq.copy()" 
10000000 loops, best of 3: 1.36 usec per loop
 unazed@unazed  ~  python3.6 -m timeit -n 10000000 --setup "seq = [1<<160]*256" -c "seq[:]"
10000000 loops, best of 3: 1.35 usec per loop
 unazed@unazed  ~  python3.6 -m timeit -n 10000000 --setup "seq = [1<<160]*256" -c "list(seq)"
10000000 loops, best of 3: 1.39 usec per loop

The operations which are the fastest follow the same sequence as given for the smaller, three element list above. First seq[:], secondly seq.copy() and lastly list(seq).

Ahead of this, there'll be a table for different sized lists with the same 48 byte integer just a different magnitude. Also as a note, the cycles to be run will be smaller else it'll take hours for a single operation to execute; so the cycle size will be 100000.

Operation Amount of Elements Time Taken
seq.copy() 512 3.08 usec
seq[:] 512 3.06 usec
list(seq) 512 3.05 usec
seq.copy() 1024 5.94 usec
seq[:] 1024 5.93 usec
list(seq) 1024 5.85 usec
seq.copy() 2048 11.8 usec
seq[:] 2048 11.8 usec
list(seq) 2048 11.7 usec
seq.copy() 4192 19.7 usec
seq[:] 4192 19.7 usec
list(seq) 4192 18.6 usec
seq.copy() 8192 49.4 usec
seq[:] 8192 50.1 usec
list(seq) 8192 46.9 usec
seq.copy() 16384 100 usec
seq[:] 16384 100 usec
list(seq) 16384 93.6 usec
seq.copy() 32768 156 usec
seq[:] 32768 156 usec
list(seq) 32768 142 usec
seq.copy() 65536 315 usec
seq[:] 65536 313 usec
list(seq) 65536 286 usec

Now it's either that I'm drugged or there is actually a speed gain using list(seq) on bigger lists but I'm actually perplexed myself about this table, so on doing tests for smaller-sized lists ranging from values between the range of 2 and 256 assuming log_2 n ∈ Z with a cycle size of 1000000.

Operation Amount of Elements Time Taken
seq.copy() 2 0.129 usec
seq[:] 2 0.124 usec
list(seq) 2 0.228 usec
seq.copy() 4 0.131 usec
seq[:] 4 0.137 usec
list(seq) 4 0.236 usec
seq.copy() 8 0.150 usec
seq[:] 8 0.141 usec
list(seq) 8 0.237 usec
seq.copy() 16 0.189 usec
seq[:] 16 0.178 usec
list(seq) 16 0.271 usec
seq.copy() 32 0.263 usec
seq[:] 32 0.244 usec
list(seq) 32 0.382 usec
seq.copy() 64 0.433 usec
seq[:] 64 0.408 usec
list(seq) 64 0.518 usec
seq.copy() 128 0.781 usec
seq[:] 128 0.768 usec
list(seq) 128 0.835 usec
seq.copy() 256 1.35 usec
seq[:] 256 1.34 usec
list(seq) 256 1.39 usec

So this table agrees with the fact list(seq) is slower, so with this one can deduce that list(seq) is slower for all lists of magnitude under (thereabout) 1024.

But I don't want a rough assumption, so I'll test for all values between 512 and 1024 exclusively. All in steps of 32 because I don't want the table to be humongous.

Operation Amount of Elements Time Taken
seq.copy() 544 2.65 usec
seq[:] 544 2.65 usec
list(seq) 544 2.65 usec
seq.copy() 576 2.79 usec
seq[:] 576 2.80 usec
list(seq) 576 2.77 usec
seq.copy() 608 2.94 usec
seq[:] 608 3.00 usec
list(seq) 608 2.92 usec
seq.copy() 640 3.12 usec
seq[:] 640 3.11 usec
list(seq) 640 3.05 usec
seq.copy() 672 3.24 usec
seq[:] 672 3.27 usec
list(seq) 672 3.21 usec
seq.copy() 704 3.42 usec
seq[:] 704 3.38 usec
list(seq) 704 3.36 usec

Now, I cut the list early because (a) I'm not going to wait 20 minutes for the rest of it to complete and (b) I've found where the list(seq) speed intersects the other two operations' speeds. As you can see, for the first three records, seq.copy() and seq[:] draw in speed and list(seq) is 0.01 usec faster, then for the next record: seq.copy() is second, seq{:] is last, and list(seq) overtakes first, then finally the third record shows seq.copy() coming second again, seq[:] coming last and list(seq) staying in first for speed.

CONCLUSION:

seq.copy() for all sequences with magnitude below 544, will be second fastest to seq[:] and the slowest method will be list(seq); however, for all sequences with a bigger magnitude, list(seq) will be fastest and both seq[:]and seq.copy() will have a seemingly random disparity.

NOTE: Element size doesn't seem to affect the operations' speed.

should i use json, xml, yaml, csv or create my own format?

This presents many of the same issues that you would deal with when you're creating another form of implementation for something already standardized in many ways. Firstly, you're going to make many, many mistakes in the design and most likely not consider the many external use-cases it could propose, thus making the implementation worthless as a whole because unless you can find other places where the certain design can be integrated without any tweaking -- there's no generality therefore there's no real reusability.

JSON is a versatile, nice-to-read and flexible notation. It has the direct relation with Python thereby that the dict type follows a more open-ended support for objects.

XML is a markup language which is just a tad bit more explicit than something like JSON; but in my personal opinion it looks a bit bloated and monotone with the tags and it's less readable than JSON. Python has a third-party library for parsing XML, but you'll have to download it from PyPI.

YAML is a greatly user-friendly format which isn't really as common as any of the other two formats, but it's still comparatively different.

CSV is just a BTEC way of doing any of the above in a lazy fashion by just separating values (that hopefully don't have commas already) with a comma delimiter. Although as this format is very simple, it is as well extensible to different delimiters like perhaps \x00 or \xFF for user data.

For your own format, as I've said in the first part of my explanation; you have to make sure that your format can be applied anywhere without any possible incompatibilities, also you should never tie a metaphorical knot on it; if you leave it open-ended it'll be able to be put under one's completely different interpretation thus customizable and plugged into a different environment. You can see this in place for the four formats listed above as JSON, it's simple and maps x: to y where x is typically hashable. In XML, it's bare-bone but with the implementation of the human - it allows for creating theoretical contexts with a really simple system. But I can't deny it's not much more extensible than JSON. YAML and CSV both share the above traits interlinking JSON and XML, thus it should show a common pattern between these widely used notations. And should show how notations are generalized.

The only different thing about all of these ways of displaying information, is the way that it's written. Some are nicer to look at, some are nicer to write and some are nicer to parse.

len(list) == x

A faster alternative is to do list[x-1:], because of how Python handles slices, this won't cause an error like list[x-1] would (the -1 is for zero-indexing):

 unazed@unazed  ~  python3.6 -m timeit -n 20000000 --setup "a = [1, 2, 3]" -c "a[3:]"                                             
20000000 loops, best of 3: 0.0992 usec per loop
 unazed@unazed  ~  python3.6 -m timeit -n 20000000 --setup "a = [1, 2, 3]" -c "len(a) == 4"
20000000 loops, best of 3: 0.105 usec per loop

another way of getting the last element in a list

So, there's two [three] main ways of getting the last element in a list:

  • list[-1]
  • for when list = [1, 2, 3], list[2] (slower than list[-1]?)
  • list(reversed(list))[0] (/s)

But, believe it or not, there's a more case-specific and yet faster way of getting the last item without having to call any underlying/hidden nor explicit functions.

a = [...]

for i in a:
  ...
last_item = i

Logical enough. Python doesn't have block scopes, so this'd work as intended.

should i use bare sockets or should i create my own interface to bare sockets?

NOTE: Although this may seem oddly specific, this applies to the general concept of creating wrappers for lower level libraries.

Using sockets is cool, but so is using sockets and making your own interface. Repetitive code isn't very nice, but the overhead with the function calls isn't efficient. Shorter, concise and readable code is beautiful - but having ability to maximize your code's utility is always nice.

Speaking perfomance-wise, an OOP interface will always be slower than the raw, unabstracted interface to the library. When you're creating an interface the only things that you're preventing are semantical mistakes, but all you're really doing is squeezing a bunch of liquids into a container. Sure, they'll stay consistent wherever you pour them; no mistake about that, but any mistakes in the actual 'squeezing' bit, perhaps missing some functionality or failing to generalize it; then you'll just mix one container of incompatible liquid with some reactive surface which'll create a huge mess and practically create the inverse effect that you want to actually achieve.

Bare sockets are faster and more customizable; but their prologues are repeated so many times and equally so their epilogues.

import socket
import sys


sockfd = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
sockfd.settimeout(2)
try:
  sockfd.connect((host, port))
except socket.timeout:
  print("couldn't connect to %s:%s" % (host, port))
  sys.exit(1)
...
sockfd.close()

6 references to socket, one import from sys, 4 references to sockfd and the overhead of try: ... except (,): ....

# newsocket.py

import socket


class Socket(object):
  def __init__(self, host, port, connect=True, timeout=2):
    self.host = host
    self.ip = ip
    self.timeout = timeout
    self._socket = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
    if connect:
      self.connect()
      
  def connect(self):
    self._socket.settimeout(self.timeout)
    try:
      self._socket.connect((self.host, int(self.ip)))
    except (socket.timeout, ValueError):
      return False
    return True
  
  def close(self):
    self._socket.close()
    

# main.py

from newsocket import Socket

sockfd = Socket(host, port, False)

if sockfd.connect():
  print("Connected")
else:
  print("couldn't connect to %s:%s" % (host, port))
  
sockfd.close()

2 references to Socket, 3 references to sockfd and not a single sys.exit. A drastic improvement from the linear paradigm shown in the other example. But, if you have any sense you'd notice that, well clearly there's waaay more references to Socket and there is still a try: ... except (,): ... embedded within, just not seen in main.py. Well that's the point of an interface, and that displays the only real features that it provides, an interface will never be faster, besides fixing any possible mistakes that could've been made in the task of rewriting code.

list conversion. best methods?

So there's a few ways of converting non-list types into lists either for mutability - or, space-consumption:

  • [*tuple]
  • [i for i in tuple]
  • list(tuple)

Here are results for all of them with differing list magnitudes:

Unsurprisingly, [i for i in tup] was the slowest way as it created a redundant variable i, and the two other methods seem to carry on with the same gradient.

Disassembly for [i for i in tup]

  1           0 LOAD_CONST               0 (<code object <listcomp> at 0x7f7f4488b660, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<listcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_NAME                0 (tup)
              8 GET_ITER
             10 CALL_FUNCTION            1
             12 RETURN_VALUE

Disassembly of <code object <listcomp> at 0x7f7f4488b660, file "<dis>", line 1>:
  1           0 BUILD_LIST               0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                 8 (to 14)
              6 STORE_FAST               1 (i)
              8 LOAD_FAST                1 (i)
             10 LIST_APPEND              2
             12 JUMP_ABSOLUTE            4
        >>   14 RETURN_VALUE

Disassembly for [*tup]

  1           0 LOAD_FAST                0 (tup)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

Disassembly for list(tup)

  1           0 LOAD_GLOBAL              0 (list)
              2 LOAD_FAST                1 (tup)
              4 CALL_FUNCTION            1
              6 RETURN_VALUE

The overhead for the list comprehension is self-explanatory.

Theoretically, [*tup] would have been the fastest because there isn't any explicit function call nor is there a global which has to be loaded. It is also the nicest looking which is a benefit.

sometimes generators are worse to use

Recently I had argued that ''.join(f(x) for x in seq) is not only prettier, but faster than ''.join([f(x) for x in seq]). My reason being that the generator (f(x) for x in seq) would generate as each value is required, rather than fully evaluating the comprehension and creating a, possibly, huge data-set and passing it to the str.join; and so the inefficiency being that all the overhead on the CPU and RAM happens in one go, however it's apparent that the next(gen) overhead that occurs during iteration of a generator is much more inefficient.

smh.

About

a handful of optimizations, tricks and cool things i've learned while doing python