ToTable
Contents
About
See the sphinx version of totable help.
Pythonic access to tokyo cabinet table database api. (NOTE: The original cython code was from pykesto.) The aims is to provide a simple syntax to load and query data in a table. Most of the work is handled by the Col query interface. e.g.
>>> from totable import ToTable, Col >>> tbl = ToTable('t.tct', 'w') >>> result = tbl.select(Col('age') > 18, Col('name').startswith('T'))
to allow querying columns with numbers and letters transparently. Even though tokyo cabinet stores all values as strings. And more syntatic sugar below.
Install
first, install Tokyo-Cabinet source, then, from a the directory containing this file:
# requires cython for now. $ cython src/ctotable.pyx $ python setup.py build_ext -i # test $ PYTHONPATH=. python totable/tests/test_totable.py # install $ sudo python setup.py install
Example Use
Make some fake data. Note it works just like a DBM or dictionary, except that the values themselves are dictionaries.
>>> from totable import ToTable, Col >>> tbl = ToTable('doctest.tct', 'w') >>> fnames = ['fred', 'jane', 'john', 'mark', 'bill', 'ted', 'ann'] >>> lnames = ['smith', 'cox', 'kit', 'ttt', 'zzz', 'ark', 'ddk'] >>> for i in range(len(fnames)): ... tbl[str(i)] = {'fname': fnames[i], 'lname': lnames[i], ... 'age': str((10 + i) * 2)} ... tbl[str(i + len(fnames))] = {'fname': fnames[i], ... 'lname': lnames[len(lnames) - i - 1], ... 'age': str((30 + i) * 2)} >>> len(tbl) 14
Col
Col, as sent to the select method makes it easy to do queries on a database the format is Col(colname) == 'Fred' where colname is one of the keys in the dictionary items in the database. or can use kwargs to select()
>>> tbl.select(lname='cox') [('1', {'lname': 'cox', 'age': '22', 'fname': 'jane'}), ('12', {'lname': 'cox', 'age': '70', 'fname': 'ted'})]
though using Col gives more power
startswith
>>> results = tbl.select(Col('fname').startswith('j')) >>> [d['fname'] + ' ' + d['lname'] for k, d in results] ['jane cox', 'jane ark', 'john kit', 'john zzz']
endswith
#and combine queries by sending them in together. >>> results = tbl.select(Col('fname').startswith('j'), Col('lname').endswith('k')) >>> [d['fname'] + ' ' + d['lname'] for k, d in results] ['jane ark']
like
this works like an sql query with '%' on either end. (dont attach those values to the query!). so to get everyone with and 'e' in their firstname...
>>> r = tbl.select(Col('fname').like('e')) >>> sorted(set([v['fname'] for k, v in r])) ['fred', 'jane', 'ted']
in_list
return row that exactly match 1 of the values in the list.
>>> r = tbl.select(Col('fname').in_list(['ted', 'fred'])) >>> sorted(set([v['fname'] for k, v in r])) ['fred', 'ted'] >>> r = tbl.select(Col('age').in_list([20, 70])) >>> sorted(set([v['age'] for k, v in r])) ['20', '70']
between
use for number querying between a min and max. includes the endpoints.
>>> r = tbl.select(Col('age').between(68, 70)) >>> [v['age'] for k, v in r] ['68', '70']
numeric queries (richcmp)
in TC, everything is stored as strings, but you can force number based comparisons with ToTable by using (you guessed it) a number. Or using a string for non-numeric comparisons.
>>> results = tbl.select(Col('age') > 68) >>> [d['age'] for k, d in results] ['70', '72']
combining queries
just add multiple Col() arguments to the select() call and they will be essentially and'ed together.
>>> results = tbl.select(Col('age') > 68, Col('age') < 72) >>> [d['age'] for k, d in results] ['70']
Negate(~)
for example get everything that's not a given value...
>>> results = tbl.select(~Col('age') <= 68) >>> [d['age'] for k, d in results] ['70', '72'] #all rows where fname is not 'jane' >>> results = tbl.select(~Col('fname') != 'jane') >>> 'jane' in [d['fname'] for k, d in results] False
Regular Expression Matching
supports normal regular expression characters "[ $ ^ | " , etc.
>>> results = tbl.select(Col('fname').matches("a")) >>> sorted(set([d['fname'] for k, d in results])) ['ann', 'jane', 'mark'] >>> results = tbl.select(Col('fname').matches("^a")) >>> sorted(set([d['fname'] for k, d in results])) ['ann']
Offset/Limit
just like SQL, yo.
>>> results = tbl.select(Col('age') < 68, limit=1) >>> len(results) 1
order
currently only works for string keys. use '-' for descending and '+' for ascending
>>> [v['fname'] for k, v in tbl.select(lname='cox', order='-fname')] ['ted', 'jane'] # ascending >>> [v['fname'] for k, v in tbl.select(lname='cox', order='+fname')] ['jane', 'ted']
values
TC is a key-value store, but it also acts as a table. it may be convenient to get just the values as you'd expect from a database table. Note in all examples above, the 'k'ey is not used, only the value dictionary. This can be made simpler with 'values_only'. When 'values_only' is True, some python call overhead is removed as well.
- ::
>>> tbl.select(Col('fname').matches("^a"), values_only=True) [{'lname': 'ddk', 'age': '32', 'fname': 'ann'}, {'lname': 'smith', 'age': '72', 'fname': 'ann'}]
Schemaless
since it's schemaless, you can add anything
>>> tbl['weird'] = {"val": "hello"} >>> tbl['weird'] {'val': 'hello'}
delete
delete as expected for a dictionary interface.
>>> del tbl['weird'] >>> print tbl.get('weird') None
put
encapsulates put, putkeep and putcat with a mode kwarg that takes 'p' or 'k' or 'c' respectively.
>>> tbl.put('a', {'a': '1'}, mode='p') >>> tbl.put('a', {'a': '2'}, mode='k') 'keep' >>> assert tbl['a'] == {'a': '1'} >>> tbl.put('b', {'a': '3'}, mode='k') 'put' >>> tbl.put('a', {'b': '99'}, 'c') >>> assert tbl['a'] == {'a': '1', 'b': '99'}
Performance Tuning
Tokyo Cabinet allows you to tune or optimize a table. the available parameters are:
- bnum specifies the number of elements of the bucket array. Suggested size of 'bnum' is about from 0.5 to 4 times of the number of all records to be stored. default is about 132K.
- apow specifies the size of record alignment by power of 2. The default value is 4 standing for 2^4=16.
- fpow specifies the maximum number of elements of the free block pool by power of 2. The default value is 10 standing for 2^10=1024.
- opts specifies options by bitwise-or (|):
- 'TDBTLARGE' must be specified to use a database larger than 2GB. (you must also specify a config flag when compiling the TC library to enable this)
- 'TDBTDEFLATE' use Deflate encoding.
- 'TDBTBZIP' use BZIP2 encoding.
- 'TDBTTCBS' use TCBS encoding.
The other parameters: cache and mmap_size are explained below.
tune
The arguments can be sent to the constructor.
>>> import totable >>> t = ToTable("some.tct", 'w', bnum=1234, fpow=6, \ ... opts=totable.TDBTLARGE | totable.TDBTBZIP) >>> t.close()
optimize
optimize is called on an database opened with mode='w'. if no arguments are specified, it will automatically adjust 'bnum' (only) according to the number of elements in the table.
>>> t = ToTable("some.tct", 'w') # ... add some records ... >>> t.optimize() True
mmap_size
mmap_size is the size of mapped memory. default is 67,108,864 (64MB) set in the constructor. this is xmsiz in TC parlance.
>>> t.close() >>> t = ToTable("some.tct", 'w', mmap_size=128 * 1e6) # ~128MB.
cache
TC also allows setting various caching parameters. * rcnum is the max number of records to be cached. default is 0 * lcnum is the max number of leaf-nodes to be cached. default is 4096 * ncnum is the max number of non-leaf nodes cached. default is 512 these also must be set in the constructor.
>>> t.close() >>> t = ToTable("some.tct", 'w', rcnum=1e7, lcnum=32768)
index
create or delete a 's'tring or 'd'ecimal index on a column for faster queries.
# create a decimal index on the number column 'age'. >>> tbl.create_index('age', 'd') True # create a 'string index on the string column 'fname'. >>> tbl.create_index('fname', 's') True # remove the index. >>> tbl.delete_index('fname') True # optimize the index >>> tbl.optimize_index('age') True
clear
remove all records from the db.
>>> len(tbl) 16 >>> tbl.clear() >>> len(tbl) 0
transaction
do stuff in a transaction. a rollback() is performed on any exceptions.
>>> try: ... with transaction(tbl): ... tbl['zzz'] = {'a': '4'} ... 1/0 ... except: pass >>> 'zzz' in tbl False
See Also
- tc nice c-python bindings for all of the tokyo cabinet db types including the table
- pykesto the project from which this library is taken. aims to provide transactions on top of tokyo cabinet .
- to help out, see TODO list at top of ctcable.pyx
- tokyo cabinet database api http://1978th.net/tokyocabinet/spex-en.html#tctdbapi