Rinocloud python is an easy way to organise scientific datasets in python. The library is open source (MIT License), so you are free to modify it to suit your needs.
Typically when doing simulations/experiments/modelling you have a bunch of parameters, which we call metadata, and an actual dataset, which is big list of numbers. This library aims to make taking care of the metadata and dataset easy.
So far the library has only been tested/developed on Python2.7, we will add proper Python3 support soon.
pip install rinocloud
import rinocloud as rino
rino.api_key = '<Your API Token Here>'
With numpy
import numpy as np
import rinocloud as rino
rino.api_key = "<your api key>"
rino.set_local_path('data/', create_dir=True)
r = rino.Object()
r.set_name("file.txt")
r.slope = 2
r.constant = 3
# we will write x and y to a file and upload it, so we dont want it being uploaded as metadata
# so we mark it with an underscore
r._x = np.linspace(0, 10, 11)
r._y = r._x * r.slope + r.constant
np.savetxt(r.filepath, np.vstack((r._x, r._y)).T)
r.save_local_metadata()
r.upload()
Pure python, without numpy
import rinocloud as rino
rino.api_key = "<your api key>"
rino.set_local_path('data/', create_dir=True)
r = rino.Object()
r.set_name("file.txt")
r.slope = 2
r.constant = 3
r._x = range(10)
# y = slope * x + constant
r._y = [i * r.slope + r.constant for i in r._x]
with open(r.filepath, 'w') as outfile:
for (i, j) in zip(r._x, r._y):
outfile.write("%1.2lf, %1.2lf\n" % (i, j))
r.save_local_metadata()
r.upload()
q = rinocloud.Query()
# searches data where x is "A22" and y is greater than 2.2
q.filter(x="A22", y__gt=2.2)
list_of_objects = q.query()
c = rinocloud.Collection()
c.set_name("folder", create_dir=True)
o1 = rinocloud.Object()
o1.set_name("a.txt")
o2 = rinocloud.Object()
o2.set_name("b.txt")
c.add([o1, o2])
# collections can have queryable metadata too
c.x = 3
c.sample_id = "A3303"
with open(o1.filepath, 'w') as out:
out.write('1\n2\n3')
with open(o2.filepath, 'w') as out:
out.write('1\n2\n3')
c.upload()
There are a lot of query types have a look at the documentation for more
The Rinocloud library works uses objects to store the metadata associated with a file. To create an object called 'obj', enter:
obj = rinocloud.Object()
You can set the path that rinocloud will look in, and save files too with
rinocloud.set_local_path('data/'[, create_dir=False])
If create_dir=True
then it will create the directory if it doesn't exist, create_dir
defaults to False.
You can set the name of the object with
obj.set_name("file.txt")
print obj.filepath
>>> file.txt
rinocloud.set_local_path('data/', create_dir=True)
obj.set_name("file.txt")
print obj.filepath
>>> data/file.txt
You can input metadata as keyword arguments on creation:
obj = rino.Object(key1=value1, key2=value2)
or to set the values after creation:
obj.key1 = value1
obj.key2 = value2
You can use the obj.filepath
to save whatever you want locally, and obj.save_local_metadata()
to save the metadata to a json file
np.savetxt(obj.filepath, [1,2,3,4,5])
obj.save_local_metadata()
If we were saving into a folder called data/
(set by using rinocloud.set_local_path('data/')) we would then have a folder structure of
data/
file.txt # raw data
file.txt.json # holds metadata
obj = rino.Object()
obj.set_name("file.txt", overwrite=True)
obj.import_local_metadata()
We need to pass overwrite=True
since rinocloud will refuse to edit existing files otherwise.
To upload an object to a specific folder, you must set the '_parent' variable to be the object ID of the folder. So
obj._parent = 10
Would save the object to a folder with id 10 inside Rinocloud, if you chose to upload the data to Rinocloud.
You can upload the file and associated metadata to Rinocloud, where you can later share it with your team, and discuss. To do this you simply call the upload method:
obj.upload()
>>> [==================== ] 456/500 -- 00:12:32
This also updates the object metadata to include the data returned from Rinocloud, such as the creation time and object ID.
Metadata can be added to an object after it has been uploaded. e.g.
obj.upload()
obj.new_key = 'value4'
obj.update()
It will be updated on Rinocloud.
Provided that tho object ID is specified, you can download the file from Rinocloud. This is done using the download method:
obj.download()
The downloaded file can be renamed by passing a new file name to the download method:
r = rino.Object(id=3397)
r.get([truncate_metadata=True])
r.download()
>>> [==================== ] 544/785 -- 00:32:45
You can add a bunch of objects to a collection, and upload them all at the same time, in the Rinocloud Web UI - the collection will be turned into a folder with child files
c = rinocloud.Collection()
c.set_name("folder", create_dir=True)
o1 = rinocloud.Object()
o1.set_name("a.txt")
o2 = rinocloud.Object()
o2.set_name("b.txt")
c.add([o1, o2])
c.upload()
Collections have fully queryable metadata
c = rinocloud.Collection()
c.set_name("folder")
c.x = 4
c.sample_id = "A3303"
c.upload()
for obj in collection:
print obj
When getting an object, if the metadata is really large, and you dont need to download it - use truncate_metadata
and it will truncate the metadata into a small string if its over 300kB.
It will download as whatever the filename is in Rinocloud, or it will increment the filename if a local file already exists.
Rinocloud also contains tools for querying. You can query any and multiple metadata fields of all objects saved to Rinocloud.
To make a query, you must create a query object:
qobj = rinocloud.Query()
trigger the query by adding filters and calling .query()
list_of_objects = rinocloud.Query().filter(x=3).query()
Just add a call to .sort('value to sort by')
to sort the resulting query.
For example
list_of_objects = rinocloud.Query().filter(x=3).sort("x").query()
To sort for ascending x
, or
list_of_objects = rinocloud.Query().filter(x=3).sort("-x").query()
To sort for descending x
.
Sorting also works for all alpha numeric fields in the object, or the objects metadata (but only for top level metadata fields).
If you sort by some value, only data where that value exists will be returned.
You can add filters to the query object. Rinocloud allows you to filter using the following operators:
'eq' - equal to,
'neq' - not equal to,
'lt' - less than,
'lte' - less than or equal to,
'gt' - greater than,
'gte' - greater than or equal to,
'ne' - not equal to,
'in' - in array,
'nin' - not in array,
'exists' - whether the metadata field exists,
'string_contains' - checks if the name or notes field contains a certain substring.
'or' - allows multiple possibilities to be specified
Filters are specified by adding a double underscore followed by the operator to the metadata key. For example, to show only objects with 'key1' having a value greater than 21, you would enter:
rinocloud.Query().filter(key1__gt=21)
It is possible to add multiple filters, so to see only results where 'key2' is not a number in the array [1,2,3] and where a variable called 'key3' exists, as well as using the filter that has already been applied, you would enter:
rinocloud.Query().filter(key2__nin=[1,2,3], key3__exists=True)
To use the 'or' filter, add '__or' to the end of the key. For example:
rinocloud.Query().filter(key1__or=6, key2__or=True)
Would return results where 'key1' was '6' or 'key2' was 'True'.
You can also access sub-objects. Suppose the metadata of some file looked like this:
{
"x": {
"y": 3
}
}
you could search for this by using
rinocloud.Query().filter(x__y__eq=3)
Once the filters are specified, the query is made by entering:
rinocloud.Query().filter(<your filter args>).query([truncate_metadata=True, limit=20, offset=0])
- If
truncate_metadata
is True then objects with more than 300kB of metadata information will have the metadata information. limit
is a maximum on how many objects you want returned.offset
is from which index of the results limit will start. Can be used for pagination of results.
More filters can be added and the query can be made again if needed. Making a query returns a list of Rinocloud objects.
You count to see how many objects match a query.
rinocloud.Query().filter(<your filter args>).count()
Batch operations are designed to streamline working with multiple objects. You can perform the download, upload, add, get and update methods on a list of objects using:
rino.batch.download(<list of objects>)
rino.batch.upload(<list of objects>)
rino.batch.get(<list of objects>)
rino.batch.update(<list of objects>)
Clone the repo and type:
python setup.py develop
This will install the package into a directory for development.
We have basic tests working now for python2.7 and python3.5.
To run the tests in either interpreter just use
python setup.py test
or, if python 3 is named python3
python3 setup.py test
If you have both 2.7 and 3.5 installed, you can use tox (install with pip install tox
)
Then just type inside this repo.
tox