jefferis / razip

Efficient Random Access to R Objects Stored in Large Zip Files

Home Page:https://jefferis.github.io/razip

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

razip

natverse Docs Travis build status R-CMD-check

The goal of razip is to provide efficient random access to the contents of large zip files by cacheing zip files offsets in memory. Contents can then be read directly to memory, optionally unserialising. The main intended use case is the storage of collections of tens of thousands of serialised R objects (e.g. nat neurnlists neurons) into single zip files that may be GB in size while still allowing efficient (order 5ms) read access times.

Installation

You can install the development version of razip from github

remotes::install_github("jefferis/razip")

you also need to ensure that you have the latest version of the zip package installed

remotes::install_github("r-lib/zip")

Example

This is a basic example which shows you how to solve a common problem:

library(razip)
# written as 
# write.neurons(nl, "~/Desktop/flywire_neurons_flow_FlyWireqs.zip", format='qs')
raz=RAZip$new("~/Desktop/flywire_neurons_flow_FlyWireqs.zip")
raz
zl=raz$ziplist()
bench::mark(s1=raz$get(sample(zl$filename, 1)), check = F)
bench::mark(s5=raz$mget(sample(zl$filename, 5)), check = F)

About

Efficient Random Access to R Objects Stored in Large Zip Files

https://jefferis.github.io/razip

License:GNU General Public License v3.0


Languages

Language:R 100.0%