This is a testing structure for doing performance experiments with various I/O packages from within an multi-threaded program. The program mimics behaviors common for HEP data processing frameworks.
The CMakeLists.txt uses 3rd party packages from /cvmfs which are identical to the packages used by the CMS experiment's CMSSW_11_2_0_pre6 software release. No CMS code is needed to build the code from this repository. CMS header files and shared libraries are needed if data files containing CMS data are being read.
To build
$ git clone <root_serialization URL>
$ mkdir build-root-serialization
$ cd build-root-serialization
$ cmake <path to root_serialization> \
-DROOT_DIR=path_to_ROOT_cmake_targets \
[-DTBB_DIR=path_to_tbb_cmake_targets] \
[-DZSTD_DIR=path_to_zstd_cmake_targets] \
[-DLZ4_DIR=path_to_lz4_cmake_targets]
$ make [-j N]
This will create the executable threaded_io_test
.
If no cmake target exists for your installation of lz4, you can replace the cmake command above with
$ cmake <path to root_serialization> \
-DCMAKE_PREFIX_PATH=path_to_lz4_install \
-DROOT_DIR=path_to_ROOT_cmake_targets \
[-DTBB_DIR=path_to_tbb_cmake_targets] \
[-DZSTD_DIR=path_to_zstd_cmake_targets] \
[-DLZ4_DIR=path_to_lz4_install]
$ make [-j N]
where path_to_lz4_install
is the path to the folder holding the include
and lib
directories containing the files for lz4.
Note that CMake
can use some of the information from ROOT's CMake configuration to
infer the build information for TBB, zstd, and lz4. If the ROOT
runtime is setup, one can use -DROOT_DIR=$ROOTSYS/cmake
.
The testing structure has 2 customizable component types
Source
s: These supply the event data products used for testing.Outputer
s: These read the event data products. Some Outputers also write that data out for storage.
In order to mimic the time taken to process event data, the testing structure has Waiter
s. A Waiter
read one event data product and based on a property of that data
product calls sleep for a length proportional to that property.
The testing structure can process multiple events concurrently. Each concurrent event has its own set of Waiter
s, one for each data product. The processing of
an event is handled by a Lane
. Concurrent events are then done by having multiple Lane
s. The Waiter
s within a Lane
are considered to be completely
independent and can be run concurrently.
All Lane
s share the same Outputer
. Therefore an Outputer
is required to be thread safe.
For the moment, each Lane
also has its own copy of the Source
. That is likely to be changed in the future to better mimic the behavior of actual HEP processing
framemworks.
The way data is processed is as follows:
- When a
Lane
is no longer processing an event it requests a new one from the system. The system advances an atomic counter and tells theLane
to use the event associated with that index. - The
Lane
then passes the event index to theSource
and asks it to asynchronously retrieve the event data products. - Once the
Source
has retrieved the data products, it signals to theWaiter
s to run asynchronously. - Once each
Waiter
has finished, the system signals to theOutputer
that the particular event data product is available for theOutputer
. TheOutputer
can then asynchronously process that data product. - Once the
Outputer
has finished with all the data products, the system signals theOutputer
that the event has finished. TheOutputer
can then do additional asynchronous work. - When the
Outputer
finishes its end of event work, theLane
is considered to be done with that event and the cycle repeats.
The threaded_io_test
takes up to 5 command line arguments
threaded_io_test <Source configuration> [# threads[/useIMT]] [# conconcurrent events] [time scale factor] [max # events] [<Outputer configuration>]
<Source configuration>
: whichSource
to use and any additional information needed to configure it. Options are described below.[# threads[/useIMT]]
: number of threads to use in the job. Optionally can add '/useIMT' directly after the number of threads to turn on ROOT's implicit multithreaded (IMT).[# concurrent events]
: number of events (that isLane
s) to use. Best if number of events is less than or equal to number of threads.[time scale factor]
: used to convert the property of the event data products into microseconds used for the sleep call. A value of 0 means no sleeping. A value less than 0 prohibits the creation of the objects which do the sleep.[<Outputer configuration>]
: optional, used to specify whichOutputer
to use and any additional information needed to configure it. The exact options are described below.
Does not generate any event data products. Specify by just using its name, e.g.
> threaded_io_test EmptySource 1 1 0 10
Generates a fixed set of data products for C++ types known by ROOT. Specify by just using its name, e.g.
> threaded_io_test TestProductsSource 1 1 0 10
Reads a standard ROOT file. Each concurrent Event has its own replica of the Source to avoid the need for cross Event synchronization. In addition to its name, one needs to give the file to read, e.g.
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10
Reads a standard ROOT file. All concurrent Events share the same Source. Access to the Source is serialized for thread-safety. In addition to its name, one needs to give the file to read, e.g.
> threaded_io_test SerialRootSource=test.root 1 1 0 1000
Reads the first N events from a standard ROOT file at construction time. The deserialized data products are held in memory. Going from event to event is just a switch of the memory addresses to be used. In addition to its name, one needs to give the file to read and, optionally, the number of events to read (default is 10) and a singular TBranch to read, e.g.
> threaded_io_test RepeatingRootSource=test.root 1 1 0 1000
or
> threaded_io_test RepeatingRootSource=test.root:repeat=5 1 1 0 1000
or
> threaded_io_test RepeatingRootSource=test.root:repeat=5:branchToRead=ints 1 1 0 1000
Reads a packed data streams format file. Each concurrent Event has its own replica of the Source to avoid the need for cross Event synchronization. In addition to its name, one needs to give the file to read, e.g.
> threaded_io_test ReplicatedPDSSource=test.pds 1 1 0 10
Reads a packed data streams format file. The Source is shared between the concurrent Events. Reads from the file are serialized for thread-safety while decompressing the Event and the object deserialization can proceed concurrently. In addition to its name, one needs to give the file to read, e.g.
> threaded_io_test SharedPDSSource=test.pds 1 1 0 10
Reads a ROOT file which only has 2 TBranches in the Events
TTree. One branch holds the EventIdentifier. The other holds a (possibly pre-compressed) buffer of all the pre-object serialized data products in the event and a vector of offsets into that buffer for the beginning of each data products serialization. The Source is shared between the concurrent Events. Reads from the file are serialized for thread-safety and decompressing the Event happens at that time as well. The object deserialization can proceed concurrently. In addition to its name, one needs to give the file to read, e.g.
> threaded_io_test SharedRootEventSource=test.eroot 1 1 0 10
This is similar to SharedRootEventSource except this time each entry in the Events
TTree is actually for a batch of Events. The Events
TTree again only holds 2 TBranches. One branch holds a std::vector<EventIdentifier>
. The other holds a (possibly pre-compressed) buffer of all the pre-object serialized data products for all the events in the batch and a vector of offsets into that buffer for the beginning of each data products serialization. The Source is shared between the concurrent Events. Reads from the file are serialized for thread-safety and decompressing the Event happens at that time as well. The object deserialization can proceed concurrently. In addition to its name, one needs to give the file to read, e.g.
> threaded_io_test SharedRootBatchEventsSource=test.eroot 1 1 0 10
Does no work. If no outputer is given, this is the one used. Specify by just using its name and optionally the option label 'useProductReady'
> threaded_io_test EmptySource 1 1 0 10 DummyOutputer
or
> threaded_io_test EmptySource 1 1 0 10 DummyOutputer=useProductReady
Dumps the name and sizes for each data product. Specify by its name and the following optional parameters:
- perEvent: print names and sizes for each event. On by default. Allowed values are
perEvent=t
,perEvent=f
andperEvent
which is same asperEvent=t
. - summary: at end of job, print names and average size per event. Off by default. Allowed values are
summary=t
,summary=f
, andsummary
which is same assummary=t
.
> threaded_io_test EmptySource 1 1 0 10 TextDumpOutputer
or
> threaded_io_test EmptySource 1 1 0 10 TextDumpOutputer=perEvent=f:summary=t
Uses ROOT to serialize the event data products but does not store them. It prints timing statistics about the serialization. Specify by just using its name and an optional 'verbose' parameter
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 SerializeOutputer
or
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 SerializeOutputer=verbose
Checks that the data products match what is expected from TestProductsSource or files containing those same data products. If the results are unexpected, the program will abort. Specify by just using its name.
> threaded_io_test TestProductsSource 1 1 0 10 TestProductsOutputer
Writes the event data products into a ROOT file. Specify both the name of the Outputer and the file to write as well as many optional parameters:
- splitLevel: split level for the branches, default 99
- compressionLevel: compression level 0-9, default 9
- compressionAlgorithm: name of compression algorithm. Allowed valued "", "ZLIB", "LZMA", "LZ4"
- basketSize: default size of all baskets, default size 16384
- treeMaxVirtualSize: Size of ROOT TTree TBasket cache. Use ROOT default if value is <0. Default -1.
- autoFlush: passed value to TTree SetAutoFlush. Use of the default value -1 means no call is made.
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootOutputer=test.root
or
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootOutputer=test.root:splitLevel=1
Writes the event data products into a PDS file. Specify both the name of the Outputer and the file to write as well as compression options:
- compressionLevel: compression level. Allowed value depends on algorithm. For now ZSTD is the only one and allows values
- 0 - 19 (negative values and values 20-22 are possible but not considered good choices by the zstandard authors)
- compressionAlgorithm: name of compression algorithm. Allowed valued "", "None", "ZSTD", "LZ4"
- serializationAlgorithm: name of a serialization algorithm. Allowed values "", "ROOT", "ROOTUnrolled" or "Unrolled". The default is "ROOT" (which is the same as ""). Both unrolled names correspond to the same algorithm.
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 PDSOutputer=test.pds
Writes the event data products into a HDF file. Specify both the name of the Outputer and the file to write as well as the number of events to batch together when writing::
- batchSize: number of events to batch together before writing out to the file. Default is 2.
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 HDFOutputer=test.hdf
or
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 HDFOutputer=test.hdf:batchSize=10
Writes the event data products into a HDF file where all data products for a batch of events are stored in a single dataset where the data products for all the events in the batch have been pre-object serialized into a std::vector<char>
. Specify both the name of the Outputer and the file to write as well as many optional parameters:
- hdfchunkSize: HDF chunk size value to use for dataset. Default is 10485760.
- batchSize: number of events to batch together when storing, default 1
- compressionLevel: compression level. Allowed value depends on algorithm. For now ZSTD is the only one and allows values
- 0 - 19 (negative values and values 20-22 are possible but not considered good choices by the zstandard authors)
- compressionAlgorithm: name of compression algorithm. Allowed values "", "None", "ZSTD", "LZ4"
- compressionChoice: what to compress. Allowed values "None", "Events", "Batch", "Both". Default is "Events".
- serializationAlgorithm: name of a serialization algorithm. Allowed values "", "ROOT", "ROOTUnrolled" or "Unrolled". The default is "ROOT" (which is the same as ""). Both unrolled names correspond to the same algorithm.
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootBatchEventsOutputer=test.root
or
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootBatchEventsOutputer=test.root:batchSize=4
Writes the event data products into a ROOT file where all data products for an event are stored in a single TBranch where the data products have been pre-object serialized into a std::vector<char>
. Specify both the name of the Outputer and the file to write as well as many optional parameters:
- tfileCompressionLevel: compression level to be used by ROOT 0-9, default 0
- tfileCompressionAlgorithm: name of compression algorithm to be used by ROOT. Allowed valued "", "ZLIB", "LZMA", "LZ4"
- treeMaxVirtualSize: Size of ROOT TTree TBasket cache. Use ROOT default if value is <0. Default -1.
- autoFlush: passed value to TTree SetAutoFlush. Use of the default value -1 means no call is made.
- compressionLevel: compression level. Allowed value depends on algorithm. For now ZSTD is the only one and allows values
- 0 - 19 (negative values and values 20-22 are possible but not considered good choices by the zstandard authors)
- compressionAlgorithm: name of compression algorithm. Allowed valued "", "None", "ZSTD", "LZ4"
- serializationAlgorithm: name of a serialization algorithm. Allowed values "", "ROOT", "ROOTUnrolled" or "Unrolled". The default is "ROOT" (which is the same as ""). Both unrolled names correspond to the same algorithm.
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootEventOutputer=test.root
or
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootEventOutputer=test.root:compressionAlgorithm=LZ4
Writes the event data products into a ROOT file where all data products for a batch of events are stored in a single TBranch where the data products for all the events in the batch have been pre-object serialized into a std::vector<char>
. Specify both the name of the Outputer and the file to write as well as many optional parameters:
- batchSize: number of events to batch together when storing, default 1
- tfileCompressionLevel: compression level to be used by ROOT 0-9, default 0
- tfileCompressionAlgorithm: name of compression algorithm to be used by ROOT. Allowed valued "", "ZLIB", "LZMA", "LZ4"
- treeMaxVirtualSize: Size of ROOT TTree TBasket cache. Use ROOT default if value is <0. Default -1.
- autoFlush: passed value to TTree SetAutoFlush. Use of the default value -1 means no call is made.
- compressionLevel: compression level. Allowed value depends on algorithm. For now ZSTD is the only one and allows values
- 0 - 19 (negative values and values 20-22 are possible but not considered good choices by the zstandard authors)
- compressionAlgorithm: name of compression algorithm. Allowed valued "", "None", "ZSTD", "LZ4"
- serializationAlgorithm: name of a serialization algorithm. Allowed values "", "ROOT", "ROOTUnrolled" or "Unrolled". The default is "ROOT" (which is the same as ""). Both unrolled names correspond to the same algorithm.
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootBatchEventsOutputer=test.root
or
> threaded_io_test ReplicatedRootSource=test.root 1 1 0 10 RootBatchEventsOutputer=test.root:batchSize=4
The unroll_test executable is meant to allow testing of the unrolled serialization process and allow comparison of object serialization sizes with respect to ROOT's standard serialization. The executable takes the following command line arguments
unroll_test [-g] [-s] [list of class names]
- -g : turns on ROOT verbose debugging output
- -s : skips running the built in test cases
- [list of class names] : names of C++ classes with ROOT dictionaries. The executable will perform serialization/deserialization on defaultly constructed instances of these classes and report the bytes needed for storage.