Streaming forward-only reading API ~lazy_read for rows / cells?
natalie-o-perret opened this issue Β· comments
Hi πββοΈ
I've noticed that when using lazy_read
, I first need to read_sheet
in order to serialize the whole worksheet (deserialize might be a tad more accurate, because by all technicalities xml => rust data structure, unless I'm not getting the rationale / intent right behind the naming)) before being able to actually read its content.
The issue is that read_sheet
cannot yield cell values as a forward-only reading process, and eventually on big files with say one big / huge worksheet the burden of handling thousands of lines can take a (very) long while
fn main() {
// reader
let start = tic();
let path = std::path::Path::new("C:/Users/natalie-perret/Desktop/file.xlsx");
let mut book = reader::xlsx::lazy_read(path).unwrap();
let sheet_count = book.get_sheet_count();
println!("Sheet Count: {:?}", sheet_count);
let sheet1 = book.read_sheet();
// ...
Are there plans to support a forward-only streaming api?
Akin to calamine:
fn test_calamine_lib(path: &str) {
let mut excel: Xlsx<_> = open_workbook(path).unwrap();
if let Some(Ok(r)) = excel.worksheet_range("Sheet1") {
for row in r.rows() {
println!("row={:?}, row[0]={:?}", row, row[0]);
}
}
}
@natalie-o-perret
Thank you for your suggestion.
We had not planned on the ability to read cell values for lazy_read files, but it is probably technically possible.
We will try to implement it in the next version of the software.
(deserialize might be a tad more accurate, because by all technicalities xml => rust data structure, unless I'm not getting the rationale / intent right behind the naming))
You are right, this is deserialization.
I didn't pay attention to it until now. I will secretly fix it.