How to get all czi metadata ?
tryphoncosinus opened this issue · comments
Hello,
Thank you for your efficient program.
I need many other metadata of a czi file. Is it possible to grab, at least an xml string, all the metadata using MEXlibCZI ? I know how to deal with an xml string using the function xmlstring2struct() (see following code) or using DOM tools.
I get them from the following solution. That works but it is really slow. Your GetInfo option is really faster than that. It could be better to have all metadata from your MEXlibCZI program.
Thank you.
function fileinfo = czifinfo( filename, varargin )
% CZIFINFO returns information of Zeiss CZI file pushing all metadata to fileinfo structure output
% Copyright Chao-Yuan Yeh, 2016
% Lighted by TryphonCosinus, 2022
import matlab.io.xml.dom.*
fID = fopen(filename);
count = 0;
flag = 1;
while flag
segHeader = readSegHeader(fID);
if segHeader.allocSize
if strfind(segHeader.SID, 'ZISRAWSUBBLOCK')
elseif strfind(segHeader.SID, 'ZISRAWFILE')
cinfo.fileHeader = readFILEHeader(fID);
elseif strfind(segHeader.SID, 'ZISRAWATTACH')
count = count + 1;
end
flag = fseek(fID, segHeader.currPos + segHeader.allocSize, 'bof') + 1;
else
flag = 0;
end
end
fseek(fID, 92, 'bof');
fseek(fID, cinfo.fileHeader.mDataPos, 'bof');
fseek(fID, cinfo.fileHeader.mDataPos + 32, 'bof');
XmlSize = uint32(fread(fID, 1, '*uint32'));
fseek(fID, cinfo.fileHeader.mDataPos + 288, 'bof');
cinfo.metadataXML = fread(fID, XmlSize, '*char')';
fclose(fID);
doc = parseString(Parser,cinfo.metadataXML);
fileinfo = xmlstring2struct(doc);
end
function segHeader = readSegHeader(fID)
segHeader.SID = fread(fID, 16, '*char')';
segHeader.allocSize = fread(fID, 1, '*uint64');
fseek(fID, 8, 'cof');
segHeader.currPos = ftell(fID);
end
function fileHeader = readFILEHeader(fID)
fileHeader.major = fread(fID, 1, '*uint32');
fileHeader.minor = fread(fID, 1, '*uint32');
fseek(fID, 8, 'cof');
fileHeader.primaryFileGuid = fread(fID, 2, '*uint64');
fileHeader.fileGuid = fread(fID, 2, '*uint64');
fileHeader.filePart = fread(fID, 1, '*uint32');
fileHeader.dirPos = fread(fID, 1, '*uint64');
fileHeader.mDataPos = fread(fID, 1, '*uint64');
fseek(fID, 4, 'cof');
fileHeader.attDirPos = fread(fID, 1, '*uint64');
end
% xmlstring2struct, convert an XML string into a MATLAB structure, 1: document object model (DOM)
function theStruct = xmlstring2struct(tree)
% Copyright 2003-2007 The MathWorks, Inc.
% Recurse over child nodes
% This could run into problems with very deeply nested trees...
try
theStruct = parseChildNodes(tree);
catch
error(message('bioinfo:xml2struct:XMLParseError', filename));
end
end
function nodeStruct = makeStructFromNode(theNode)
nodeStruct = struct('Name',char(theNode.getNodeName),...
'Attributes',parseAttributes(theNode),'Data','',...
'Children',parseChildNodes(theNode));
if any(strcmp(methods(theNode),'getData'))
nodeStruct.Data = char(theNode.getData);
else
nodeStruct.Data = '';
end
end
function attributes = parseAttributes(theNode)
% Create attributes struct
attributes = [];
if theNode.hasAttributes
theAttributes = theNode.getAttributes;
numAttributes = theAttributes.getLength;
allocCell = cell(1,numAttributes);
attributes = struct('Name',allocCell,'Value',allocCell);
for count = 1:numAttributes
attrib = theAttributes.item(count-1);
attributes(count).Name = char(attrib.getName);
attributes(count).Value = char(attrib.getValue);
end
end
end
function children = parseChildNodes(theNode)
% Recurse over node children
children = [];
if theNode.hasChildNodes
childNodes = theNode.getChildNodes;
numChildNodes = childNodes.getLength;
allocCell = cell(1,numChildNodes);
children = struct('Name',allocCell,'Attributes',allocCell,...
'Data',allocCell,'Children',allocCell);
for count = 1:numChildNodes
theChild = childNodes.item(count-1);
children(count) = makeStructFromNode(theChild);
end
end
end
% End CZI file info process
Does this work for your case or is helpful?
Hello,
Yes that was helping me.
However I am stuck with this error trying to read the xml string :
SAXParseException Invalid byte 1 of 1-byte UTF-8 sequence
I noticed that the xml string begins with <?xml version="1.0"?>
without specifying the Encoding.
I plan to use my program using MEXlibCZI under Linux, Windows or MacOS.
I am trying to figure out the problem, reason why I am late to conclude here.
Thank you.
I found two differences between the xml strings generated by MEXlibCZI and the code I presented (cinfo.metadataXML content):
special 'µ' and '²' characters from xmlstr=MEXlibCZI('GetMetaData',h) are coded as 'µ' and '²' in cinfo.metadataXML. It could be generalized to other special character I did not saw yet. To solve the issue, I used these inelegant statements under MATLAB (2021b) :
xmlstr = replace(xmlstr,'µ',strcat(char(194),'µ'));
xmlstr = replace(xmlstr,'²',strcat(char(194),'²'));
Now the following code works as it should :
inputObject = java.io.StringBufferInputStream(xmlstr);
sxml = xmlread(inputObject);
Hope this helps. Thank you again.
Hmm, yes, you are right, it seems somehow the encoding is messed up here. I am sorry, I have no idea what's wrong here. It seems that the string has the "correct" content within Matlab.
What works for me is this
h=MEXlibCZI('Open','G:\Data\TestData\Example_TMA1_Zeb1_SPRR2_Ck19_S100-1-1-1-1.czi');
xmlstr=MEXlibCZI('GetMetadataXml',h);
sxml = xmlread(org.xml.sax.InputSource(java.io.StringReader(xmlstr)));
where I got the idea from here.
If you have some insight what's wrong here (and whether MEXlibCZI is doing something wrong here), I'd be glad to know.
I tried your solution successfully and I can get rid of the 'replace' statements.
Solved. Thank you for this help.