ptahmose / MEXlibCZI

read CZI-documents from MATLAB

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How to get all czi metadata ?

tryphoncosinus opened this issue · comments

Hello,

Thank you for your efficient program.

I need many other metadata of a czi file. Is it possible to grab, at least an xml string, all the metadata using MEXlibCZI ? I know how to deal with an xml string using the function xmlstring2struct() (see following code) or using DOM tools.

I get them from the following solution. That works but it is really slow. Your GetInfo option is really faster than that. It could be better to have all metadata from your MEXlibCZI program.

Thank you.

function  fileinfo  = czifinfo( filename, varargin )
% CZIFINFO returns information of Zeiss CZI file pushing all metadata to fileinfo structure output

%   Copyright Chao-Yuan Yeh, 2016
%   Lighted by TryphonCosinus, 2022

import matlab.io.xml.dom.*
fID = fopen(filename);
count = 0;
flag = 1;
while flag
    segHeader = readSegHeader(fID);
    if segHeader.allocSize
        if strfind(segHeader.SID, 'ZISRAWSUBBLOCK')
        elseif strfind(segHeader.SID, 'ZISRAWFILE')
            cinfo.fileHeader = readFILEHeader(fID);
        elseif strfind(segHeader.SID, 'ZISRAWATTACH')
            count = count + 1;
        end
        flag = fseek(fID, segHeader.currPos + segHeader.allocSize, 'bof') + 1;
    else
        flag = 0;
    end
end

fseek(fID, 92, 'bof');
fseek(fID, cinfo.fileHeader.mDataPos, 'bof');
fseek(fID, cinfo.fileHeader.mDataPos + 32, 'bof');
XmlSize = uint32(fread(fID, 1, '*uint32'));
fseek(fID, cinfo.fileHeader.mDataPos + 288, 'bof');
cinfo.metadataXML = fread(fID, XmlSize, '*char')';
fclose(fID);
doc = parseString(Parser,cinfo.metadataXML);
fileinfo = xmlstring2struct(doc);

end
function segHeader = readSegHeader(fID)
segHeader.SID = fread(fID, 16, '*char')';
segHeader.allocSize = fread(fID, 1, '*uint64');
fseek(fID, 8, 'cof'); 
segHeader.currPos = ftell(fID);
end

function fileHeader = readFILEHeader(fID)
fileHeader.major = fread(fID, 1, '*uint32');
fileHeader.minor = fread(fID, 1, '*uint32');
fseek(fID, 8, 'cof');
fileHeader.primaryFileGuid = fread(fID, 2, '*uint64');
fileHeader.fileGuid = fread(fID, 2, '*uint64');
fileHeader.filePart = fread(fID, 1, '*uint32');
fileHeader.dirPos = fread(fID, 1, '*uint64');
fileHeader.mDataPos = fread(fID, 1, '*uint64');
fseek(fID, 4, 'cof');
fileHeader.attDirPos  = fread(fID, 1, '*uint64');
end

% xmlstring2struct, convert an XML string into a MATLAB structure, 1: document object model (DOM)
function theStruct = xmlstring2struct(tree)
%   Copyright 2003-2007 The MathWorks, Inc.
% Recurse over child nodes
% This could run into problems with very deeply nested trees...
try
    theStruct = parseChildNodes(tree);
catch
    error(message('bioinfo:xml2struct:XMLParseError', filename));
end
end
function nodeStruct = makeStructFromNode(theNode)

nodeStruct = struct('Name',char(theNode.getNodeName),...
    'Attributes',parseAttributes(theNode),'Data','',...
    'Children',parseChildNodes(theNode));

if any(strcmp(methods(theNode),'getData'))
   nodeStruct.Data = char(theNode.getData); 
else
    nodeStruct.Data = '';
end
end
function attributes = parseAttributes(theNode)
% Create attributes struct
attributes = [];
if theNode.hasAttributes
    theAttributes = theNode.getAttributes;
    numAttributes = theAttributes.getLength;
    allocCell = cell(1,numAttributes);
    attributes = struct('Name',allocCell,'Value',allocCell);
    for count = 1:numAttributes
        attrib = theAttributes.item(count-1);
        attributes(count).Name = char(attrib.getName);
        attributes(count).Value = char(attrib.getValue);
    end
end
end
function children = parseChildNodes(theNode)
% Recurse over node children
children = [];
if theNode.hasChildNodes
    childNodes = theNode.getChildNodes;
    numChildNodes = childNodes.getLength;
    allocCell = cell(1,numChildNodes);
    children = struct('Name',allocCell,'Attributes',allocCell,...
                                 'Data',allocCell,'Children',allocCell);
    for count = 1:numChildNodes
        theChild = childNodes.item(count-1);
        children(count) = makeStructFromNode(theChild);
    end
end
end
% End CZI file info process

I'd guess the command "GetMetadataXml" should work for you, something like this:
image

Does this work for your case or is helpful?

Hello,

Yes that was helping me.
However I am stuck with this error trying to read the xml string :
SAXParseException Invalid byte 1 of 1-byte UTF-8 sequence

I noticed that the xml string begins with <?xml version="1.0"?> without specifying the Encoding.
I plan to use my program using MEXlibCZI under Linux, Windows or MacOS.

I am trying to figure out the problem, reason why I am late to conclude here.

Thank you.

I found two differences between the xml strings generated by MEXlibCZI and the code I presented (cinfo.metadataXML content):
special 'µ' and '²' characters from xmlstr=MEXlibCZI('GetMetaData',h) are coded as 'µ' and '²' in cinfo.metadataXML. It could be generalized to other special character I did not saw yet. To solve the issue, I used these inelegant statements under MATLAB (2021b) :

xmlstr = replace(xmlstr,'µ',strcat(char(194),'µ'));
xmlstr = replace(xmlstr,'²',strcat(char(194),'²'));

Now the following code works as it should :

inputObject = java.io.StringBufferInputStream(xmlstr);
sxml = xmlread(inputObject);

Hope this helps. Thank you again.

Hmm, yes, you are right, it seems somehow the encoding is messed up here. I am sorry, I have no idea what's wrong here. It seems that the string has the "correct" content within Matlab.
What works for me is this

h=MEXlibCZI('Open','G:\Data\TestData\Example_TMA1_Zeb1_SPRR2_Ck19_S100-1-1-1-1.czi');
xmlstr=MEXlibCZI('GetMetadataXml',h);
sxml = xmlread(org.xml.sax.InputSource(java.io.StringReader(xmlstr)));

where I got the idea from here.

If you have some insight what's wrong here (and whether MEXlibCZI is doing something wrong here), I'd be glad to know.

I tried your solution successfully and I can get rid of the 'replace' statements.

Solved. Thank you for this help.