Hirevo / mega-rs

An API client library for interacting with MEGA from Rust

Home Page:https://crates.io/crates/mega

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

FileAttributes is incomplete

Xyah3PBeHB opened this issue · comments

mega-rs/src/utils.rs

Lines 22 to 27 in 89a9753

pub(crate) struct FileAttributes {
#[serde(rename = "n")]
pub name: String,
#[serde(rename = "c", skip_serializing_if = "Option::is_none")]
pub c: Option<String>,
}

FileAttributes is incomplete, and some attributes will be lost when deserialized and reserialized. Even if you don't need them now, you can use #[serde(flatten)] to capture them first.

    #[serde(flatten)]
    other: serde_json::Value,

https://github.com/meganz/webclient/blob/d7f2a8e053f32858016d6711c5dbbeace7858c3d/nodedec.js#L340-L508

In addition, FileNode should contain the mtime attribute, and the FileFingerprint/checksum should be generated and set it to the FileAttributes when uploading the file.

https://github.com/meganz/sdk/blob/7e2bb9e05804a773f4f719b9de6476e43379c830/src/node.cpp#L927-L970

Hello, thanks for filing this issue.

I think it would be indeed a good call to add that #[serde(flatten)] field, even just for potential forward compatibility reasons.

Regarding the mtime attribute in FileNode, I've looked through some files I have stored in MEGA which were uploaded using both MEGAcmd (MEGA's official CLI, on version 1.6.1.3) and the web client (simply drag-and-dropped on mega.nz), and I couldn't find mtime field anywhere in the (pre-deserialized) JSON output from the FetchNodes request (message type f).
Do you know which tool/method I can use to create a file with such a field myself (for testing correctness and compatibility) ?

Regarding the FileFingerprint/checksum, I'll look into it to implement the checksum generation and comparison logic, but I am not sure what should the library do in the case the checksum would be found to be invalid.
Maybe we could simply not do anything with it when fetching nodes and just expose a check_attributes_integrity method to users to allow running this check on the nodes for which they care to validate the checksum ?

Thanks for your quick and detailed reply.

I wasn't precise enough and you misunderstood my advice. FileNode means a mega::Node whose NodeKind is NodeKind::File. I think such a Node should provide an API similar to mega::Node::created_at to get the modified_at (the mtime attribute I mentioned earlier), although sometimes it is not available.

This value can be obtained from the following two ways:

  1. The t/mtime attribute of FileAttributes (legacy).
  2. The c/"checksum" attribute of FileAttributes which is also known as Fingerprint/FileFingerprint.

https://github.com/meganz/webclient/blob/d7f2a8e053f32858016d6711c5dbbeace7858c3d/nodedec.js#L432-L446
https://github.com/meganz/sdk/blob/7e2bb9e05804a773f4f719b9de6476e43379c830/src/node.cpp#L949-L955

I have encountered files where the above two methods cannot get the value. Maybe someone used the imperfect MEGA third-party library (rclone use go-mega) to upload the file or something went wrong when uploading with the official tool.

In view of the checksum generation implementation, I personally think it is better to provide a separate method as you said to verify whether the specified reader: impl AsyncRead matches the checksum of a certain Node. But the FileFingerprint/"checksum"/"hash" defined by MEGA is a combination containing the real file checksum and mtime. Maybe it needs to be explained in the docs.

https://github.com/meganz/webclient/blob/d7f2a8e053f32858016d6711c5dbbeace7858c3d/js/transfers/utils.js#L363
https://github.com/meganz/sdk/blob/7e2bb9e05804a773f4f719b9de6476e43379c830/src/filefingerprint.cpp#L121

Because real checksum is a sparse CRC32 hash for the most files (size > 8192). This is a good trade-off in some usage scenarios compared to MAC checks that require AES encryption of the full file. When MEGAcmd/MEGAsync downloads a file, if it finds that there is a file with the same name and the same file size in the target path, it will compare the FileFingerprint (including the mtime) to decide whether to skip this file download.

Thank you for reading my long-winded reply, I hope there is no misexpression this time.

Thank you very much for the numerous pointers you've given me on how these checksums work.

I'm sorry for the slight delay, it's been a bit tricky to wrap my head around the exact mechanics that needed to be added in to support this properly.

I've now been able to implement that checksum generation logic into the library, as part of PR #2.

In that PR, the checksums are now automatically computed on file uploads and serialized into the c attribute, along with the updated modification date (which I also now write into the t attribute).

The checksum and modification date are also now extracted upon node fetching and exposed for users using Node::checksum and Node::modified_at getter methods.

The checksum is ignored on file download since it seems redundant with the existing MAC method.

I also exposed a standalone function to allow users to compute similar checksums for their own readers (mega::compute_sparse_checksum), which should allow people to compare checksum easily with the one from remote MEGA nodes and makes decisions in a similar fashion to MEGAcmd/MEGAsync, as you described.
I added a new example that makes use of this function to make such a quick comparison with a remote node.

I hope that this addresses everything you had in mind, feel free to make feedback if you see something strange in the implementation or if there is something missing that you wanted to see added.

Thank you for your efforts to achieve my request.

I checked your PR #2 and there are two points I would like to discuss with you.

  1. t attribute is a legacy design and should only be read for compatibility and not used when creating new node.

https://github.com/meganz/webclient/blob/a49c766f993cfec42fe117d8bca343371cf8bb2c/js/transfers/upload2.js#L671-L683
https://github.com/meganz/webclient/blob/a49c766f993cfec42fe117d8bca343371cf8bb2c/nodedec.js#L361-L362

  1. Node::modified_at should represent the modification time of the unencrypted data, which is similar to the modification time of the file. Therefore, it should not be changed when moving/renaming node. For the same reason, the NodeFingerprint::modified_at should be set with the modification time of the file to be uploaded. (Maybe add an argument that timestamp: Option<i64> or timestamp: Option<DateTime<Utc>> to Client::upload_node or add a new method Client::upload_node_with_timestamp)

https://github.com/meganz/webclient/blob/a49c766f993cfec42fe117d8bca343371cf8bb2c/js/transfers/utils.js#L415
https://developer.mozilla.org/en-US/docs/Web/API/File/lastModified

Thanks for the review of my implementation, this is very valuable feedback so thank you for taking the time to actually do that.

I agree with your proposition to allow setting the modification date more directly, so I've added a new last_modified: LastModified argument to Client::upload_node, which I think may be a bit clearer to read than a bare timestamp or Option<DateTime<Utc>>.

I've fixed some of the bugs you've mentioned, namely, renaming nodes that used to update the last modification date.

Regarding the legacy t attribute, I have to say I am a bit confused as to why MEGA themselves would set t to 0 explicitely except of the actual timestamp, even though it is a legacy thing.
Wouldn't that mean that some legacy clients would interpret this as a 1970-01-01 00:00:00 UTC last modification date ?

Actually, I think I got a bit confused earlier.

I thought that the t: 0 assignment in js/transfers/upload2.js was equivalent to setting NodeAttributes::modified_at to Some(0) in this project, but it is actually them setting the UploadAttributes::kind field to NodeKind::File (which its discriminant is indeed 0), so my bad for misreading this.

But I am really confused as to why omitting the t field (by setting NodeAttributes::modified_at to None) when creating a folder causes the MEGA webclient to display 'MALFORMED ATTRIBUTES' for that folder, despite the webclient itself seemingly doing exactly that and not encountering this issue.

Well, in any case, this does not seem to be a major issue, as folders still don't have a meaningful modification date (the webclient does not display one for folders in its UI, where files would have one), so it doesn't seem to be a blocker in any way.

I just checked the implementation from the perspective of the user who put forward the demand. You who can fully implement the entire system are the most valuable.

Now you have figured out the truth about t: 0, maybe you can change some modified_at: Some(0), to modified_at: None,?

I think MEGA's vision for the time information of the folder is only the creation time when the node is created using the API.

root@localhost:~# mega-ls --help
Usage: ls [-halRr] [--show-handles] [--tree] [--versions] [remotepath] [--use-pcre] [--show-creation-time] [--time-format=FORMAT]
Lists files in a remote path
 remotepath can be a pattern (Perl Compatible Regular Expressions with "--use-pcre"
   or wildcarded expresions with ? or * like f*00?.txt)
 Also, constructions like /PATTERN1/PATTERN2/PATTERN3 are allowed

Options:
 -R|-r  List folders recursively
 --tree Prints tree-like exit (implies -r)
 --show-handles Prints files/folders handles (H:XXXXXXXX). You can address a file/folder by its handle
 -l     Print summary (--tree has no effect)
         SUMMARY contents:
           FLAGS: Indicate type/status of an element:
             xxxx
             |||+---- Sharing status: (s)hared, (i)n share or not shared(-)
             ||+----- if exported, whether it is (p)ermanent or (t)temporal
             |+------ e/- wheter node is (e)xported
             +-------- Type(d=folder,-=file,r=root,i=inbox,b=rubbish,x=unsupported)
           VERS: Number of versions in a file
           SIZE: Size of the file in bytes:
           DATE: Modification date for files and creation date for folders (in UTC time):
           NAME: name of the node
 -h     Show human readable sizes in summary
 -a     Include extra information
         If this flag is repeated (e.g: -aa) more info will appear
         (public links, expiration dates, ...)
 --versions     show historical versions
        You can delete all versions of a file with "deleteversions"
 --show-creation-time   show creation time instead of modification time for files
 --time-format=FORMAT   show time in available formats. Examples:
               RFC2822:  Example: Fri, 06 Apr 2018 13:05:37 +0200
               ISO6081:  Example: 2018-04-06
               ISO6081_WITH_TIME:  Example: 2018-04-06T13:05:37
               SHORT:  Example: 06Apr2018 13:05:37
               SHORT_UTC:  Example: 06Apr2018 13:05:37
               CUSTOM. e.g: --time-format="%Y %b":  Example: 2018 Apr
                 You can use any strftime compliant format: http://www.cplusplus.com/reference/ctime/strftime/
 --use-pcre     use PCRE expressions

https://github.com/meganz/sdk/blob/679e829457af5647d485d95a5202125cdaf9c2c5/include/megaapi.h#L766-L782

I just found that the root cause of the issue I had with the MALFORMED_ATTRIBUTES was completely unrelated to modified_at being None, it is simply that the example case I used ended up having an attributes object whose length is exactly a multiple of 16 after being encoded into MEGA's attribute format.

And a bug in the padding routine I wrote to pad buffers to multiples of 16 (as MEGA expects) mistakenly added 15 more bytes if it was already a multiple of 16 in length, leading to the padding being completely wrong and MEGA displaying MALFORMED_ATTRIBUTES.

I really hope no one encountered this issue yet with their own files, or at least haven't lost any data due to my unfortunate mistake.
I've now backported the fix on top of the v0.4.0 release commit and put out a v0.4.1 release to address it early.

Now, always setting modified_at to None works for all cases in my testing and I've now committed this change in the PR, there are no Some(0) or any other value in the code.

PR #3 has now been merged !

@Xyah3PBeHB Thank you for filing this issue, for your help reverse-engineering MEGA's SDK and for your inputs on the API design !