uriparser / uriparser

:hocho: Strictly RFC 3986 compliant URI parsing and handling library written in C89; moved from SourceForge to GitHub

Home Page:https://uriparser.github.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Forcing URI object to own memory

veselov opened this issue · comments

commented

I find it a bit tedious dealing with memory when using uriparse.
It seems that most of the time Uri objects are point to memory they don't own. Which means that the caller must keep track of the used strings along with the Uri objects, for as long as Uri object that uses that string is "alive".

Is there an official way to make a Uri object take ownership of the memory (i.e. allocate the memory, store all the string data there), after it's been created? My utmost grief is with uriAddBaseUriA. It seems that the result references memory from the Uri objects that were given to it as arguments, which means that I can't dispose of them until the I'm done with the result.

I understand that I can convert the URL to a string, and then parse it again, but that seems like a lot of work for the task. I see that normalizing a Uri may make it own the memory, but I'd need something more sure than "may"...

Thank you.

uriparser allocates memory as little as possible (and is used on embedded devices). So with regard to design shy use of memory is a feature — saving on resources and adding flexibility —, even if it feels like a bug — inconvenience — to you.

Right now I would guess your best option is to add convenience wrappers or helpers to turn shallow objects into deep objects, maybe out-of-place rather than in-place, for clarity and safety, as well as freeing counterparts. Are you aware of any other ways to address this inconvenience?

commented

I understand the idea behind trying to minimize copying memory, and am sure it's appreciated for various uses.

I have already added convenience wrappers, effectively carrying around a (pointer to a) structure that contains the string that Uri object is build around, and Uri object itself.

The problem with functions like uriAddBaseUriA, because once I populated an Uri object using such function, I don't really have a (an easy) way of knowing which memory does this object reference. Note, that, IMHO, this is different from calling a function like uriParse..., because in that case I know which memory the Uri object would reference (and also, it's a single continuous piece thereof, which obviously makes things easier).

In a sense, an Uri object holds memory it was created based on, hostage. Imagine a URL that is parsed into a Uri object. This Uri object is then passed around, other Uri objects are created from it and used in various places, but to the very end of using all of those Uri objects, the original URL string has to be kept around, until all of stemmed Uri objects are released.

Without writing code that uses knowledge about internal structure of Uri objects, the only other way of splitting off memory is by calling uriToString, and then uriParse... again. That sounds (at least sounded when I realized all this) like a bit too much - having to fully re-parse the URL.

What I think the library may provide to make things simpler, is a function that takes in an Uri object pointer, and:

  • noop if Uri object already owns memory
  • allocates the memory to hold all of its reference string data
  • copies string data it's referencing to the allocated memory
  • moves all of its internal pointers to point to the allocated memory
    So after calling this function, one would know that they can do away with any memory that they fed to create that object in the first place.

You may be able to abuse uriNormalizeSyntaxExA for what you want, if there is at least one part of the URI where you can tolerate normalization, e.g. the scheme. You would pass URI_NORMALIZE_SCHEME for mask and after a successful call, .owner will be URI_TRUE and the UriUriA instance should own all its memory.

commented

Understood, thank you.

Do you think reparsing the URL is safe enough, though? Any reason converting it into a string, and back to Uri object will cause the resulting Uri object not equal the initial one?

My reason for caution on using normalization, is that I think that theoretically if normalization doesn't change anything, you might optimize it to not allocate own memory, and re-parsing is a safer bet.

Understood, thank you.

Do you think reparsing the URL is safe enough, though? Any reason converting it into a string, and back to Uri object will cause the resulting Uri object not equal the initial one?

If you do no manual modification to the UriUriA instance after parsing, there is little point in re-parsing over keeping the original string. If you do manual modification, you need to to do that in a way that cooperates with the eventual uriFreeUriMembersA on that instance, e.g. keeping .owner in mind. I'm not really sure what case the re-parsing hack would help — could you elaborate?

My reason for caution on using normalization, is that I think that theoretically if normalization doesn't change anything, you might optimize it to not allocate own memory, and re-parsing is a safer bet.

The normalize hack would work for 0.9.3 because of

  • (1)
    /* Dup all not duped yet */
    if ((outMask == NULL) && !uri->owner) {
    if (!URI_FUNC(MakeOwner)(uri, &doneMask, memory)) {
    URI_FUNC(PreventLeakage)(uri, doneMask, memory);
    return URI_ERROR_MALLOC;
    }
    uri->owner = URI_TRUE;
    }
  • (2)
    if (!URI_FUNC(LowercaseMalloc)(&(uri->scheme.first), &(uri->scheme.afterLast), memory)) {
    URI_FUNC(PreventLeakage)(uri, doneMask, memory);
    return URI_ERROR_MALLOC;
    }
    doneMask |= URI_NORMALIZE_SCHEME;

I get your point that a sentence like If necessary the URI becomes owner of all memory is not something to rely upon if you care about the owner part. Would exposing code (1) in a new API function help your case? You could then use the uriNormalizeSyntaxExA with <=0.9.3 and the new function with >=0.9.4.

commented

I'm not really sure what case the re-parsing hack would help

I just had it in mind as a guaranteed mechanism to copy memory (though at the cost of reparse). And yes, you've pointed to exactly the piece of language that made me cautious about using normalization. Thank you for your thorough explanation of the normalization function, I'll make use of it for now.

Would exposing code (1) in a new API function help your case?

Absolutely :) And I think considering the memory model, this functionality should come in handy for, say, anybody who'd like to wrap the UriUri objects into any kind of higher level objects.