Forcing URI object to own memory
veselov opened this issue · comments
I find it a bit tedious dealing with memory when using uriparse.
It seems that most of the time Uri objects are point to memory they don't own. Which means that the caller must keep track of the used strings along with the Uri objects, for as long as Uri object that uses that string is "alive".
Is there an official way to make a Uri object take ownership of the memory (i.e. allocate the memory, store all the string data there), after it's been created? My utmost grief is with uriAddBaseUriA
. It seems that the result references memory from the Uri objects that were given to it as arguments, which means that I can't dispose of them until the I'm done with the result.
I understand that I can convert the URL to a string, and then parse it again, but that seems like a lot of work for the task. I see that normalizing a Uri may make it own the memory, but I'd need something more sure than "may"...
Thank you.
uriparser allocates memory as little as possible (and is used on embedded devices). So with regard to design shy use of memory is a feature — saving on resources and adding flexibility —, even if it feels like a bug — inconvenience — to you.
Right now I would guess your best option is to add convenience wrappers or helpers to turn shallow objects into deep objects, maybe out-of-place rather than in-place, for clarity and safety, as well as free
ing counterparts. Are you aware of any other ways to address this inconvenience?
I understand the idea behind trying to minimize copying memory, and am sure it's appreciated for various uses.
I have already added convenience wrappers, effectively carrying around a (pointer to a) structure that contains the string that Uri object is build around, and Uri object itself.
The problem with functions like uriAddBaseUriA
, because once I populated an Uri object using such function, I don't really have a (an easy) way of knowing which memory does this object reference. Note, that, IMHO, this is different from calling a function like uriParse...
, because in that case I know which memory the Uri object would reference (and also, it's a single continuous piece thereof, which obviously makes things easier).
In a sense, an Uri object holds memory it was created based on, hostage. Imagine a URL that is parsed into a Uri object. This Uri object is then passed around, other Uri objects are created from it and used in various places, but to the very end of using all of those Uri objects, the original URL string has to be kept around, until all of stemmed Uri objects are released.
Without writing code that uses knowledge about internal structure of Uri objects, the only other way of splitting off memory is by calling uriToString
, and then uriParse...
again. That sounds (at least sounded when I realized all this) like a bit too much - having to fully re-parse the URL.
What I think the library may provide to make things simpler, is a function that takes in an Uri object pointer, and:
- noop if Uri object already owns memory
- allocates the memory to hold all of its reference string data
- copies string data it's referencing to the allocated memory
- moves all of its internal pointers to point to the allocated memory
So after calling this function, one would know that they can do away with any memory that they fed to create that object in the first place.
You may be able to abuse uriNormalizeSyntaxExA
for what you want, if there is at least one part of the URI where you can tolerate normalization, e.g. the scheme. You would pass URI_NORMALIZE_SCHEME
for mask
and after a successful call, .owner
will be URI_TRUE
and the UriUriA
instance should own all its memory.
Understood, thank you.
Do you think reparsing the URL is safe enough, though? Any reason converting it into a string, and back to Uri object will cause the resulting Uri object not equal the initial one?
My reason for caution on using normalization, is that I think that theoretically if normalization doesn't change anything, you might optimize it to not allocate own memory, and re-parsing is a safer bet.
Understood, thank you.
Do you think reparsing the URL is safe enough, though? Any reason converting it into a string, and back to Uri object will cause the resulting Uri object not equal the initial one?
If you do no manual modification to the UriUriA
instance after parsing, there is little point in re-parsing over keeping the original string. If you do manual modification, you need to to do that in a way that cooperates with the eventual uriFreeUriMembersA
on that instance, e.g. keeping .owner
in mind. I'm not really sure what case the re-parsing hack would help — could you elaborate?
My reason for caution on using normalization, is that I think that theoretically if normalization doesn't change anything, you might optimize it to not allocate own memory, and re-parsing is a safer bet.
The normalize hack would work for 0.9.3 because of
I get your point that a sentence like If necessary the URI becomes owner of all memory
is not something to rely upon if you care about the owner part. Would exposing code (1) in a new API function help your case? You could then use the uriNormalizeSyntaxExA
with <=0.9.3 and the new function with >=0.9.4.
I'm not really sure what case the re-parsing hack would help
I just had it in mind as a guaranteed mechanism to copy memory (though at the cost of reparse). And yes, you've pointed to exactly the piece of language that made me cautious about using normalization. Thank you for your thorough explanation of the normalization function, I'll make use of it for now.
Would exposing code (1) in a new API function help your case?
Absolutely :) And I think considering the memory model, this functionality should come in handy for, say, anybody who'd like to wrap the UriUri
objects into any kind of higher level objects.