Character iterating?

Question

Character iterating?

codecat opened this issue 8 years ago · comments

Melissa commented 8 years ago

What do you suggest is the best way of iterating over codepoints using this library?

Neil Henning · Answer 1 · Sun Jul 10 2016 00:13:21 GMT+0800 (China Standard Time)

Heyo @angelog!

So at present I've just done it manually when I've needed to - but I agree that is not the ideal approach for everyone.

I could forsee adding a function (something like):

void* some_utf8_str = ...;
long codepoint;
some_utf8_str = utf8codepoint(some_utf8_str, &codepoint);

And you could then iterate until codepoint was the null terminator ('\0'). Would that be of use to you?

Melissa · Answer 2 · Sun Jul 10 2016 05:26:35 GMT+0800 (China Standard Time)

That would be a helpful addition to the library, yeah.

I'm curious, how exactly were you doing it manually?

Neil Henning · Answer 3 · Sun Jul 10 2016 17:10:49 GMT+0800 (China Standard Time)

Basically the run length of the utf8 codepoint is encoded by the pattern of the first bits of each byte. I was creating a long codepoint by concating multiple bytes together.

I think having a function to do this makes a lot of sense though, I'll work on it!

Melissa · Answer 4 · Sun Jul 10 2016 17:16:52 GMT+0800 (China Standard Time)

Ah, yeah it doesn't sound too practical to do it manually. Thanks! 👍

Neil Henning · Answer 5 · Mon Jul 11 2016 05:01:46 GMT+0800 (China Standard Time)

Hey @angelog can you check out pull request #21 for me please? I've included an example of how to use it in the pull request too 😄

Neil Henning · Answer 6 · Mon Jul 11 2016 16:54:35 GMT+0800 (China Standard Time)

I've merged #21, solving this issue.

Melissa · Answer 7 · Mon Jul 11 2016 19:21:40 GMT+0800 (China Standard Time)

I will play around with it later tonight. Thank you! 👍

Melissa · Answer 8 · Thu Aug 11 2016 05:46:15 GMT+0800 (China Standard Time)

Sorry I didn't reply to this earlier, I was pretty busy. Tried it last night, works wonderfully! Thank you :)