emojicode / emojicode

πŸ˜€πŸ˜œπŸ”‚ World’s only programming language that’s bursting with emojis

Home Page:https://emojicode.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Syntax question: How is tokens separated?

danielb987 opened this issue Β· comments

I'm interested in the formal syntax of Emojicode.

http://www.emojicode.org/docs/reference/syntax.html

Are tokens separated by white space or could for example an emoji be immediately followed by an identifier? Could a number be immediately followed by an identifier? And so on.

If tokens are separated with white space, which characters are considered as white space? Linefeed (10), carriage return (13), space (32), tab (9) ?

Regards,
Daniel

commented

Tokens are not per se separated by whitespace. The tokenizer ignores whitespace if it hasn't determined a token type yet. If a token is parsed (e.g. a number) and a whitespace occurs that syntactically cannot be part of the token, the token is terminated. The tokenizer continues with this whitespace as if it has not determined a token type.

E.g. 432πŸ˜€ is an integer token and an identifier or 😏493903@var is an identifier, integer and a variable.

All unicode whitespace characters are considered whitespace. See http://www.unicode.org/Public/6.3.0/ucd/PropList.txt and

/// @returns True if @c c is a whitespace character. See http://www.unicode.org/Public/6.3.0/ucd/PropList.txt
inline bool isWhitespace(char32_t c) {
return (0x9 <= c && c <= 0xD) || c == 0x20 || c == 0x85 || c == 0xA0 || c == 0x1680 || (0x2000 <= c && c <= 0x200A)
|| c == 0x2028 || c== 0x2029 || c == 0x2029 || c == 0x202F || c == 0x205F || c == 0x3000 || c == 0xFE0F;
}

Thanks. May two emojis come immediately after each other and are they in that case considered as one token, or may for example an emoji for a for loop be immediately followed by an emoji for a symbol?

Examples: May πŸ˜€ πŸ”€a is bigger than bπŸ”€ be written as πŸ˜€πŸ”€a is bigger than bπŸ”€
or may 🍊 ▢️ a b πŸ‡ be written as πŸŠβ–ΆοΈ a b πŸ‡

I look at this code:
πŸ‡ πŸ”ΆπŸŽ…πŸŽ πŸ‡

πŸ‰
And the first line has two white spaces. Are these white spaces mandatory in order to separate the tokens?

commented

Subsequent emojis are never considered one token. There is no difference in writing πŸ‡ πŸ”ΆπŸŽ…πŸŽ πŸ‡ or πŸ‡πŸ”ΆπŸŽ…πŸŽπŸ‡.

Note, however, that if your system is not properly displaying emojis you might see ZWJ emoji sequences, flags or emojis with gender/skin color as multiple symbols, e.g. πŸ‘±πŸ½β€β™‚οΈ, πŸ‘©β€πŸ’», πŸ‘¨β€πŸ’», πŸ™†β€β™‚οΈ, πŸ‘©β€πŸ‘§β€πŸ‘¦, πŸ‘¨β€πŸ‘©β€πŸ‘§β€πŸ‘§. By definition each of them is a single emoji and treated accordingly by the tokenizer.

Emojicode verion 0.5: variableInitAndScoping.emojic

What does this line do?
πŸ˜€ πŸͺπŸ”€i=πŸ”€ πŸ”‘i 10πŸͺ

πŸ˜€ prints the string to the standard output.
πŸͺ concatenate strings.
πŸ”€i=πŸ”€ one of the strings to be concatenated.

But what does πŸ”‘i 10 do? Convert i to a string with a radix of 10? πŸ”‘ is the string class, but I don't get the syntax here.

commented

It's an ordinary method call on i, which seems to be an integer: http://www.emojicode.org/docs/packages/s/1f682.html#mπŸ”‘