dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

Home Page:https://docs.microsoft.com/dotnet/core/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[API Proposal]: System.String.Repeat (instance method)

tats-u opened this issue Β· comments

Background and motivation

At least Java/JS/Python/Ruby/Perl/PHP/Rust/Go/Kotlin/Swift can generate a string formed by repeating a certain string a specified number of times by only one method, function, or operator.

System.out.println("πŸ‘".repeat(10));
console.log("πŸ‘".repeat(10));
print("πŸ‘" * 10)
puts "πŸ‘" * 10
print "πŸ‘" x 10;
echo str_repeat("πŸ‘", 10);
println!("{}", "πŸ‘".repeat(10));
fmt.Println(strings.Repeat("πŸ‘", 10))
println("πŸ‘".repeat(10))
print(String(repeating: "πŸ‘", count: 10))

This emoji takes 2 characters in C#.

For example, recheck, a ReDoS checker, uses .repeat(n) to describe attack strings for regex.

However, C# doesn't have such a convenient method unlike the above languages.

API Proposal

namespace System;

[Serializable]
[NonVersionable] // This only applies to field layout
[TypeForwardedFrom("mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")]
public sealed partial class String
    : IComparable,
          IEnumerable,
          IConvertible,
          IEnumerable<char>,
          IComparable<string?>,
          IEquatable<string?>,
          ICloneable,
          ISpanParsable<string>
{
    public string Repeat(int n);
}

We can implement it using FastAllocateString, CopyStringContent, and System.Numerics.BitOperations.LeadingZeroCount.

MSB
1 [πŸ‘] +1
↓  πŸ‘[πŸ‘] Γ—2
0  πŸ‘ πŸ‘ +0
↓  πŸ‘ πŸ‘[πŸ‘ πŸ‘]  Γ—2
1  πŸ‘ πŸ‘ πŸ‘ πŸ‘[πŸ‘] +1
↓  πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘[πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘] Γ—2
0  πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ +0
LSB

API Usage

Console.WriteLine("✨️".Repeat(20));

Alternative Designs

Console.WriteLine(string.Concat(enumerable.Repeat("✨️", 20)));
Console.WriteLine(new string('#', 20));

The former uses the LINQ, which is slower and more difficult to be found out.
The latter is only limited to a single BMP char.

Risks

Runtime/SDK size increase

Why not just write a trivial extension method here?

As you can imagine, LINQ + string.Concat (ValueStringBuilder) isn't so terrible in most cases (other than those extremely fearing extra allocations) because they shouldn't be executed millions or more times.
However, Java, our long-time rival, has already implemented it. I think there's no reason to not only actively develop it, as you think, but also reject it.

I'll take a microbench between LINQ and single time allocation implementation if I have time.

Why not just write a trivial extension method here?

If many people had done, wouldn't it be so bad to prepare it by the runtime?
Developers not from C/C++ will expect C# would have it, too.

This isn't so high priority because those other than newbies and game developers will be able to content themselves with LINQ, as you claim. You can throw this into Any TIme at worst.

I forgot REPL users will have to copy and paste or recite the boilerplate static string Repeat(this string s, int n) => string.Concat(Enumerable.Repeat(s, n)); every run.
It's an unkind and irritating specification a little.

Java's trackers:

Java has been able to do it using Stream before .repeat was implemented in 11 since 8, which is a little verbose than the C#'s LINQ implementation:

IntStream.range(0,10).mapToObj(i -> "πŸ‘").collect(Collectors.joining())

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

You can use more performant APIs that exist nowadays, like string.Create
Here is an example of such simple extension method you can write for your projects:

public static string Repeat(this string str, int count)
{
	return string.Create(str.Length * count, str, (span, value) =>
	{
		for (int i = 0; i < count; i++)
		{
			value.AsSpan().CopyTo(span);
			span = span.Slice(value.Length);
		}
	});
}

@KeterSCP We can double the size of span per iteration using my suggested algorithm.
I'll add your code to a microbench.
Also I want to know how much the lambda reduces the performance. (Update: it may not matter thanks to the guarded devirtualization)


I expected the use case in PowerShell, but its * operator can reproduce the left operand string like Python/Ruby.

It turned out to be more complex than I expected.
This is just a draft and can be more optimized.
As you can imagine, I think the above code is faster at least in simple cases due to the overhead, but only microbench knows the truth.

public static class E
{
    public static string Repeat(this string str, int count) =>
        count switch
        {
            < 0 => throw new ArgumentOutOfRangeException(nameof(count)),
            0 => "",
            1 => str,
            _
                => string.Create(
                    checked(str.Length * count),
                    (str, count),
                    static (outSpan, state) =>
                    {
                        var inSpan = state.Item1.AsSpan();
                        inSpan.CopyTo(outSpan);
                        var len = inSpan.Length;
                        var firstSpan = outSpan[..len];
                        var copyFromSpan = firstSpan;
                        var copyToSpan = outSpan[len..];
                        for (
                            var bit =
                                1
                                << (
                                    30
                                    - System.Numerics.BitOperations.LeadingZeroCount(
                                        (uint)state.Item2
                                    )
                                );
                            bit != 0;
                            bit >>= 1
                        )
                        {
                            copyFromSpan.CopyTo(copyToSpan);
                            var oldLength = copyFromSpan.Length;
                            copyToSpan = copyToSpan[oldLength..];
                            copyFromSpan = outSpan[..(oldLength << 1)];
                            if ((bit & state.Item2) != 0)
                            {
                                firstSpan.CopyTo(copyToSpan);
                                copyToSpan = copyToSpan[len..];
                            }
                        }
                    }
                )
        };
}

IL size in lambda: 210

I erased n from the above code:

public static class E
{
    public static string Repeat(this string str, int count)
    {
        return count switch
        {
            < 0 => throw new ArgumentOutOfRangeException(nameof(count)),
            0 => "",
            1 => str,
            _
                => string.Create(
                    checked(str.Length * count),
                    str,
                    static (outSpan, input) =>
                    {
                        var inSpan = input.AsSpan();
                        for (; outSpan.Length != 0; outSpan = outSpan.Slice(inSpan.Length))
                        {
                            inSpan.CopyTo(outSpan);
                        }
                    }
                )
        };
    }
}

IL size in lambda: 48β†’43

We have to recite such a implementation (the LINQ one is also sufficiently complex for beginners) or search an NuGet package (is there one?) in every project (solution) for just a method that has been implemented in many other (newer) languages.

FWIW, where Repeat is the log n copies algorithm and Repeat2 is the n copies algorithm:

Runtime=.NET 8.0

| Method              | Mean        | Error      | StdDev     | Gen0   | Allocated |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_1CharX1000   |   323.86 ns |   6.416 ns |   8.114 ns | 0.4835 |    2024 B |
| Repeat2_1CharX1000  | 4,345.55 ns |  80.400 ns |  78.963 ns | 0.4807 |    2024 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_1CharX100    |    83.14 ns |   1.540 ns |   1.441 ns | 0.0535 |     224 B |
| Repeat2_1CharX100   |   456.18 ns |   8.921 ns |  10.955 ns | 0.0534 |     224 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_1CharX10     |    45.06 ns |   0.955 ns |   1.100 ns | 0.0114 |      48 B |
| Repeat2_1CharX10    |    59.01 ns |   1.238 ns |   1.474 ns | 0.0114 |      48 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_10CharX1000  | 1,795.62 ns |  35.437 ns |  71.585 ns | 4.7607 |   20024 B |
| Repeat2_10CharX1000 | 6,158.26 ns | 119.666 ns | 199.936 ns | 4.7607 |   20024 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_10CharX100   |   268.02 ns |   5.308 ns |   5.900 ns | 0.4835 |    2024 B |
| Repeat2_10CharX100  |   625.81 ns |  11.624 ns |  10.304 ns | 0.4835 |    2024 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_10CharX10    |    65.35 ns |   1.366 ns |   1.278 ns | 0.0535 |     224 B |
| Repeat2_10CharX10   |    77.00 ns |   1.570 ns |   1.392 ns | 0.0535 |     224 B |