dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

Home Page:https://docs.microsoft.com/dotnet/core/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[API Proposal]: System.String.Repeat (instance method)

tats-u opened this issue Β· comments

Background and motivation

At least Java/JS/Python/Ruby/Perl/PHP/Rust/Go/Kotlin/Swift can generate a string formed by repeating a certain string a specified number of times by only one method, function, or operator.

System.out.println("πŸ‘".repeat(10));
console.log("πŸ‘".repeat(10));
print("πŸ‘" * 10)
puts "πŸ‘" * 10
print "πŸ‘" x 10;
echo str_repeat("πŸ‘", 10);
println!("{}", "πŸ‘".repeat(10));
fmt.Println(strings.Repeat("πŸ‘", 10))
println("πŸ‘".repeat(10))
print(String(repeating: "πŸ‘", count: 10))

This emoji takes 2 characters in C#.

For example, recheck, a ReDoS checker, uses .repeat(n) to describe attack strings for regex.

However, C# doesn't have such a convenient method unlike the above languages.

API Proposal

namespace System;

[Serializable]
[NonVersionable] // This only applies to field layout
[TypeForwardedFrom("mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")]
public sealed partial class String
    : IComparable,
          IEnumerable,
          IConvertible,
          IEnumerable<char>,
          IComparable<string?>,
          IEquatable<string?>,
          ICloneable,
          ISpanParsable<string>
{
    public string Repeat(int n);
}

We can implement it using FastAllocateString, CopyStringContent, and System.Numerics.BitOperations.LeadingZeroCount.

MSB
1 [πŸ‘] +1
↓  πŸ‘[πŸ‘] Γ—2
0  πŸ‘ πŸ‘ +0
↓  πŸ‘ πŸ‘[πŸ‘ πŸ‘]  Γ—2
1  πŸ‘ πŸ‘ πŸ‘ πŸ‘[πŸ‘] +1
↓  πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘[πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘] Γ—2
0  πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ πŸ‘ +0
LSB

API Usage

Console.WriteLine("✨️".Repeat(20));

Alternative Designs

Console.WriteLine(string.Concat(enumerable.Repeat("✨️", 20)));
Console.WriteLine(new string('#', 20));

The former uses the LINQ, which is slower and more difficult to be found out.
The latter is only limited to a single BMP char.

Risks

Runtime/SDK size increase

Why not just write a trivial extension method here?

As you can imagine, LINQ + string.Concat (ValueStringBuilder) isn't so terrible in most cases (other than those extremely fearing extra allocations) because they shouldn't be executed millions or more times.
However, Java, our long-time rival, has already implemented it. I think there's no reason to not only actively develop it, as you think, but also reject it.

I'll take a microbench between LINQ and single time allocation implementation if I have time.

Why not just write a trivial extension method here?

If many people had done, wouldn't it be so bad to prepare it by the runtime?
Developers not from C/C++ will expect C# would have it, too.

This isn't so high priority because those other than newbies and game developers will be able to content themselves with LINQ, as you claim. You can throw this into Any TIme at worst.

I forgot REPL users will have to copy and paste or recite the boilerplate static string Repeat(this string s, int n) => string.Concat(Enumerable.Repeat(s, n)); every run.
It's an unkind and irritating specification a little.

Java's trackers:

Java has been able to do it using Stream before .repeat was implemented in 11 since 8, which is a little verbose than the C#'s LINQ implementation:

IntStream.range(0,10).mapToObj(i -> "πŸ‘").collect(Collectors.joining())

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

You can use more performant APIs that exist nowadays, like string.Create
Here is an example of such simple extension method you can write for your projects:

public static string Repeat(this string str, int count)
{
	return string.Create(str.Length * count, str, (span, value) =>
	{
		for (int i = 0; i < count; i++)
		{
			value.AsSpan().CopyTo(span);
			span = span.Slice(value.Length);
		}
	});
}

@KeterSCP We can double the size of span per iteration using my suggested algorithm.
I'll add your code to a microbench.
Also I want to know how much the lambda reduces the performance. (Update: it may not matter thanks to the guarded devirtualization)


I expected the use case in PowerShell, but its * operator can reproduce the left operand string like Python/Ruby.

It turned out to be more complex than I expected.
This is just a draft and can be more optimized.
As you can imagine, I think the above code is faster at least in simple cases due to the overhead, but only microbench knows the truth.

public static class E
{
    public static string Repeat(this string str, int count) =>
        count switch
        {
            < 0 => throw new ArgumentOutOfRangeException(nameof(count)),
            0 => "",
            1 => str,
            _
                => string.Create(
                    checked(str.Length * count),
                    (str, count),
                    static (outSpan, state) =>
                    {
                        var inSpan = state.Item1.AsSpan();
                        inSpan.CopyTo(outSpan);
                        var len = inSpan.Length;
                        var firstSpan = outSpan[..len];
                        var copyFromSpan = firstSpan;
                        var copyToSpan = outSpan[len..];
                        for (
                            var bit =
                                1
                                << (
                                    30
                                    - System.Numerics.BitOperations.LeadingZeroCount(
                                        (uint)state.Item2
                                    )
                                );
                            bit != 0;
                            bit >>= 1
                        )
                        {
                            copyFromSpan.CopyTo(copyToSpan);
                            var oldLength = copyFromSpan.Length;
                            copyToSpan = copyToSpan[oldLength..];
                            copyFromSpan = outSpan[..(oldLength << 1)];
                            if ((bit & state.Item2) != 0)
                            {
                                firstSpan.CopyTo(copyToSpan);
                                copyToSpan = copyToSpan[len..];
                            }
                        }
                    }
                )
        };
}

IL size in lambda: 210

I erased n from the above code:

public static class E
{
    public static string Repeat(this string str, int count)
    {
        return count switch
        {
            < 0 => throw new ArgumentOutOfRangeException(nameof(count)),
            0 => "",
            1 => str,
            _
                => string.Create(
                    checked(str.Length * count),
                    str,
                    static (outSpan, input) =>
                    {
                        var inSpan = input.AsSpan();
                        for (; outSpan.Length != 0; outSpan = outSpan.Slice(inSpan.Length))
                        {
                            inSpan.CopyTo(outSpan);
                        }
                    }
                )
        };
    }
}

IL size in lambda: 48β†’43

We have to recite such a implementation (the LINQ one is also sufficiently complex for beginners) or search an NuGet package (is there one?) in every project (solution) for just a method that has been implemented in many other (newer) languages.

FWIW, where Repeat is the log n copies algorithm and Repeat2 is the n copies algorithm:

Runtime=.NET 8.0

| Method              | Mean        | Error      | StdDev     | Gen0   | Allocated |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_1CharX1000   |   323.86 ns |   6.416 ns |   8.114 ns | 0.4835 |    2024 B |
| Repeat2_1CharX1000  | 4,345.55 ns |  80.400 ns |  78.963 ns | 0.4807 |    2024 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_1CharX100    |    83.14 ns |   1.540 ns |   1.441 ns | 0.0535 |     224 B |
| Repeat2_1CharX100   |   456.18 ns |   8.921 ns |  10.955 ns | 0.0534 |     224 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_1CharX10     |    45.06 ns |   0.955 ns |   1.100 ns | 0.0114 |      48 B |
| Repeat2_1CharX10    |    59.01 ns |   1.238 ns |   1.474 ns | 0.0114 |      48 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_10CharX1000  | 1,795.62 ns |  35.437 ns |  71.585 ns | 4.7607 |   20024 B |
| Repeat2_10CharX1000 | 6,158.26 ns | 119.666 ns | 199.936 ns | 4.7607 |   20024 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_10CharX100   |   268.02 ns |   5.308 ns |   5.900 ns | 0.4835 |    2024 B |
| Repeat2_10CharX100  |   625.81 ns |  11.624 ns |  10.304 ns | 0.4835 |    2024 B |
|-------------------- |------------:|-----------:|-----------:|-------:|----------:|
| Repeat_10CharX10    |    65.35 ns |   1.366 ns |   1.278 ns | 0.0535 |     224 B |
| Repeat2_10CharX10   |    77.00 ns |   1.570 ns |   1.392 ns | 0.0535 |     224 B |

@mikernet Oh, thank you for the benchmark.
It is right Repeat_* is my more complex one and Repeat2_* is the naiver one based on @KeterSCP?
If so that's a great news.

However I still need to find out cases disadvantageous against the more complex one and perform my bench with them and Mann-Whitney regardless of the result.


I took into cases where the input string is composed of a single character:

// WTFPLv2; feel free to combine this code into your product without asking permission or any conditions
public static class E
{
    public static string Repeat(this string str, int count) =>
        (count, str.Length) switch
        {
            (< 0, _) => throw new ArgumentOutOfRangeException(nameof(count)),
            (0, _) or (_, 0) => "",
            (1, _) => str,
            (_, 1) => new string(str[0], count),
            _
                => string.Create(
                    checked(str.Length * count),
                    (str, count),
                    static (outSpan, state) =>
                    {
                        var inSpan = state.Item1.AsSpan();
                        inSpan.CopyTo(outSpan);
                        var len = inSpan.Length;
                        var firstSpan = outSpan[..len];
                        var copyFromSpan = firstSpan;
                        var copyToSpan = outSpan[len..];
                        for (
                            var bit =
                                1
                                << (
                                    30
                                    - System.Numerics.BitOperations.LeadingZeroCount(
                                        (uint)state.Item2
                                    )
                                );
                            bit != 0;
                            bit >>= 1
                        )
                        {
                            copyFromSpan.CopyTo(copyToSpan);
                            var oldLength = copyFromSpan.Length;
                            copyToSpan = copyToSpan[oldLength..];
                            copyFromSpan = outSpan[..(oldLength << 1)];
                            if ((bit & state.Item2) != 0)
                            {
                                firstSpan.CopyTo(copyToSpan);
                                copyToSpan = copyToSpan[len..];
                            }
                        }
                    }
                )
        };
}

I looked into the implementations of the * of PowerShell and IronPython.

PowerShell's implementation: https://github.com/PowerShell/PowerShell/blob/74d8bdba443895ea84da1ea6c0d26dac05f08d7b/src/System.Management.Automation/engine/runtime/Operations/StringOps.cs#L53-L62

IronPython's implementation: https://github.com/IronLanguages/ironpython3/blob/master/Src/IronPython/Runtime/Operations/StringOps.cs#L1395-L1399
StringBuilder.Insert: https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs,4fc7620ab6118309,references
ReplaceInPlaceAtChunk https://source.dot.net/#System.Private.CoreLib/src/libraries/System.Private.CoreLib/src/System/Text/StringBuilder.cs,b15916a476938198,references

Unfortunately, they both call Span.CopyTo count times due to the relatively low frequency of use.
I wish .NET had provided a faster implementation by a single line for them.

It is right Repeat_* is my more complex one and Repeat2_* is the naiver one based on @KeterSCP?

Yup. 1Char was a single character str parameter, 10Char was a 10 character str parameter. X10 was a count of 10, X100 was a count of 100, etc.

@mikernet I'm glad to hear that. Thank you.

Benchmarks on the above 3 implementations, within the regulations of not modifying the runtime.

  • RepeatWithCounter: @ KeterSCP
  • RepeatNoCounter: my naiver implementation based on the above
  • RepeatDoubleBlockSize: my more complex implementation

RepeatWithCounter has no pros over the other 2. RepeatNoCounter is fastest only if the iteration count is very small.
If we have to choose only one, we would choose RepeatDoubleBlockSize.
I wonder if we should combine my 2 implementations using the case against count.

Iteration counts are the form of $2^n - 1$, which forces RepeatDoubleBlockSize to copy one per every bit.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3593/23H2/2023Update/SunValley3)
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.300-preview.24203.14
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  DefaultJob : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2


Method Input Count Mean Error StdDev Ratio MannWhitney(5%) RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
RepeatWithCounter πŸ‘ 3 22.93 ns 0.236 ns 0.197 ns 1.56 Slower 0.02 0.0204 - - 128 B 3.20
RepeatNoCounter πŸ‘ 3 14.67 ns 0.112 ns 0.100 ns 1.00 Base 0.00 0.0063 - - 40 B 1.00
RepeatDoubleBlockSize πŸ‘ 3 18.13 ns 0.186 ns 0.174 ns 1.24 Slower 0.01 0.0063 - - 40 B 1.00
RepeatWithCounter πŸ‘ 7 33.49 ns 0.254 ns 0.226 ns 1.30 Slower 0.01 0.0229 - - 144 B 2.57
RepeatNoCounter πŸ‘ 7 25.76 ns 0.188 ns 0.167 ns 1.00 Base 0.00 0.0089 - - 56 B 1.00
RepeatDoubleBlockSize πŸ‘ 7 25.84 ns 0.153 ns 0.143 ns 1.00 Same 0.01 0.0089 - - 56 B 1.00
RepeatWithCounter πŸ‘ 15 55.19 ns 0.145 ns 0.129 ns 1.16 Slower 0.01 0.0280 - - 176 B 2.00
RepeatNoCounter πŸ‘ 15 47.54 ns 0.257 ns 0.240 ns 1.00 Base 0.00 0.0140 - - 88 B 1.00
RepeatDoubleBlockSize πŸ‘ 15 34.24 ns 0.106 ns 0.100 ns 0.72 Faster 0.00 0.0140 - - 88 B 1.00
RepeatWithCounter πŸ‘ 1023 2,842.29 ns 7.374 ns 6.158 ns 1.02 Same 0.00 0.6676 - - 4208 B 1.02
RepeatNoCounter πŸ‘ 1023 2,799.93 ns 15.562 ns 13.796 ns 1.00 Base 0.00 0.6561 - - 4120 B 1.00
RepeatDoubleBlockSize πŸ‘ 1023 236.11 ns 3.028 ns 2.684 ns 0.08 Faster 0.00 0.6561 - - 4120 B 1.00
RepeatWithCounter πŸ‘ 16383 43,518.69 ns 125.063 ns 116.984 ns 1.00 Same 0.00 10.3760 - - 65648 B 1.00
RepeatNoCounter πŸ‘ 16383 43,718.76 ns 144.522 ns 135.186 ns 1.00 Base 0.00 10.3760 - - 65560 B 1.00
RepeatDoubleBlockSize πŸ‘ 16383 2,696.70 ns 53.694 ns 52.735 ns 0.06 Faster 0.00 10.4141 - - 65560 B 1.00
RepeatWithCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 3 55.48 ns 0.203 ns 0.180 ns 1.16 Slower 0.01 0.1390 - - 872 B 1.11
RepeatNoCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 3 47.82 ns 0.514 ns 0.455 ns 1.00 Base 0.00 0.1249 - - 784 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)πŸΏπŸ‘ [127] 3 51.91 ns 0.139 ns 0.124 ns 1.09 Slower 0.01 0.1249 - - 784 B 1.00
RepeatWithCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 7 111.86 ns 0.539 ns 0.478 ns 1.06 Slower 0.00 0.3009 - - 1888 B 1.05
RepeatNoCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 7 105.56 ns 0.391 ns 0.347 ns 1.00 Base 0.00 0.2869 - - 1800 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)πŸΏπŸ‘ [127] 7 103.10 ns 1.126 ns 1.054 ns 0.98 Same 0.01 0.2869 - - 1800 B 1.00
RepeatWithCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 15 221.60 ns 0.708 ns 0.662 ns 1.04 Same 0.01 0.6249 - - 3920 B 1.02
RepeatNoCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 15 213.74 ns 1.812 ns 1.606 ns 1.00 Base 0.00 0.6108 - - 3832 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)πŸΏπŸ‘ [127] 15 193.22 ns 0.850 ns 0.754 ns 0.90 Faster 0.01 0.6108 - - 3832 B 1.00
RepeatWithCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 1023 90,454.42 ns 1,804.951 ns 2,346.946 ns 1.00 Same 0.03 76.9043 76.9043 76.9043 259978 B 1.00
RepeatNoCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 1023 89,836.23 ns 1,737.684 ns 1,859.302 ns 1.00 Base 0.00 76.9043 76.9043 76.9043 259890 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)πŸΏπŸ‘ [127] 1023 54,875.29 ns 1,043.264 ns 1,071.355 ns 0.61 Faster 0.01 76.9043 76.9043 76.9043 259890 B 1.00
RepeatWithCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 16383 1,410,675.18 ns 28,087.310 ns 44,549.450 ns 1.00 Same 0.02 998.0469 998.0469 998.0469 4161728 B 1.00
RepeatNoCounter πŸ‘¨β€(...)πŸΏπŸ‘ [127] 16383 1,404,378.14 ns 28,022.675 ns 43,627.929 ns 1.00 Base 0.00 998.0469 998.0469 998.0469 4161640 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)πŸΏπŸ‘ [127] 16383 810,295.29 ns 15,833.747 ns 15,550.857 ns 0.58 Faster 0.02 999.0234 999.0234 999.0234 4161640 B 1.00
RepeatWithCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 3 56.98 ns 0.318 ns 0.266 ns 1.20 Slower 0.01 0.1402 - - 880 B 1.11
RepeatNoCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 3 47.63 ns 0.326 ns 0.289 ns 1.00 Base 0.00 0.1262 - - 792 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)β€πŸ§‘ [128] 3 52.64 ns 0.226 ns 0.189 ns 1.11 Slower 0.01 0.1262 - - 792 B 1.00
RepeatWithCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 7 115.35 ns 0.525 ns 0.466 ns 1.10 Slower 0.01 0.3034 - - 1904 B 1.05
RepeatNoCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 7 104.63 ns 0.963 ns 0.854 ns 1.00 Base 0.00 0.2894 - - 1816 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)β€πŸ§‘ [128] 7 103.67 ns 1.521 ns 1.349 ns 0.99 Same 0.01 0.2894 - - 1816 B 1.00
RepeatWithCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 15 236.88 ns 1.065 ns 0.889 ns 1.06 Slower 0.01 0.6294 - - 3952 B 1.02
RepeatNoCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 15 222.94 ns 1.305 ns 1.090 ns 1.00 Base 0.00 0.6156 - - 3864 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)β€πŸ§‘ [128] 15 194.35 ns 0.754 ns 0.669 ns 0.87 Faster 0.01 0.6156 - - 3864 B 1.00
RepeatWithCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 1023 90,920.08 ns 1,794.115 ns 2,332.856 ns 1.00 Same 0.04 76.9043 76.9043 76.9043 262026 B 1.00
RepeatNoCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 1023 91,311.83 ns 1,652.045 ns 2,369.313 ns 1.00 Base 0.00 76.9043 76.9043 76.9043 261938 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)β€πŸ§‘ [128] 1023 55,103.75 ns 1,097.718 ns 1,174.546 ns 0.60 Faster 0.02 76.9043 76.9043 76.9043 261938 B 1.00
RepeatWithCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 16383 1,421,301.32 ns 28,223.169 ns 44,764.937 ns 0.99 Same 0.04 998.0469 998.0469 998.0469 4194496 B 1.00
RepeatNoCounter πŸ‘¨β€(...)β€πŸ§‘ [128] 16383 1,429,694.28 ns 28,501.615 ns 39,013.333 ns 1.00 Base 0.00 998.0469 998.0469 998.0469 4194408 B 1.00
RepeatDoubleBlockSize πŸ‘¨β€(...)β€πŸ§‘ [128] 16383 794,637.41 ns 15,844.339 ns 17,610.941 ns 0.56 Faster 0.02 999.0234 999.0234 999.0234 4194408 B 1.00

My faster one (RepeatDoubleBlockSize) vs LINQ:

The naive LINQ implementation (see "Alternative Designs") can be as much as more than 30x slower than the fastest implementation within the regulations of not modifying the runtime!
On average, LINQ was faster than I thought, though.


BenchmarkDotNet v0.13.12, Windows 11 (10.0.22631.3593/23H2/2023Update/SunValley3)
Intel Core i7-9700 CPU 3.00GHz, 1 CPU, 8 logical and 8 physical cores
.NET SDK 8.0.300-preview.24203.14
  [Host]     : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2
  Job-DBCPAT : .NET 8.0.5 (8.0.524.21615), X64 RyuJIT AVX2

MaxIterationCount=40  

Method Input Count Mean Error StdDev Ratio MannWhitney(5%) RatioSD
StringCreateFastest πŸ‘ 3 18.82 ns 0.408 ns 0.382 ns 1.00 Base 0.00
Linq πŸ‘ 3 36.84 ns 0.426 ns 0.356 ns 1.95 Slower 0.04
StringCreateFastest πŸ‘ 16383 2,710.49 ns 46.661 ns 45.828 ns 1.00 Base 0.00
Linq πŸ‘ 16383 94,059.70 ns 1,139.643 ns 1,010.264 ns 34.76 Slower 0.65
StringCreateFastest πŸ‘¨β€(...)β€πŸ§‘ [128] 3 53.35 ns 1.088 ns 1.069 ns 1.00 Base 0.00
Linq πŸ‘¨β€(...)β€πŸ§‘ [128] 3 123.64 ns 2.620 ns 4.590 ns 2.25 Slower 0.10
StringCreateFastest πŸ‘¨β€(...)β€πŸ§‘ [128] 16383 816,125.05 ns 8,544.552 ns 7,992.579 ns 1.00 Base 0.00
Linq πŸ‘¨β€(...)β€πŸ§‘ [128] 16383 1,285,999.49 ns 24,760.555 ns 29,475.679 ns 1.58 Slower 0.04

Repo: https://github.com/tats-u/StringRepeatBench

I tried to improve the following code, but I couldn't. (couldn't be faster statistically or on average)

                            if ((bit & state.Item2) != 0)
                            {
                                firstSpan.CopyTo(copyToSpan);
                                copyToSpan = copyToSpan[len..];
                            }