subspan creation shouldn't check that size is != dynamic_extent, causes code bloat
JosephBialekMsft opened this issue · comments
Currently the subspan code checks that size != dynamic_extent.
This happens in the following two places:
constexpr explicit extent_type(size_type size) : size_(size)
{
Expects(size != dynamic_extent);
}
and
template <class OtherExtentType>
constexpr storage_type(KnownNotNull data, OtherExtentType ext)
: ExtentType(ext), data_(data.p)
{
Expects(ExtentType::size() != dynamic_extent);
}
The result is that the following simple test case has an unneeded cmp/branch.
#include <windows.h>
#include <iostream>
#include <gsl/gsl>
#include <stdlib.h>
__declspec(noinline)
void
PrintSpan(
gsl::span<const BYTE> MySpan)
{
for (auto c : MySpan)
{
printf("%i", c);
}
}
__declspec(noinline)
void
DoStuff(
gsl::span<const BYTE> MySpan, ULONG Offset)
{
MySpan = MySpan.subspan(8 + Offset);
PrintSpan(MySpan);
}
int main()
{
size_t bufSize = rand() / 1000;
BYTE* buf = (BYTE*)malloc(bufSize);
gsl::span<const BYTE> MySpan{ buf, bufSize };
DoStuff(MySpan, rand() / 50);
}
asm:
__declspec(noinline)
void
DoStuff(
gsl::span<const BYTE> MySpan, ULONG Offset)
{
00007FF78B9F10D0 sub rsp,38h
MySpan = MySpan.subspan(8 + Offset);
00007FF78B9F10D4 mov rax,qword ptr [rcx] // load size from span
00007FF78B9F10D7 add edx,8 // add 8 to Offset
00007FF78B9F10DA cmp rax,rdx // compare size to the total offset
00007FF78B9F10DD jb DoStuff+43h (07FF78B9F1113h) // jump to fast-fail code if size is too small
00007FF78B9F10DF sub rax,rdx // adjust size
00007FF78B9F10E2 mov qword ptr [rsp+20h],rax // store size to stack temporary span
**00007FF78B9F10E7 cmp rax,0FFFFFFFFFFFFFFFFh // compare size to dynamic_extent
00007FF78B9F10EB je DoStuff+43h (07FF78B9F1113h) // jump to fast-fail code if size == dynamic_extent, an impossible condition**
00007FF78B9F10ED mov rcx,qword ptr [rcx+8] // load pointer from span
00007FF78B9F10F1 add rcx,rdx // adjust pointer
00007FF78B9F10F4 mov qword ptr [rsp+28h],rcx // save pointer
PrintSpan(MySpan);
00007FF78B9F10F9 lea rcx,[rsp+20h]
00007FF78B9F10FE movaps xmm0,xmmword ptr [rsp+20h] // completely useless load/store pair, but this is the optimizers fault
00007FF78B9F1103 movdqa xmmword ptr [rsp+20h],xmm0
00007FF78B9F1109 call PrintSpan (07FF78B9F1080h)
}
00007FF78B9F110E add rsp,38h
00007FF78B9F1112 ret
MySpan = MySpan.subspan(8 + Offset);
00007FF78B9F1113 call gsl::details::terminate (07FF78B9F1070h)
Note that we already have optimizations in gsl::span such as the KnownNotNull optimization that allows us to omit null pointer checks when creating subspans since we can assume the existing span is already valid. I haven't fully investigated how dynamic_extent is used and where it is actually needed and not needed, but perhaps a similar optimization can be used to eliminate it from cases where it is not needed.
Hi @JosephBialekMsft
Thanks for noticing this, something akin to KnownNotNull
does seem to be like a potential solution. I will bring this up at the next maintainer's sync.
Dmitry
Looking at this again, it seems like both checks for Expects(ExtentType::size() != dynamic_extent);
in storage_type
are always useless. storage_type<ExtentType>
is only ever created with ExtentType == extent_type<Extent>
, where Extent
has type std::size_t
and is the extent of the span.
Looking at extent_type<std::size_t Ext>::size()
:
- if
Ext != dynamic_extent
, thensize()
always returnsExt
, and thereforesize() != dynamic_extent
- if
Ext == dynamic_extent
, thensize()
returnsextent_type<dynamic_extent>::size_
.size_
can only be set via one of two constructors:constexpr explicit extent_type(size_type size)
, which already does the check in questionconstexpr explicit extent_type(extent_type<Other> ext) : size_(ext.size())
, which simply relies on the other extent'ssize()
method
So there is no way for ExtentType::size() == dynamic_extent
.
I'll make a PR to remove this check, hopefully I haven't overlooked anything...
The duplicate check in storage_type
has now been completey removed. Hopefully this removes the unnecessary branching.