ofiwg / libfabric

Open Fabric Interfaces

Home Page:http://libfabric.org/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

`fi_errno` codes depend on implementation-defined macros

darrylabbate opened this issue · comments

Many of the core error codes defined in fi_errno.h are mapped directly to their system-level counterparts defined in errno.h. However, many of these are only available on Linux (e.g. EREMOTEIO, EHOSTDOWN). This leaves many of the core error codes unusable on other platforms.

Is there an intended workaround here? I don't think it's intended for providers to check the validity of an underlying core error code (e.g. #ifdef EREMOTEIO ...), nor is it desirable to add extra logic for each supported platform (e.g. #ifdef _WIN32 ...).

I don't believe there's a strict requirement for core error codes to map to something defined in errno.h since there are proprietary core error codes >= 256:

enum {
FI_EOTHER = FI_ERRNO_OFFSET, /* Unspecified error */
FI_ETOOSMALL = 257, /* Provided buffer is too small */
FI_EOPBADSTATE = 258, /* Operation not permitted in current state */
FI_EAVAIL = 259, /* Error available */
FI_EBADFLAGS = 260, /* Flags not supported */
FI_ENOEQ = 261, /* Missing or unavailable event queue */
FI_EDOMAIN = 262, /* Invalid resource domain */
FI_ENOCQ = 263, /* Missing or unavailable completion queue */
FI_ECRC = 264, /* CRC error */
FI_ETRUNC = 265, /* Truncation error */
FI_ENOKEY = 266, /* Required key not available */
FI_ENOAV = 267, /* Missing or unavailable address vector */
FI_EOVERRUN = 268, /* Queue has been overrun */
FI_ENORX = 269, /* Receiver not ready, no receive buffers available */
FI_ENOMR = 270, /* No more memory registrations available */
FI_ERRNO_MAX
};

AFAICT, the only real benefit of the current implementation is the ability to reuse error messages from strerror(3). I'm concerned about the coupling with system-level error codes causing further cross-compatibility issues and establishing a de facto standard (i.e. FI_Exxx == Exxx is a safe assumption)

CC:

The design is intentional. It not only allows reusing strerror(), but also avoids unnecessary error code translation when returning lower-level failures as OFI error.

On platforms that some of the error code definitions are missing, those missing codes should be defined in the corresponding osd.h. See include/windows/osd.h for some examples.

On platforms that some of the error code definitions are missing, those missing codes should be defined in the corresponding osd.h. See include/windows/osd.h for some examples.

I see. So in the case where EREMOTEIO (121) conflicts with ENOLINK, it should just be defined as the next available "high number?"

/* Visual Studio doesn't have these, so just choose some high numbers */
#ifndef ESOCKTNOSUPPORT
# define ESOCKTNOSUPPORT 240 /* Socket type not supported */
#endif
#ifndef ESHUTDOWN
# define ESHUTDOWN 241 /* Can't send after socket shutdown */
#endif
#ifndef ETOOMANYREFS
# define ETOOMANYREFS 242 /* Too many references: can't splice */
#endif
#ifndef EHOSTDOWN
# define EHOSTDOWN 243 /* Host is down */
#endif
#ifndef EUSERS
# define EUSERS 244 /* Too many users (for UFS) */
#endif
#ifndef EDQUOT
# define EDQUOT 245 /* Disc quota exceeded */
#endif
#ifndef ESTALE
# define ESTALE 246 /* Stale NFS file handle */
#endif