Apkawa / xlsx2html

A simple export from xlsx format to html tables with keep cell formatting

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Cells with custom date format not converted correctly or throw error

gorj-tessella opened this issue · comments

When attempting to process a sheet with a date in the custom format "d-mmm-yyyy", it failed with a "ValueError: Invalid length for field 'mmm'". The error was thrown by babel.

Upon further inspection I found that pyxl was using the cell's number_format property which was "d-mmm-yyyy". When it ran the date conversions function, nothing was changed. Babel needed this to be converted to "d-MMM-yyyy". I found that the normalize_date_format was not doing this; it was making the following changes:

DATE_REPLACES = {
'DD': 'dd',
'YYYY': 'yyyy',
'YY': 'yy',
}

The root issues are as follows:

  • dt.py expects dates to be written with upper case for some reason. Maybe that is true for internal formats, but it isn't for custom formats. The "mmm" needed to be converted to "MMM"
  • Custom date/time formats in excel are frustratingly ambiguous. Specifically, both month and minute are specified with lowercase "m". "m" is only interpreted as minute if follows "hh" or precedes "ss".
  • Openpyxl doesn't provide any capabilities to get the value as formatted from a cell, though it really should.

Reference:

The following code should be sufficient to format an openpyxl datetime object using an excel format, based on excel's definitions.

RE_DATE_TOK = re.compile(r"y+|m+|d+|h+|s+|\.0+|AM/PM|A/P")
WEEK_DAYS = [
    'Monday',
    'Tuesday',
    'Wednesday',
    'Thursday',
    'Friday',
    'Saturday',
    'Sunday'
]
YEAR_MONTHS = [
    'January',
    'February',
    'March',
    'April',
    'May',
    'June',
    'July',
    'August',
    'September',
    'October',
    'November',
    'December'
]


def excel_format_custom_datetime(fmt, value):
    """Only works for US"""
    def zpad(v, tok):
        v = str(v)
        i = len(tok) - len(v)
        if i > 0:
            v = ('0' * i) + v
        return v

    has_ap = False
    is_minute = set()
    must_minute = False
    ms = list(RE_DATE_TOK.finditer(fmt))
    for i, m in enumerate(ms):
        tok = m.group(0)
        if tok in ['AM/PM', 'A/P']:
            has_ap = True
        elif tok[0] == 'h':
            # First m after h is always minute
            must_minute = True
        elif must_minute and tok in ['m', 'mm']:
            is_minute.add(i)
            must_minute = False
        elif tok[0] == 's':
            last_i = i - 1
            if last_i < 0:
                must_minute = True
            elif last_i not in is_minute:
                if ms[last_i].group(0) in ['m', 'mm']:
                    # m right before s is alway minute
                    is_minute.add(last_i)
                elif not len(is_minute):
                    # if no previous m, first m after s is always minute
                    must_minute = True

    parts = []
    pos = 0
    for i, m in enumerate(ms):
        tok = m.group(0)
        start, end = m.span(0)
        parts.append(fmt[pos:start])
        if tok[0] == 'h':
            tok = tok[:2]
            v = value.hour
            if has_ap:
                v = v % 12
                if v == 0:
                    v = 12
            tok = zpad(v, tok)
        elif tok[0] == 'm':
            if len(tok) > 5:
                tok = tok[:4]  # Defaults to MMMM
            if tok == 'mmm':
                tok = YEAR_MONTHS[value.month - 1][:3]
            elif tok == 'mmmm':
                tok = YEAR_MONTHS[value.month - 1]
            elif tok == 'mmmmm':
                tok = YEAR_MONTHS[value.month - 1][0]
            elif i in is_minute:
                tok = zpad(value.minute, tok)
            else:
                tok = zpad(value.month, tok)
        elif tok[0] == 's':
            tok = tok[:2]
            tok = zpad(value.second, tok)
        elif tok[:2] == '.0':
            digits = len(tok) - 1
            v = value.microsecond / 1000000.0
            v = ("{." + str(digits) + "f}").format(v)[1:]
            tok = v
        elif tok == 'AM/PM':
            tok = 'AM' if (value.hour < 12) else 'PM'
        elif tok == 'A/P':
            tok = 'A' if (value.hour < 12) else 'P'
        elif tok[0] == 'y':
            if len(tok) <= 2:
                tok = str(value.year)[-2:]
            else:
                tok = str(value.year)
        elif tok[0] == 'd':
            if len(tok) <= 2:
                tok = zpad(value.day, tok)
            elif tok == 'ddd':
                tok = WEEK_DAYS[value.weekday()][:3]
            else:
                tok = WEEK_DAYS[value.weekday()]
        else:
            raise ValueError(f'Unhandled datetime token {tok}')
        parts.append(tok)
        pos = end
    parts.append(fmt[pos:])

    return ''.join(parts)

Great job! I will check soon and add your solution as soon as there is time.
The current implementation was incomplete and only covered my tasks.