brendonh / pyth

Python text markup and conversion

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Import field text from RTFs

b-ad opened this issue · comments

commented

The code currently passes over all non-photo fields, including textboxes with text in them, drop-down menus with items selected, and check blocks that are checked.

I tried to figure out how to deal with this into the code's current structure, by incorporating something into the handle_field method. But I couldn't figure it out. So instead I made a pre-processing function that goes through and finds those types of fields, and replaces those fields with whatever text was supposed to be there: the entered text if a textbox, the selected text if a drop-down list (or the default if that's appropriate), and a "Yes" or "No" if it was a checkblock. Then, when you run it through the converter it will come out as plain text.

This required regex rather than re.

I'm not submitting this as a pull request because I'm not sure where you'd want to include this sort of pre-processing. But if you want to do so, here is the function:



import regex

def flattenrtffields(rawrtf):

    #get all "fields" including nested
    fieldsearch=regex.compile(r"{\\field[^{]*?({(?>[^{}]+|(?1))*})({(?>[^{}]+|(?1))*})}")
    m = fieldsearch.finditer(rawrtf)
    if m:

      textboxes,drops,checks=[],[],[]
      checkboxoptions=["No","Yes"]

      #Make lists of the kinds of fields to flatten
      for field in m:
        if "FORMTEXT" in field[0]:
          textboxes.append(field[0])
        elif "FORMDROPDOWN" in field[0]:
          drops.append(field[0])
        elif "FORMCHECKBOX" in field[0]:
          checks.append(field[0])
        else:
          pass

      #deal with textboxes
      for textbox in textboxes:
        try:
          result = regex.search(r"fldrslt ({(?>[^{}]+|(?1))*})}",textbox)[1]
          if result:
            rawrtf=rawrtf.replace(textbox,result)
        except:
          pass

      #deal with dropdownlists
      for drop in drops:
        try:
          ddresult = regex.search(r"fftype2.*ffres([0-9]*)",drop)[1]
          if ddresult=="25":
            ddresult=regex.search(r"ffdefres([0-9]*)",drop)[1]
          ddlist = re.findall(r"ffl ([^}]*)}",drop)
          rawrtf=rawrtf.replace(drop,"{\\rtlch "+ddlist[int(ddresult)]+"}")
        except:
          pass

      #deal with checkboxes
      for check in checks:
        try:
          result = regex.search(r"fftype1.*ffres([0-9]*)",check)[1]
          if result=="25":
            result=regex.search(r"ffdefres([0-9]*)",check)[1]
          rawrtf=rawrtf.replace(check,"{\\rtlch "+checkboxoptions[int(ddresult)]+"}")
        except:
          pass

    return rawrtf