pvl should be able to fail gracefully when a label doesn't parse correctly

Question

pvl should be able to fail gracefully when a label doesn't parse correctly

godber opened this issue 9 years ago · comments

pvl should be able to fail gracefully when a label doesn't parse correctly. Here is a sample product that fails:

http://pds-imaging.jpl.nasa.gov/data/mer1-m-pancam-5-anaglyph-ops-v1.0/mer1po_0xxx/data/sol0072/rdr/1p134581832rsl0932p2415r2m1.img

For elements that don't parse, it should just carry on and return what it can of the label. I am not quite sure what to do with the failed element though.

The example above fails because it lacks a value altogether (note the ^M are carriage returns displayed by VIM, they are fine) ...

GROUP                             = DERIVED_IMAGE_PARMS^M
  RADIANCE_OFFSET                 = 0.0 <WATT*M**-2*SR**-1*NM**-1>^M
  RADIANCE_SCALING_FACTOR         = 1.656e-05 <WATT*M**-2*SR**-1*NM**-1>^M
  RADIOMETRIC_CORRECTION_TYPE     = MIPLRAD2^M
  DERIVED_IMAGE_TYPE              = ^M
END_GROUP                         = DERIVED_IMAGE_PARMS^M

Trevor Olson · Answer 1 · Thu Jul 16 2015 11:01:47 GMT+0800 (China Standard Time)

What do you envision this being parsed? The only tool I can get working on my mac is the java based label parser. It seems it just treats it as a syntax error:

$ java -jar parser.jar ~/Downloads/1p134581832rsl0932p2415r2m1.img 
SyntaxValidation
ODL Parser: /Users/wto/Downloads/1p134581832rsl0932p2415r2m1.img:459:1: unexpected token: END_GROUP
ODL Parser: /Users/wto/Downloads/1p134581832rsl0932p2415r2m1.img:461:1: expecting ASSIGNMENT_OPERATOR, found '/* CAMERA_MODEL DATA ELEMENTS */'
ODL Parser: /Users/wto/Downloads/1p134581832rsl0932p2415r2m1.img:485:1: expecting "END_GROUP", found 'END'
ODL Parser: /Users/wto/Downloads/1p134581832rsl0932p2415r2m1.img:486:1: expecting "end", found 'null'
Parse errors:4 warnings:0

Austin Godber · Answer 2 · Thu Jul 16 2015 19:55:08 GMT+0800 (China Standard Time)

Even if the label is bad, I'd still like to see the image right? As long as the parser can extract the bare minimum necessary to show the image we should offer a mode where the user gets back the parts of the label that are valid. We should probably inform the user that the label contains an error. Maybe make a keyword permissive=True ... or vice versa strict=True.

Austin Godber · Answer 3 · Thu Jul 16 2015 20:14:53 GMT+0800 (China Standard Time)

I think by default pvl should accept parse errors because most users just want to see the image, MAYBE get physical values for the pixels ... to do that they will need maybe a dozen valid label entries. This means we should offer a strict=True mode where pvl does raise an exception when an error is encountered. In the default case a list of errors can be accumulated in an .errors attribute (ideally).

I do find myself asking the question, "Is there anyone who actually cares if a label is valid?" If the PDS doesn't validate them as part of the archival process, then who actually cares? Just the software engineers who write the code probably :).

Austin Godber · Answer 4 · Thu Jul 16 2015 20:47:15 GMT+0800 (China Standard Time)

Well, it looks like everything starts off with parse_block. It could be a matter of catching errors raised below that point and appending them to the .error list and moving on to the next statement. Though its not yet clear to me how we'd know where the next statement is. Moving on to the next line is a good candidate. Though I don't understand how to do that yet either.

Trevor Olson · Answer 5 · Thu Jul 16 2015 21:15:03 GMT+0800 (China Standard Time)

Yeah, my main concern is what is the expected way for this to parse and how to implement that in a way that will not lead to unexpected behavior that will be a nightmare to debug. Looking through the pds toolkits, it seems that a lot of the image viewers simply use regex to extract the values they need instead of praising the whole image. I'm also curious how the IDL parser treats this label.

My suggestion is we create a version of the PDSDecoder that treats newlines as statement terminators and then would simply consider this line to be an assignment of an empty string or None. I would like to do that after I've at least merged the new test suite and some of the encoder fixes into this repo.

Austin Godber · Answer 6 · Thu Jul 16 2015 21:32:46 GMT+0800 (China Standard Time)

I have no issues waiting. The IDL parser is forgiving and accepts all sorts of garbage.

So users would have to choose which PDSDecoder they would use? Who would use the "stricter" version?

And thinking about it, statements can span multiple lines in the cases of long strings and sets.

Trevor Olson · Answer 7 · Thu Jul 16 2015 22:33:03 GMT+0800 (China Standard Time)

Yeah, I was thinking newline would only be a statement delimiter if following an assignment operator. This would preclude long strings, lists and sets as long as they begin on the same line. I'm thinking this is just the behaviour of the decoder as it is how I think most people think of labels and interpret them in their head. At some point, I think we should include "strict" parsers for each dialect (pvl, pds3, and cube) but I think we should default to the most permissive as long as it does not create unexpected behavior.

Trevor Olson · Answer 8 · Thu Jul 16 2015 22:34:52 GMT+0800 (China Standard Time)

So basically this would be valid:

foo = bar
baz = 
answer = 42

But this would become invalid:

foo = 
    bar

Trevor Olson · Answer 9 · Thu Jul 16 2015 22:38:31 GMT+0800 (China Standard Time)

It seems the IDL parser works line by line which is why you see this sort of behaviour.

macgyver603 · Answer 10 · Sat Jul 18 2015 05:53:23 GMT+0800 (China Standard Time)

Austin wanted me to add the file names of products in the get_mission_data that have invalid labels:
2nn043ilf06cyp00p1817l000m1.img
128078.img
134600.img

Trevor Olson · Answer 11 · Sat Jul 18 2015 06:14:20 GMT+0800 (China Standard Time)

What's wrong with 128078.img and 134600.img? They seem to parse fine with pvl for me.

macgyver603 · Answer 12 · Sat Jul 18 2015 06:42:03 GMT+0800 (China Standard Time)

From the stacktrace it looked like a pvl issue, but now I see that the error is actually with PDS3Image.open as shown below.

b = planetaryimage.PDS3Image.open('tests/mission_data/128078.img')

Trevor Olson · Answer 13 · Sat Jul 18 2015 06:55:55 GMT+0800 (China Standard Time)

Ah, that makes sense.

Trevor Olson · Answer 14 · Sat Jul 18 2015 06:57:41 GMT+0800 (China Standard Time)

Out of curriousty, what created the maformed labels on 1p134581832rsl0932p2415r2m1.img and 2nn043ilf06cyp00p1817l000m1.img. We might want to submit a bug report there.

Austin Godber · Answer 15 · Sat Jul 18 2015 07:11:48 GMT+0800 (China Standard Time)

JPL created them:

 PRODUCER_INSTITUTION_NAME        = "MULTIMISSION IMAGE PROCESSING SUBSYSTEM
                                     , JET PROPULSION LAB"

Which is why I don't think bad labels should be treated as an edge case of any sort. If JPL creates bad labels and the PDS is willing to publish them in the archive ... then its going to be a common occurrence.

I am really wondering how many invalid labels are published in the archive. Millions?

I also suspect that these products might be a lower tier product that simply aren't validated. Like maybe these are tactical or best effort products so the bar for admission is lower. We'd have to look at the SIS for details.

Trevor Olson · Answer 16 · Sat Jul 18 2015 07:18:11 GMT+0800 (China Standard Time)

Silly JPL. To busy being cool and going to pluto I guess.

Austin Godber · Answer 17 · Sat Jul 18 2015 07:26:17 GMT+0800 (China Standard Time)

Inspired by @wtolson s examples ... example 1

foo = bar
baz = 
answer = 42

example 2:

foo =  bar
baz = 
       clown
answer = 42

Both are invalid and both would return:

{
'foo': 'bar',
'answer': 42
}

But the pvl object returned would include valid = False and errors = [ <line 2> ] and errors = [ <line 2>, <line 3> ]. Maybe the valid attribute is redundant as it could be represented by an empty list for errors.

I still don't understand the code well enough to really answer yet.

Austin Godber · Answer 18 · Thu Dec 31 2015 07:59:44 GMT+0800 (China Standard Time)

One option would be to fall back to a simpler line based parser in the event of an error.

I've spent enough additional time looking at the code at this point to realize there is no trivial fix ... not even one with unfortunate consequences. Or at least not that I could see. We need some sort of backtracking capability.

Austin Godber · Answer 19 · Tue Feb 16 2016 01:17:14 GMT+0800 (China Standard Time)

CC @jjjakubo