planetarypy / pvl

Python implementation of PVL (Parameter Value Language)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Inserting key/value pairs in PDS label

andywinhold opened this issue · comments

commented

Is there a way to insert desired key/value lines at a specific point in a PDS label?

ex:
from

PDS_VERSION_ID = 
DD_VERSION_ID = 
LABEL_REVISION_NOTE =
^IMAGE

to

PDS_VERSION_ID = 
DD_VERSION_ID = 
LABEL_REVISION_NOTE =
DATA_SET_ID =
PRODUCT_ID = 
INSTRUMENT_HOST_NAME = 
^IMAGE

There's nothing built in, however you could use label.items() to mutate the label:

>>> import pvl
>>> label = pvl.loads("""
...     PDS_VERSION_ID = a 
...     DD_VERSION_ID = b
...     LABEL_REVISION_NOTE = c
...     ^IMAGE = d
... """)
>>> items = label.items()
>>> extra = [
...     ('LABEL_REVISION_NOTE', 'more'),
...     ('DATA_SET_ID', 'more'),
...     ('PRODUCT_ID', 'more'),
...     ('INSTRUMENT_HOST_NAME', 'more'),
... ]
>>> new_label = pvl.PVLModule(items[:3] + extra + items[3:])
>>> print(pvl.dumps(new_label))
PDS_VERSION_ID = a
DD_VERSION_ID = b
LABEL_REVISION_NOTE = c
LABEL_REVISION_NOTE = more
DATA_SET_ID = more
PRODUCT_ID = more
INSTRUMENT_HOST_NAME = more
^IMAGE = d
END

The way the IDL code handled this situation was to provide an insert_before() and insert_after() function. So it could be possible to implement something that works like this:

label.insert_after('LABEL_REVISION_NOTE', extra)

There are issues here with disambiguating duplicate keys (if that's technically even allowed) and with addressing nested keys.

Edited - I accidentally said items instead of label, changed it to label.

It might be more pythonic to implement some of the list interface (i.e. label.insert(index, object), label.index(object, [start, [end]]) etc.).

Also slicing might be nice.

@wtolson, in the case of label.insert(index, object) what would index be?

@godber for the builtin list, index is the int position to insert the object:

>>> l = [1, 2, 3]
>>> l.insert(1, 42)
>>> print(l)
[1, 42, 2, 3]

While it would be good to at least replicate this, it might be interesting to also overload it for a str index similar your insert_after()/insert_before(). The only issue I see is it may be confusing if there are multiple items with the same key.

commented

@wtolson and @godber I really appreciate the help.

I like the original approach, but if I'm right pvl.loads() requires you know the initial string, especially values a, b, c and d (in this case).
I'm using this in tandem with planetaryimage's PDS3Image() which populates an initial label for you. Since those initial label values vary from product to product,
For example:

IMAGE
    MAXIMUM = ___
    MEAN =  ___
    MEDIAN = ___

the pvl.loads() string would have to be adjusted for each new product. Or so I understand it.

@pbvarga1 suggested another approach (for this case) of instantiating the PDSI3Image() class inside my code and making edits to the pvl module created in PDS3Image._create_label()

To handle the ambiguities with multiple items with similar keys we could use an optional argument to indicate before/after the first, second, third, ... all cases of the string index. The default, I think, would be before/after the first instance. For example,

# Default is after the first instance
label.insert_after('LABEL_REVISION_NOTE', extra)
# After the third (second?) instance
label.insert_after('LABEL_REVISION_NOTE', extra, instance=2) 
 # After every instance
label.insert_after('LABEL_REVISION_NOTE', extra, every_instance=True)

Im not sure about the argument names and whether it would be better to use index counting or cardinal counting. Index counting will be more pythonic but cardinal would be more... human? Either way, I think this will help quell the ambiguities.

@wtolson On python3, label.items() returns ItemView which is not subscriptable

In [1]: import pvl

In [2]: label = pvl.loads("""
   ...:     foo = bar
   ...:     monty = python
   ...: """)

In [3]: items = label.items()

In [4]: items[0]
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-95f461411437> in <module>()
----> 1 items[0]

TypeError: 'ItemsView' object does not support indexing

So there will have to be changes to ItemView or make a different way to have access to self.__items

@pbvarga1 TypeError: 'ItemsView' object does not support indexing should probably be a separate ticket.

@andywinhold maybe this whole issue should be on PDS3Image instead. Higher level insert() capability could be added there. Though maybe pvl should at least support insert() by integer.

If I recall correctly, there is some fundamental issue with not being able to edit the label on the PDS3Image object.

commented

@godber I'll look at PDS3Image and see what I can come up with. It's good to know there may already be an issue with editing.

@pbvarga1 did you overlook this suggestion?

#23 (comment)

Or do you disagree with it?

@godber Overlooked. I was thinking about insertion relative to its surroundings. The index insertion I think is useful if you care about where it is and not what it is around. I can implement in #28.

Yeah, I agree that from the user perspective, they are most often thinking in terms of "after label X" or "before label Y". In fact, from the user perspective, it's hard to actually even know the numerical index. In a 400 line PDS label, just figuring out the integer index is a hassle.

Ok, so design wise, this has me thinking the following things:

  • implementing insert() with the pythonic interface seems like a sensible thing to do
  • users still need a method of doing inserts in a relative manner.

So assume you've implemented insert(), would we then provide the higher level functionality by

  • implementing insert_before() and insert_after() methods on top of insert()?
  • add optional keyword arguments to insert()?
  • add the higher level functionality EXTERNAL from pvl, perhaps in PlanetaryImage?

I think there is real need for these relative insert capabilities directly in pvl. @wtolson you have any spare cycles to comment?

Also, @pbvarga1, I suspect you could look at your existing PR and see that with some refactoring you have already (accidentally?) implemented insert() ... I have NOT actually looked at the code.

I like having insert_after and insert_before be there own methods. It will make code that implements these methods easier to read and easier to remember how to use. Then insert only deals with integer indexes.

However, I could overload insert(key, object) by including boolean keywords like relative=False and before=True. Then if relative is True or key is a String, then we use insert_before. Otherwise we assume key is an int and insert in place. The code in the PR could pretty easily be refactored to include insert like this.