Pointer-generator behavior with features

Question

Pointer-generator behavior with features

kylebgorman opened this issue a year ago · comments

Reporting some funky behavior with the pointer-generator with features:

With --arch pointer_generator_lstm and features enabled (i.e., --features_col 3 or something other than the default 0), I get the following report:

Model: pointer-generator
Encoder: LSTM
Decoder: attentive LSTM
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]

  | Name                   | Type                  | Params
-----------------------------------------------------------------
0 | loss_func              | NLLLoss               | 0     
1 | dropout_layer          | Dropout               | 0     
2 | source_encoder         | LSTMEncoder           | 333 K 
3 | decoder                | LSTMAttentiveDecoder  | 283 K 
4 | classifier             | Linear                | 12.0 K
5 | log_softmax            | LogSoftmax            | 0     
6 | generation_probability | GenerationProbability | 601   
-----------------------------------------------------------------

Since it doesn't list a feature encoder in either place, this makes me think it has ignored features even though I have explicitly requested them.

With --arch pointer_generator_lstm --source_encoder lstm --features_encoder lstm (which should be the same thing) we get a crash:

  File "/home/kbg/.miniconda3/lib/python3.10/site-packages/yoyodyne/models/pointer_generator.py", line 370, in forward
    return predictions
UnboundLocalError: local variable 'predictions' referenced before assignment

The reason from this should be clear from the code: the branch that begins at line 345 doesn't define predictions.

--arch pointer_generator_lstm --source_encoder lstm --features_encoder linear works, though after hill climbing for a while both losses go nan (e.g., on our Polish data).
Same story as (3) with --arch pointer_generator_lstm --source_encoder transformer --features_encoder linear.

I am assigning this to @bonham79; I think the fix will be quite small.

Adam · Answer 1 · Tue Aug 15 2023 07:55:29 GMT+0800 (China Standard Time)

I am testing pointer generator for #80 and sneaking the predictions fix in there (unless it turns out to be more complicated). However, I will comment that it is very unintuitive that setting the features column, but no features encoder tells the model to simply ignore features.

We should raise an Exception that a specific features encoder needs to be set, if a features column is requested. Or else, we should set a default features encoder for each architecture.

Kyle Gorman · Answer 2 · Wed Aug 16 2023 04:44:04 GMT+0800 (China Standard Time)

I am testing pointer generator for #80 and sneaking the predictions fix in there (unless it turns out to be more complicated).

+1.

However, I will comment that it is very unintuitive that setting the features column, but no features encoder tells the model to simply ignore features.

Yeah, got a different way to do that?

We should raise an Exception that a specific features encoder needs to be set, if a features column is requested. Or else, we should set a default features encoder for each architecture.

I think that is the current state of affairs: each model handles its own features and has a default behavior, but you can specify a different one and it might work. (I submit it should work for all of them.)

Adam · Answer 3 · Wed Aug 16 2023 05:12:07 GMT+0800 (China Standard Time)

I am testing pointer generator for #80 and sneaking the predictions fix in there (unless it turns out to be more complicated).

+1.

However, I will comment that it is very unintuitive that setting the features column, but no features encoder tells the model to simply ignore features.

Yeah, got a different way to do that?

Just the below suggestion in the original message.

We should raise an Exception that a specific features encoder needs to be set, if a features column is requested. Or else, we should set a default features encoder for each architecture.

I think that is the current state of affairs: each model handles its own features and has a default behavior, but you can specify a different one and it might work. (I submit it should work for all of them.)

I did not look closely, but it seems that this is only the default behavior due essentially to chance: most of our models encode features as extra symbols in the input sequence. For models that do not do this (e.g. pointer generator), the default (AFAIK) is to ignore the features. It should probably raise an exception if a features-col is provided.

Adam · Answer 4 · Wed Aug 16 2023 05:14:33 GMT+0800 (China Standard Time)

Actually while I am here I will remark that:

--arch pointer_generator_lstm --source_encoder lstm --features_encoder linear works, though after hill climbing for a while both losses go nan (e.g., on our Polish data).

Same story as (3) with --arch pointer_generator_lstm --source_encoder transformer --features_encoder linear.

sound concerning and are worth investigation as well...

Kyle Gorman · Answer 5 · Wed Aug 16 2023 06:56:07 GMT+0800 (China Standard Time)

I did not look closely, but it seems that this is only the default behavior due essentially to chance: most of our models encode features as extra symbols in the input sequence. For models that do not do this (e.g. pointer generator), the default (AFAIK) is to ignore the features. It should probably raise an exception if a features-col is provided.

Okay, so, to summarize (tell me if I've got it wrong). If a non-zero feature column is specified:

--arch attentive_lstm just works: it concatenates the features
--arch transducer raises an exception because no feature encoder is provided
--arch attentive_lstm --source_encoder lstm --features_encoder lstm has separate source and feature encoders (even though it would normally have concatenated)
--arch transducer --source_encoder lstm --feature_encoder lstm has separate source and feature encoders

and so on?

tbartley94 · Answer 6 · Wed Aug 16 2023 21:29:48 GMT+0800 (China Standard Time)

Hmm, think I got some wires crossed when updating features. For the moment just adding the flag seems the best call. Though in the long term I think I'm just going to make it that the default is a concat operation. (That way the behavior is the same throughout when you pass the flags.)

Kyle Gorman · Answer 7 · Wed Aug 16 2023 21:38:57 GMT+0800 (China Standard Time)

Concat can’t be the default for pointer generator or transducer though…doesn’t make sense…so we need a default feature encoder for those two, or, and I prefer this, an informative exception.

tbartley94 · Answer 8 · Wed Aug 16 2023 21:42:25 GMT+0800 (China Standard Time)

Sorry what I meant was, concat is the default feature encoder for other models and PT and Transducer have a separate default.

Kyle Gorman · Answer 9 · Thu Aug 17 2023 06:31:32 GMT+0800 (China Standard Time)

Since it's not obvious to me what the defaults should be for the non-concatenating models, we should at least consider just throwing an exception instead. E.g., what should the default feature encoder be here?

yoyodyne-train --features_col 3 --arch pointer_generator_lstm --source_encoder transformer ...

Adam · Answer 10 · Thu Aug 17 2023 06:52:21 GMT+0800 (China Standard Time)

It sounds like we need to test exhaustively to know for sure what the default of every case is. What I did observe was, after fixing the indentation error for the PointerGeneratorLSTM so that using it with a feature encoder works:

yoyodyne-train --features_col 3 --arch pointer_generator_lstm completely ignores the features
yoyodyne-train --features_col 3 --arch pointer_generator_lstm --source_encoder lstm ... Seems to work

IMO 1) is confusing behavior, and if the features are explicitly requested, we should either raise an Exception if we do not know what to do with them, or encode them by default (presumably the the same encoder of the requested architecture).
yoyodyne-train --features_col 3 --arch attentive_lstm seemed to concatenate the features to the source tokens, which seems like reasonable default behavior.

While we are on this topic, iirc --features_col 0 is the default behavior, and it could potentially also be worth raising a warning if there are unused columns in the data file, if we do not already, but this is not a concern for this issue.

Kyle Gorman · Answer 11 · Thu Aug 17 2023 06:59:04 GMT+0800 (China Standard Time)

yoyodyne-train --features_col 3 --arch pointer_generator_lstm completely ignores the features

IMO 1) is confusing behavior, and if the features are explicitly requested, we should either raise an Exception if we do not know what to do with them, or encode them by default (presumably the the same encoder of the requested architecture).

Yep. I think I'd prefer an exception.

yoyodyne-train --features_col 3 --arch attentive_lstm seemed to concatenate the features to the source tokens, which seems like reasonable default behavior.

OK.

While we are on this topic, iirc --features_col 0 is the default behavior, and it could potentially also be worth raising a warning if there are unused columns in the data file, if we do not already, but this is not a concern for this issue.

I am lukewarm on this because the implementation sounds weirdly complex to get right, but I see the point.

Adam · Answer 12 · Fri Aug 18 2023 03:37:14 GMT+0800 (China Standard Time)

More notes---I am struggling through this stuff a bit as I try to implement the transformer version of this and my main bottleneck right now is figuring out how to modify general controller code for getting the features.

In the train script, we only consider features as separate for specific architectures.
https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L107
https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L161

I believe that this means no other architecture can ever make use of a separate feature encoder even if it is requested.

Furthermore, it is pretty buried in the code that you need to hardcode archtectures at all in order to get a separate features encoding for a new model you are implementing. Digging through the giant stack of errors that are thrown when trying to implement a model to use separate features is basically no help, and before coming across this, the encoder_cls seemed to just sort of magically be not getting set haha.

I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern.

Kyle Gorman · Answer 13 · Fri Aug 18 2023 04:07:42 GMT+0800 (China Standard Time)

More notes---I am struggling through this stuff a bit as I try to implement the transformer version of this and my main bottleneck right now is figuring out how to modify general controller code for getting the features.

In the train script, we only consider features as separate for specific architectures. https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L107 https://github.com/CUNY-CL/yoyodyne/blob/master/yoyodyne/train.py#L161

I believe that this means no other architecture can ever make use of a separate feature encoder even if it is requested.

Following so far.

I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern.

Yes, I agree.

Travis Bartley · Answer 14 · Tue Aug 22 2023 09:24:40 GMT+0800 (China Standard Time)

I think this is all solved if we implement a more elegant solution to inferring what to do with feature configurations from the command line. I will note that even if that is solved, adjusting the code to pass features conditionally through the abstraction in our models is kind of a headache, but I don't know an easy solution to that given our current design pattern.

I think this works well for an initial stopgap. A later PR will make it that all architectures can use a feature_encoder, hard coding assignment of which architectures get feature encoders is just limiting the amount of testing cases we need atm.

Kyle Gorman · Answer 15 · Thu Aug 24 2023 01:49:32 GMT+0800 (China Standard Time)

Status update: I think that (1) is still true; (2-4) are no longer crashes but maybe someone will see a nan from them.

Kyle Gorman · Answer 16 · Tue Apr 09 2024 00:21:16 GMT+0800 (China Standard Time)

Is this still open after #172? @michaelpginn @Adamits

Adam · Answer 17 · Tue Apr 09 2024 00:35:56 GMT+0800 (China Standard Time)

@kylebgorman Yes, #172 just makes pg-transformer work with the predict.py script. I can take this issue over, and try to fix the interface so things are more intuitive. I will try to write a proposal by Friday.

I also do not see anything explaining why you were getting nan losses with pg-lstm. Do you recall if we ever worked on that? I can try to reproduce when I come back to this.

Kyle Gorman · Answer 18 · Tue Apr 09 2024 00:40:35 GMT+0800 (China Standard Time)

I also do not see anything explaining why you were getting nan losses with pg-lstm. Do you recall if we ever worked on that? I can try to reproduce when I come back to this.

Let's both try to replicate. I was doing --arch pointer_generator_lstm --source_encoder lstm --features_encoder linear on our Polish abstractness data.

Kyle Gorman · Answer 19 · Tue Apr 09 2024 10:09:22 GMT+0800 (China Standard Time)

I also do not see anything explaining why you were getting nan losses with pg-lstm. Do you recall if we ever worked on that? I can try to reproduce when I come back to this.

Let's both try to replicate. I was doing --arch pointer_generator_lstm --source_encoder lstm --features_encoder linear on our Polish abstractness data.

Was not able to replicate. (I got a checkpoint with accuracy .922.) You?

Adam · Answer 20 · Wed Apr 10 2024 00:20:16 GMT+0800 (China Standard Time)

I ran this:

#!/bin/bash

set -eou pipefail

readonly DATA=../../abstractness/data
readonly ARCH=pointer_generator_lstm
# TODO: Set according to language.
readonly LANGUAGE=pol
readonly MODEL_DIR="../models/${ARCH}"

readonly TRAIN="${DATA}/${LANGUAGE}/${LANGUAGE}_train.tsv"
cat "${DATA}/${LANGUAGE}/${LANGUAGE}_"[0-7]".tsv" > "${TRAIN}"
trap "rm -f ${TRAIN}" EXIT
readonly VAL="${DATA}/${LANGUAGE}/${LANGUAGE}_8.tsv"


yoyodyne-train \
    --experiment "${LANGUAGE}" \
    --train "${TRAIN}" \
    --val "${VAL}" \
    --model_dir "${MODEL_DIR}" \
    --features_col 3 \
    --arch "${ARCH}" \
    --batch_size 400 \
    --hidden_size 512 \
    --embedding_size 256 \
    --source_encoder transformer \
    --features_encoder linear \
    --decoder_layers 2 \
    --encoder_layers 2 \
    --dropout .3 \
    --beta2 .98 \
    --max_epochs 20 \
    --check_val_every_n_epoch 2 \
    --seed 49

and got
val loss: 0.12661617994308472
val accuracy: 0.8870481848716736
train loss: 0.04390707239508629

Kyle Gorman · Answer 21 · Wed Apr 10 2024 00:38:32 GMT+0800 (China Standard Time)

Yeah, so (3-4) are solved sounds like.

Kyle Gorman · Answer 22 · Wed Apr 10 2024 10:22:36 GMT+0800 (China Standard Time)

I think this is all taken care of so I'm going to close. We can reopen if it arises again.

Adam · Answer 23 · Fri Apr 12 2024 02:25:47 GMT+0800 (China Standard Time)

I was thinking of the needed fixes to the modules interface was related to this issue. Maybe I can open a new issue specific to updating better defaults and raising errors. I probably will not get to that until tomorrow.

Kyle Gorman · Answer 24 · Fri Apr 12 2024 02:34:32 GMT+0800 (China Standard Time)

I was thinking of the needed fixes to the modules interface was related to this issue. Maybe I can open a new issue specific to updating better defaults and raising errors. I probably will not get to that until tomorrow.

Are you thinking like, dealing with preconditions that models have? My thought re: that, was that models (not modules) should have a callback interface which is provided with the index/indexes, and can either do nothing or run a test. In the pointer-generator case it should test that the source/features share symbols with the target (and this in turn needs to be independent of whether we are sharing embeddings or not). IDK what else it would do.

Adam · Answer 25 · Fri Apr 12 2024 02:42:51 GMT+0800 (China Standard Time)

Are you thinking like, dealing with preconditions that models have? My thought re: that, was that models (not modules) should have a callback interface which is provided with the index/indexes, and can either do nothing or run a test. In the pointer-generator case it should test that the source/features share symbols with the target (and this in turn needs to be independent of whether we are sharing embeddings or not). IDK what else it would do.

This sounds like a nice way of doing it. I think your general concept that these types of checks should be defined on the model (rather than any of our yoyodyne data modules, or train.py, or modules) is probably the best design pattern here.
I want to run checks on the requested architecture though. E.g. if we have requested features but we a) are not concatenating them to the input string and b) have no features encoder, this log a warning. Or alternatively, we should default ptr-gen models to have a feature encoder. I think there were some other things I thought of in the past that I need to brainstorm again.

Kyle Gorman · Answer 26 · Fri Apr 12 2024 03:04:32 GMT+0800 (China Standard Time)

Are you thinking like, dealing with preconditions that models have? My thought re: that, was that models (not modules) should have a callback interface which is provided with the index/indexes, and can either do nothing or run a test. In the pointer-generator case it should test that the source/features share symbols with the target (and this in turn needs to be independent of whether we are sharing embeddings or not). IDK what else it would do.

This sounds like a nice way of doing it. I think your general concept that these types of checks should be defined on the model (rather than any of our yoyodyne data modules, or train.py, or modules) is probably the best design pattern here.

My thought exactly. Stuff in train.py is going to be hard to incorporate if you're using this like a library. And right now, it really is not set up to be used like a library (that's what I'm complaining about in #151 for instance).

The more I think about it, the set of checks is not open, nor need they be Turing-complete, so the interface might be like, each model registers zero or more of a fixed set of checks, and then we just run through that list and call them.

In fact there may only be one, or a very small nubmer of such checks we need right now, in which case the YAGNI solution is to have each model define a zero-place boolean method check_vocabulary or something (terrible name, please fix) which returns True iff you need to do vocabulary check described above. And then these methods are called when the superclass initialization is run.

I want to run checks on the requested architecture though. E.g. if we have requested features but we a) are not concatenating them to the input string and b) have no features encoder, this log a warning. Or alternatively, we should default ptr-gen models to have a feature encoder. I think there were some other things I thought of in the past that I need to brainstorm again.

If the former strategy why not make it an error?