MAIF / melusine

📧 Melusine: Use python to automatize your email processing workflow

Home Page:https://maif.github.io/melusine

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Issue with attachment type metadata

Maxime-POULAIN-Verlingue opened this issue · comments

Hey !

We had a problem with the attachment type in metadata. As you can see in the screenshot below, we had only two values after applying our Metadata pipeline. 0 for the presence of an attachment file and 1 if there is no attachment file in the mail. The screenshot is an extract of the DataFrame call df_email.

dfemails_error_metadata

Here is the way we create our pipeline and how we apply it on our emails :

Metadatapipeline = Pipeline([('MetaExtension', MetaExtension()),
('MetaDate', MetaDate()),
('MetaAttachmentType', MetaAttachmentType()),
('Dummifier', Dummifier(columns_to_dummify = ['extension', 'attachment_type', 'dayofweek','hour', 'min']))])
df_meta = Metadatapipeline.fit_transform(df_emails)

Then, this is the function which is supposed to extract the type of the attachment file in melusine/prepare_email/metadata_engineering.py:
image
image

We added some prints to understand what is the problem. As you can see, when there is at least one attachment file in the mail, the type of x is str, and when there is no attachment file the value of x is nan.
When the function has to deal with a mail with an attachment file, the value of the row["attachment"] is a str. For example, we could have "['image002.png', 'image003.jpg']". Then, the for loop will just take it as a str and deal with the char one by one. This seems to be the reason of our issue.

To fix this problem, we did :
image

This seems to solve our issue :
image
image

Python version : 3.8.12

Melusine version : 2.3.1

Operating System : Windows

Hi Maxime,

We are currently working on refactoring Melusine and this might be too early to integrate.
We keep your suggestion in mind but will put it on hold at the moment.

Best regards

Hey !

I downloaded the last version of Melusine (2.3.4) and It seems I don't have this issue anymore with the new version.
I close this issue.

Best regards