has2k1 / plotnine

A Grammar of Graphics for Python

Home Page:https://plotnine.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

dynamic labels i.e. labs(fill=lambda x: x['example_col'].iloc[0])

Nephdz opened this issue · comments

Is there a way to use a lambda function or something else to dynamically change the labels for all parts of the plot to be a value in the dataframe? I'm thinking something like this:

(
    df
        .loc[lambda x: x['filter_col']==999]
        .pipe(ggplot)
            + geom_map(aes(fill='bins'))
            + scale_fill_manual(labels=lambda y: [np.log(x) for x in y],values=[to_hex(c) for c in colors])
            + labs(fill=lambda x: f'legend label string: {x['example_col'].iloc[0]}',title=lambda x: '{} | {} | {}'.format(x['A'].iloc[0],x['B'].iloc[0],"C"))
)

I'm no expert in R but I think ggplot lets you do something where you create an entire column of strings of f'legend label string: {x['example_col'].iloc[0]}' and call it fill_column then do + labs(fill=.$fill_column )

I am not sure of what you are trying to do, but the values to any parameter in the labs() call has to be a string. It does not make sense to have anything else, I'm don't know about R.

I'm trying to replicate the process of method chaining from dplyr/tidyverse. WIthout doing any method chaining it would look something like this:

df = df.loc[lambda x: x['filter_col']==999]
legend_label = f'legend label string: {df['example_col'].iloc[0]}'
title_label = '{} | {} | {}'.format(x['A'].iloc[0],x['B'].iloc[0],"C")
(
  ggplot(df)
    + geom_map(aes(fill='bins'))
    + scale_fill_manual(labels=lambda y: [np.log(x) for x in y],values=[to_hex(c) for c in colors])
    + labs(fill=legend_label,title=title_label)
)

but this is much less elegant than the first example. In R, you can access the underlying data that was fed into ggplot using .$. Ideally, I could do something like this:

(
    df
        .loc[lambda x: x['filter_col']==999]
        .assign(
            legend_title=lambda x: f'legend label string: {x['example_col'].iloc[0]}',
            title=lambda x: '{} | {} | {}'.format(x['A'].iloc[0],x['B'].iloc[0],"C")
        )
        .pipe(ggplot)
            + geom_map(aes(fill='bins'))
            + scale_fill_manual(labels=lambda y: [np.log(x) for x in y],values=[to_hex(c) for c in colors])
            + labs(fill='legend_title',title='title')
)

and plotnine would be able to see that 'legend_title and title are references to columns in the underlying dataframe and use the unique values. If you think that this is impossible, then I can probably use seaborn for this type of customizability and plotnine for anything else. Plotnine is definitely preferred, it's a great plotting library.

Okay I get it. It is not possible.

Using my libraries dppd/dppd_plotnine you can probably achieve something similar.

It's not gonna be a literal R translation though.

I think it would read like this

(
import dppd, dppd_plotnine
dp, X = dppd.dppd()

dp(df)
        .filter_by(X['filter_col'] == 999)
        .p9()
        .add_map(x='x', y='y', fill='bins') # aes = name, non_aes = _name
        # I think labs is not wrapped in dppd_plotnine
        # (would be trivial to add though).
        # but putting the name on the scale will work.
        .scale_fill_manual(
              #I'm just assuming that y & colors are from the df
              # after .p9(), the dataframe is in X.data
              labels=lambda y: [np.log(x) for x in X.data['y']],
              values=[to_hex(c) for c in X.data['colors'],
              name=f'legend label string: X.data['example_col'].iloc[0]}',)
             )
       .title( '{} | {} | {}'.format(X.data['A'].iloc[0],X.data['B'].iloc[0],"C"))
).pd

(also using a full column of strings to pick a single one for the legend title is some madness ;) ).