rubik

A set of very useful tools for data wrangling and data processing that could be used with the Python library Pandas. This set of tools allows the user to give to a Pandas DataFrame any kind of complex structure, being able to arrange columns and rows as if they were part of a Rubik's cube.

Share rubik with all your panda friends!

List of Content

Test and use rubik

Install rubik

To install rubik from the terminal, first create a virtual environment venv, then use the pip install command:

python3 -m venv venv
source venv/bin/activate

pip install git+https://github.com/josemariasosa/rubik

Check version

To make sure the installation was correct, check Rubik's version using the following command from the terminal:

python -c 'import rubik; print(rubik.__version__)'
# 2.1.0

Use rubik in your scripts

Import the module using the rk alias for rubik.

import rubik as rk
import pandas as pd

Available Functions:

This is a list of the functions with very simple examples for it use.

0. The header

import pandas as pd
from operator import itemgetter
pd.set_option('display.min_rows', 30)
pd.set_option('display.max_rows', 60)
pd.set_option('display.max_columns', 20)

After the Pandas Version 0.23, the used must explicitly specify the number of columns that will be printed in the standard output. When Pandas library is loaded the set_option method set the default to 20.

1. The `rk.fillna_list` Function

rk.fillna_list(data_frame, column_name)

From any column in a DataFrame, replace the NaN values with empty lists.

1.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

1.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	NaN
1	user-452	[1]
2	user-21	[5, 2]
3	user-621	NaN
4	user-5512	[3, 4]
5	user-25	[1, 2, 3]

The new table is:

Entry	Id	Roles
0	user-123	[ ]
1	user-452	[1]
2	user-21	[5, 2]
3	user-621	[ ]
4	user-5512	[3, 4]
5	user-25	[1, 2, 3]

The code is:

new = rk.fillna_list(original, 'Roles')

2. The `rk.concat_to_list` Function

rk.concat_to_list(data_frame, column_list, column_new_name)

Concatenate multiple columns of a data frame into a single list.

2.1 Arguments:

data_frame      - The DataFrame we are going to work with.

column_list     - A List with the column names we are going to work with.

column_new_name - A String with the column name we are going to create.

2.2 Example:

The original table is:

Entry	Id	Role 1	Role 2
0	user-123	1	2
1	user-452	1	3
2	user-21	5	2
3	user-621	3	1
4	user-5512	3	4
5	user-25	1	3

The new table is:

Entry	Id	Roles
0	user-123	[1, 2]
1	user-452	[1, 3]
2	user-21	[5, 2]
3	user-621	[3, 1]
4	user-5512	[3, 4]
5	user-25	[1, 3]

The code is:

new = rk.concat_to_list(original, ['Role 1', 'Role 2'], 'Roles')

3. The `rk.ungroup_list` Function

rk.ungroup_list(data_frame, column_name)

This function unnest a 'Series of Lists' in a Pandas data frame.

⚡️ Note that the number of rows for the result may increase.

3.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

3.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	[1, 2]
1	user-452	[5, 7]
2	user-21	[3]

The new table is:

Entry	Id	Roles
0	user-123	1
0	user-123	2
1	user-452	5
1	user-452	7
2	user-21	3

The code is:

new = rk.ungroup_list(original, 'Roles')

4. The `rk.ungroup_dict` Function

rk.ungroup_dict(data_frame, column_name, prefix=False)

This function flatten a data frame with dictionaries in a column.

4.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

prefix - Use the prefix argument as follow:
    - False: default, regular behavior, column names are the dict keys.
    - True: Use as prefix the original column name followed by an underscore.
    - String: The user can give any prefix.

4.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	{"main": 1, "secondary": 2}
1	user-452	{"main": 3, "secondary": 1}
2	user-21	{"main": 7}
3	user-621	{"main": 2, "secondary": 6}
4	user-5512	{"main": 7, "secondary": 5}
5	user-25	{"main": 3}

The new table is:

Entry	Id	main	secondary
0	user-123	1	2
1	user-452	3	1
2	user-21	7	NaN
3	user-621	2	6
4	user-5512	7	5
5	user-25	3	NaN

The code is:

new = rk.ungroup_dict(original, 'Roles')

5. The `rk.list_to_columns` Function

rk.list_to_columns(data_frame, column_name)

This function creates multiple columns from a single column with lists.

5.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

5.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	[1, 2]
1	user-452	[1, 3]
2	user-21	[5, 2]
3	user-621	[3, 1]
4	user-5512	[3, 4]
5	user-25	[1, 3]

The new table is:

Entry	Id	Roles_1	Roles_2
0	user-123	1	2
1	user-452	1	3
2	user-21	5	2
3	user-621	3	1
4	user-5512	3	4
5	user-25	1	3

The code is:

new = rk.list_to_columns(original, 'Roles')

6. The `rk.groupto_list` Function

rk.groupto_list(data_frame, column_list, column_name)

Group a variable (column_name) in to a single list in regards of agroup of variables (column_list).

6.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with as pivot columns.

column_name - A String with the column name we are going to modify.

6.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	1
0	user-123	2
1	user-452	5
1	user-452	7
2	user-21	3

The new table is:

Entry	Id	Roles
0	user-123	[1, 2]
1	user-452	[5, 7]
2	user-21	[3]

The code is:

new = rk.groupto_list(original, ['Entry', 'Id'], 'Roles')

7. The `rk.groupto_tuple` Function

rk.groupto_tuple(data_frame, column_list, column_name)

Group a variable (column_name) into a tuple in regards of a group of variables (column_list).

7.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with as pivot columns.

column_name - A String with the column name we are going to modify.

7.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	1
0	user-123	2
1	user-452	5
1	user-452	7
2	user-21	3

The new table is:

Entry	Id	Roles
0	user-123	(1, 2)
1	user-452	(5, 7)
2	user-21	(3, )

The code is:

new = rk.groupto_tuple(original, ['Entry', 'Id'], 'Roles')

8. The `rk.groupto_sorted_tuple` Function

rk.groupto_sorted_tuple(data_frame, column_list, column_name, n=0)

Group a variable (column_name) in to a single tuple in regards of a group of variables (column_list). Sort a list of tuples by the first, second, or n-1 element.

8.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with.

column_name - A String with the column name we are going to modify.

n           - An integer with the index of the sorting value (default n = 0).

8.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	2
0	user-123	1
1	user-452	7
1	user-452	5
2	user-21	3

The new table is:

Entry	Id	Roles
0	user-123	(1, 2)
1	user-452	(5, 7)
2	user-21	(3, )

The code is:

new = rk.groupto_sorted_tuple(original, ['Entry', 'Id'], 'Roles')

9. The `rk.groupto_dict` Function

rk.groupto_dict(data_frame, column_list, column_new_name)

Generate new column with dictionaries having values of othe columns.

9.1 Arguments:

data_frame      - The DataFrame we are going to work with.

column_list     - A List with the column names we are going to work with.

column_new_name - A String with the column name we are going to create.

9.2 Example:

The original table is:

Entry	Id	main	secondary
0	user-123	1	2
1	user-452	3	1
2	user-21	7	3
3	user-621	2	6
4	user-5512	7	5
5	user-25	3	3

The new table is:

Entry	Id	Roles
0	user-123	{"main": 1, "secondary": 2}
1	user-452	{"main": 3, "secondary": 1}
2	user-21	{"main": 7, "secondary": 3}
3	user-621	{"main": 2, "secondary": 6}
4	user-5512	{"main": 7, "secondary": 5}
5	user-25	{"main": 3, "secondary": 3}

The code is:

new = rk.groupto_dict(original, ['main', 'secondary'], 'Roles')

10. The `rk.groupto_set` Function

rk.groupto_set(data_frame, column_list, column_name)

Group a variable (column_name) in to a single set in regards of a group of variables (column_list).

The returned object type is List, not an actual set.

10.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with.

column_name - A String with the column name we are going to modify.

10.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	2
0	user-123	1
1	user-452	7
1	user-452	7
2	user-21	3

The new table is:

Entry	Id	Roles
0	user-123	[2, 1]
1	user-452	[7]
2	user-21	[3]

The code is:

new = rk.groupto_set(original, ['Entry', 'Id'], 'Roles')

11. The `rk.groupto_sorted_set` Function

rk.groupto_sorted_set(data_frame, column_list, column_name)

Group a variable (column_name) into a sorted set in regards of a group of variables (column_list).

The returned object type is List, not an actual set.

11.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_list - A List with the column names we are going to work with.

column_name - A String with the column name we are going to modify.

11.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	2
0	user-123	1
1	user-452	7
1	user-452	7
2	user-21	3

The new table is:

Entry	Id	Roles
0	user-123	[1, 2]
1	user-452	[7]
2	user-21	[3]

The code is:

new = rk.groupto_sorted_set(original, ['Entry', 'Id'], 'Roles')

12. The `rk.extend_column` Function

rk.extend_column(data_frame, col_name_1, col_name_2, col_new_name)

Expand 2 Pandas Series with every element being lists into a single column with lists.

12.1 Arguments:

data_frame  - The DataFrame we are going to work with.

col_name_1 - A String with the column name we are going to modify.

col_name_2 - A String with the column name we are going to modify.

col_new_name - A String with the column name we are going to create.

12.2 Example:

The original table is:

Entry	Id	Roles1	Roles2
0	user-123	[1, 2]	[ ]
1	user-452	[3, 1]	[2]
2	user-21	[7]	[5, 4]
3	user-621	[2, 6]	[ ]
4	user-5512	[7, 5]	[1]
5	user-25	[3]	[4, 5]

The new table is:

Entry	Id	Roles
0	user-123	[1, 2]
1	user-452	[3, 1, 2]
2	user-21	[7, 5, 4]
3	user-621	[2, 6]
4	user-5512	[7, 5, 1]
5	user-25	[3, 4, 5]

The code is:

new = rk.extend_column(original, 'Roles1', 'Roles2', 'Roles')

13. The `rk.table` Function

rk.table(_list)

This function works like table() in R. It returns a data frame with the frequency of the elements in a given list.

The response is a Pandas DataFrame.

13.1 Arguments:

_list  - The List we are going to work with.

13.2 Example:

The original list is:

[100, 103, 555, 102, 100, 100, 100, 102, 103, 103]

The new table is:

value	freq
100	4
103	3
102	2
555	1

The code is:

original = [100, 103, 555, 102, 100, 100, 100, 102, 103, 103]

new = rk.table(original)

14. The `rk.flat_list` Function

rk.flat_list(_list)

Flatten a list with nested lists.

14.1 Arguments:

_list  - The List we are going to work with.

14.2 Example:

The original list is:

[[100,[103, [555]]], 102]

The new list is:

[100, 103, 555, 102]

The code is:

original = [[100,[103, [555]]], 102]

new = rk.flat_list(original)

print(new)
# [100, 103, 555, 102]

15. The `rk.chunkify` Function

rk.chunkify(chunk_this_list, chunk_size)

Create smaller chunks in the same list.

15.1 Arguments:

chunk_this_list  - The List we are going to work with.

chunk_size       - An integer with the number of elements in a chunk.

15.2 Example:

The original list is:

[100, 103, 555, 102, 100, 100, 100, 102, 103]

The new list is:

[[100, 103], [555, 102], [100, 100], [100, 102], [103]]

The code is:

original = [100, 103, 555, 102, 100, 100, 100, 102, 103, 103]

new = rk.chunkify(original, 2)

print(new)
# [[100, 103], [555, 102], [100, 100], [100, 102], [103]]

16. The `rk.fillna_dict` Function

rk.fillna_dict(data_frame, column_name)

From any column in a DataFrame, replace the NaN values with empty dictionaries.

16.1 Arguments:

data_frame  - The DataFrame we are going to work with.

column_name - A String with the column name we are going to modify.

16.2 Example:

The original table is:

Entry	Id	Roles
0	user-123	NaN
1	user-452	NaN
2	user-21	{'r': 1}
3	user-621	NaN
4	user-5512	{'r': 2}
5	user-25	NaN

The new table is:

Entry	Id	Roles
0	user-123	{ }
1	user-452	{ }
2	user-21	{'r': 1}
3	user-621	{ }
4	user-5512	{'r': 2}
5	user-25	{ }

The code is:

new = rk.fillna_dict(original, 'Roles')

Get the code of the last version here.

Versions:

version - 2.2.3 'Pareciera ser todo más oscuro acá abajo.'
1. flat_list is more compatible with Pandas.
2. I removed the versions funny names from the code.
version - 2.2.2 'My guitar is not too loud!'
1. Fixing edge case for the flat_list function.
version - 2.2.1 'Never stop until the cube is done.'
1. Fixing edge case for the ungroup_dict function using math. https://docs.python.org/3/library/math.html#math.isnan
2. New function. fillna_dict.
version - 2.2 'Pandemic leisure.'
1. Updating function. For ungroup_dict, the user may use a prefix for the new columns that will be created.
version - 2.1 'This is the end of a decade.'
1. (deleted) New function. Expand a column with a list, into multiple columns.
2. Updating function. chunkify receives now a list or a DataFrame.
version - 2.0. 'PyCon Latam 2019 - Puerto Vallarta.'
1. New function names. Again! In compliance with PEP8.
2. Create the rubik Package for git.
3. pip install git+https://github.com/josemariasosa/rubik
version - 1.3.2. 'New job. New opportunities.'
1. Displaying a DataFrame in the standard output in a pretty way.
  - Once the display.max_rows is exceeded, the display.min_rows options determines how many rows are shown in the truncated repr.
version - 1.3.1. 'Just a little bit higher. Not too much.'
1. Standardizing names and the format.
version - 1.3. 'I should not be high in classes.'
1. Improvements in the flatDict function.
  - Avoid crashing names with the dictionary keys.
2. Adding the chunkify function.
version - 0. 'I am not the original one, but I'm old, thought.'

rubik

List of Content

Test and use rubik

Install rubik

Check version

Use rubik in your scripts

Available Functions:

0. The header

1. The rk.fillna_list Function

1.1 Arguments:

1.2 Example:

2. The rk.concat_to_list Function

2.1 Arguments:

2.2 Example:

3. The rk.ungroup_list Function

3.1 Arguments:

3.2 Example:

4. The rk.ungroup_dict Function

4.1 Arguments:

4.2 Example:

5. The rk.list_to_columns Function

5.1 Arguments:

5.2 Example:

6. The rk.groupto_list Function

6.1 Arguments:

6.2 Example:

7. The rk.groupto_tuple Function

7.1 Arguments:

7.2 Example:

8. The rk.groupto_sorted_tuple Function

8.1 Arguments:

8.2 Example:

9. The rk.groupto_dict Function

9.1 Arguments:

9.2 Example:

10. The rk.groupto_set Function

10.1 Arguments:

10.2 Example:

11. The rk.groupto_sorted_set Function

11.1 Arguments:

11.2 Example:

12. The rk.extend_column Function

12.1 Arguments:

12.2 Example:

13. The rk.table Function

13.1 Arguments:

13.2 Example:

14. The rk.flat_list Function

14.1 Arguments:

14.2 Example:

15. The rk.chunkify Function

15.1 Arguments:

15.2 Example:

16. The rk.fillna_dict Function

16.1 Arguments:

16.2 Example:

Versions:

About

Languages

1. The `rk.fillna_list` Function

2. The `rk.concat_to_list` Function

3. The `rk.ungroup_list` Function

4. The `rk.ungroup_dict` Function

5. The `rk.list_to_columns` Function

6. The `rk.groupto_list` Function

7. The `rk.groupto_tuple` Function

8. The `rk.groupto_sorted_tuple` Function

9. The `rk.groupto_dict` Function

10. The `rk.groupto_set` Function

11. The `rk.groupto_sorted_set` Function

12. The `rk.extend_column` Function

13. The `rk.table` Function

14. The `rk.flat_list` Function

15. The `rk.chunkify` Function

16. The `rk.fillna_dict` Function