eleluqrey / pandas_apply_lambda

Little exercise with endless posibilities...probably the most important challenge!!!

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Pandas Apply Lambda

The ultimate tool when leading with a Pandas dataframe!!!

Image


Using the apply function with lambda

The general syntax is:

df.apply(lambda x: func( x['col1'], x['col2']), axis=1 )

This will allow you to create pretty much any logic, I promise!!!

๐Ÿ“ Data

To perform the challenges you will use the dataset /data/input/IMDB-Movie-Data.csv.

๐Ÿผ Challenge 1. Using a single argument

We want to create bins of movies according to the number of votes they've received. For that matter, we will create a new column named 'bin' which will tag every movie as follow:

  • From 0 to 999 ==> 'cat_1'
  • From 1000 to 9999 ==> 'cat_2'
  • From 10000 to 99999 ==> 'cat_3'
  • From 100000 to 999999 ==> 'cat_4'
  • More than 1000000 ==> 'cat_5'

๐Ÿผ ๐Ÿผ Challenge 2. Using two arguments

We want to know how much is the revenue per minute for every movie.

๐Ÿผ ๐Ÿผ ๐Ÿผ Challenge 3. A bit more complicated

We want to create a new rating where we add 1 point if the genre is thriller but subtract 1 point if the genre is comedy.

๐Ÿผ ๐Ÿผ ๐Ÿผ ๐Ÿผ Challenge 4. A bit too weird...

We want to know whether the integer part of the number resulting from the sum of the ASCII value of every character of the movie title divided by the number of votes, is a prime number (remember that prime numbers are integers).

๐Ÿผ ๐Ÿผ ๐Ÿผ ๐Ÿผ ๐Ÿผ Challenge 5. And finally some fantasy

Feel free to propose your own ranking based in aggregations of at least 3 columns of the dataset.


๐Ÿผ ๐Ÿผ ๐Ÿผ ๐Ÿผ ๐Ÿผ ๐Ÿผ Bonus challenge. Freaky bonus

We want to know which movies might have hidden paterns in their description. A way to know that is finding those movies which the sum of all numeric values of the string description hash (SHA256) are between their revenue and their number of votes.


About

Little exercise with endless posibilities...probably the most important challenge!!!


Languages

Language:Jupyter Notebook 100.0%