kirudang / R_vs_Python

Compare R and Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

This shows several difference in syntax between R and Python.

Critical differences

This section highlights differences in Python and R that could result in inadvertent errors if the wrong convention is used (i.e. code may still run but would produce wrong result).

Topic R Python
General purposes R was developed specifically for statistical computing and data analysis. Python was developed as a general-purpose programming language.
Boolean

TRUE or T

FALSE or F

True

False

Array indexing Starts at 1 Starts at 0
Indentation Has no impact on code – is purely cosmetic Has a specific meaning in the code. Reducing the indentation level indicates the end of a block of code.
Length of a string

nchar(x)

Do not use length(x)

len(x)
Return statements in functions If no return statement is specified, will return the last calculation done within the function Return statement must be specified if we want the function to return an output; otherwise it will return “None”
Interpretation of “=” The “=” sign will create an independent copy of the object. For example, if we do data_2 = data_1, and perform some manipulations on data_2, then data_1 will be unchanged.

The “=” sign will create a new pointer to the original object, which will not behave independently. For example, if we do data_2 = data_1, and perform some manipulations on data_2, the same operations will be applied to data_1.

To make an independent copy of a dataset, use data_2 = data_1.copy() instead.

Structural differences

This section highlights differences in Python and R that represent significant differences in the way the code is structured, but which are unlikely to cause non-obvious errors (i.e. if the wrong approach is used then the code would not run).

Topic R Python
Code blocks

Are encased in braces { and }

example = function(x){

some code

some more code

return(something)

}

Begins with a line ending with a colon. On the next line, the indentation level increases by 1. The code block ends when the indentation level returns back to where it was at the start of the code block.

def example(x):

some code

some more code

return somethig

more code that is not part of the function definition

Common ways to create unlabelled sequences of objects

In R, these are called vectors. Use the “c” command to create one, e.g. c(1, 2, 3)

c here stands for combination.

Elements in an R vector must all be of the same type.

In Python, this is called a list. Use square brackets with elements separated by commas, e.g. [1, 2, 3]

Elements in a Python list can be of mixed type.

Can also create a tuple using round parentheses, but these cannot be changed after being created. Example: (1, 2, 3)

Common ways to create labelled sequences of objects

In R, this is called a list. Use the “list” command to create one, separating key-value pairs with an equal sign, e.g. l = list('a' = 1, 'b' = 2, 'c' = 3)

Access elements with the $ symbol, e.g. l$a is 1

In Python, this is called a dictionary. Use braces to create on, separating the list of key-value pairs with commas, e.g. d = {'a':1, 'b':2, 'c':3}

Access elements using square brackets, e.g. d['a'] is 1

Applying a function across all elements of an array Use the lapply command Use the list comprehension syntax, e.g. [formula for x in list if condition]
Loop for(i in 1:10) {...} for i in range(10): ...
Conditional statement if(x > 3) {...} if x > 3: ...
Call a function function(data)

function(data)

data.function()

In Python, we have more ways to call a function, in which data oriented is a common way:

For example:

mean(data)

but we also call:

data.mean()

Access a column in a data frame

1.     Using the $ operator: data_frame$column

2.     Using square brackets []: data_frame[, "column"]

1.     Using square brackets []: data_frame["column"]

2.     Using the dot notation: data_frame.column (only works if the column name does not contain any spaces or special characters)

Minor Differences

These are differences in naming or notational conventions that don’t cause major changes in the structure of a code, but which might result in needing to change the name of a keyword or function. Items in this list will cause an obvious error (e.g. code won’t run) if the wrong convention is used.

Topic R Python
Concatenating strings

Use “cat”, paste() or “paste0”

cat("Hello,", "world!")

Note: paste0() is similar to paste(), but it does not add any separator between the strings, while we can regulate the separator in paste(). For example, we can paste("Line 1", "Line 2", sep = "\n") to break line.

Use “+”

“Hello, " + "World!"

Other options: format(), join()

concatenated_string = "{}{}".format(string1, string2)

concatenated_string = "_".join([string1, string2])

Displaying text Use “print” – this can only display a single string Use the print command. This can handle a sequence of strings / variables and will print them all out with a space between them.
Exponentiation Can use a ** b or a^b Use a ** b
Modular arithmetic Use a %% b Use a % b
Integer division, discarding remainder Use a %/% b Use a // b
Determine type of a variable Use typeof(x) Use type(x)
Change type of a variable General format of the function is “to.type()”. Example: to.integer(x) General format of the function is “type()”. Example: int(x)
Boolean variables Use all-caps, TRUE and FALSE Capitalize only first letter, True and False
Boolean operators

Use symbols &,

, !

For containment in a vector use %in%

Install package install.packages('name') pip install name
Importing additional functionality

These are called packages in R

Use library(package)

In R, when we access library, all functions of that library will be available.

These are called modules in Python

Use from package import module

from sklearn import metrics

Note: In Python, every import only does with a specific function from that library. So if you need to import all modules, you need to use below syntax:

from package import *

For example:

from pandas import *

Comment out Ctrl + Shift + C

·        Windows: CTRL + 3

·        Mac: CMD + 3

Create a function

Use function()

function_name <- function(arg1, arg2){

return()

}

Use def()

def function_name(arg1, arg2):

return()

Lambda functions

Do not have.

Still use using the function() keyword to create function.

Lambda functions are anonymous functions in Python, meaning they are functions without a name. They are used to perform a small task or calculation and are often used in combination with other functions like filter(), map() or reduce(). The syntax of a lambda function in Python is:

lambda arguments: expression

f = lambda x: x**2

print(f(5)) # 25

Condition ifelse() R offers this.

Not offer. Need to use normal syntax:

result = x if x > y else y

Call for help

Type in Console:

?function_name()

??function_name()→ To check the package that contains the function.

For example:

?mutate()

Help(function_name)

For example:

Help(len)

Note: Python also has dir(), a function used to return a list of valid attributes and methods of an object. For example, dir(list) returns a list of attributes and methods available for the built-in list type

Check available built-in functions ls("package:base") :This will return a character vector of all the functions in the base package or help(base) : see a list of all the functions in the base package, along with brief descriptions of each. dir(__builtins__)
Unequal != != or <>

Some differences in Data Manipulation

These are differences in fundamental data cleaning steps when working with data frame.

In R, R Base and dplyr are two main libraries for data frame manipulation, when in Python, they are Python Base and Pandas.

Topic R Python
Check structure str(df) df.info()
Data dimension dim(df)

df.shape

Note: No bracket here for shape

Variables of data frame colnames(df)

df.columns

Note: No bracket here for columns

Drop columns

Single:

df$column <- NULL

Multiples:

·        By index

df[ , -c(column_index_1, column_index_2)]

·        By name

df[ , !names(data_frame) %in% c("column_name1", "column_name2")]

Single:

del df["column_name"]

or We can use below code with 1 column name.

Multiples:

df.drop(columns=["column_1", "column_2"], inplace=True)

Check unique values unique(df$column_name)

df["column_name"].unique() df["column_name"].value_counts()

For value_counts() we can state an argument normalize = True to calculate the proportion of each element in the column.

Check duplicated observations summary(duplicated(df)) df.duplicated().sum()
Drop duplicated values

df[!duplicated(df), ]

or you can use dplyr as follow:

df %>% distinct()

df.drop_duplicates(inplace=True)
Check missing values

·        Single column is.na(df$column_name)

·        Multiple columns sapply(df, function(x) sum(is.na(x)))

df.isnull().sum()
Drop NA new_df <- na.omit(df) new_df = df.dropna()
Fill NA

We can use replace_na() function in tidyr, or na.fill() from the zoo library,

or we can use R base like this:

df[is.na(df$col), "col"] <- value

It’s easier to replace NA in Python. We just need to specify column and value then conduct this code:

df.fillna(value)

df.column_name.fillna(value)

About

Compare R and Python