hoat23 / DataAnalytics

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Data Analytics

Rewrite SQL queries in Python with Pandas

SELECT, DISTINCT, COUNT, LIMIT

SQL Python
SELECT name
FROM titanic_test_data
titanic_df["name"]
SELECT *
FROM titanic_test_data
LIMIT 5
titanic_df.head(5)
SELECT DISTINCT age
FROM titanic_test_data
titanic_df["age"].unique()
SELECT COUNT(DISTINCT age)
FROM titanic_test_data
len(titanic_df["age"].unique())

SELECT, WHERE, OR, AND, IN (SELECT with conditions)

SQL Python
SELECT *
FROM titanic_test_data
WHERE pclass = 1
titanic_df[titanic_df.pclass == 1]
SELECT *
FROM titanic_test_data
WHERE pclass = 1
OR pclass = 2
titanic_df[(titanic_df.pclass == 1) | 
(titanic_df.pclass == 2)]
SELECT *
FROM titanic_test_data
WHERE pclass IN (1,2)
titanic_df[titanic_df.pclass.isin([1,2])]
SELECT name
FROM titanic_test_data
WHERE pclass = 1 
AND gender = "male"
titanic_df[(titanic_df.pclass == 1) 
& (titanic_df.gender == "male")]["name"] 
SELECT name, age
FROM titanic_test_data
WHERE pclass NOT IN (1,2)
titanic_df[~titanic_df.pclass.isin([1,2])] 
[["name","age"]]

GROUP BY, ORDER BY, COUNT

SQL Python
SELECT
pclass,
gender,
COUNT(*)
FROM titanic_test_data
GROUP BY 1,2
titanic_df.groupby(["pclass","gender"]).size()
SELECT
pclass,
gender,
COUNT(*)
FROM titanic_test_data
GROUP BY 1,2
ORDER BY 3 DESC
titanic_df.groupby(["pclass","gender"])
.size().sort_values(ascending=False) 
SELECT
name,
pclass,
gender
FROM titanic_test_data
ORDER BY 1, 2 DESC
titanic_df.sort_values(["name","pclass"],
ascending=[True,False])
[["name","pclass","gender"]] 
SELECT
pclass,
gender,
SUM(fare)
FROM titanic_test_data
GROUP BY 1,2
titanic_df.groupby(["pclass","gender"]).sum()["fare"]

MIN, MAX, MEAN, MEDIAN

SQL Python
SELECT
MIN(age) AS min,
MAX(age) AS max,
AVG(age) AS mean,
APPROX_QUANTILES(age, 100)[OFFSET(50)] AS median
FROM titanic_test_data
titanic_df.agg(
{'age': ['min', 'max', 
'mean', 'median']})

SQL-Elasticsearch in PowerSheell

$connectstring = "DSN=Local Elasticsearch;"
$sql = "SELECT * FROM library"

$conn = New-Object System.Data.Odbc.OdbcConnection($connectstring)
$conn.open()
$cmd = New-Object system.Data.Odbc.OdbcCommand($sql,$conn)
$da = New-Object system.Data.Odbc.OdbcDataAdapter($cmd)
$dt = New-Object system.Data.datatable
$null = $da.fill($dt)
$conn.close()
$dt

Reference

References

About


Languages

Language:Jupyter Notebook 94.7%Language:Python 5.3%