aayush1036 / data_analysis

Repository from Github https://github.comaayush1036/data_analysisRepository from Github https://github.comaayush1036/data_analysis

This is the github repository for the blog about Pandas VS SQL for data analysis

Read the blog at https://aayushmaan1306.medium.com/pandas-vs-sql-for-data-analysis-5a5cd8dc81d5

Website: https://aayush1036.github.io/profile_website/

The blog is also published at the Analytics Vidhya publication

Final Comparison

Pandas is better if you want to perform complex data manipulations because of the .transform() function in pandas which enables you to apply any data transformation function to the data and creating columns and applying functions are easier in pandas. Pandas is also better because you can visualize data using various libraries like matplotlib, seaborn, plotly, altair etc which helps us to get a better insight into the data and make the analysis more presentable.

SQL is better if you want to query a large amount of data because of the speed it provides and it is also better for establishing relations between tables because of its RDBMS model. Adding pirmary and foreign key constraints is much easier in SQL and adding these constraints resrict the enry of data in the tables. For visualizing data in SQL, we would have to use other softwares like Tableau and link it to a SQL database.

About


Languages

Language:Jupyter Notebook 100.0%