Clueless-Community / scrape-up

A web-scraping-based python package that enables you to scrape data from various platforms like GitHub, Twitter, Instagram, or any useful website.

Home Page:https://pypi.org/project/scrape-up/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Feat: Add Module for Myntra

aRUsh-codes opened this issue · comments

Describe the feature

As a part of GSSOC 24, I'd like to add a Myntra module with the following functionalities:
https://www.myntra.com/

Add features like

  • product page
  • product details
  • Search for
    • categories
    • genders
    • brands

Add ScreenShots

Screenshot (130)

Record

  • I agree to follow this project's Code of Conduct
  • I'm a GSSoC'24 contributor
  • I want to work on this issue

Go ahead @aRUsh-codes

Note

  • Please create a separate module for this, as in the folder and project structure (if it is already created, just add your features as functions in the same module).
  • Do not use the `selenium web driver as it is incompatible with all devices and cloud platforms.
  • Before making any changes, please check whether the module you want to add exists. If yes, then you can add your functionality as a method only make a separate module and class for it.

All the best 👨‍💻

@nikhil25803 Thank you for assigning me. Can you also add the apt. level label to this.

Hi @nikhil25803
After careful consideration and trying out different ways I don't think Myntra's data can be scraped and I suggest you to close this issue.
Here are my findings:-

  • Myntra uploads its data dynamically into the webpage using javascript which makes it hard to catch using only BeautifulSoup and requests method (A solution to this problem is to use Selenium but we can't do that either)
  • I tried taking advantage of the fact that Myntra stores its data in json format on the webiste, which is a seperate gateway url( eg- https://www.myntra.com/gateway/v2/search/men-sweaters?rows=50&o=49&plaEnabled=false&xdEnabled=false&pincode=110001) but even that didn't work because Myntra is strict with authentication(which means a request would need a solid authentication like product_id,name, description in case of products which won't be feasible to be provided by users)

Screenshot (135)

Image Source


The official documentation and the robotx.txt suggest the same strictness.

If anyone else finds(or knows) a way I would be happy to learn more.