yogeshwaran01 / instagramy

Python Package for Instagram Users, Posts, and Hashtag data.

Home Page:https://pypi.org/project/instagramy

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Post Text

yeamusic21 opened this issue · comments

commented

Would it be possible to include the post text when scraping user posts? I'm interested in grabbing the number of hashtags used over the last X (say 12) posts. For example, say over the last 12 posts, on average there are 5 hashtags used per post. If the Post object included the post text, I think I could get what I'm looking for. This isn't really an issue, more of a request or suggestion for your consideration.

Thank You for your suggestion, You can get the caption of the post by

>>> user = InstagramUser('github', sessionid=id)
>>> sample_post = user.posts[0]
>>> sample_post.caption

I try to add the hashtags of the post in the next release.

Thank You

commented

I try to add the hashtags of the post in the next release.

Amazing! You rock @yogeshwaran01 👍 💯 🥇

@yeamusic21 You can get the hashtags and text of the post by the following script

>>> from Instagram import InstagramPost

>>> post = InstagramPost('CNQDEkxr8eM', sessionid=sessionid)
>>> data = post.post_data
>>> post_text = post['edge_media_to_caption']['edges'][0]['node']['text']
# if raise key Error text of post is Empty
>>> post_text
commented

@yogeshwaran01 Wow! Awesome! I will try this out and let you know how it goes. 👍 💯

commented

@yogeshwaran01 - Errors out with sessionid :-(

>>> from instagramy import InstagramUser, InstagramPost
>>> import re
>>> import os
>>> import json
>>> import pickle
>>> from time import sleep
>>> postID = 'CM3a6y5B4ju'
>>> post2 = InstagramPost(postID, sessionid=os.environ.get("INSTAGRAM_SESSIONID"))
Traceback (most recent call last):
  File "C:\Users\Redacted\Desktop\Personal\Redacted\Redacted\Redacted\lib\site-packages\instagramy\InstagramPost.py", line 52, in __init__
    self.post_data = data["entry_data"]["PostPage"][0]["graphql"][
KeyError: 'graphql'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Redacted\Desktop\Personal\Redacted\Redacted\Redacted\lib\site-packages\instagramy\InstagramPost.py", line 56, in __init__
    raise RedirectionError
instagramy.core.exceptions.RedirectionError: Instagram Redirects you to login page, Try After Sometime or Reboot your PC Provide the sessionid to Login
>>>

Works with the following revisions:

  • no session id
  • [0] added after ['edges'])
>>> import re
>>> import os
>>> import json
>>> import pickle
>>> from time import sleep
>>> postID = 'CM3a6y5B4ju'
>>> post2 = InstagramPost(postID)
>>> pdata = post2.post_data
>>> len(re.findall("[#]\w+", pdata['edge_media_to_caption']['edges'][0]['node']['text']))
7

@yeamusic21, Yes your right I forget to add [0], Now it is updated.

instagramy.core.exceptions.RedirectionError is due to incorrect session_id or Instagram changed the session_id

commented

@yogeshwaran01 I'm using Python 3.7.1 and it seems like my code is different than your latest code.

My Instagramy code in instagramy/InstagramPost.py:

    def __init__(self, post_id: str, sessionid=None):
        self.post_id = post_id
        self.url = f"https://www.instagram.com/p/{post_id}/"
        self.sessionid = sessionid
        data = self.get_json()
        try:
            self.post_data = data["entry_data"]["PostPage"][0]["graphql"][
                "shortcode_media"
            ]
        except KeyError:
            raise RedirectionError

Latest Instagramy code in instagramy/InstagramPost.py:

    def __init__(self, post_id: str, sessionid=None, from_cache=False):
        self.post_id = post_id
        self.url = f"https://www.instagram.com/p/{post_id}/"
        self.sessionid = sessionid
        cache = Cache("post")
        if from_cache:
            if cache.is_exists(post_id):
                self.post_data = cache.read_cache(post_id)
            else:
                data = self.get_json()
                cache.make_cache(
                    post_id,
                    data["entry_data"]["PostPage"][0]["graphql"]["shortcode_media"],
                )
                self.post_data = data["entry_data"]["PostPage"][0]["graphql"]["shortcode_media"]
        else:
            data = self.get_json()
            cache.make_cache(
                post_id, data["entry_data"]["PostPage"][0]["graphql"]["shortcode_media"]
            )
            try:
                self.post_data = data["entry_data"]["PostPage"][0]["graphql"][
                    "shortcode_media"
                ]
            except KeyError:
                raise RedirectionError
        if sessionid:
            try:
                self.viewer = Viewer(data=data["config"]["viewer"])
            except UnboundLocalError:
                self.viewer = None
        else:
            self.viewer = None

@yeamusic21, Yes today instagramy got a new release 4.3. It got a new caching feature. Please check it out.
Update the package to the latest version

pip install instagramy --upgrade