opsdisk / yagooglesearch

Yet another googlesearch - A Python library for executing intelligent, realistic-looking, and tunable Google searches.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

How do I only get the main links

batman-do opened this issue · comments

How do I only get the main links and not the additional links attached to the main link (which are additional links #...), @opsdisk

image

Hi @batman-do - can you clarify your question for me? Are the additional links the ones on a page, like "about", "contact", etc.?

Passing verbose_output=True might give you what you want. You may have to experiment and see what the data looks like here https://github.com/opsdisk/yagooglesearch/blob/master/yagooglesearch/__init__.py#L514

There may be an additional links attribute you can extract.

Hi @batman-do - can you clarify your question for me? Are the additional links the ones on a page, like "about", "contact", etc.?

Passing verbose_output=True might give you what you want. You may have to experiment and see what the data looks like here https://github.com/opsdisk/yagooglesearch/blob/master/yagooglesearch/__init__.py#L514

There may be an additional links attribute you can extract.

@opsdisk What I mean is that I only want to get the main links when getting the top-10, for example, not the secondary links anymore.

Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/
Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8

,...

  • What parameters do I have to add to eliminate this problem or do I still have to get the higher top and use regex,... to remove the extra links?

So you get both of these results back from yagooglesearch, and you only want the "main" one, not the "additional" link?

Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/
Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8

I'd recommend filtering them out with regex after they are all collected. So using https://github.com/opsdisk/yagooglesearch#usage as an example, add some logic/regex in the for loop to remove the ones with URL anchors that you don't want.

So you get both of these results back from yagooglesearch, and you only want the "main" one, not the "additional" link?

Main link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/
Additional link: https://bepos.io/blogs/email-ban-hang-chuyen-nghiep/#ftoc-heading-8

I'd recommend filtering them out with regex after they are all collected. So using https://github.com/opsdisk/yagooglesearch#usage as an example, add some logic/regex in the for loop to remove the ones with URL anchors that you don't want.

@opsdisk thank u for reply, I understand this :)),