client.get_recipes() only returns 10 results instead of all results

Question

client.get_recipes() only returns 10 results instead of all results

joeld1 opened this issue 5 months ago · comments

I have over 10 recipes on MyFitnessPal but calling client.get_recipes() only returns the first 10 recipes; I was wondering if there's a way to paginate and get all recipes instead of just the first 10 ?

Thank you!

JD · Answer 1 · Sat Jun 01 2024 18:38:44 GMT+0800 (China Standard Time)

Pagination no longer functions for the recipe_parser (i.e. https://www.myfitnesspal.com/recipe_parser?page=1&sort_order=recent) path since there isn't a page indicator.

I've edited get_recipes in the client.py files to try to see if there's data in the next page when determining whether to paginate or not.

Here is the refactored method:

    def get_recipes_from_page(self, page_count, recipes_dict):
        RECIPES_PATH = f"recipe_parser?page={page_count}&sort_order=recent"
        recipes_url = parse.urljoin(self.BASE_URL_SECURE, RECIPES_PATH)
        document = self._get_document_for_url(recipes_url)
        recipes = document.xpath(
            "//*[@id='main']/ul[1]/li"
        )  # get all items in the recipe list
        for recipe_info in recipes:
            recipe_path = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
                "href"
            ]
            recipe_id = recipe_path.split("/")[-1]
            recipe_title = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
                "title"
            ]
            recipes_dict[recipe_id] = recipe_title
        return document


    def get_recipes(self) -> Dict[int, str]:
        """Returns a dictionary with all saved recipes.

        Recipe ID will be used as dictionary key, recipe title as dictionary value.
        """
        recipes_dict = {}

        page_count = 1
        has_next_page = True
        while has_next_page:
            document = self.get_recipes_from_page(page_count, recipes_dict)

            # Check for Pagination
            pagination_links = document.xpath('//*[@id="main"]/ul[2]/a')
            if pagination_links:
                if page_count == 1:
                    # If Pagination exists and it is page 1 there have to be a second,
                    # but only one href to the next (obviously none to the previous)
                    page_count += 1
                elif len(pagination_links) > 1:
                    # If there are two links, ont to the previous and one to the next
                    page_count += 1
                else:
                    # Only one link means it is the last page
                    has_next_page = False
            else:
                tmp_dict = {}
                # Check and see if there's another page, if we can't determine if pagination exists
                document = self.get_recipes_from_page(page_count+1, tmp_dict)
                if tmp_dict:
                    # Increment page_count in order to get the next page
                    page_count += 1
                else:
                    # Indicator for no recipes if len(recipes_dict) is 0 here
                    has_next_page = False
        return recipes_dict

Hannah · Answer 2 · Wed Aug 21 2024 10:08:53 GMT+0800 (China Standard Time)

@joeld1 amazing! Would you mind putting in a PR with this change?

JD · Answer 3 · Thu Aug 22 2024 18:41:08 GMT+0800 (China Standard Time)

No problem!

JD · Answer 4 · Sat Sep 07 2024 06:36:07 GMT+0800 (China Standard Time)

Hello @hannahburkhardt , I just uploaded my pull request containing the refactored method

#185

JD · Answer 5 · Mon Sep 09 2024 07:01:44 GMT+0800 (China Standard Time)

Hello @hannahburkhardt ,

I managed to find another edge case so I updated the code to be able to handle that one.

The edge case is as follows:

pagination_links does exists, 28 recipes exist, and only 2 pagination links (i.e. page 1, page 2) are present
-- only 20 recipes are returned because we only discovered 2 pagination links (i.e. page 1, page 2)

My latest push should now be able to handle this edge case in the event that all pagination links aren't shown