client.get_recipes() only returns 10 results instead of all results
joeld1 opened this issue · comments
I have over 10 recipes on MyFitnessPal but calling client.get_recipes() only returns the first 10 recipes; I was wondering if there's a way to paginate and get all recipes instead of just the first 10 ?
Thank you!
Pagination no longer functions for the recipe_parser (i.e. https://www.myfitnesspal.com/recipe_parser?page=1&sort_order=recent) path since there isn't a page indicator.
I've edited get_recipes
in the client.py
files to try to see if there's data in the next page when determining whether to paginate or not.
Here is the refactored method:
def get_recipes_from_page(self, page_count, recipes_dict):
RECIPES_PATH = f"recipe_parser?page={page_count}&sort_order=recent"
recipes_url = parse.urljoin(self.BASE_URL_SECURE, RECIPES_PATH)
document = self._get_document_for_url(recipes_url)
recipes = document.xpath(
"//*[@id='main']/ul[1]/li"
) # get all items in the recipe list
for recipe_info in recipes:
recipe_path = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
"href"
]
recipe_id = recipe_path.split("/")[-1]
recipe_title = recipe_info.xpath("./div[2]/h2/span[1]/a")[0].attrib[
"title"
]
recipes_dict[recipe_id] = recipe_title
return document
def get_recipes(self) -> Dict[int, str]:
"""Returns a dictionary with all saved recipes.
Recipe ID will be used as dictionary key, recipe title as dictionary value.
"""
recipes_dict = {}
page_count = 1
has_next_page = True
while has_next_page:
document = self.get_recipes_from_page(page_count, recipes_dict)
# Check for Pagination
pagination_links = document.xpath('//*[@id="main"]/ul[2]/a')
if pagination_links:
if page_count == 1:
# If Pagination exists and it is page 1 there have to be a second,
# but only one href to the next (obviously none to the previous)
page_count += 1
elif len(pagination_links) > 1:
# If there are two links, ont to the previous and one to the next
page_count += 1
else:
# Only one link means it is the last page
has_next_page = False
else:
tmp_dict = {}
# Check and see if there's another page, if we can't determine if pagination exists
document = self.get_recipes_from_page(page_count+1, tmp_dict)
if tmp_dict:
# Increment page_count in order to get the next page
page_count += 1
else:
# Indicator for no recipes if len(recipes_dict) is 0 here
has_next_page = False
return recipes_dict
No problem!
Hello @hannahburkhardt , I just uploaded my pull request containing the refactored method
Hello @hannahburkhardt ,
I managed to find another edge case so I updated the code to be able to handle that one.
The edge case is as follows:
pagination_links
does exists, 28 recipes exist, and only 2 pagination links (i.e. page 1, page 2) are present
-- only 20 recipes are returned because we only discovered 2 pagination links (i.e. page 1, page 2)
My latest push should now be able to handle this edge case in the event that all pagination links aren't shown