colonelpanic8 / okcupyd

A Library that enables programmatic interaction with okcupid.com, using okcupid.com's private okcupid JSON API and html scraping when necessary.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Populate some user info from the search output to avoid unnecessary profile requests

mconigliaro opened this issue · comments

It looks like some user info is already available during a search (i.e. username, age, location, match_percentage, enemy_percentage) without making a request for the whole profile, but I didn't see a way to prevent the profile request. For example, in the code below...

profiles = user.search(...)
for p in profiles:

It looks like by the time p is assigned, the profile request has already been made. It would be nice if some of the profile data was populated from the search output and the rest was loaded on demand. That way, your script could keep a list of recently visited users and avoid requesting those profiles again so you don't end up looking like a stalker. Haha. Of course, #31 would help with this too.

There is actually some fanciness going on with the way attributes are retrieved on profile objects -- The profile request is not made when the profile object is created but when any attribute that requires the profile to be fetched is requested. This happens through the profile_tree/_profile_response cached_properties which at least prevents multiple requests from being made.

If you look back far enough in the commit history (see ce473a2 for example, you will see that okcupyd actually used to populate some of the profile info in exactly the way you are suggesting. At some point I decided to remove this in favor of the simpler approach of getting all of information from the profile page.
Your suggestion has obvious merits (fewer requests, not looking like a creep), but actually implementing what you want might be a bit more complicated than you imagine. You see, Profile objects can be created in many places, and it is important that
a) Profile objects continue to work away from search
and
b) The only parameters that are ever required for profile creation are a logged in session and a username

What needs to be done is
a) Make it so that Profile accepts keyword arguments to eagerly populate its variables
b) Use a slightly more sophisticated Processor object in the SearchFetchable function that passes these keyword arguments to the Profile constructors

Actually, as I mentioned, search used to do exactly what you wanted, so using the following code:

class MatchCardExtractor(object):

    def __init__(self, div):
        self._div = div

    @property
    def username(self):
        return xpb.div.with_class('username').get_text_(self._div).strip()

    @property
    def age(self):
        return int(xpb.span.with_class('age').get_text_(self._div))

    @property
    def location(self):
        return helpers.replace_chars(
            xpb.span.with_class('location').get_text_(self._div)
        )

    _match_percentage_xpb = xpb.div.with_classes('percentage_wrapper', 'match').span.with_classes('percentage')

    @property
    def match_percentage(self):
        try:
            return int(self._match_percentage_xpb.get_text_(self._div).strip('%'))
        except:
            return 0

    _enemy_percentage_xpb = xpb.div.with_classes('percentage_wrapper', 'enemy').span.with_classes('percentage')

    @property
    def enemy_percentage(self):
        try:
            return int(self._enemy_percentage_xpb.get_text_(self._div).strip('%'))
        except ValueError:
            return 0

    @property
    def contacted(self):
        return bool(xpb.div.with_class('fancydate').apply_(self._div))

    @property
    def as_dict(self):
        return {
            'username': self.username,
            'age': self.age,
            'location': self.location,
            'match_percentage': self.match_percentage,
            'enemy_percentage': self.enemy_percentage,
            'id': self.id,
            'rating': self.rating,
            'contacted': self.contacted
        }

b) might just end up looking like

    session = session or Session.login()
    def build_profile_from_match_card_div(match_card_div):
        match_card_extractor = MatchCardExtractor(match_card_div)
        return Profile(session, match_card_extractor.username, **match_card_extractor.as_dict)
    return util.Fetchable.fetch_marshall(
        SearchHTMLFetcher(session, **kwargs),
        util.SimpleProcessor(
            session,
            build_profile_from_match_card_div,
            xpb.div.with_classes('match_card')
        )
    )

a) is also pretty easy... basically just take **kwargs and shove that shit into self.dict in the constructor of Profile (obviously some validition and stuff would be nice).

Do you have any interest in trying to add this functionality? It actually seems pretty easy to do and I'd really appreciate a pull request.

Oh, you implemented it already? I'll give it a try when I have a free moment.

This is fixed in d84655e and v 0.8.15