karussell / snacktory

Readability clone in Java

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Fetch content from Twitter URLs?

rubdottocom opened this issue · comments

Hi!
I'm trying to fetch content from the URLs inside of a Tweet.

When I try to do it for Official Twitter Android app, Twitter only shares with me a text like "read this tweet from @user at http://twitter.com/status/8341234812634".

So I fetch this URL with the hope to get the real tweet text with the real URL that I want to fetch.

However, when I do that I receive from Twitter a sort of warning that I must accept the use of cookies "To bring you Twitter, we and our partners use cookies on our and other websites. Cookies help personalize Twitter content, tailor Twitter Ads, measure their performance and provide you with a better, faster, safer Twitter experience. By using our services, you agree to our Cookie Use. Close".

I tried to set some "user-agent" and "cookie" configuration to HttpURLConnection before fetch Twitter, without success.

Do you know how can I achieve that?

That's currently my code (some dirty, I'm wondering to push you a fix when it works).

public String fetchAsString(String urlAsString, int timeout, boolean includeSomeGooseOptions)
        throws MalformedURLException, IOException {
    HttpURLConnection hConn = createUrlConnection(urlAsString, timeout, includeSomeGooseOptions);
    hConn.setInstanceFollowRedirects(true);

   // Start "hack"
    hConn.setRequestProperty("User-Agent", "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9.2.2) Gecko/20100316 Firefox/3.6.2");
    Log.d("EXTRACT", hConn.getRequestProperty("User-Agent"));
    CookieManager cookieManager = new CookieManager();
    CookieHandler.setDefault(cookieManager);

    HttpCookie cookie = new HttpCookie("lang", "en");
    cookie.setDomain("twitter.com");
    cookie.setPath("/");
    cookie.setVersion(0);
    try {
        cookieManager.getCookieStore().add(new URI("http://twitter.com/"), cookie);
    } catch (URISyntaxException e) {
        e.printStackTrace();
    }
   // End "hack"

    String encoding = hConn.getContentEncoding();        
    InputStream is;
    if (encoding != null && encoding.equalsIgnoreCase("gzip")) {
        is = new GZIPInputStream(hConn.getInputStream());
    } else if (encoding != null && encoding.equalsIgnoreCase("deflate")) {
        is = new InflaterInputStream(hConn.getInputStream(), new Inflater(true));
    } else {
        is = hConn.getInputStream();
    }

    String enc = Converter.extractEncoding(hConn.getContentType());
    String res = createConverter(urlAsString).streamToString(is, enc);
    if (logger.isDebugEnabled())
        logger.debug(res.length() + " FetchAsString:" + urlAsString);
    return res;
}

Why not use the official Twitter API? I think they don't like scraping ;)

BTW: I would personally also being interested in scraping twitter ;)
BTW2: normally snacktory already accepts all cookies. See HtmlFetcher:

static {
    SHelper.enableCookieMgmt();
    SHelper.enableUserAgentOverwrite();
    SHelper.enableAnySSL();
}

Well... I don't want that the user needs to do a Twitter authentication with my App, so I'm receiving content from other apps through Share option across Android system.

I'll investigate further, thanks

I fear you'll need to do some JavaScript hacks. Or investigate how blind people, browsers like lynx or JS-disabled browsers can access Twitter. Also RSS does not seem to work anymore: https://twitter.com/timetabling.rss