ecologylab / BigSemanticsService

Provides a RESTful service for BigSemantics. Supports thin desktop, mobile and cloud clients.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Service Fails on this url http://dl.acm.org/citation.cfm?id=297827

brownzach125 opened this issue · comments

The service fails to return( or just takes an unseemly amount of time I haven't waited around) on this url:
http://dl.acm.org/citation.cfm?id=297827
The actual request:
http://ecology-service.cse.tamu.edu/BigSemanticsService/metadata.xml?url=http://dl.acm.org/citation.cfm?id=297827

I looked into this. The problem is that this paper has >2000 citations in ACM and it takes a long time to download the HTML and run extraction.

Now I think it is cached there, so the request actually works. In the future, we might need to actively crawl and cache pages like this, to make sure the performance is not too bad.

is there any way we can examine the content-length header at download time
and use this info to make the service be more fault tolerant?

andruid

On Fri, Jan 16, 2015 at 2:42 PM, Yin Qu (屈垠) notifications@github.com
wrote:

I looked into this. The problem is that this paper has >2000 citations in
ACM and it takes a long time to download the HTML and run extraction.

Now I think it is cached there, so the request actually works. In the
future, we might need to actively crawl and cache pages like this, to make
sure the performance is not too bad.


Reply to this email directly or view it on GitHub
#15 (comment)
.

andruid kerne, ph.d.
director, interface ecology lab
associate professor, department of computer science and engineering
texas a&m university 979.862.3684 fax
college station, tx 77843-3112 http://ecologylab.net

http://facebook.com/ecologylab

Interfaces are the multidimensional border zones through which the
interdependent relationships of people, activities, codes, components,
and systems are constituted. Interface ecology investigates the
dynamic interactions of media, cultures, and disciplines that
flow through interfaces.

I don't think there is any error happening for this case; it just takes a long time.

The actual HTTP connection for this case uses chunks, so we don't really know the total size before hand.