Segmentation fault with not well formed urls

Question

Segmentation fault with not well formed urls

ivan-is opened this issue 8 years ago · comments

Code sample for reproducing this error:

In [1]: from reppy.robots import Robots

In [2]: Robots.robots_url('http://:::cnn.com/47_50_95.html')

terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoi
Aborted (core dumped)

Environment:

pip freeze | grep reppy
reppy==0.4.3

cat /etc/redhat-release 
Fedora release 24 (Twenty Four)

Brandon Forehand · Answer 1 · Fri Feb 10 2017 01:52:24 GMT+0800 (China Standard Time)

This is actually a bug in url-cpp, which is why it is causing a segmentation fault.

Dan Lecocq · Answer 2 · Fri Feb 10 2017 02:00:48 GMT+0800 (China Standard Time)

I guess my question is what the interpretation of this URL should be.

Brandon Forehand · Answer 3 · Fri Feb 10 2017 04:22:20 GMT+0800 (China Standard Time)

I don't think we need to interpret it magically, but not causing a core dump would be good.

Dan Lecocq · Answer 4 · Fri Feb 10 2017 04:23:56 GMT+0800 (China Standard Time)

That's fair. That's also something that could change in our reppy cythonization -- to catch that exception and raise something sensible.

I think url-cpp raising a new std::invalid_argument is sane, but bubbling up the stoi is lame on my part.

Brandon Forehand · Answer 5 · Fri Feb 10 2017 05:10:44 GMT+0800 (China Standard Time)

In this case, it's not actually bubbling up the exception, terminate is being called which means an exception is being thrown during the exception stack unwinding.

Brandon Forehand · Answer 6 · Fri Feb 10 2017 09:11:57 GMT+0800 (China Standard Time)

seomoz/url-cpp#37 addresses this issue, but it will need to be merged upstream to rep-cpp and then to reppy before this issue can be resolved.