seomoz / reppy

Modern robots.txt Parser for Python

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Segmentation fault with not well formed urls

ivan-is opened this issue · comments

Code sample for reproducing this error:

In [1]: from reppy.robots import Robots

In [2]: Robots.robots_url('http://:::cnn.com/47_50_95.html')

terminate called after throwing an instance of 'std::invalid_argument'
  what():  stoi
Aborted (core dumped)

Environment:

pip freeze | grep reppy
reppy==0.4.3

cat /etc/redhat-release 
Fedora release 24 (Twenty Four)

This is actually a bug in url-cpp, which is why it is causing a segmentation fault.

I guess my question is what the interpretation of this URL should be.

I don't think we need to interpret it magically, but not causing a core dump would be good.

That's fair. That's also something that could change in our reppy cythonization -- to catch that exception and raise something sensible.

I think url-cpp raising a new std::invalid_argument is sane, but bubbling up the stoi is lame on my part.

In this case, it's not actually bubbling up the exception, terminate is being called which means an exception is being thrown during the exception stack unwinding.

seomoz/url-cpp#37 addresses this issue, but it will need to be merged upstream to rep-cpp and then to reppy before this issue can be resolved.