kermitt2 / entity-fishing

A machine learning tool for fishing entities

Home Page:http://nerd.readthedocs.io/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need help to checkout, corupt data?

Slashdacoda opened this issue · comments

Hello @Slashdacoda !

I've just tried and have seen no problem:

lopez@work:~/tmp$ git clone https://github.com/kermitt2/entity-fishing
Cloning into 'entity-fishing'...
remote: Enumerating objects: 511, done.
remote: Counting objects: 100% (511/511), done.
remote: Compressing objects: 100% (291/291), done.
remote: Total 14526 (delta 222), reused 375 (delta 148), pack-reused 14015
Receiving objects: 100% (14526/14526), 611.58 MiB | 1.98 MiB/s, done.
Resolving deltas: 100% (6480/6480), done.
Updating files: 100% (2798/2798), done.
lopez@work:~/tmp$ 

What's your OS and version of git?

You can always try with the zip, you might be luckier, e.g.:

wget https://github.com/kermitt2/entity-fishing/archive/refs/heads/master.zip

Hey @kermitt2

Win 10pro ()
image

A bit troubleshooting:

After updating to 2.31.1 (https://gitforwindows.org), still same in git bash:
image

On Windows Terminal:
image

After installing Cygwin 2.905 (64 bit):
image

I think this is an Windows/Filesystem related problem: https://brendanforster.com/notes/fixing-invalid-git-paths-on-windows/

Some character problem. In my case the msg is:
image

The fix should be related to some path related character, maybe:

https://github.com/kermitt2/entity-fishing/blob/master/data/corpus/corpus-long/wikipedia/RawText/Alfred_Conkling_Coxe%2C_Sr.

why this %2C >> , in a filename?

Update: on other pc with windows 10 it works, thats wierd^^

Never the less, i will try this steps to fix my enviroment: https://brendanforster.com/notes/fixing-invalid-git-paths-on-windows/

Ok, after all, the problem semes to be the last point in the name of the files.

My enviroment can't find the 2 files with this nameshema. I figure it out at the point on recommiting the changed filename:

image

A posible solution is renaming it without a dot at the end of the name. Following this propose i ask myself if

  1. is it enough to rename it, or did we have to chane other things on other section of the project?
  2. why only my enviroment has problems with this nameshema "blalb.c." > identified as an C file

Hello !

The data you are pointing to come from an external evaluation corpus "Wikipedia" created by:

Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li. Learning to rank: from pairwise approach to listwise approach. In Zoubin Ghahramani, editor, Machine Learning, Proceedings of the Twenty-Fourth International Conference (ICML 2007), Corvallis, Oregon, USA, June 20-24, 2007, volume 227 of ACM International Conference Proceeding Series, pages 129–136. ACM. DOI <https://doi.org/10.1145/1273496.1273513>.

and they use the Wikipedia article name as file name - bad practice for file portability, but it's not our choice.

The file names are referenced in data/corpus/corpus-long/wikipedia/wikipedia.xml, @docName, that's it.

I guess there is no problem to rename these files (this corpus is not very useful beyond old system comparison, and is not updated), just be sure to rename them also in the corresponding wikipedia.xml for consistency... PR welcome ! :)

The checkout problem should be fixed, thx for the support @kermitt2