bundesAPI / handelsregister

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Kostenfreie abfragen ab 01.08.2022

wirthual opened this issue · comments

As per their website:

With the coming into effect of the Law on the Implementation of the Digitalization Guidelines (DiRUG) on 01.08.2022, access to all register content in the trade, cooperative, association and partnership register as well as to any electronically available documets through the Common Register Portal of federal states is provided free of charge starting from 01.08.2022. After that date, no registration and no log-in is required any more.

Do they offer an API description? 😅

I have querying working using MechanicalSoup now. Took a bit of prodding with their weird javascript form.

@alper sounds awesome. Maybe would be cool if you could document it?

Got the stub in #7.

Next up grab all the relevant belonging to a specific company and see if the people can be parsed out?

(Used mechanize after all because it's pretty solid and familiar once it works.)

Maybe going to use Selenium after all because this is the post payload for getting one of the documents:

ergebnissForm=ergebnissForm&javax.faces.ViewState=-8635335262319402326%3A6636106239244724446&ergebnissForm%3AselectedSuchErgebnisFormTable_rppDD=10&ergebnissForm%3AselectedSuchErgebnisFormTable_rppDD=10&ergebnissForm%3AselectedSuchErgebnisFormTable%3A0%3Aj_idt164%3A2%3Afade=ergebnissForm%3AselectedSuchErgebnisFormTable%3A0%3Aj_idt164%3A2%3Afade

and its triggered in javascript by this Jakarta Server Faces application.

I still have to try it out. It could be that this thing has a <noscript> fallback.

I ported it to Selenium and can download the PDF files now. Will polish it and make sure you can for a given company get all the PDFs.

@alper Does it also work without active JavaScript?
Can you provide your source code?

I'll post it after one more iteration.

It seems nothing here works without javascript.

Is this still necessary? I got it to work in headless and download all the straightforward documents for an entity.

CleanShot 2022-08-20 at 13 26 05@2x

This can be cleaned up, documents moved into a permanent location and run in batch but Selenium/Gecokdriver is kinda unreliable.

It's in my fork here: https://github.com/alper/handelsregister/blob/main/sel.py

It's in my fork here: https://github.com/alper/handelsregister/blob/main/sel.py

How to run it in headless mode? The readme in your fork only describes how to use the regular handelsregister.py, not the sel.py

I thought I would be able to download .pdf files with the sel.py but I find no information about how to download them.

I think it does but I haven't used it for a while and it's grossly untested. It definitely won't work to just get a bunch of PDFs without a lot of handling.

Hi @alper , I was trying to run sel.py in colab . I seem to get an error in the following line https://github.com/alper/handelsregister/blob/e6cea7d92041e4a28c323ea390c9bdb5bbab7a1d/sel.py#L65
and the error trace is as follows
Do you know what could be wrong here?

Registerportal | Advanced search
<selenium.webdriver.remote.webelement.WebElement (session="60091ef69448ee4dc5d60e9b753fa24e", element="3b7448c5-f072-4e11-af93-25c85be00d4d")>
<selenium.webdriver.remote.webelement.WebElement (session="60091ef69448ee4dc5d60e9b753fa24e", element="105dcd63-80e1-4bf5-9a25-1cedbd5785e7")>
---------------------------------------------------------------------------
ElementClickInterceptedException          Traceback (most recent call last)
<ipython-input-43-c212176fe302> in <cell line: 21>()
     19 search_button = driver.find_element(By.XPATH, "//button[@id='form:btnSuche']")
     20 print(search_button)
---> 21 print(search_button.click())
     22 #document_list = ['AD','CD','HD',# 'DK',# 'UT'# 'VÖ','SI']

3 frames
/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
    243                 alert_text = value["alert"].get("text")
    244             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 245         raise exception_class(message, screen, stacktrace)

ElementClickInterceptedException: Message: element click intercepted: Element <button id="form:btnSuche" name="form:btnSuche" class="ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only searchButton" onclick="PrimeFaces.bcn(this,event,[function(event){PF('btnSuche').disable()},function(event){PrimeFaces.ab({s:&quot;form:btnSuche&quot;,f:&quot;form&quot;,u:&quot;form&quot;});return false;}]);" type="submit" role="button" aria-disabled="false">...</button> is not clickable at point (767, 1065). Other element would receive the click: <a href="#page-wrapper">...</a>
  (Session info: headless chrome=90.0.4430.212)
Stacktrace:
#0 0x56b032b607f9 <unknown>
#1 0x56b032b003b3 <unknown>--> 245         raise exception_class(message, screen, stacktrace)

ElementClickInterceptedException: Message: element click intercepted: Element <button id="form:btnSuche" name="form:btnSuche" class="ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only searchButton" onclick="PrimeFaces.bcn(this,event,[function(event){PF('btnSuche').disable()},function(event){PrimeFaces.ab({s:&quot;form:btnSuche&quot;,f:&quot;form&quot;,u:&quot;form&quot;});return false;}]);" type="submit" role="button" aria-disabled="false">...</button> is not clickable at point (767, 1065). Other element would receive the click: <a href="#page-wrapper">...</a>
  (Session info: headless chrome=90.0.4430.212)
Stacktrace:
#0 0x56b032b607f9 <unknown>
#1 0x56b032b003b3 <unknown>