In the Unternehmensvertrag / Umwandlungsvertrag / sonstiger Vertrag vom 21.02.2023 from DB Fernverkehr Aktiengesellschaft (HRB 83173), where the sales of long distance train tickets moved from DB Vertrieb to DB Fernverkehr a interesting list is attached: lots of domains DB registered for selling tickets, customer information, marketing campaigns and weird other stuff:
Because open data in the transport sector is always a good idea, and DB actually providing some on their Open Data Portal, lets republish the list of domains in a machine readable format: CSV.
To extract the domains.csv yourself, a Makefile is contained in this repo. For it to work, you need install dependencies and download HE-Frankfurt_am_Main_HRB_83173+Unternehmensvertrag_-_Umwandlungsvertrag_-20230411101101.pdf
from the Handelsregister first:
Use the normale suche to search for db fernverkehr ag
and select alle Schlagwörter enthalten. From the results, look for Hessen Amtsgericht Frankfurt am Main HRB 83173, DB Fernverkehr Aktiengesellschaft and choose DK (document view). Open the tree like this:
Dokumente zum Rechtstrager
└─ Dokumente zur Registernummer
└─ Weitere Urkunden / Unterlagen
└─ Unternehmensvertrag / Umwandlungsvertrag / sonstiger Vertrag
├─ Unternehmensvertrag / Umwandlungsvertrag / sonstiger Vertrag vom 23.03.2023
├─ Unternehmensvertrag / Umwandlungsvertrag / sonstiger Vertrag vom 21.02.2023
├─ ...
Choose Unternehmensvertrag / Umwandlungsvertrag / sonstiger Vertrag vom 21.02.2023, select PDF and download. Remove the -202...
(timestamp) from the filename.
Needed dependencies:
- pdftk
- https://github.com/ocrmypdf/OCRmyPDF
- http://tabula.technology (will be downloaded by the makefile)
- https://github.com/BurntSushi/xsv
Run make domains.csv
. The resulting file already got some automatic cleanup steps, but still needs some more manual cleanup.
Thanks to
- @gglnx for finding the pdf and tweeting about the
bahn.fail
domain contained therein - ocrmypdf for helping redo the broken ocr in the pdf
- tabula for making it easy to extract the data
- xsv for helping wrangle csv files
- handelsregister.de for providing a public download - even if the interface is literally UX hell and doesn't support permanent document identifiers or direct links.
Because it is simply a list of factual data, CC-0, I guess.