prakharrathi25 / mindfire-quest

Our work for the Mindfire Quest with Swiss Re

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

mindfire-quest

Mindfire Quest

Unveil the obscure network of company and location data, using smart algorithms and data wrangling

Mission Statement

By transforming Swiss Re into a truly tech- and data-led risk knowledge company, we aim to make an incremental change in better understanding and qualifying risks. Reflecting the real-world evidence in a systematic, digital way, for example understanding where companies have their offices, facilities, factories, and warehouses, is essential to perform risk assessment and resilience services in a data-driven way. While meta information on companies does exist, an extended view to its locations isn’t solved at scale yet. This quest aims to bridge this major information gap applying advanced techniques with artificial intelligence.

The quest's mission is to build relationships in company and location data. In particular, it will focus on:

  • Defining the many ways in which a company and a location (for example a factory, warehouse, sales point) can be interlinked
  • Identifying free or paid source of information, where data would support applying ontology to actual companies and buildings
  • Populating a first applied ontology with information retrieved from commercially-free data
  • Potentially retrieving unique identifiers when and where possible, to allow Swiss Re to map it to its internal data at a later stage

Methods

The methods that we have applied have been discussed in the following section:

  1. NLP Ontology: DNB & DUNS; Panama, Paradise and Pandora papers for offshore and virtual address, or the Global cement directory report

  2. Top View: Semantic image search using, e.g. CLIP (from OpenAI) applied to sat and terrain view repos of North America, Africa etc. a) Library used: CLIP Notebook b) Library Used: Google MUM Multi-Modal T5 c) Library Used: CV modals on TF or Sagemaker...

  3. Sat View: Text Extraction of company names from the buildings or building entrances signs, potentially for inclined sat view. (Cement factory company_detection from image)

  4. Google PlusCodes, SearchOnL coordinate numbering system for buildings.

In sprint 2, we have developed a hybrid of all the above methods.

Here's a workflow of what the solution is envisioned to look like:-

🎯 Ways to Implement

White-Label Path

This is the path that we think is possible at the moment. The following are the steps involved:

  1. YOLOR or similar algorithm to identify buildings and sites in satellite visual and terrain images for higher accuracy, or via transfer learning by feeding the bounding boxes to CLIP. Already done and uploaded for over 1bn buildings.

  2. Top view: CLIP notebook to search through the sat images with buildings in it. (Crosscheck and subtract the ones already listed in the NLP Ontology)

  3. Ground view: Text and/or logo detection

  4. Geolocation: Google Earth Engine (GEE) script for coordinates to street address or to Plus Codes

  5. Optional addition: Address normalization fuzzy logic tool.

How to Implement
  • Step 1: "T2 notebook" for any object detection algorithm implementation from TenserFlow Hub. “Open buildings region notebook” as example for the 2/3 of the continent of Africa for building detection and coordinates extraction as well as Github repos for USA & Canada

  • Step 2 "CLIP Cement Factory Beyond Tags - Semantic Search on images with OpenAI notebook" : notebook for identifying potential company candidates

  • Step 3 "Company_detection_from_image notebook": AWS cement factory detection notebook for uniquely identifying language and company from text on satellite or streetview images on company buildings, entrance or outdoor objects, such as branded trucks or containers on ships or harbors.

  • Step 4 GEE link - open buildings- here

  • Step 5 Address normalisation fuzzy logic tool.

Happy Path

Given the availability of software and resources, this is the path we think can make our tasks easier.

Step 1: GOOGLE MUM to be released soon

Step 2: "CLIP Cement Factory Beyond Tags - Semantic Search on images with OpenAI notebook": notebook for identifying potential company candidates

Step 3: "Company_detection_from_image notebook": AWS cement factory detection notebook for uniquely identifying language and company from text on satellite or streetview images on company buildings, entrance or outdoor objects, such as branded trucks or containers on ships or harbors.

Step 4: GEE link - open buildings*

Step 5: Address normalisation fuzzy logic tool

Data

The data has been collected from the DnB website. It scraped using a web scraper and the data can be found in folder data > dnb-single-page.csv. The data is a snapshot of the DnB database from a single page. Overall, the database has over 3000 entries.

We collected data of a variety of companies:

⚒️ Contributors

The team members undertaking this project are:


Susanne Kühne


Marco Fernandez


Stephanie Boyle


Ojasvi Gupta


Prakhar Rathi

About

Our work for the Mindfire Quest with Swiss Re


Languages

Language:Jupyter Notebook 100.0%