osint reversing socmint unofficial-apis undocumented-api

⏮️ Awesome unofficial API reversing

Resources for reverse engineering web and native apps (especially mobile) for the specific purpose of discovering and using “unofficial APIs” (aka "undocumented APIs") for OSINT / SOCMINT and other purposes.

Before considering unofficial APIs, it is often worth checking whether or not the target platform publishes an official public and/or free API that will get you what you need. There are some great repos on Github listing public and free APIs.

Additionally, it may not be necessary to reverse an undocumented API yourself if someone has already published a comprehensive unofficial client. Many instances of these may also be found on Github.

📣 PSA for UK counterterrorism organisations

Thank you for your service.

A public API that may prove especially helpful for supporting your work targeting XRW individuals and groups is Open Measures, formerly known as the Social Media Analysis Toolkit (SMAT).

In terms of "rolling-your-own" client to directly ingest SOCMINT from unofficial APIs: given that extremists and terrorists (in general not just the XRW) typically don't frequent the biggest mainstream platforms (with one exception), nearly all of the places they hang out in online are trivial to grab data from reliably using the information below.

🧠 Pre-requisite knowledge

Provided you know how to use a computer and access the internet, you don't really need to know much else to begin exploring the world of unofficial APIs. It's also not necessary to be able to read code, program, or use specialist tools when you first start out. Knowledge and skills will be picked up gradually and easily over time if you find it interesting.

Here's an entirely optional (and opinionated) list of pages on Wikipedia that give a pretty good crash course in concepts and jargon:

Afterwards, it's worth reading TCP/IP Chapter 9, Section 4 TCP/IP Key Applications and Application Protocols, more specifically the content in this section relating to HTTP.

Although reversing unofficial APIs is demonstrably not API hacking, documentation from Portswigger on API testing and GraphQL, as well as OWASP and the Portswigger mapping to OWASP, are informative.

👀 Discovery

Reconnaissance and enumeration phase.

General

Note, many of the tools in this discovery section imply a black-box scenario. If the target platform is fully open source or even partly so, it may be possible to enumerate unofficial API endpoints by simply reading the source code. There may also be official documentation publicly available that will help with this and serve as an aide-mémoire when you come to the "fuzzing" and "collection" phases of the process.

🛠️ Tools

Wireshark
Portswigger Burp Suite
OWASP ZAP
weAudit VSCode extension for taking notes while reading an open-source codebase

📖 Reading

📼 Watch

soxoj's Hardcore OSINT : Reversing social media mechanisms

Web apps (inc. PWAs)

🛠️ Tools

Any of the General tools for discovery, plus:

Browser devtools

📖 Reading

Native apps (inc. mobile apps)

🛠️ Tools

Any of the General tools for discovery, plus:

📖 Reading

📚 Documentation

If you are mapping several endpoints, and especially if you are attempting a comprehensive unofficial client build, it is worth documenting in detail what you find during the discovery phase. You may wish to consider keeping track of requests and responses in full (including payloads, headers, etc.). There are also tools used by developers of official APIs to document their implementation that work just as well for reverse engineers of unofficial APIs.

🧪 Fuzzing

Automated and manual testing phase to check everything works.

curl
Postman

🧑‍🌾 Collection

Requests made and data returned from API for your specific purpose.

🐍 Python

requests
BeautifulSoup4
asyncio

🤷‍♂️ Don't know how to code?

Jan Lauridtsen's 2024 SANS OSINT Summit lecture Uncovering the invisible gold mines: How to dump raw data from TikTok on dumping raw data from apps built with a React frontend. Jan's Github repo for this talk here.

🏃‍♂️Evading detection

There are various techniques that can be used to prevent a platform detecting and blocking an unofficial API client:

Browser emulation arouses less suspicion
User agent switching, provided contradictory and / or unique fingerprint not emitted by your client
- Note: anti-fingerprinting measures, beyond user agent switching, quickly get complex and there's a continual arms race between fingerprinters / anti-fingerprinters
IP rotation: works better with residential rather than with data centre IP addresses typically provided by VPNs or proxies - this is likely to be quite expensive, however. Using AWS Lambda for easy-mode IP rotation or even an array of dongles to get mobile / cell IPs are also inventive strategies
Cloudflare evasion: several methods include trawling Shodan and Censys for server IPs behind Cloudflare IPs (exposed as a result of misconfiguration) to bypass Cloudflare - this is usually the most productive approach, though is not the only one. Do not accidentally DOS or DDOS the service - this is likely to be illegal.

As with anti-fingerprinting, block evasion in general is a complex and fast changing area. The hard work will be ensuring the code you write today continues to work as fingerprinting and anti-bot measures evolve. This dynamism is compounded by target sites occasionally refactoring their APIs. Always stay on the right side of the law and don't do anything that you will be unable to defend (possibly in a court of law).

Examples

Rolstenhouse/unofficial-apis

OSINT / SOCMINT tools

Stories

The prologue of Darknet Diaries Episode 120: Voulnet
The main narrative of Darknet Diaries Episode 84: Jet-setters

Use in journalism and academic research

"Alternative frontends"

⁉️ Isn't this just that web scraping thing i've head about but with more steps?

There is an undeniable degree of overlap between conventional web scraping and unofficial API reversing. Nevertheless, considering the latter to be merely a subset of web scraping ignores the fact that when people talk about "web scraping" they really mean "screen scraping": parsing and extracting data from the DOM of a rendered webpage (even using computer vision tech to grab imagery as well as text). Unofficial API reversing has the same objective (i.e., extracting data) but has its own set of methods (beyond some common approaches), and it does neither the topic at hand nor web scraping any favours by conflating the two. Sometimes when you're reaching for an unofficial means of extracting data, web scraping will be the tool you need and at other times unofficial APIs will be.

⚖️ Disclaimer

Reversing unofficial APIs and using them may violate terms of service depending on the platform and, in certain circumstances, the laws of your country. Legal risks are similar to those described on the Wikipedia entry on web scraping, though they will vary given how marked the difference can be in approach between unofficial API reversing and conventional web scraping. Legal threats around reverse engineering are worth looking into to supplement the aforementioned. The information presented here is for educational purposes only: I am neither responsible nor liable for your actions. Do no evil.

About

Resources for reverse engineering web and mobile apps for the specific purpose of discovering and using “unofficial APIs”

osint reversing socmint unofficial-apis undocumented-api