Resources for reverse engineering web and native apps (especially mobile) for the specific purpose of discovering and using “unofficial APIs” (aka "undocumented APIs") for OSINT / SOCMINT and other purposes.
Before considering unofficial APIs, it is often worth checking whether or not the target platform publishes an official public and/or free API that will get you what you need. There are some great repos on Github listing public and free APIs.
Additionally, it may not be necessary to reverse an undocumented API yourself if someone has already published a comprehensive unofficial client. Many instances of these may also be found on Github.
Thank you for your service.
A public API that may prove especially helpful for supporting your work targeting XRW individuals and groups is Open Measures, formerly known as the Social Media Analysis Toolkit (SMAT).
In terms of "rolling-your-own" client to directly ingest SOCMINT from unofficial APIs: given that extremists and terrorists (in general not just the XRW) typically don't frequent the biggest mainstream platforms (with one exception), nearly all of the places they hang out in online are trivial to grab data from reliably using the information below.
Provided you know how to use a computer and access the internet, you don't really need to know much else to begin exploring the world of unofficial APIs. It's also not necessary to be able to read code, program, or use specialist tools when you first start out. Knowledge and skills will be picked up gradually and easily over time if you find it interesting.
Here's an entirely optional (and opinionated) list of pages on Wikipedia that give a pretty good crash course in concepts and jargon:
- Web 2.0
- Web service
- Request-response
- Client-server model
- Frontend and backend
- Web server
- XML
- XMLHttpRequest
- Web app
- Dynamic webpage
- Responsive web design
- Single-page application
- Progressive web app
- Mobile app
- API
- Web API
- OpenAPI
- JSON
- REST
- GraphQL
- Mashup
- Access token
- Synchronous
- Asynchronous
- Ajax
- Webhook
- WebSocket
Afterwards, it's worth reading TCP/IP Chapter 9, Section 4 TCP/IP Key Applications and Application Protocols, more specifically the content in this section relating to HTTP.
Although reversing unofficial APIs is demonstrably not API hacking, documentation from Portswigger on API testing and GraphQL, as well as OWASP and the Portswigger mapping to OWASP, are informative.
Reconnaissance and enumeration phase.
Note, many of the tools in this discovery section imply a black-box scenario. If the target platform is fully open source or even partly so, it may be possible to enumerate unofficial API endpoints by simply reading the source code. There may also be official documentation publicly available that will help with this and serve as an aide-mémoire when you come to the "fuzzing" and "collection" phases of the process.
- Wireshark
- Portswigger Burp Suite
- OWASP ZAP
- weAudit VSCode extension for taking notes while reading an open-source codebase
- API Discovery: 15 ways to find APIs
- Discover APIs section on unofficial APIs
- How To Find API Endpoints Of A Website: A Complete Guide
Any of the General tools for discovery, plus:
- Browser devtools
- Finding Undocumented APIs
- Scraping XHR
- How to use undocumented web APIs
- Computational research in the post-API age
- Web scraping 201: Finding the API
Any of the General tools for discovery, plus:
- How to use Ghidra to reverse engineer mobile application
- Reverse engineering an Android application
- Can ZAP be used to test mobile apps?
- Use Burp Suite for mobile testing
If you are mapping several endpoints, and especially if you are attempting a comprehensive unofficial client build, it is worth documenting in detail what you find during the discovery phase. You may wish to consider keeping track of requests and responses in full (including payloads, headers, etc.). There are also tools used by developers of official APIs to document their implementation that work just as well for reverse engineers of unofficial APIs.
Automated and manual testing phase to check everything works.
- curl
- Postman
Requests made and data returned from API for your specific purpose.
- requests
- BeautifulSoup4
- asyncio
- Jan Lauridtsen's 2024 SANS OSINT Summit lecture Uncovering the invisible gold mines: How to dump raw data from TikTok on dumping raw data from apps built with a React frontend. Jan's Github repo for this talk here.
There are various techniques that can be used to prevent a platform detecting and blocking an unofficial API client:
- Browser emulation arouses less suspicion
- User agent switching, provided contradictory and / or unique fingerprint not emitted by your client
- Note: anti-fingerprinting measures, beyond user agent switching, quickly get complex and there's a continual arms race between fingerprinters / anti-fingerprinters
- IP rotation: works better with residential rather than with data centre IP addresses typically provided by VPNs or proxies - this is likely to be quite expensive, however. Using AWS Lambda for easy-mode IP rotation or even an array of dongles to get mobile / cell IPs are also inventive strategies
- Cloudflare evasion: several methods include trawling Shodan and Censys for server IPs behind Cloudflare IPs (exposed as a result of misconfiguration) to bypass Cloudflare - this is usually the most productive approach, though is not the only one. Do not accidentally DOS or DDOS the service - this is likely to be illegal.
As with anti-fingerprinting, block evasion in general is a complex and fast changing area. The hard work will be ensuring the code you write today continues to work as fingerprinting and anti-bot measures evolve. This dynamism is compounded by target sites occasionally refactoring their APIs. Always stay on the right side of the law and don't do anything that you will be unable to defend (possibly in a court of law).
- The prologue of Darknet Diaries Episode 120: Voulnet
- The main narrative of Darknet Diaries Episode 84: Jet-setters
- https://gizmodo.com/ring-s-hidden-data-let-us-map-amazons-sprawling-home-su-1840312279
- https://www.theguardian.com/us-news/2022/aug/25/porch-piracy-package-thefts-doorstep-delivery
- https://site.dcalacci.net/papers/ring-cscw-2021.pdf
- https://www.pnas.org/doi/full/10.1073/pnas.1717781115
- https://source.opennews.org/articles/freeing-plum-book/
There is an undeniable degree of overlap between conventional web scraping and unofficial API reversing. Nevertheless, considering the latter to be merely a subset of web scraping ignores the fact that when people talk about "web scraping" they really mean "screen scraping": parsing and extracting data from the DOM of a rendered webpage (even using computer vision tech to grab imagery as well as text). Unofficial API reversing has the same objective (i.e., extracting data) but has its own set of methods (beyond some common approaches), and it does neither the topic at hand nor web scraping any favours by conflating the two. Sometimes when you're reaching for an unofficial means of extracting data, web scraping will be the tool you need and at other times unofficial APIs will be.
Reversing unofficial APIs and using them may violate terms of service depending on the platform and, in certain circumstances, the laws of your country. Legal risks are similar to those described on the Wikipedia entry on web scraping, though they will vary given how marked the difference can be in approach between unofficial API reversing and conventional web scraping. Legal threats around reverse engineering are worth looking into to supplement the aforementioned. The information presented here is for educational purposes only: I am neither responsible nor liable for your actions. Do no evil.