VirusTotal / vt-py

The official Python 3 client library for VirusTotal

Home Page:https://virustotal.github.io/vt-py/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

[help] Pagination for endpoints that are not collections

tlansec opened this issue · comments

There are some URL endpoints in VT that are paginated and use a structure like this:

{
 ...
 "links": {
    "self": "somelink",
    "next": "somelink"
 }

It seems that generally the approach in this library for these is to use client.Iterator, but some of these endpoins don't yield a collection, e.g.

/files/sha256goeshere?relationships=embedded_urls

What is the right way of querying this endpoint for more than 20 elements using the library? An example file with >20 embedded URLs is:

a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6

Hello @tlansec,

client.Iterator handles this for you by automatically fetching the next page of elements. You can configure how many elements the client retrieves in each call using the batch_size parameter. The following script may help you:

import vt
import os

FILE_SHA256 = 'a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6'
APIKEY = os.getenv('VT_APIKEY')

with vt.Client(APIKEY) as c:
  for url in c.iterator(f'/files/{FILE_SHA256}/embedded_urls', batch_size=100):
    print(url)

Using the raw API, you need to paginate using the cursor returned in every response (vt-py does that for you).

Regards,
Marta

Hi Marta,

Thanks for your quick response. I was hoping for a solution like this. What about if the URL is of the format:

f'/files/{FILE_SHA256}?relationships=communicating_files,referrer_files,downloaded_files

The benefit of building the URL this way is that I can fetch multiple relationships in a single API call. Is there a way to make this call using vt-py without paginating manually?

Cheers,
Tom

Hello Tom,

Yes, you can do that:

import vt
import os
from pprint import pprint

FILE_SHA256 = 'a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6'
APIKEY = os.getenv('VT_APIKEY')

with vt.Client(APIKEY) as c:
  o = c.get_object(f'/files/{FILE_SHA256}', params={'relationships': 'embedded_urls,dropped_files'})
  pprint(o.relationships)

Output:

{'dropped_files': {'data': [],
                   'links': {'related': 'https://www.virustotal.com/api/v3/files/a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6/dropped_files',
                             'self': 'https://www.virustotal.com/api/v3/files/a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6/relationships/dropped_files?limit=20'}},
 'embedded_urls': {'data': [{'context_attributes': {'url': 'http://www.microsoft.com/truetype/0'},
                             'id': '03ad546e1696448bfc0a3cc935b2fcf5b75bc12737b54fdde09905455940d9be',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://schemas.xmlsoap.org/soap/encoding/'},
                             'id': '07b29293c80ee54bf56b955e8b93faa065d86db4bf57936ea7e955d125983ebd',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://client.api.ufiler.pro/api/v1/integrator/%25s/rev1'},
                             'id': '125aa0fff3f9792f95947f44c120037c2ea0e10ed039a728cf94719a87af972a',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://crl.usertrust.com/AddTrustExternalCARoot.crl05'},
                             'id': '14403aa5305f372da84aaa466016848d26e386c6f7f36f01944598a3ef8517b7',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://crl.globalsign.com/root-r3.crl0b'},
                             'id': '1ac83fa936ad64f2fec247d442eb0d85f418bbececc09fd69a8edceb7ce45c99',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://www.libtorrent.org/'},
                             'id': '1ec7f71d9eee9ea133d6ebad9f284e1defa497d01dc395f68b612f2c85a38aea',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://crl.usertrust.com/UTN-USERFirst-Object.crl05'},
                             'id': '2e8b0617d0225f0f73973635b9be52d93fceee0d8426e6270f149eb8e99c3a58',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://crl.usertrust.com/USERTrustRSACertificationAuthority.crl0v'},
                             'id': '32a7ad60603b92dae7b1848adfc63803fd6f6c6462c30f655e030d7b468e1a07',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://crt.sectigo.com/SectigoRSACodeSigningCA.crt0'},
                             'id': '35cca8f60af063241b3170e87bd7325f94304feeceac5bddbb8648ee2ec14fe5',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://www.verisign.com/CPS0b'},
                             'id': '39d3d07711638eed8b57d46b3f73cf4443ee3d70340bbf822ee86ea7e60764db',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://ufiler.pro/agreement'},
                             'id': '3eafbc81636747982873228417ba48dbb37fa0a50590273d203ce75ff5f2f3db',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://soft-for-you.ru/upload/7z1604.exe'},
                             'id': '3ed030b34c926b6010de97f6d649708a1986d21961b15a131986b5dcbf0b8a25',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://ufiler.pro/'},
                             'id': '46e9c937a81ce80d59812239038d2b123012856385cca5380635e31cffd0629a',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://www.verisign.com/repository/CPS'},
                             'id': '4760eeb45da523e71913f86b2e925af11f9ab019c1f00342c7d877649c06f34f',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://crl.sectigo.com/SectigoRSATimeStampingCA.crl0t'},
                             'id': '49272405d86c0f3e801af971c8711437ba612154ab7fe663b0fd8d2632bd78bb',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://secure.globalsign.com/cacert/gsextendcodesignsha2g3ocsp.crt0'},
                             'id': '49567f5211c71399e4d4678ed0903a10a99c54936615a07687bd1bbc1457ef1a',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://client.api.ufiler.pro/api/v1/integrator/tools/geo/rev1'},
                             'id': '4a4df74087fc7d630b0afcf8e9702311da022843e105e1703995513a05c98ec4',
                             'type': 'url'},
                            {'context_attributes': {'url': 'http://crl.sectigo.com/COMODOTimeStampingCA_2.crl0r'},
                             'id': '4c1b6f99f3d2806619e3ddf4949b2c3526530d1b69baeba2066a1fe2dcfd3d88',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://sectigo.com/CPS0C'},
                             'id': '4e6dbce1a595066a5358994599fd717231632b67962fba37052d757c7293dfc9',
                             'type': 'url'},
                            {'context_attributes': {'url': 'https://ufiler.pro/private'},
                             'id': '5e64ddcc0c1c84f619ffe9c660e1d7ca70be9ccc19c519fe3f5a30398072efb1',
                             'type': 'url'}],
                   'links': {'next': 'https://www.virustotal.com/api/v3/files/a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6/relationships/embedded_urls?cursor=eyJsaW1pdCI6IDIwLCAib2Zmc2V0IjogMjB9&limit=20',
                             'related': 'https://www.virustotal.com/api/v3/files/a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6/embedded_urls',
                             'self': 'https://www.virustotal.com/api/v3/files/a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6/relationships/embedded_urls?limit=20'},
                   'meta': {'cursor': 'eyJsaW1pdCI6IDIwLCAib2Zmc2V0IjogMjB9'}}}

Bad things about this:

  • You will only get the first page for each relationship
  • You will only get the descriptors and context attributes in each relationship (not the whole object).

I hope it helps.

Regards,
Marta

In a world with unlimited API requests I would of course prefer the whole object, but would rather make 1 API req than make 5, especially where there may not even be any results.

It would be nice if there were a method in the client that dealt with the pagination for the user irrespective of the endpoint being queried.

The one thing I think may not be right is:

You will only get the first page for each relationship

It should be possible to get the second page using the "next" link in the links section, e.g.

...
'links': {'next': 'https://www.virustotal.com/api/v3/files/a0b9ddaa108d8dd6faca8b661fc0890be5f8077a131a5585e386dd25801276b6/relationships/embedded_urls?cursor=eyJsaW1pdCI6IDIwLCAib2Zmc2V0IjogMjB9&limit=20',
...

It should be possible to get the second page using the "next" link in the links section, e.g.

Conceptually, you are fetching an object; not an iterator. Objects can't be iterated through, that's why you would need to do it in two steps:

  • Get the first page of each relationship
  • Iterate over the remaining pages of each relationship.
with vt.Client(APIKEY) as c:
  o = c.get_object(f'/files/{FILE_SHA256}', params={'relationships': 'embedded_urls,dropped_files'})
  pprint(o.relationships)
  for r_name, r_data in o.relationships.items():
    cursor = r_data.get('meta', {}).get('cursor')
    if not cursor:
      continue
    for item in c.iterator(f'/files/{FILE_SHA256}/relationships/{r_name}', cursor=f'{cursor}-0'):
      print(item)

These concepts are also explained in our api docs: https://developers.virustotal.com/reference/relationships

I hope this helps.

Regards,
Marta

I get it now, thanks for taking the time to explain. I guess I have a design decision on my side to make about the approach to take.

I'm closing this issue since your question seems to be solved. Don't hesitate to reach out again if you need anything else.

Regards,
Marta