ZipFile / python-getdents

Python binding to linux syscall getdents64

Home Page:https://pypi.org/project/getdents/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Python getdents

Iterate large directories efficiently with python.

About

python-getdents is a simple wrapper around Linux system call getdents64 (see man getdents for details). More details on approach.

TODO

  • Verify that implementation works on platforms other than x86_64.

Install

pip install getdents

For development

python3 -m venv env
. env/bin/activate
pip install -e .[test]

Building Wheels

pip install cibuildwheel
cibuildwheel --platform linux --output-dir wheelhouse

Run tests

ulimit -v 33554432 && py.test tests/

Or

ulimit -v 33554432 && ./setup.py test

Usage

from getdents import getdents

for inode, type, name in getdents('/tmp', 32768):
    print(name)

Advanced

import os
from getdents import *

fd = os.open('/tmp', O_GETDENTS)

for inode, type, name in getdents_raw(fd, 2**20):
    print({
            DT_BLK:     'blockdev',
            DT_CHR:     'chardev ',
            DT_DIR:     'dir     ',
            DT_FIFO:    'pipe    ',
            DT_LNK:     'symlink ',
            DT_REG:     'file    ',
            DT_SOCK:    'socket  ',
            DT_UNKNOWN: 'unknown ',
        }[type], {
            True:  'd',
            False: ' ',
        }[inode == 0],
        name,
    )

os.close(fd)

CLI

Usage

python-getdents [-h] [-b N] [-o NAME] PATH

Options

+--------------------------+-------------------------------------------------+ | Option | Description | +==========================+=================================================+ | -b N | Buffer size (in bytes) to allocate when | | | iterating over directory. Default is 32768, the | | | same value used by glibc, you probably want to | +--------------------------+ increase this value. Try starting with 16777216 | | --buffer-size N | (16 MiB). Best performance is achieved when | | | buffer size rounds to size of the file system | | | block. | +--------------------------+-------------------------------------------------+ | -o NAME | Output format: | | | | | | * plain (default) Print only names. | | | * csv Print as comma-separated values in | +--------------------------+ order: inode, type, name. | | --output-format NAME | * csv-headers Same as csv, but print | | | headers on the first line also. | | | * json output as JSON array. | | | * json-stream output each directory entry | | | as single json object separated by newline. | +--------------------------+-------------------------------------------------+

Exit codes

  • 3 - Requested buffer is too large
  • 4 - PATH not found.
  • 5 - PATH is not a directory.
  • 6 - Not enough permissions to read contents of the PATH.

Examples

python-getdents /path/to/large/dir
python -m getdents /path/to/large/dir
python-getdents /path/to/large/dir -o csv -b 16777216 > dir.csv

About

Python binding to linux syscall getdents64

https://pypi.org/project/getdents/

License:BSD 2-Clause "Simplified" License


Languages

Language:Python 82.1%Language:C 17.9%