LangStream / documentation

LangStream handbook of how to create AI applications. Batteries included.

Home Page:https://docs.langstream.ai

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

webcrawler-source not consistent

ddieruf opened this issue · comments

The example is

user-agent: "langstream.ai-webcrawler/1.0"
bucketName: "{{{secrets.s3-credentials.bucket-name}}}"
endpoint: "{{{secrets.s3-credentials.endpoint}}}"
access-key: "{{{secrets.s3-credentials.access-key}}}"
secret-key: "{{{secrets.s3-credentials.secret}}}"
region: "{{{secrets.s3-credentials.region}}}"

But the S3 credentials talk about:

bucketName
endpoint
username
password

No mention of access-key, secret-key, region.

Looking at the code there is no "username" or "password" value.

bucketName = configuration.getOrDefault("bucketName", "langstream-source").toString();
String endpoint = configuration.getOrDefault("endpoint", "http://minio-endpoint.-not-set:9090").toString();
String username =  configuration.getOrDefault("access-key", "minioadmin").toString();
String password =  configuration.getOrDefault("secret-key", "minioadmin").toString();
String region = configuration.getOrDefault("region", "").toString();
allowedDomains = Set.of(configuration.getOrDefault("allowed-domains", "")
        .toString().split(","));
seedUrls = Set.of(configuration.getOrDefault("seed-urls", "")
        .toString().split(","));
idleTime = Integer.parseInt(configuration.getOrDefault("idle-time", 1).toString());
maxUnflushedPages = Integer.parseInt(configuration.getOrDefault("max-unflushed-pages", 100).toString());
flushNext.set(maxUnflushedPages);
minTimeBetweenRequests = Integer.parseInt(configuration.getOrDefault("min-time-between-requests", 100).toString());
userAgent = configuration.getOrDefault("user-agent", "langstream.ai-webcrawler/1.0").toString();