webcrawler-source not consistent
ddieruf opened this issue · comments
David Dieruf commented
The example is
user-agent: "langstream.ai-webcrawler/1.0"
bucketName: "{{{secrets.s3-credentials.bucket-name}}}"
endpoint: "{{{secrets.s3-credentials.endpoint}}}"
access-key: "{{{secrets.s3-credentials.access-key}}}"
secret-key: "{{{secrets.s3-credentials.secret}}}"
region: "{{{secrets.s3-credentials.region}}}"
But the S3 credentials talk about:
bucketName
endpoint
username
password
No mention of access-key, secret-key, region.
David Dieruf commented
Looking at the code there is no "username" or "password" value.
bucketName = configuration.getOrDefault("bucketName", "langstream-source").toString();
String endpoint = configuration.getOrDefault("endpoint", "http://minio-endpoint.-not-set:9090").toString();
String username = configuration.getOrDefault("access-key", "minioadmin").toString();
String password = configuration.getOrDefault("secret-key", "minioadmin").toString();
String region = configuration.getOrDefault("region", "").toString();
allowedDomains = Set.of(configuration.getOrDefault("allowed-domains", "")
.toString().split(","));
seedUrls = Set.of(configuration.getOrDefault("seed-urls", "")
.toString().split(","));
idleTime = Integer.parseInt(configuration.getOrDefault("idle-time", 1).toString());
maxUnflushedPages = Integer.parseInt(configuration.getOrDefault("max-unflushed-pages", 100).toString());
flushNext.set(maxUnflushedPages);
minTimeBetweenRequests = Integer.parseInt(configuration.getOrDefault("min-time-between-requests", 100).toString());
userAgent = configuration.getOrDefault("user-agent", "langstream.ai-webcrawler/1.0").toString();