API
redact
takes the following argument:
keywords/phrases - the keyword, phrases to be replaced
Stream - a stream of data
options - options to override the default behavior
Note: For now it the redact process is case sensitive.
Test Cases
Test are written in jest
, run:
npm run test
Running
Have a look at playground.js
, where you can hit the API redact
To run test/lint cases:
npm i
and then run respective scripts
Thoughts and design consideration
The problem is solved by the following approach:
- There are many ways to solve this problem, but looking at this problem, we need to make sure how it can be composed with many different systems in the future. It should be scalable and super fast.
- Keeping that in mind, I made the
redact
API to consume the input as aStream
. Why? BecauseStream
is the core of everything in Node.js likehttp
,file
,sockets
etc. So in future, I can make it as a lamda function, to process the incoming documents via http, queues etc (which areStreams
) - The API is Async, so that it can process without blocking the main thread.
- Also it tries to process the text in chunks. Hence, it can process large files without any memory issue. Imagine loading a 10gb text/file? The code will work without any issues.
- It's config driven, currently it takes a simple
option
, but in future theoption
can have many things like ignore spaces, case insensitive etc. - Since the solution is built on Streams, it can handle backpressure (esp on slow consumers) out of the box.
- We can package and ship it as a npm module too.
Simple eslint and jest test cases are covered too.
Note
I have simple validations on the inputs, but that is not enough. There are many use cases, which the input validator or the input parser have ignored (for now, due to time constraints). Just FYI..
Not all edge cases covered in the code, but we can extend it easily by adding many rules in the
validator.js