Skip to content

A node.js through stream that does basic streaming text search and replace and is chunk boundary friendly

License

Notifications You must be signed in to change notification settings

eugeneware/replacestream

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

94 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

replacestream

A node.js transform stream for basic streaming text search/replacement friendy with chunk boundary.

build status Coverage Status

Installation

Install via npm:

$ npm install replacestream

Examples

Search and replace over a test file

Say we want to do a search and replace over the following file:

// happybirthday.txt
Happy birthday to you!
Happy birthday to you!
Happy birthday to dear Liza!
Happy birthday to you!
var replaceStream = require('replacestream')
  , fs = require('fs')
  , path = require('path');

// Replace all the instances of 'birthday' with 'earthday'
fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream('birthday', 'earthday'))
  .pipe(process.stdout);

Running this will print out:

$ node simple.js
Happy earthday to you!
Happy earthday to you!
Happy earthday to dear Liza!
Happy earthday to you!

You can also limit the number of replaces to first n:

// Replace the first 2 of the instances of 'birthday' with 'earthday'
fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream('birthday', 'earthday', { limit: 2 } ))
  .pipe(process.stdout);

Which would output:

$ node simple.js
Happy earthday to you!
Happy earthday to you!
Happy birthday to dear Liza!
Happy birthday to you!

And you can also pass in a replacement function which will get called for each replacement:

// Replace the word 'Happy' with a different word each time
var words = ['Awesome', 'Good', 'Super', 'Joyous'];
function replaceFn(match) {
  return words.shift();
}
fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream('Happy', replaceFn))
  .pipe(process.stdout);

Which would output:

$ node simple.js
Awesome birthday to you!
Good birthday to you!
Super birthday to dear Liza!
Joyous birthday to you!

Search and replace with Regular Expressions

Here's the same example, but with RegEx:

// happybirthday.txt
Happy birthday to you!
Happy birthday to you!
Happy birthday to dear Liza!
Happy birthday to you!
var replaceStream = require('replacestream')
  , fs = require('fs')
  , path = require('path');

// Replace any word that has an 'o' with 'oh'
fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream(/\w*o\w*/g, 'oh'))
  .pipe(process.stdout);

Running this will print out:

$ node simple.js
Happy birthday oh oh!
Happy birthday oh oh!
Happy birthday oh dear Liza!
Happy birthday oh oh!

You can also insert captures using the $1 ($index) notation. This is similar the built in method replace.

// happybirthday.txt
Happy birthday to you!
Happy birthday to you!
Happy birthday to dear Liza!
Happy birthday to you!
var replaceStream = require('replacestream')
  , fs = require('fs')
  , path = require('path');

// Replace any word that has an 'o' with 'oh'
fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream(/(dear) (Liza!)/, 'my very good and $1 friend $2'))
  .pipe(process.stdout);

Running this will print:

$ node simple.js
Happy birthday to you!
Happy birthday to you!
Happy birthday to my very good and dear friend Liza!
Happy birthday to you!

You can also pass in a replacement function. The function will be passed parameters just like String.prototype.replace (e.g. replaceFunction(match, p1, p2, offset, string)). In this case the matched string is limited to the buffer the match is found on, not the entire stream.

function replaceFn() {
  return arguments[2] + ' to ' + arguments[1]
}
fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream(/(birt\w*)\sto\s(you)/g, replaceFn))
  .pipe(process.stdout);

Which would output:

$ node simple.js
Happy you to birthday!
Happy you to birthday!
Happy birthday to dear Liza!
Happy you to birthday!

Web server search and replace over a test file

Here's the same example, but kicked off from a HTTP server:

// server.js
var http = require('http')
  , fs = require('fs')
  , path = require('path')
  , replaceStream = require('replacestream');

var app = function (req, res) {
  if (req.url.match(/^\/happybirthday\.txt$/)) {
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
      .pipe(replaceStream('birthday', 'earthday'))
      .pipe(res);
  } else {
    res.writeHead(200, { 'Content-Type': 'text/plain' });
    res.end('File not found');
  }
};
var server = http.createServer(app).listen(3000);

When you request the file:

$ curl -i "http://localhost:3000/happybirthday.txt"
HTTP/1.1 200 OK
Content-Type: text/plain
Date: Mon, 08 Jul 2013 06:45:21 GMT
Connection: keep-alive
Transfer-Encoding: chunked

Happy earthday to you!
Happy earthday to you!
Happy earthday to dear Liza!
Happy earthday to you!

NB: If your readable Stream that you're piping through the replacestream is paused, then you may have to call the .resume() method on it.

Configuration

Changing the encoding

You can also change the text encoding of the search and replace by setting an encoding property on the options object:

// Replace the first 2 of the instances of 'birthday' with 'earthday'
fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream('birthday', 'earthday', { limit: 2, encoding: 'ascii' } ))
  .pipe(process.stdout);

By default the encoding will be set to 'utf8'.

List of options

Option Default Description
limit Infinity Sets a limit on the number of times the replacement will be made. This is forced to one when a regex without the global flag is provided.
encoding utf8 The text encoding used during search and replace.
maxMatchLen 100 When doing cross-chunk replacing, this sets the maximum length match that will be supported.
ignoreCase true When doing string match (not relevant for regex matching) whether to do a case insensitive search.
regExpOptions undefined (Deprecated) When provided, these flags will be used when creating the search regexes internally. This functionality is deprecated as the flags set on the regex provided are no longer mutated if this is not provided.

FAQ

What does "chunk boundary friendly" mean?

It means that a replace should happen even if the string to be replaced is between streaming chunks of data. For example, say I do something like this

fs.createReadStream(path.join(__dirname, 'happybirthday.txt'))
  .pipe(replaceStream('birthday', 'earthday'))
  .pipe(process.stdout);

Here i am trying to replace all instances of 'birthday' with 'earthday'. Let's say the first chunk of data that is available is 'happy birth' and the second chunk of data available is 'day'. In this case the replace will happen successfully, the same as it would have if the chunk contained the entire string that was to be replaced (e.g. chunk1 = 'happy' chunk2 = 'birthday')

Does that apply across more than 2 chunks? How does it work with regexes?

It does apply across multiple chunks. By default, however, the maximum match length (maxMatchLen) is set to 100 characters. You can increase this by adding maxMatchLen: x to your options:

replacestream('hi', 'bye', {maxMatchLen: 1000})

A string the size of maxMatchLen will be saved in memory so it shouldn't be set too high. maxMatchLen is what allows us to have a match between chunks. We are saving maxMatchLen characters in a string (the last maxMatchLen characters from the previous chunks) that we prepend to the current chunk, then attempt to find a match.

As for regex it works exactly the same except you would pass a regular expression into replacestream:

replacestream(/a+/, 'b')

Contributing

replacestream is an OPEN Open Source Project. This means that:

Individuals making significant and valuable contributions are given commit-access to the project to contribute as they see fit. This project is more like an open wiki than a standard guarded open source project.

See the CONTRIBUTING.md file for more details.

Contributors

replacestream is only possible due to the excellent work of the following contributors:

Eugene WareGitHub/eugeneware
Ryan MehtaGitHub/mehtaphysical
Tim ChaplinGitHub/tjchaplin
Bryce GibsonGitHub/bryce-gibson
RomainGitHub/Filirom1
Shinnosuke WatanabeGitHub/shinnn
Steve MaoGitHub/stevemao
Martin PetlušGitHub/martinpetlus

About

A node.js through stream that does basic streaming text search and replace and is chunk boundary friendly

Resources

License

Stars

Watchers

Forks

Packages

No packages published