Skip to content

A JS implementation of the Unicode Line Breaking Algorithm (UAX #14)

License

Notifications You must be signed in to change notification settings

WhimsicalCode/linebreak

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

44 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

linebreak

An implementation of the Unicode Line Breaking Algorithm (UAX #14)

Line breaking, also known as word wrapping, is the process of breaking a section of text into lines such that it will fit in the available width of a page, window or other display area. The Unicode Line Breaking Algorithm performs part of this process. Given an input text, it produces a set of positions called "break opportunities" that are appropriate points to begin a new line. The selection of actual line break positions from the set of break opportunities is not covered by the Unicode Line Breaking Algorithm, but is in the domain of higher level software with knowledge of the available width and the display size of the text.

This is a JavaScript implementation of the Unicode Line Breaking Algorithm for Node.js (and browsers I guess). Currently supports Unicode version 13. It is used by PDFKit for line wrapping text in PDF documents, but since the algorithm knows nothing about the actual visual appearance or layout of text, it could be used for other things as well.

Installation

You can install via npm

npm install linebreak

Example

var LineBreaker = require('linebreak');

var lorem = 'lorem ipsum...';
var breaker = new LineBreaker(lorem);
var last = 0;
var bk;

while (bk = breaker.nextBreak()) {
  // get the string between the last break and this one
  var word = lorem.slice(last, bk.position);
  console.log(word);

  // you can also check bk.required to see if this was a required break...
  if (bk.required) {
    console.log('\n\n');
  }

  last = bk.position;
}

Development Notes

In order to use the library, you shouldn't need to know this, but if you're interested in contributing or fixing bugs, these things might be of interest.

  • The src/classes.js file is automatically generated from LineBreak.txt in the Unicode database by src/generate_data.js. It should be rare that you need to run this, but you may if, for instance, you want to change the Unicode version.

  • You can run the tests using npm test. They are written using mocha, and generated from LineBreakTest.txt from the Unicode database, which is included in the repository for performance reasons while running them. About 50 of the over 7600 tests are currently skipped due to implementation differences. It appears that some of the tests may be wrong or use different tailoring from the spec.

License

MIT

About

A JS implementation of the Unicode Line Breaking Algorithm (UAX #14)

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • JavaScript 100.0%