Parse HTML character references.
- What is this?
- When should I use this?
- Install
- Use
- API
- Types
- Compatibility
- Security
- Related
- Contribute
- License
This is a small and powerful decoder of HTML character references (often called entities).
You can use this for spec-compliant decoding of character references. It’s small and fast enough to do that well. You can also use this when making a linter, because there are different warnings emitted with reasons for why and positional info on where they happened.
This package is ESM only. In Node.js (version 14.14+, 16.0+), install with npm:
npm install parse-entities
In Deno with esm.sh
:
import {parseEntities} from 'https://esm.sh/parse-entities@3'
In browsers with esm.sh
:
<script type="module">
import {parseEntities} from 'https://esm.sh/parse-entities@3?bundle'
</script>
import {parseEntities} from 'parse-entities'
console.log(parseEntities('alpha & bravo')))
// => alpha & bravo
console.log(parseEntities('charlie ©cat; delta'))
// => charlie ©cat; delta
console.log(parseEntities('echo © foxtrot ≠ golf 𝌆 hotel'))
// => echo © foxtrot ≠ golf 𝌆 hotel
This package exports the identifier parseEntities
.
There is no default export.
Parse HTML character references.
Configuration (optional).
Additional character to accept (string?
, default: ''
).
This allows other characters, without error, when following an ampersand.
Whether to parse value
as an attribute value (boolean?
, default: false
).
This results in slightly different behavior.
Whether to allow nonterminated references (boolean
, default: true
).
For example, ©cat
for ©cat
.
This behavior is compliant to the spec but can lead to unexpected results.
Starting position
of value
(Position
or Point
, optional).
Useful when dealing with values nested in some sort of syntax tree.
The default is:
{line: 1, column: 1, offset: 0}
Error handler (Function?
).
Text handler (Function?
).
Reference handler (Function?
).
Context used when calling warning
('*'
, optional).
Context used when calling text
('*'
, optional).
Context used when calling reference
('*'
, optional)
string
— decoded value
.
Error handler.
this
(*
) — refers towarningContext
when given toparseEntities
reason
(string
) — human readable reason for emitting a parse errorpoint
(Point
) — place where the error occurredcode
(number
) — machine readable code the error
The following codes are used:
Code | Example | Note |
---|---|---|
1 |
foo & bar |
Missing semicolon (named) |
2 |
foo { bar |
Missing semicolon (numeric) |
3 |
Foo &bar baz |
Empty (named) |
4 |
Foo &# |
Empty (numeric) |
5 |
Foo &bar; baz |
Unknown (named) |
6 |
Foo € baz |
Disallowed reference |
7 |
Foo � baz |
Prohibited: outside permissible unicode range |
Text handler.
this
(*
) — refers totextContext
when given toparseEntities
value
(string
) — string of contentposition
(Position
) — place wherevalue
starts and ends
Character reference handler.
this
(*
) — refers toreferenceContext
when given toparseEntities
value
(string
) — decoded character referenceposition
(Position
) — place wheresource
starts and endssource
(string
) — raw source of character reference
This package is fully typed with TypeScript.
It exports the additional types Options
, WarningHandler
,
ReferenceHandler
, and TextHandler
.
This package is at least compatible with all maintained versions of Node.js. As of now, that is Node.js 14.14+ and 16.0+. It also works in Deno and modern browsers.
This package is safe: it matches the HTML spec to parse character references.
wooorm/stringify-entities
— encode HTML character referenceswooorm/character-entities
— info on character referenceswooorm/character-entities-html4
— info on HTML4 character referenceswooorm/character-entities-legacy
— info on legacy character referenceswooorm/character-reference-invalid
— info on invalid numeric character references
Yes please! See How to Contribute to Open Source.