PHP's utf8_encode in JavaScript

Here’s what our current JavaScript equivalent to PHP's utf8_encode looks like.

module.exports = function utf8_encode (argString) { // eslint-disable-line camelcase
// discuss at: https://locutus.io/php/utf8_encode/
// original by: Webtoolkit.info (https://www.webtoolkit.info/)
// improved by: Kevin van Zonneveld (https://kvz.io)
// improved by: sowberry
// improved by: Jack
// improved by: Yves Sucaet
// improved by: kirilloid
// bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
// bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
// bugfixed by: Ulrich
// bugfixed by: Rafał Kukawski (https://blog.kukawski.pl)
// bugfixed by: kirilloid
// example 1: utf8_encode('Kevin van Zonneveld')
// returns 1: 'Kevin van Zonneveld'
if (argString === null || typeof argString === 'undefined') {
return ''
}
// .replace(/\r\n/g, "\n").replace(/\r/g, "\n");
const string = (argString + '')
let utftext = ''
let start
let end
let stringl = 0
start = end = 0
stringl = string.length
for (let n = 0; n < stringl; n++) {
let c1 = string.charCodeAt(n)
let enc = null
if (c1 < 128) {
end++
} else if (c1 > 127 && c1 < 2048) {
enc = String.fromCharCode(
(c1 >> 6) | 192, (c1 & 63) | 128
)
} else if ((c1 & 0xF800) !== 0xD800) {
enc = String.fromCharCode(
(c1 >> 12) | 224, ((c1 >> 6) & 63) | 128, (c1 & 63) | 128
)
} else {
// surrogate pairs
if ((c1 & 0xFC00) !== 0xD800) {
throw new RangeError('Unmatched trail surrogate at ' + n)
}
const c2 = string.charCodeAt(++n)
if ((c2 & 0xFC00) !== 0xDC00) {
throw new RangeError('Unmatched lead surrogate at ' + (n - 1))
}
c1 = ((c1 & 0x3FF) << 10) + (c2 & 0x3FF) + 0x10000
enc = String.fromCharCode(
(c1 >> 18) | 240, ((c1 >> 12) & 63) | 128, ((c1 >> 6) & 63) | 128, (c1 & 63) | 128
)
}
if (enc !== null) {
if (end > start) {
utftext += string.slice(start, end)
}
utftext += enc
start = end = n + 1
}
}
if (end > start) {
utftext += string.slice(start, stringl)
}
return utftext
}
[ View on GitHub | Edit on GitHub | Source on GitHub ]

How to use

You you can install via npm install locutus and require it via require('locutus/php/xml/utf8_encode'). You could also require the xml module in full so that you could access xml.utf8_encode instead.

If you intend to target the browser, you can then use a module bundler such as Parcel, webpack, Browserify, or rollup.js. This can be important because Locutus allows modern JavaScript in the source files, meaning it may not work in all browsers without a build/transpile step. Locutus does transpile all functions to ES5 before publishing to npm.

A community effort

Not unlike Wikipedia, Locutus is an ongoing community effort. Our philosophy follows The McDonald’s Theory. This means that we don't consider it to be a bad thing that many of our functions are first iterations, which may still have their fair share of issues. We hope that these flaws will inspire others to come up with better ideas.

This way of working also means that we don't offer any production guarantees, and recommend to use Locutus inspiration and learning purposes only.

Examples

Please note that these examples are distilled from test cases that automatically verify our functions still work correctly. This could explain some quirky ones.

#codeexpected result
1utf8_encode('Kevin van Zonneveld')'Kevin van Zonneveld'

« More PHP xml functions


Star