PHP's is_unicode in JavaScript

Here’s what our current JavaScript equivalent to PHP's is_unicode looks like.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
module.exports = function is_unicode (vr) { // eslint-disable-line camelcase
// discuss at: http://locutus.io/php/is_unicode/
// original by: Brett Zamir (http://brett-zamir.me)
// note 1: Almost all strings in JavaScript should be Unicode
// example 1: is_unicode('We the peoples of the United Nations...!')
// returns 1: true
if (typeof vr !== 'string') {
return false
}
// If surrogates occur outside of high-low pairs, then this is not Unicode
var arr = []
var highSurrogate = '[\uD800-\uDBFF]'
var lowSurrogate = '[\uDC00-\uDFFF]'
var highSurrogateBeforeAny = new RegExp(highSurrogate + '([\\s\\S])', 'g')
var lowSurrogateAfterAny = new RegExp('([\\s\\S])' + lowSurrogate, 'g')
var singleLowSurrogate = new RegExp('^' + lowSurrogate + '$')
var singleHighSurrogate = new RegExp('^' + highSurrogate + '$')
while ((arr = highSurrogateBeforeAny.exec(vr)) !== null) {
if (!arr[1] || !arr[1].match(singleLowSurrogate)) {
// If high not followed by low surrogate
return false
}
}
while ((arr = lowSurrogateAfterAny.exec(vr)) !== null) {
if (!arr[1] || !arr[1].match(singleHighSurrogate)) {
// If low not preceded by high surrogate
return false
}
}
return true
}
[ View on GitHub | Edit on GitHub | Source on GitHub ]

How to use

You you can install via npm install locutus and require it via require('locutus/php/var/is_unicode'). You could also require the var module in full so that you could access var.is_unicode instead.

If you intend to target the browser, you can then use a module bundler such as Browserify, webpack or rollup.js.

ES5/ES6

This function targets ES5, but as of Locutus 2.0.2 we also support ES6 functions. Locutus transpiles to ES5 before publishing to npm.

A community effort

Not unlike Wikipedia, Locutus is an ongoing community effort. Our philosophy follows The McDonald’s Theory. This means that we don't consider it to be a bad thing that many of our functions are first iterations, which may still have their fair share of issues. We hope that these flaws will inspire others to come up with better ideas.

This way of working also means that we don't offer any production guarantees, and recommend to use Locutus inspiration and learning purposes only.

Notes

  • Almost all strings in JavaScript should be Unicode

Examples

Please note that these examples are distilled from test cases that automatically verify our functions still work correctly. This could explain some quirky ones.

#codeexpected result
1is_unicode('We the peoples of the United Nations...!')true

« More PHP var functions