PHP's strlen in JavaScript

How to use

You you can install via yarn add locutus and require this function via const strlen = require('locutus/php/strings/strlen').

It is important to use a bundler that supports tree-shaking so that you only ship the functions that you actually use to your browser, instead of all of Locutus, which is massive. Examples are: Parcel, webpack, or rollup.js. For server-side use this is typically less of a concern.

Examples

Please note that these examples are distilled from test cases that automatically verify our functions still work correctly. This could explain some quirky ones.

#codeexpected result
1strlen('Kevin van Zonneveld')19
2ini_set('unicode.semantics', 'on') strlen('A\ud87e\udc04Z')3

Notes

  • May look like overkill, but in order to be truly faithful to handling all Unicode characters and to this function in PHP which does not count the number of bytes but counts the number of characters, something like this is really necessary.

Here’s what our current JavaScript equivalent to PHP's strlen looks like.

module.exports = function strlen(string) {
// discuss at: https://locutus.io/php/strlen/
// original by: Kevin van Zonneveld (https://kvz.io)
// improved by: Sakimori
// improved by: Kevin van Zonneveld (https://kvz.io)
// input by: Kirk Strobeck
// bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
// revised by: Brett Zamir (https://brett-zamir.me)
// note 1: May look like overkill, but in order to be truly faithful to handling all Unicode
// note 1: characters and to this function in PHP which does not count the number of bytes
// note 1: but counts the number of characters, something like this is really necessary.
// example 1: strlen('Kevin van Zonneveld')
// returns 1: 19
// example 2: ini_set('unicode.semantics', 'on')
// example 2: strlen('A\ud87e\udc04Z')
// returns 2: 3

const str = string + ''

const iniVal = (typeof require !== 'undefined' ? require('../info/ini_get')('unicode.semantics') : undefined) || 'off'
if (iniVal === 'off') {
return str.length
}

let i = 0
let lgth = 0

const getWholeChar = function (str, i) {
const code = str.charCodeAt(i)
let next = ''
let prev = ''
if (code >= 0xd800 && code <= 0xdbff) {
// High surrogate (could change last hex to 0xDB7F to
// treat high private surrogates as single characters)
if (str.length <= i + 1) {
throw new Error('High surrogate without following low surrogate')
}
next = str.charCodeAt(i + 1)
if (next < 0xdc00 || next > 0xdfff) {
throw new Error('High surrogate without following low surrogate')
}
return str.charAt(i) + str.charAt(i + 1)
} else if (code >= 0xdc00 && code <= 0xdfff) {
// Low surrogate
if (i === 0) {
throw new Error('Low surrogate without preceding high surrogate')
}
prev = str.charCodeAt(i - 1)
if (prev < 0xd800 || prev > 0xdbff) {
// (could change last hex to 0xDB7F to treat high private surrogates
// as single characters)
throw new Error('Low surrogate without preceding high surrogate')
}
// We can pass over low surrogates now as the second
// component in a pair which we have already processed
return false
}
return str.charAt(i)
}

for (i = 0, lgth = 0; i < str.length; i++) {
if (getWholeChar(str, i) === false) {
continue
}
// Adapt this line at the top of any loop, passing in the whole string and
// the current iteration and returning a variable to represent the individual character;
// purpose is to treat the first part of a surrogate pair as the whole character and then
// ignore the second part
lgth++
}

return lgth
}

A community effort

Not unlike Wikipedia, Locutus is an ongoing community effort. Our philosophy follows The McDonald’s Theory. This means that we assimilate first iterations with imperfections, hoping for others to take issue with-and improve them. This unorthodox approach has worked very well to foster fun and fruitful collaboration, but please be reminded to use our creations at your own risk. THE SOFTWARE IS PROVIDED "AS IS" has never been more true than for Locutus.

Now go and: [ View on GitHub | Edit on GitHub | View Raw ]


« More PHP strings functions


Star