PHP's strlen in TypeScript

Rosetta Stone: c/strlen · ruby/length · perl/length · lua/len · r/nchar · elixir/length · awk/length · tcl/length · powershell/length · rust/len

How to use

Install via yarn add locutus and import: import { strlen } from 'locutus/php/strings/strlen'.

Or with CommonJS: const { strlen } = require('locutus/php/strings/strlen')

Use a bundler that supports tree-shaking so you only ship the functions you actually use. Vite, webpack, Rollup, and Parcel all handle this. For server-side use this is less of a concern.

Examples

These examples are extracted from test cases that automatically verify our functions against their native counterparts.

#	code	expected result
1	`strlen('Kevin van Zonneveld')`	`19`
2	`ini_set('unicode.semantics', 'on') strlen('A\ud87e\udc04Z')`	`3`

Notes

May look like overkill, but in order to be truly faithful to handling all Unicode characters and to this function in PHP which does not count the number of bytes but counts the number of characters, something like this is really necessary.

Dependencies

This function uses the following Locutus functions:

ini_get (php/info)

Here's what our current TypeScript equivalent to PHP's strlen looks like.

import { ini_get } from '../info/ini_get.ts'

export function strlen(string?: string): number {
  //  discuss at: https://locutus.io/php/strlen/
  // original by: Kevin van Zonneveld (https://kvz.io)
  // improved by: Sakimori
  // improved by: Kevin van Zonneveld (https://kvz.io)
  //    input by: Kirk Strobeck
  // bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
  //  revised by: Brett Zamir (https://brett-zamir.me)
  //      note 1: May look like overkill, but in order to be truly faithful to handling all Unicode
  //      note 1: characters and to this function in PHP which does not count the number of bytes
  //      note 1: but counts the number of characters, something like this is really necessary.
  //   example 1: strlen('Kevin van Zonneveld')
  //   returns 1: 19
  //   example 2: ini_set('unicode.semantics', 'on')
  //   example 2: strlen('A\ud87e\udc04Z')
  //   returns 2: 3

  if (typeof string === 'undefined') {
    throw new Error('strlen() expects exactly 1 argument, 0 given')
  }

  const str = string + ''

  const iniVal = ini_get('unicode.semantics') || 'off'
  if (iniVal === 'off') {
    return str.length
  }

  let i = 0
  let lgth = 0

  const getWholeChar = function (str: string, i: number): string | false {
    const code = str.charCodeAt(i)
    if (code >= 0xd800 && code <= 0xdbff) {
      // High surrogate (could change last hex to 0xDB7F to
      // treat high private surrogates as single characters)
      if (str.length <= i + 1) {
        throw new Error('High surrogate without following low surrogate')
      }
      const next = str.charCodeAt(i + 1)
      if (next < 0xdc00 || next > 0xdfff) {
        throw new Error('High surrogate without following low surrogate')
      }
      return str.charAt(i) + str.charAt(i + 1)
    } else if (code >= 0xdc00 && code <= 0xdfff) {
      // Low surrogate
      if (i === 0) {
        throw new Error('Low surrogate without preceding high surrogate')
      }
      const prev = str.charCodeAt(i - 1)
      if (prev < 0xd800 || prev > 0xdbff) {
        // (could change last hex to 0xDB7F to treat high private surrogates
        // as single characters)
        throw new Error('Low surrogate without preceding high surrogate')
      }
      // We can pass over low surrogates now as the second
      // component in a pair which we have already processed
      return false
    }
    return str.charAt(i)
  }

  for (i = 0, lgth = 0; i < str.length; i++) {
    if (getWholeChar(str, i) === false) {
      continue
    }
    // Adapt this line at the top of any loop, passing in the whole string and
    // the current iteration and returning a variable to represent the individual character;
    // purpose is to treat the first part of a surrogate pair as the whole character and then
    // ignore the second part
    lgth++
  }

  return lgth
}

import { ini_get } from '../info/ini_get.ts'

export function strlen(string) {
  //  discuss at: https://locutus.io/php/strlen/
  // original by: Kevin van Zonneveld (https://kvz.io)
  // improved by: Sakimori
  // improved by: Kevin van Zonneveld (https://kvz.io)
  //    input by: Kirk Strobeck
  // bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
  //  revised by: Brett Zamir (https://brett-zamir.me)
  //      note 1: May look like overkill, but in order to be truly faithful to handling all Unicode
  //      note 1: characters and to this function in PHP which does not count the number of bytes
  //      note 1: but counts the number of characters, something like this is really necessary.
  //   example 1: strlen('Kevin van Zonneveld')
  //   returns 1: 19
  //   example 2: ini_set('unicode.semantics', 'on')
  //   example 2: strlen('A\ud87e\udc04Z')
  //   returns 2: 3

  if (typeof string === 'undefined') {
    throw new Error('strlen() expects exactly 1 argument, 0 given')
  }

  const str = string + ''

  const iniVal = ini_get('unicode.semantics') || 'off'
  if (iniVal === 'off') {
    return str.length
  }

  let i = 0
  let lgth = 0

  const getWholeChar = function (str, i) {
    const code = str.charCodeAt(i)
    if (code >= 0xd800 && code <= 0xdbff) {
      // High surrogate (could change last hex to 0xDB7F to
      // treat high private surrogates as single characters)
      if (str.length <= i + 1) {
        throw new Error('High surrogate without following low surrogate')
      }
      const next = str.charCodeAt(i + 1)
      if (next < 0xdc00 || next > 0xdfff) {
        throw new Error('High surrogate without following low surrogate')
      }
      return str.charAt(i) + str.charAt(i + 1)
    } else if (code >= 0xdc00 && code <= 0xdfff) {
      // Low surrogate
      if (i === 0) {
        throw new Error('Low surrogate without preceding high surrogate')
      }
      const prev = str.charCodeAt(i - 1)
      if (prev < 0xd800 || prev > 0xdbff) {
        // (could change last hex to 0xDB7F to treat high private surrogates
        // as single characters)
        throw new Error('Low surrogate without preceding high surrogate')
      }
      // We can pass over low surrogates now as the second
      // component in a pair which we have already processed
      return false
    }
    return str.charAt(i)
  }

  for (i = 0, lgth = 0; i < str.length; i++) {
    if (getWholeChar(str, i) === false) {
      continue
    }
    // Adapt this line at the top of any loop, passing in the whole string and
    // the current iteration and returning a variable to represent the individual character;
    // purpose is to treat the first part of a surrogate pair as the whole character and then
    // ignore the second part
    lgth++
  }

  return lgth
}

// php/_helpers/_phpTypes (Locutus helper dependency)
type PhpNullish = null | undefined

type PhpInput = {} | PhpNullish

type PhpList<T = PhpInput> = T[]

type PhpAssoc<T = PhpInput> = { [key: string]: T }

type PhpArrayLike<T = PhpInput> = PhpList<T> | PhpAssoc<T>

function isPhpList<T = PhpInput>(value: PhpInput): value is PhpList<T> {
  return Array.isArray(value)
}

function isObjectLike(value: PhpInput): value is PhpArrayLike<PhpInput> {
  return typeof value === 'object' && value !== null
}

function isPhpAssocObject<T = PhpInput>(value: PhpInput): value is PhpAssoc<T> {
  return isObjectLike(value) && !isPhpList(value)
}

// php/_helpers/_phpRuntimeState (Locutus helper dependency)
interface IniEntry {
  local_value?: PhpInput
}

type LocaleEntry = PhpAssoc<PhpInput> & {
  sorting?: (left: PhpInput, right: PhpInput) => number
}

type LocaleCategoryMap = PhpAssoc<string | undefined>

interface LocutusRuntimeContainer {
  php?: PhpAssoc<PhpInput>
}

type GlobalWithLocutus = {
  $locutus?: LocutusRuntimeContainer
  [key: string]: PhpInput
}

interface PhpRuntimeState {
  ini: PhpAssoc<IniEntry | undefined>
  locales: PhpAssoc<LocaleEntry | undefined>
  localeCategories: LocaleCategoryMap
  pointers: PhpList<PhpInput>
  locale_default: string | undefined
}

const isIniBag = (value: PhpInput): value is PhpAssoc<IniEntry | undefined> =>
  isPhpAssocObject<IniEntry | undefined>(value)

const isLocaleBag = (value: PhpInput): value is PhpAssoc<LocaleEntry | undefined> =>
  isPhpAssocObject<LocaleEntry | undefined>(value)

const isLocaleCategoryBag = (value: PhpInput): value is LocaleCategoryMap => isPhpAssocObject<string | undefined>(value)

const globalContext: GlobalWithLocutus =
  typeof window === 'object' && window !== null ? window : typeof global === 'object' && global !== null ? global : {}

const ensurePhpRuntimeObject = (): PhpAssoc<PhpInput> => {
  let locutus = globalContext.$locutus
  if (typeof locutus !== 'object' || locutus === null) {
    locutus = {}
    globalContext.$locutus = locutus
  }

  let php = locutus.php
  if (typeof php !== 'object' || php === null) {
    php = {}
    locutus.php = php
  }

  return php
}

function ensurePhpRuntimeState(): PhpRuntimeState {
  const php = ensurePhpRuntimeObject()
  const iniValue = php.ini
  const localesValue = php.locales
  const localeCategoriesValue = php.localeCategories
  const pointersValue = php.pointers

  const ini = isIniBag(iniValue) ? iniValue : {}
  const locales = isLocaleBag(localesValue) ? localesValue : {}
  const localeCategories = isLocaleCategoryBag(localeCategoriesValue) ? localeCategoriesValue : {}
  const pointers: PhpList<PhpInput> = Array.isArray(pointersValue) ? pointersValue : []

  if (iniValue !== ini) {
    php.ini = ini
  }
  if (localesValue !== locales) {
    php.locales = locales
  }
  if (localeCategoriesValue !== localeCategories) {
    php.localeCategories = localeCategories
  }
  if (pointersValue !== pointers) {
    php.pointers = pointers
  }

  const localeDefaultValue = php.locale_default
  const localeDefault = typeof localeDefaultValue === 'string' ? localeDefaultValue : undefined

  return {
    ini,
    locales,
    localeCategories,
    pointers,
    locale_default: localeDefault,
  }
}

// php/info/ini_get (Locutus dependency module)
function ini_get(varname: string): string {
  //  discuss at: https://locutus.io/php/ini_get/
  // original by: Brett Zamir (https://brett-zamir.me)
  //      note 1: The ini values must be set by ini_set or manually within an ini file
  //   example 1: ini_set('date.timezone', 'Asia/Hong_Kong')
  //   example 1: ini_get('date.timezone')
  //   returns 1: 'Asia/Hong_Kong'

  const runtime = ensurePhpRuntimeState()
  const entry = runtime.ini[varname]

  if (entry && entry.local_value !== undefined) {
    if (entry.local_value === null) {
      return ''
    }
    return String(entry.local_value)
  }

  return ''
}

// php/strings/strlen (target function module)
function strlen(string?: string): number {
  //  discuss at: https://locutus.io/php/strlen/
  // original by: Kevin van Zonneveld (https://kvz.io)
  // improved by: Sakimori
  // improved by: Kevin van Zonneveld (https://kvz.io)
  //    input by: Kirk Strobeck
  // bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
  //  revised by: Brett Zamir (https://brett-zamir.me)
  //      note 1: May look like overkill, but in order to be truly faithful to handling all Unicode
  //      note 1: characters and to this function in PHP which does not count the number of bytes
  //      note 1: but counts the number of characters, something like this is really necessary.
  //   example 1: strlen('Kevin van Zonneveld')
  //   returns 1: 19
  //   example 2: ini_set('unicode.semantics', 'on')
  //   example 2: strlen('A\ud87e\udc04Z')
  //   returns 2: 3

  if (typeof string === 'undefined') {
    throw new Error('strlen() expects exactly 1 argument, 0 given')
  }

  const str = string + ''

  const iniVal = ini_get('unicode.semantics') || 'off'
  if (iniVal === 'off') {
    return str.length
  }

  let i = 0
  let lgth = 0

  const getWholeChar = function (str: string, i: number): string | false {
    const code = str.charCodeAt(i)
    if (code >= 0xd800 && code <= 0xdbff) {
      // High surrogate (could change last hex to 0xDB7F to
      // treat high private surrogates as single characters)
      if (str.length <= i + 1) {
        throw new Error('High surrogate without following low surrogate')
      }
      const next = str.charCodeAt(i + 1)
      if (next < 0xdc00 || next > 0xdfff) {
        throw new Error('High surrogate without following low surrogate')
      }
      return str.charAt(i) + str.charAt(i + 1)
    } else if (code >= 0xdc00 && code <= 0xdfff) {
      // Low surrogate
      if (i === 0) {
        throw new Error('Low surrogate without preceding high surrogate')
      }
      const prev = str.charCodeAt(i - 1)
      if (prev < 0xd800 || prev > 0xdbff) {
        // (could change last hex to 0xDB7F to treat high private surrogates
        // as single characters)
        throw new Error('Low surrogate without preceding high surrogate')
      }
      // We can pass over low surrogates now as the second
      // component in a pair which we have already processed
      return false
    }
    return str.charAt(i)
  }

  for (i = 0, lgth = 0; i < str.length; i++) {
    if (getWholeChar(str, i) === false) {
      continue
    }
    // Adapt this line at the top of any loop, passing in the whole string and
    // the current iteration and returning a variable to represent the individual character;
    // purpose is to treat the first part of a surrogate pair as the whole character and then
    // ignore the second part
    lgth++
  }

  return lgth
}

// php/_helpers/_phpTypes (Locutus helper dependency)

function isObjectLike(value) {
  return typeof value === 'object' && value !== null
}

function isPhpAssocObject(value) {
  return isObjectLike(value) && !Array.isArray(value)
}

// php/_helpers/_phpRuntimeState (Locutus helper dependency)

const globalContext =
  typeof window === 'object' && window !== null ? window : typeof global === 'object' && global !== null ? global : {}

const ensurePhpRuntimeObject = () => {
  let locutus = globalContext.$locutus
  if (typeof locutus !== 'object' || locutus === null) {
    locutus = {}
    globalContext.$locutus = locutus
  }

  let php = locutus.php
  if (typeof php !== 'object' || php === null) {
    php = {}
    locutus.php = php
  }

  return php
}

function ensurePhpRuntimeState() {
  const php = ensurePhpRuntimeObject()
  const iniValue = php.ini
  const localesValue = php.locales
  const localeCategoriesValue = php.localeCategories
  const pointersValue = php.pointers

  const ini = isPhpAssocObject(iniValue) ? iniValue : {}
  const locales = isPhpAssocObject(localesValue) ? localesValue : {}
  const localeCategories = isPhpAssocObject(localeCategoriesValue) ? localeCategoriesValue : {}
  const pointers = Array.isArray(pointersValue) ? pointersValue : []

  if (iniValue !== ini) {
    php.ini = ini
  }
  if (localesValue !== locales) {
    php.locales = locales
  }
  if (localeCategoriesValue !== localeCategories) {
    php.localeCategories = localeCategories
  }
  if (pointersValue !== pointers) {
    php.pointers = pointers
  }

  const localeDefaultValue = php.locale_default
  const localeDefault = typeof localeDefaultValue === 'string' ? localeDefaultValue : undefined

  return {
    ini,
    locales,
    localeCategories,
    pointers,
    locale_default: localeDefault,
  }
}

// php/info/ini_get (Locutus dependency module)
function ini_get(varname) {
  //  discuss at: https://locutus.io/php/ini_get/
  // original by: Brett Zamir (https://brett-zamir.me)
  //      note 1: The ini values must be set by ini_set or manually within an ini file
  //   example 1: ini_set('date.timezone', 'Asia/Hong_Kong')
  //   example 1: ini_get('date.timezone')
  //   returns 1: 'Asia/Hong_Kong'

  const runtime = ensurePhpRuntimeState()
  const entry = runtime.ini[varname]

  if (entry && entry.local_value !== undefined) {
    if (entry.local_value === null) {
      return ''
    }
    return String(entry.local_value)
  }

  return ''
}

// php/strings/strlen (target function module)
function strlen(string) {
  //  discuss at: https://locutus.io/php/strlen/
  // original by: Kevin van Zonneveld (https://kvz.io)
  // improved by: Sakimori
  // improved by: Kevin van Zonneveld (https://kvz.io)
  //    input by: Kirk Strobeck
  // bugfixed by: Onno Marsman (https://twitter.com/onnomarsman)
  //  revised by: Brett Zamir (https://brett-zamir.me)
  //      note 1: May look like overkill, but in order to be truly faithful to handling all Unicode
  //      note 1: characters and to this function in PHP which does not count the number of bytes
  //      note 1: but counts the number of characters, something like this is really necessary.
  //   example 1: strlen('Kevin van Zonneveld')
  //   returns 1: 19
  //   example 2: ini_set('unicode.semantics', 'on')
  //   example 2: strlen('A\ud87e\udc04Z')
  //   returns 2: 3

  if (typeof string === 'undefined') {
    throw new Error('strlen() expects exactly 1 argument, 0 given')
  }

  const str = string + ''

  const iniVal = ini_get('unicode.semantics') || 'off'
  if (iniVal === 'off') {
    return str.length
  }

  let i = 0
  let lgth = 0

  const getWholeChar = function (str, i) {
    const code = str.charCodeAt(i)
    if (code >= 0xd800 && code <= 0xdbff) {
      // High surrogate (could change last hex to 0xDB7F to
      // treat high private surrogates as single characters)
      if (str.length <= i + 1) {
        throw new Error('High surrogate without following low surrogate')
      }
      const next = str.charCodeAt(i + 1)
      if (next < 0xdc00 || next > 0xdfff) {
        throw new Error('High surrogate without following low surrogate')
      }
      return str.charAt(i) + str.charAt(i + 1)
    } else if (code >= 0xdc00 && code <= 0xdfff) {
      // Low surrogate
      if (i === 0) {
        throw new Error('Low surrogate without preceding high surrogate')
      }
      const prev = str.charCodeAt(i - 1)
      if (prev < 0xd800 || prev > 0xdbff) {
        // (could change last hex to 0xDB7F to treat high private surrogates
        // as single characters)
        throw new Error('Low surrogate without preceding high surrogate')
      }
      // We can pass over low surrogates now as the second
      // component in a pair which we have already processed
      return false
    }
    return str.charAt(i)
  }

  for (i = 0, lgth = 0; i < str.length; i++) {
    if (getWholeChar(str, i) === false) {
      continue
    }
    // Adapt this line at the top of any loop, passing in the whole string and
    // the current iteration and returning a variable to represent the individual character;
    // purpose is to treat the first part of a surrogate pair as the whole character and then
    // ignore the second part
    lgth++
  }

  return lgth
}

Improve this function

Locutus is a community effort following The McDonald's Theory: we ship first iterations, hoping others will improve them. If you see something that could be better, we'd love your contribution.

View on GitHub · Edit on GitHub · View Raw

« More PHP strings functions

Star