String Manipulation

string_utils.manipulation.camel_case_to_snake(input_string, separator='_')

Convert a camel case string into a snake case one. (The original string is returned if is not a valid camel case string)

Example:

>>> camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'
Parameters
  • input_string (str) – String to convert.

  • separator (str) – Sign to use as separator.

Returns

Converted string.

string_utils.manipulation.snake_case_to_camel(input_string: str, upper_case_first: bool = True, separator: str = '_') → str

Convert a snake case string into a camel case one. (The original string is returned if is not a valid snake case string)

Example:

>>> snake_case_to_camel('the_snake_is_green') # returns 'TheSnakeIsGreen'
Parameters
  • input_string (str) – String to convert.

  • upper_case_first (bool) – True to turn the first letter into uppercase (default).

  • separator (str) – Sign to use as separator (default to “_”).

Returns

Converted string

string_utils.manipulation.reverse(input_string: str) → str

Returns the string with its chars reversed.

Example:

>>> reverse('hello') # returns 'olleh'
Parameters

input_string (str) – String to revert.

Returns

Reversed string.

string_utils.manipulation.shuffle(input_string: str) → str

Return a new string containing same chars of the given one but in a randomized order.

Example:

>>> shuffle('hello world') # possible output: 'l wodheorll'
Parameters

input_string (str) – String to shuffle

Returns

Shuffled string

string_utils.manipulation.strip_html(input_string: str, keep_tag_content: bool = False) → str

Remove html code contained into the given string.

Examples:

>>> strip_html('test: <a href="foo/bar">click here</a>') # returns 'test: '
>>> strip_html('test: <a href="foo/bar">click here</a>', keep_tag_content=True) # returns 'test: click here'
Parameters
  • input_string (str) – String to manipulate.

  • keep_tag_content (bool) – True to preserve tag content, False to remove tag and its content too (default).

Returns

String with html removed.

string_utils.manipulation.prettify(input_string: str) → str

Reformat a string by applying the following basic grammar and formatting rules:

  • String cannot start or end with spaces

  • The first letter in the string and the ones after a dot, an exclamation or a question mark must be uppercase

  • String cannot have multiple sequential spaces, empty lines or punctuation (except for “?”, “!” and “.”)

  • Arithmetic operators (+, -, /, *, =) must have one, and only one space before and after themselves

  • One, and only one space should follow a dot, a comma, an exclamation or a question mark

  • Text inside double quotes cannot start or end with spaces, but one, and only one space must come first and after quotes (foo” bar”baz -> foo “bar” baz)

  • Text inside round brackets cannot start or end with spaces, but one, and only one space must come first and after brackets (“foo(bar )baz” -> “foo (bar) baz”)

  • Percentage sign (“%”) cannot be preceded by a space if there is a number before (“100 %” -> “100%”)

  • Saxon genitive is correct (“Dave’ s dog” -> “Dave’s dog”)

Examples:

>>> prettify(' unprettified string ,, like this one,will be"prettified" .it\' s awesome! ')
>>> # -> 'Unprettified string, like this one, will be "prettified". It's awesome!'
Parameters

input_string – String to manipulate

Returns

Prettified string.

string_utils.manipulation.asciify(input_string: str) → str

Force string content to be ascii-only by translating all non-ascii chars into the closest possible representation (eg: ó -> o, Ë -> E, ç -> c…).

Bear in mind: Some chars may be lost if impossible to translate.

Example:

>>> asciify('èéùúòóäåëýñÅÀÁÇÌÍÑÓË') # returns 'eeuuooaaeynAAACIINOE'
Parameters

input_string – String to convert

Returns

Ascii utf-8 string

string_utils.manipulation.slugify(input_string: str, separator: str = '-') → str

Converts a string into a “slug” using provided separator. The returned string has the following properties:

  • it has no spaces

  • all letters are in lower case

  • all punctuation signs and non alphanumeric chars are removed

  • words are divided using provided separator

  • all chars are encoded as ascii (by using asciify())

  • is safe for URL

Examples:

>>> slugify('Top 10 Reasons To Love Dogs!!!') # returns: 'top-10-reasons-to-love-dogs'
>>> slugify('Mönstér Mägnët') # returns 'monster-magnet'
Parameters
  • input_string (str) – String to convert.

  • separator (str) – Sign used to join string tokens (default to “-“).

Returns

Slug string

string_utils.manipulation.booleanize(input_string: str) → bool

Turns a string into a boolean based on its content (CASE INSENSITIVE).

A positive boolean (True) is returned if the string value is one of the following:

  • “true”

  • “1”

  • “yes”

  • “y”

Otherwise False is returned.

Examples:

>>> booleanize('true') # returns True
>>> booleanize('YES') # returns True
>>> booleanize('nope') # returns False
Parameters

input_string (str) – String to convert

Returns

True if the string contains a boolean-like positive value, false otherwise

string_utils.manipulation.strip_margin(input_string: str) → str

Removes tab indentation from multi line strings (inspired by analogous Scala function).

Example:

>>> strip_margin('''
>>>                 line 1
>>>                 line 2
>>>                 line 3
>>> ''')
>>> # returns:
>>> '''
>>> line 1
>>> line 2
>>> line 3
>>> '''
Parameters

input_string (str) – String to format

Returns

A string without left margins

string_utils.manipulation.compress(input_string: str, encoding: str = 'utf-8', compression_level: int = 9) → str

Compress the given string by returning a shorter one that can be safely used in any context (like URL) and restored back to its original state using decompress().

Bear in mind: Besides the provided compression_level, the compression result (how much the string is actually compressed by resulting into a shorter string) depends on 2 factors:

  1. The amount of data (string size): short strings might not provide a significant compression result or even be longer than the given input string (this is due to the fact that some bytes have to be embedded into the compressed string in order to be able to restore it later on)

  2. The content type: random sequences of chars are very unlikely to be successfully compressed, while the best compression result is obtained when the string contains several recurring char sequences (like in the example).

Behind the scenes this method makes use of the standard Python’s zlib and base64 libraries.

Examples:

>>> n = 0 # <- ignore this, it's a fix for Pycharm (not fixable using ignore comments)
>>> # "original" will be a string with 169 chars:
>>> original = ' '.join(['word n{}'.format(n) for n in range(20)])
>>> # "compressed" will be a string of 88 chars
>>> compressed = compress(original)
Parameters
  • input_string (str) – String to compress (must be not empty or a ValueError will be raised).

  • encoding (str) – String encoding (default to “utf-8”).

  • compression_level (int) – A value between 0 (no compression) and 9 (best compression), default to 9.

Returns

Compressed string.

string_utils.manipulation.decompress(input_string: str, encoding: str = 'utf-8') → str

Restore a previously compressed string (obtained using compress()) back to its original state.

Parameters
  • input_string (str) – String to restore.

  • encoding (str) – Original string encoding.

Returns

Decompressed string.

string_utils.manipulation.roman_encode(input_number: Union[str, int]) → str

Convert the given number/string into a roman number.

The passed input must represents a positive integer in the range 1-3999 (inclusive).

Why this limit? You may be wondering:

  1. zero is forbidden since there is no related representation in roman numbers

  2. the upper bound 3999 is due to the limitation in the ascii charset (the higher quantity sign displayable in ascii is “M” which is equal to 1000, therefore based on roman numbers rules we can use 3 times M to reach 3000 but we can’t go any further in thousands without special “boxed chars”).

Examples:

>>> roman_encode(37) # returns 'XXXVIII'
>>> roman_encode('2020') # returns 'MMXX'
Parameters

input_number (Union[str, int]) – An integer or a string to be converted.

Returns

Roman number string.

string_utils.manipulation.roman_decode(input_string: str) → int

Decode a roman number string into an integer if the provided string is valid.

Example:

>>> roman_decode('VII') # returns 7
Parameters

input_string (str) – (Assumed) Roman number

Returns

Integer value