String Manipulation¶

string_utils.manipulation.camel_case_to_snake(input_string, separator='_')¶

Convert a camel case string into a snake case one. (The original string is returned if is not a valid camel case string)

Example:

>>> camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'

Parameters

input_string (str) – String to convert.
separator (str) – Sign to use as separator.

Returns

Converted string.

string_utils.manipulation.snake_case_to_camel(input_string: str, upper_case_first: bool = True, separator: str = '_') → str¶

Convert a snake case string into a camel case one. (The original string is returned if is not a valid snake case string)

Example:

>>> snake_case_to_camel('the_snake_is_green') # returns 'TheSnakeIsGreen'

Parameters

input_string (str) – String to convert.
upper_case_first (bool) – True to turn the first letter into uppercase (default).
separator (str) – Sign to use as separator (default to “_”).

Returns

Converted string

string_utils.manipulation.reverse(input_string: str) → str¶

Returns the string with its chars reversed.

Example:

>>> reverse('hello') # returns 'olleh'

Parameters: input_string (str) – String to revert.
Returns: Reversed string.

string_utils.manipulation.shuffle(input_string: str) → str¶

Return a new string containing same chars of the given one but in a randomized order.

Example:

>>> shuffle('hello world') # possible output: 'l wodheorll'

Parameters: input_string (str) – String to shuffle
Returns: Shuffled string

string_utils.manipulation.strip_html(input_string: str, keep_tag_content: bool = False) → str¶

Remove html code contained into the given string.

Examples:

>>> strip_html('test: <a href="foo/bar">click here</a>') # returns 'test: '
>>> strip_html('test: <a href="foo/bar">click here</a>', keep_tag_content=True) # returns 'test: click here'

Parameters

input_string (str) – String to manipulate.
keep_tag_content (bool) – True to preserve tag content, False to remove tag and its content too (default).

Returns

String with html removed.

string_utils.manipulation.prettify(input_string: str) → str¶

Reformat a string by applying the following basic grammar and formatting rules:

String cannot start or end with spaces
The first letter in the string and the ones after a dot, an exclamation or a question mark must be uppercase
String cannot have multiple sequential spaces, empty lines or punctuation (except for “?”, “!” and “.”)
Arithmetic operators (+, -, /, *, =) must have one, and only one space before and after themselves
One, and only one space should follow a dot, a comma, an exclamation or a question mark
Text inside double quotes cannot start or end with spaces, but one, and only one space must come first and after quotes (foo” bar”baz -> foo “bar” baz)
Text inside round brackets cannot start or end with spaces, but one, and only one space must come first and after brackets (“foo(bar )baz” -> “foo (bar) baz”)
Percentage sign (“%”) cannot be preceded by a space if there is a number before (“100 %” -> “100%”)
Saxon genitive is correct (“Dave’ s dog” -> “Dave’s dog”)

Examples:

>>> prettify(' unprettified string ,, like this one,will be"prettified" .it\' s awesome! ')
>>> # -> 'Unprettified string, like this one, will be "prettified". It's awesome!'

Parameters: input_string – String to manipulate
Returns: Prettified string.

string_utils.manipulation.asciify(input_string: str) → str¶

Force string content to be ascii-only by translating all non-ascii chars into the closest possible representation (eg: ó -> o, Ë -> E, ç -> c…).

Bear in mind: Some chars may be lost if impossible to translate.

Example:

>>> asciify('èéùúòóäåëýñÅÀÁÇÌÍÑÓË') # returns 'eeuuooaaeynAAACIINOE'

Parameters: input_string – String to convert
Returns: Ascii utf-8 string

string_utils.manipulation.slugify(input_string: str, separator: str = '-') → str¶

Converts a string into a “slug” using provided separator. The returned string has the following properties:

it has no spaces
all letters are in lower case
all punctuation signs and non alphanumeric chars are removed
words are divided using provided separator
all chars are encoded as ascii (by using asciify())
is safe for URL

Examples:

>>> slugify('Top 10 Reasons To Love Dogs!!!') # returns: 'top-10-reasons-to-love-dogs'
>>> slugify('Mönstér Mägnët') # returns 'monster-magnet'

Parameters

input_string (str) – String to convert.
separator (str) – Sign used to join string tokens (default to “-“).

Returns

Slug string

string_utils.manipulation.booleanize(input_string: str) → bool¶

Turns a string into a boolean based on its content (CASE INSENSITIVE).

A positive boolean (True) is returned if the string value is one of the following:

“true”
“1”
“yes”
“y”

Otherwise False is returned.

Examples:

>>> booleanize('true') # returns True
>>> booleanize('YES') # returns True
>>> booleanize('nope') # returns False

Parameters: input_string (str) – String to convert
Returns: True if the string contains a boolean-like positive value, false otherwise

string_utils.manipulation.strip_margin(input_string: str) → str¶

Removes tab indentation from multi line strings (inspired by analogous Scala function).

Example:

>>> strip_margin('''
>>>                 line 1
>>>                 line 2
>>>                 line 3
>>> ''')
>>> # returns:
>>> '''
>>> line 1
>>> line 2
>>> line 3
>>> '''

Parameters: input_string (str) – String to format
Returns: A string without left margins

string_utils.manipulation.compress(input_string: str, encoding: str = 'utf-8', compression_level: int = 9) → str¶

Compress the given string by returning a shorter one that can be safely used in any context (like URL) and restored back to its original state using decompress().

Bear in mind: Besides the provided compression_level, the compression result (how much the string is actually compressed by resulting into a shorter string) depends on 2 factors:

The amount of data (string size): short strings might not provide a significant compression result or even be longer than the given input string (this is due to the fact that some bytes have to be embedded into the compressed string in order to be able to restore it later on)
The content type: random sequences of chars are very unlikely to be successfully compressed, while the best compression result is obtained when the string contains several recurring char sequences (like in the example).

Behind the scenes this method makes use of the standard Python’s zlib and base64 libraries.

Examples:

>>> n = 0 # <- ignore this, it's a fix for Pycharm (not fixable using ignore comments)
>>> # "original" will be a string with 169 chars:
>>> original = ' '.join(['word n{}'.format(n) for n in range(20)])
>>> # "compressed" will be a string of 88 chars
>>> compressed = compress(original)

Parameters

input_string (str) – String to compress (must be not empty or a ValueError will be raised).
encoding (str) – String encoding (default to “utf-8”).
compression_level (int) – A value between 0 (no compression) and 9 (best compression), default to 9.

Returns

Compressed string.

string_utils.manipulation.decompress(input_string: str, encoding: str = 'utf-8') → str¶

Restore a previously compressed string (obtained using compress()) back to its original state.

Parameters

input_string (str) – String to restore.
encoding (str) – Original string encoding.

Returns

Decompressed string.

string_utils.manipulation.roman_encode(input_number: Union[str, int]) → str¶

Convert the given number/string into a roman number.

The passed input must represents a positive integer in the range 1-3999 (inclusive).

Why this limit? You may be wondering:

zero is forbidden since there is no related representation in roman numbers
the upper bound 3999 is due to the limitation in the ascii charset (the higher quantity sign displayable in ascii is “M” which is equal to 1000, therefore based on roman numbers rules we can use 3 times M to reach 3000 but we can’t go any further in thousands without special “boxed chars”).

Examples:

>>> roman_encode(37) # returns 'XXXVIII'
>>> roman_encode('2020') # returns 'MMXX'

Parameters: input_number (Union[str, int]) – An integer or a string to be converted.
Returns: Roman number string.

string_utils.manipulation.roman_decode(input_string: str) → int¶

Decode a roman number string into an integer if the provided string is valid.

Example:

>>> roman_decode('VII') # returns 7

Parameters: input_string (str) – (Assumed) Roman number
Returns: Integer value

String Manipulation¶

Support the project!