String Manipulation¶
-
string_utils.manipulation.
camel_case_to_snake
(input_string, separator='_')¶ Convert a camel case string into a snake case one. (The original string is returned if is not a valid camel case string)
Example:
>>> camel_case_to_snake('ThisIsACamelStringTest') # returns 'this_is_a_camel_case_string_test'
- Parameters
input_string (str) – String to convert.
separator (str) – Sign to use as separator.
- Returns
Converted string.
-
string_utils.manipulation.
snake_case_to_camel
(input_string: str, upper_case_first: bool = True, separator: str = '_') → str¶ Convert a snake case string into a camel case one. (The original string is returned if is not a valid snake case string)
Example:
>>> snake_case_to_camel('the_snake_is_green') # returns 'TheSnakeIsGreen'
- Parameters
input_string (str) – String to convert.
upper_case_first (bool) – True to turn the first letter into uppercase (default).
separator (str) – Sign to use as separator (default to “_”).
- Returns
Converted string
-
string_utils.manipulation.
reverse
(input_string: str) → str¶ Returns the string with its chars reversed.
Example:
>>> reverse('hello') # returns 'olleh'
- Parameters
input_string (str) – String to revert.
- Returns
Reversed string.
-
string_utils.manipulation.
shuffle
(input_string: str) → str¶ Return a new string containing same chars of the given one but in a randomized order.
Example:
>>> shuffle('hello world') # possible output: 'l wodheorll'
- Parameters
input_string (str) – String to shuffle
- Returns
Shuffled string
-
string_utils.manipulation.
strip_html
(input_string: str, keep_tag_content: bool = False) → str¶ Remove html code contained into the given string.
Examples:
>>> strip_html('test: <a href="foo/bar">click here</a>') # returns 'test: ' >>> strip_html('test: <a href="foo/bar">click here</a>', keep_tag_content=True) # returns 'test: click here'
- Parameters
input_string (str) – String to manipulate.
keep_tag_content (bool) – True to preserve tag content, False to remove tag and its content too (default).
- Returns
String with html removed.
-
string_utils.manipulation.
prettify
(input_string: str) → str¶ Reformat a string by applying the following basic grammar and formatting rules:
String cannot start or end with spaces
The first letter in the string and the ones after a dot, an exclamation or a question mark must be uppercase
String cannot have multiple sequential spaces, empty lines or punctuation (except for “?”, “!” and “.”)
Arithmetic operators (+, -, /, *, =) must have one, and only one space before and after themselves
One, and only one space should follow a dot, a comma, an exclamation or a question mark
Text inside double quotes cannot start or end with spaces, but one, and only one space must come first and after quotes (foo” bar”baz -> foo “bar” baz)
Text inside round brackets cannot start or end with spaces, but one, and only one space must come first and after brackets (“foo(bar )baz” -> “foo (bar) baz”)
Percentage sign (“%”) cannot be preceded by a space if there is a number before (“100 %” -> “100%”)
Saxon genitive is correct (“Dave’ s dog” -> “Dave’s dog”)
Examples:
>>> prettify(' unprettified string ,, like this one,will be"prettified" .it\' s awesome! ') >>> # -> 'Unprettified string, like this one, will be "prettified". It's awesome!'
- Parameters
input_string – String to manipulate
- Returns
Prettified string.
-
string_utils.manipulation.
asciify
(input_string: str) → str¶ Force string content to be ascii-only by translating all non-ascii chars into the closest possible representation (eg: ó -> o, Ë -> E, ç -> c…).
Bear in mind: Some chars may be lost if impossible to translate.
Example:
>>> asciify('èéùúòóäåëýñÅÀÁÇÌÍÑÓË') # returns 'eeuuooaaeynAAACIINOE'
- Parameters
input_string – String to convert
- Returns
Ascii utf-8 string
-
string_utils.manipulation.
slugify
(input_string: str, separator: str = '-') → str¶ Converts a string into a “slug” using provided separator. The returned string has the following properties:
it has no spaces
all letters are in lower case
all punctuation signs and non alphanumeric chars are removed
words are divided using provided separator
all chars are encoded as ascii (by using asciify())
is safe for URL
Examples:
>>> slugify('Top 10 Reasons To Love Dogs!!!') # returns: 'top-10-reasons-to-love-dogs' >>> slugify('Mönstér Mägnët') # returns 'monster-magnet'
- Parameters
input_string (str) – String to convert.
separator (str) – Sign used to join string tokens (default to “-“).
- Returns
Slug string
-
string_utils.manipulation.
booleanize
(input_string: str) → bool¶ Turns a string into a boolean based on its content (CASE INSENSITIVE).
A positive boolean (True) is returned if the string value is one of the following:
“true”
“1”
“yes”
“y”
Otherwise False is returned.
Examples:
>>> booleanize('true') # returns True >>> booleanize('YES') # returns True >>> booleanize('nope') # returns False
- Parameters
input_string (str) – String to convert
- Returns
True if the string contains a boolean-like positive value, false otherwise
-
string_utils.manipulation.
strip_margin
(input_string: str) → str¶ Removes tab indentation from multi line strings (inspired by analogous Scala function).
Example:
>>> strip_margin(''' >>> line 1 >>> line 2 >>> line 3 >>> ''') >>> # returns: >>> ''' >>> line 1 >>> line 2 >>> line 3 >>> '''
- Parameters
input_string (str) – String to format
- Returns
A string without left margins
-
string_utils.manipulation.
compress
(input_string: str, encoding: str = 'utf-8', compression_level: int = 9) → str¶ Compress the given string by returning a shorter one that can be safely used in any context (like URL) and restored back to its original state using decompress().
Bear in mind: Besides the provided compression_level, the compression result (how much the string is actually compressed by resulting into a shorter string) depends on 2 factors:
The amount of data (string size): short strings might not provide a significant compression result or even be longer than the given input string (this is due to the fact that some bytes have to be embedded into the compressed string in order to be able to restore it later on)
The content type: random sequences of chars are very unlikely to be successfully compressed, while the best compression result is obtained when the string contains several recurring char sequences (like in the example).
Behind the scenes this method makes use of the standard Python’s zlib and base64 libraries.
Examples:
>>> n = 0 # <- ignore this, it's a fix for Pycharm (not fixable using ignore comments) >>> # "original" will be a string with 169 chars: >>> original = ' '.join(['word n{}'.format(n) for n in range(20)]) >>> # "compressed" will be a string of 88 chars >>> compressed = compress(original)
- Parameters
input_string (str) – String to compress (must be not empty or a ValueError will be raised).
encoding (str) – String encoding (default to “utf-8”).
compression_level (int) – A value between 0 (no compression) and 9 (best compression), default to 9.
- Returns
Compressed string.
-
string_utils.manipulation.
decompress
(input_string: str, encoding: str = 'utf-8') → str¶ Restore a previously compressed string (obtained using compress()) back to its original state.
- Parameters
input_string (str) – String to restore.
encoding (str) – Original string encoding.
- Returns
Decompressed string.
-
string_utils.manipulation.
roman_encode
(input_number: Union[str, int]) → str¶ Convert the given number/string into a roman number.
The passed input must represents a positive integer in the range 1-3999 (inclusive).
Why this limit? You may be wondering:
zero is forbidden since there is no related representation in roman numbers
the upper bound 3999 is due to the limitation in the ascii charset (the higher quantity sign displayable in ascii is “M” which is equal to 1000, therefore based on roman numbers rules we can use 3 times M to reach 3000 but we can’t go any further in thousands without special “boxed chars”).
Examples:
>>> roman_encode(37) # returns 'XXXVIII' >>> roman_encode('2020') # returns 'MMXX'
- Parameters
input_number (Union[str, int]) – An integer or a string to be converted.
- Returns
Roman number string.
-
string_utils.manipulation.
roman_decode
(input_string: str) → int¶ Decode a roman number string into an integer if the provided string is valid.
Example:
>>> roman_decode('VII') # returns 7
- Parameters
input_string (str) – (Assumed) Roman number
- Returns
Integer value