Some text can contain a number of characters that can influence the functions that run some forms of text analysis. These include a range of characters that simply look like spaces or line breaks, or which are not part of the usual English dictionary.
Technical details
Assuming that your text is in English, the following steps can be taken to create a variable that removes most of these characters, and which will likely improve the automated analyses. The code uses regular expressions to replace troublesome whitespace characters in a text variable.
Method - JavaScript-Text variable
The following code can be used in a JavaScript - Text variable:
//change my_text_variable below to the name of your text variable to clean
var input = my_text_variable;
//replace troublesome characters with a regular space " " input.replace(/[\W_]+/g," ");
Method - R-Text variable
The following code can be used in an R-Text variable, see our Displayr documentation for more examples of replacing text in text variables:
#change my_text_variable below to the name of your text variable to clean
input = my_text_variable
#replace troublesome characters with a regular space " "
input = gsub("/[\\W_]+/g"," ",input)