Replacetext() Regex "not Followed By"

April 22, 2024 Post a Comment

Any ideas why this simple RegEx doesn't seem to be supported in a Google Docs script? foo(?!bar) I'm assuming that Google Apps Script uses the same RegEx as JavaScript. Is this

Solution 1:

As discussed in comments on this question, this is a documented limitation; the replaceText() method doesn't support reverse-lookaheads or any other capture group.

A subset of the JavaScript regular expression features are not fully supported, such as capture groups and mode modifiers.

Serge suggested a work-around, "it should be possible to manipulate your document at a lower level (extracting text from paragraph etc) but it could rapidly become quite cumbersome."

Here's what that could look like. If you don't mind losing all formatting, this example will apply capture groups, RegExp flags (i for case-insensitivity) and reverse-lookaheads to change:

Little rabbit Foo Foo, running through the foobar.

to:

Little rabbit Fred Fred, running through the foobar.

Code:

functionmyFunction() {
  var body = DocumentApp.getActiveDocument().getBody();
  var paragraphs = body.getParagraphs();
  for (var i=0; i<paragraphs.length; i++) {
    var text = paragraphs[i].getText();
    paragraphs[i].replaceText(".*", text.replace(/(f)oo(?!bar)/gi, '$1red') );
  }
}

Solution 2:

You have a sequence which you can match with a regular expression, but that regular expression will also match one, or more, things which you do not desire to change. The generalized solution to this situation is to:

Change the text such that you have known sequences of characters which are definitely not used. Effectively, this gives you sequences of characters which you use as variables to hold the values you don't want to change. Personally, I would use: body.replaceText('Q','Qz'); Which will make it such that there is no sequence in your document which matches /Q[^z]/. This results in you being able to use sequences like Qa to represent some text you don't want to change. I use Q because it has a low frequency of use in English. You can use any character. For efficiency, choose a character which results in a low number of changes within the text you are affecting.
Change the things you don't want to end up changing to one of the character sequences you now know are unused. For example: body.replaceText('foobar','Qa'); Repeat this for whatever additional items you don't want to end up changing.
Change the text you are really wanting to change. In this example: body.replaceText('foo','hello'.replace(/Q/g,'Qz')); Note that you need to apply to the new replacement text the first substitution which you used to open up known unused sequences.
Restore all of the things you did not want to change to their original state: body.replaceText('Qa','foobar');
Restore the text you used to open up unused character sequences: body.replaceText('Qz','Q');

All together that would be:

var body = DocumentApp.getActiveDocument().getBody();
body.replaceText('Q','Qz');      //Open up unused character sequences
body.replaceText('foobar','Qa'); //Save the things you don't want to change.//In the general case, you need to apply to the new text the same substitution//  which you used to open up unused character sequences.  If you don't you//  may end up with those sequences being changed in the new text.
body.replaceText('foo','hello'.replace(/Q/g,'Qz')); //Make the change you desire.

body.replaceText('Qa','foobar'); //Restore the things you saved.
body.replaceText('Qz','Q');      //Restore the original sequence.

While solving the problem this way does not allow you to use all the features of JavaScript RegExp (e.g. capture groups, look-ahead assertions, and flags), it should preserve the formatting within your document.

You can choose not to perform steps 1 and 5 above by picking a longer sequence of characters to use to represent the text which you do not want to match (e.g. kNoWn1UnUsEd). However, such a longer sequence is something that must be selected based on your knowledge of what already exists in the document. Doing that can save a couple of steps, but you either have to search for an unused string or accept that there is some probability that the string you use is already in the document, which would result in an undesired substitution.

Solution 3:

I figured out a way to obtain most of JS's str.replace() functionalities including capture groups and smart replacers in Apps Script without messing up the style. The trick is to use Javascript's regex.exec() function and Apps Script's text.deleteText() and text.insertText() functions.

function replaceText(body, regex, replacer, attribute){
  varcontent= body.getText();
  consttext= body.editAsText();
  varmatch="";
  while (true){
    content = body.getText();
    varoldLength= content.length;
    match = regex.exec(content);
    if (match === null){
        break;
    }
    varstart= match.index;
    varend= regex.lastIndex - 1;
    text.deleteText(start, end);
    text.insertText(start, replacer(match, regex));
    varnewLength= body.getText().length;
    varreplacedLength= oldLength - newLength;
    varnewEnd= end - replacedLength;
    text.setAttributes(start, newEnd, attribute);
    regex.lastIndex -= replacedLength;
  }
}

Argument explanations:

body: the body of the document you want to operate on
regex: the normal JS regular expression object used as a search pattern
replacer: the replacer function used to return the string you want to replace with, replacer automatically receive two arguments: I. match: match object generated by regex.exec() and II. regex: the regular expression object used as a search pattern
attribute: An Apps Script attribute object For example, if you want to apply bold style to new strings replacing the old ones, you can create a boldStyle attribute object:

var boldStyle = {};
boldStyle[DocumentApp.Attribute.BOLD] = true;

Tips:

How can I use capture groups in replaceText()? You can access all capture groups from the replacer function, match[0] is the whole string matched, match[1] is the first capture group, match[2] is the second, etc.
How can I access the index and position of the match in replaceText()? You can access the start index of the match (match.index) and end index of the match (regex.lastIndex) from the replacer function.

For more in-depth reference of JS RegExp, see this excellent tutorial from Javascript.info.

Example:

Here's a example use case of the replaceText() function. It's simple implementation of a markdown to google docs conversion script:

functionmarkdownToDocs() {
  const body = DocumentApp.getActiveDocument().getBody();

  // Use editAsText to obtain a single text element containing// all the characters in the document.const text = body.editAsText();

  // e.g. replace "**string**" with "string" (bolded)var boldStyle = {};
  boldStyle[DocumentApp.Attribute.BOLD] = true;
  replaceDeliminaters(body, "\\*\\*", boldStyle, false);

  // e.g. replace multiline "```line 1\nline 2\nline 3```" with "line 1\nline 2\nline 3" (with gray background highlight)var blockHighlightStyle = {};
  blockHighlightStyle[DocumentApp.Attribute.BACKGROUND_COLOR] = "#EEEEEE";
  replaceDeliminaters(body, "```", blockHighlightStyle, true);

  // e.g. replace inline "`console.log("hello world")`" with "console.log("hello world")" (in "Times New Roman" font and italic)var inlineStyle = {};
  inlineStyle[DocumentApp.Attribute.FONT_FAMILY] = "Times New Roman";
  inlineStyle[DocumentApp.Attribute.ITALIC] = true;
  replaceDeliminaters(body, "`", inlineStyle, false);

  // feel free to change all the styling and markdown deliminaters as you wish.
}

// replace markdown deliminaters like "**", "`", and "```"functionreplaceDeliminaters(body, deliminator, attributes, multiline){
  var capture;
  if (multiline){
    capture = "([\\s\\S]+?)"; // capture newline characters as well
  } else{
    capture = "(.+?)"; // do not capture newline characters
  }
  const regex = newRegExp(deliminator + capture + deliminator, "g");
  const replacer = function(match, regex){
    return match[1]; // return the first capture group
  }
  replaceText(body, regex, replacer, attributes);
}

JavaScript College