This is a convenience function to capture substrings from textual data. Uses str_match_all internally but instead of returning everything, always returns only one single part of the match, depending on parameters capture_n and capture_group.

extract_substring(string, pattern, capture_n = 1, capture_bracket = 0,
  missing = NA_character_)



string to extract


regular expression pattern to search for


within each string, which match of the pattern should be extracted? e.g. if the pattern searches for words, should the first, second or third word be captured?


for the captured match, which capture group should be extracted? i.e. which parentheses-enclosed segment of the pattern? by default captures the whole pattern (capture_bracket = 0).


what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.


character vector of same length as string with the extracted substrings

See also

Other data extraction functions: extract_data, extract_word