This extracts words from text, by default looks for continuous sequences of numbers and/or letters. Can adjust whether characters such as "_", "-", " ", and "." should be counted as part of a word or separate them and whether numbers should be included.

extract_word(string, capture_n = 1, include_numbers = TRUE,
  include_underscore = FALSE, include_dash = FALSE,
  include_space = FALSE, include_colon = FALSE,
  missing = NA_character_)

Arguments

string

string to extract

capture_n

which word to extract? 1st, 2nd, 3rd?

include_numbers

whether to include numbers (0-9) as part of the word (if FALSE, numbers will work as a word separator)

include_underscore

whether to include the underscore character (_) as part of a word (if FALSE, it will work as a word separator)

include_dash

whether to include the dash character (-) as part of a word (if FALSE, it will work as a word separator)

include_space

whether to include the space character ( ) as part of a word (if FALSE, it will work as a word separator)

include_colon

whether to include the colon character (.) as part of a word (if FALSE, it will work as a word separator)

missing

what to replace missing values with? Note that values can be missing because there are not enough captured matches or because the actual capture_bracket is empty.

See also

Other data extraction functions: extract_data, extract_substring

Examples

x_text <- extract_word(c("sample number16.2", "sample number7b"), capture_n = 2, include_colon = TRUE) # "number16.2" "number7b" x_num <- parse_number(x_text) # 16.2 7.0