The following functions are intened to make it easy to extract relevant information from textual data. These functions are primarily intended for use in iso_mutate_file_info and inside the filtering conditions passed to iso_filter_files. However, they can of course also be used stand-alone and in regular mutate or filter calls on the data frames returned by the data retrievel functions (iso_get_raw_data, iso_get_file_info, iso_get_vendor_data_table, etc.). Not that all the parse_ functions are used in iso_parse_file_info for easy type conversions.

Details

For simultaneous extraction of pure text data into multiple columns, please see the extract function from the tidyr package.

  • extract_substring is a generic convience function to extract parts of textual data (based on regular expression matches). Can be used in combination with the parsing functions to turn extracted substrings into numerical or logical data.

  • extract_word is a more specific convenience function to extract the 1st/2nd/3rd word from textual data.

  • parse_number is a convenience function to extract a number even if it is surrouded by text (re-exported from the readr package).

  • parse_double parses text that holds double (decimal) numerical values without any extraneous text around - use parse_number instead if this is not the case (re-exported from the readr package)

  • parse_integer parses text that holds integer (whole number) numerical values without any extraneous text around - use parse_number instead if this is not the case (re-exported from the readr package)

  • parse_logical parses text that holds logical (boolean, i.e. TRUE/FALSE) values (re-exported from the readr package)

  • parse_datetime parses text that holds date and time information (re-exported from the readr package)

See also

Other data extraction functions: extract_substring, extract_word