gregexp {utilitiesR}R Documentation

Extracts all matches from a string(s) against a Perl regex.

Description

By "all matches" we mean adding the 'g' switch to a regex. For example, the regex '([oe])' matches 'hello' two times.

Usage

  gregexp(pattern, text, use.names = T, ...)

Arguments

use.names

whether to set the names of the output list to 'text'.

pattern

the regex (in Perl), with capturing brackets, possibly named.

text

the string or character string to test the regex against

...

passed into regexpr or gregexpr (e.g. ignore.case)

Value

A list of the same length as 'text', with names equal to 'text' (if use.names is TRUE). Each element is *itself* a list, one element per captured group, being a vector of the matches. If a captured group does not match, out[[word]][[groupname]] will be of length 0. If there are no capturing groups to begin with, an error is thrown.

See Also

regexp for extracting matches in a more managable manner (a table), but only extracts the first match for each captured group (i.e. no 'g' flag).

Examples

gregexp('([eo])', 'hello')
# list(hello=list(c('e', 'o')))

gregexp('Name=(?<name>[a-z]+), Surname=(?<surname>[a-z]+)',
        'Name=Jane, Surname=Doe\nName=John, Surname=Smith',
        ignore.case=TRUE,
        use.names=FALSE)
# list(list(name=c("Jane", "John"), surname=c("Doe", "Smith")))

text <- c('apples and oranges or pears and bananas', 'dogs and cats')
gregexp('(?<first>\\w+) and (?<second>\\w+)', text, use.names=FALSE)
# list(list(first=c('apples', 'pears'), second=c('oranges', 'bananas')),
#      list(first='dogs', second='cats'))

# same as above but names(out) == text
gregexp('(?<first>[a-z]+) and (?<second>[a-z]+)', text)

[Package utilitiesR version 2.0 Index]