SYNOPSYS
    xml_grep2 [*options*] *xpath_expression* [*FILE*...]

DESCRIPTION
    "xml_grep2" is a grep-like utility for XML files.

    It mimicks grep as much as possible with the major difference that the
    patterns are XPath expressions instead of regular expressions.

    When the results of the grep is a list of XML nodes (ie no option that
    causes the output to be plain text is used) then the output is normally
    a single XML document: results are wrapped in a single root element
    ("xg2:result_set"). When several files are grepped, the results are
    grouped by file, wrapped in a single element ("xg2:file") with an
    attribute ("xg2:filename") giving the name of the file.

OPTIONS
    -c, --count
        Suppress normal output; instead print a count of matching lines for
        each input file.

    --help
        Display help message

    -f *NUM*, --format *NUM*
        Format, of the output XML

        The format parameter sets the indenting of the output. This
        parameter is expected to be an integer value, that specifies that
        indentation should be used. The format parameter can have three
        different values if it is used:

        If *NUM* is 0, than the document is dumped as it was originally
        parsed

        If *NUM* is 1, libxml2 will add ignorable whitespaces, so the nodes
        content is easier to read. Existing text nodes will not be altered

        If *NUM* is 2 (or higher), libxml2 will act as $format == 1 but it
        add a leading and a trailing linebreak to each text node.

        libxml2 uses a hardcoded indentation of 2 space characters per
        indentation level. This value can not be altered on runtime.

    -g, --generate-empty-set
        Generate an XML result (consisting of only the wrapper) even if no
        result has been found

    -H, --wrap, --with-filename
        Force results for each file to be wrapped, even if only 1 file is
        grepped.

        Results are normally wrapped by file only when 2 or more files are
        grepped

        When the "-t", "--text" option is used, prints the filename for each
        match.

    -h, --nowrap, --no-filename
        Suppress the wrapping of results by file, even if more than one file
        is grepped.

        When the "-t", "--text" option is used, suppress the prefixing of
        filenames on output when multiple files are searched.

    --html
        Parses the input as HTML instead of XML

    -L, --files-without-matches
        Suppress normal output; instead print the name of each input file
        from which no output would normally have been printed. Note that the
        file still needs to be parsed and loaded.

    -l, --files-with-matches
        Suppress normal output; instead print the name of each input file
        from which output would normally have been printed. Note that the
        file still needs to be parsed and loaded.

    --label *LABEL*
        Displays input actually coming from standard input as input com- ing
        from file LABEL. This is especially useful for tools like zgrep,
        e.g. gzip -cd foo.xml.gz | xml_grep --label=foo.xml something

    -M, --man
        Display long help message

    -m *NUM*, --max-count *NUM*
        Output only *NUM* matches. Note that the file still needs to be
        parsed and loaded.

    -N *PREFIX*=*URI*, --define-ns *PREFIX*=*URI*
        Defines a namespace mapping, that can then be used in the XPath
        query.

        This is the only way to query elements (or attributes) in the
        default namespace.

        "XML::LibXML::XPathContext" needs to be installed for this option to
        be available.

        Several -N, --define-ns options can be used

    -n *STRING*, --namespace *STRING*
        Change the default namespace prefix used for wrapping results. The
        default is "xg2". Use an empty string "-n ''" to remove the
        namespace altogether.

        If a namespace (default or otherwise) is used, it is associated to
        the URI "http://xmltwig.com/tools/xml_grep2/"

    -o, --original-encoding
        Output results in the original encoding of the file. Otherwise
        output is in UTF-8.

        The exception to this is when the -v, --invert-match option is used,
        in which case the original encoding is used.

        If the result is an XML document then the encoding will be the
        encoding of the first document with hits.

        Note that if grepping files in various encodings the result could
        very well be not well-formed XML.

        Without this argument all outputs are in UTF-8.

    -q, --quiet, --silent
        Quiet; do not write anything to standard output. Exit immediately
        with zero status if any match is found, even if an error was
        detected. Also see the -s or --no-messages option.

    -R, -r, --recursive
        Read all files under each directory, recursively

    --include *PATTERN*
        Recurse in directories only searching file matching *PATTERN*.

    --exclude *PATTERN*
        Recurse in directories skip file matching *PATTERN*.

    -s, --no-messages
        Suppress error messages about nonexistent or unreadable files.

    -t, --text-only
        Return the result as text (using the XPath *value* of nodes).
        Results are stripped of newlines and output 1 per line.

        Results are in the original encoding for the document.

    -V, --version
        Print the version number of xml_grep2 to standard error. This
        version number should be included in all bug reports (see below).

    -v, --invert-match
        Return the original document without nodes matching the pattern
        argument Note that in this mode documents are output on their
        original_encoding.

    x, no-xml-wrap
        Suppress the output of the XML wrap around XML result.

        Useful for exemple when returning collection of attribute nodes.

Differences with grep
    There are some differences in behaviour with grep that are worth being
    mentioned:

    files are always parsed and loaded in memory
        This is inevitable due to the radom-access nature of XPath

    the file list is built before the grepping start
        This means that warnings about permission problems are reported all
        at once before the results are output.

BUGS, TODO
    namespace problems
        When a namespace mapping is defined using the -N, define-ns option,
        if this prefix is found in a document, even bound to a different
        namespace, it will match.

        When a prefix is defined using the -N, define-ns option, if the
        prefix is found in a file, then the one defined on the command line
        will not match for this file

    Encoding
        Avoid outputing characters outside of the basic ASCII range as
        numerical entities

        Allow encoding conversions

    XML parsing errors
        Deal better with malformed XML, probably through an option to skip
        malformed XML files without dying

    Be more compatible with "grep"
        Do not build the list of files up front. Report bad links.

    package properly, more tests, more docs...

XPath
    see http://www.w3.org/TR/xpath/ for the spec

    see http://zvon.org/xxl/XPathTutorial/General/examples.html for a
    tutorial

EXAMPLES
    xml_grep2 //h1 index.xhtml
        Extract "h1" elements from "index.xhtml". Do not forget the "//" or
        you will not get any result.

    xml_grep2 '//h1|//h2' index.xhtml
        Extract "h1" and "h2" elements from "index.xhtml". The expression
        needs to be quoted because the "|" is special for the shell.

    xml_grep2 -t -h -r --include '*.xml' '//RowAmount' /invoices/
        Get the content (-t) of all "RowAmount" elements in ".xml" files in
        the "invoices" directory (and sub-directories)

        The result will be a text stream with 1 result perl line. The -h
        option suppresses the display of the file name at the beginning of
        each line.

    xml_grep2 -t -r -h --include '*.xml' '//@AmountCurrencyIdentifier'
    /invoices/
        Get the value of all "AmountCurrencyIdentifier" attribute in ".xml"
        files in the "invoices" directory (and sub-directories). Piping this
        to "sort -u" will give you all the currencies used in the invoices.

    xml_grep2 -v '/p[@class="classified"]' secret.xml > pr.xml
        Remove all "p" elements in the "classified" class from the file
        "secret.xml"

    xml_grep2 -t -N d='http://purl.org/rss/1.0/' '//d:title'
    use.perl.org.rss.xml
        Extract the text of the titles from the RSS feed for use.perl.org

        As the title elements are in the default namespace, the only way to
        get them is to define a mapping between a prefix and the namespace
        URI, then to use it.

    GET http://xmltwig.com/index.html | ./xml_grep2 --html -t '//@href' |
    sort -u
        Get the list of links in a web page

REQUIREMENTS
    Perl 5,

    libxml2

    XML::LibXML

    XML::LibXML::XPathContext for -N, --define-ns option

    Pod::Usage;

    Getopt::Long;

SEE ALSO
    "xml_grep", distributed with the XML::Twig Perl module offers a less
    powerful but often more memory efficient implementation of an XML
    grepper.

    "xsh" (http://xsh.sourceforge.net) is an XML shell also based on
    "libxml2" and "XML::LibXML".

    "XMLStarlet" (http://xmlstar.sourceforge.net/) is a set of tools to
    process XML written in C and also based on "libxml2"

LICENSE
    This library is free software; you can redistribute it and/or modify it
    under the same terms as Perl itself.

AUTHOR
    Michel Rodriguez <mirod@xmltwig.com>