Am I the last person to know of XMLStarlet? I got introduced to it by astronaught in #lugradio, who was asking how to use it to copy an XML document but exclude certain elements.
Given the document below, I want to duplicate it but only include chapter
elements with certain names, let’s say “intro”, “appendix1” and “appendix2”.
<doc>
<header>…</header>
<chapter name="intro">…</chapter>
<chapter name="preface">…</chapter>
<chapter name="one">…</chapter>
<chapter name="two">…</chapter>
<chapter name="three">…</chapter>
<chapter name="four">…</chapter>
<chapter name="appendix1">…</chapter>
<chapter name="appendix2">…</chapter>
<chapter name="notes">…</chapter>
<footer>…</footer>
</doc>
I know enough of XSLT and XPath to get me by, and had an idea of how to do it going down that road. I didn’t actually produce a working XSLT‐based answer at the time, but here’s one:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<!-- Identity transform -->
<xsl:template match="node()|@*">
<xsl:copy>
<xsl:apply-templates select="@*|node()"/>
</xsl:copy>
</xsl:template>
<!-- Only include certain chapter elements. -->
<xsl:template match="chapter[@name != 'intro' and @name != 'appendix1' and @name != 'appendix2']"/>
</xsl:stylesheet>
I spent a lot of time trying to work out how this reasonably simple transform
would look like using an xmlstarlet sel
command line. You can select the
chapter nodes with:
xmlstarlet sel -t -m "/doc/chapter[@name='intro' or @name='appendix1' or @name='appendix2']" -c .
That didn’t get me anywhere, so I gave up and started playing with other
XMLStarlet commands. ed
is what I really wanted:
xmlstarlet ed -d '/doc/chapter[@name != "intro" and @name != "appendix1" and @name != "appendix2"]'
Simple ☺