Am I the last person to know of XMLStarlet? I got introduced to it by astronaught in #lugradio, who was asking how to use it to copy an XML document but exclude certain elements.

Given the document below, I want to duplicate it but only include chapter elements with certain names, let’s say “intro”, “appendix1” and “appendix2”.

<doc>
  <header>…</header>
  <chapter name="intro">…</chapter>
  <chapter name="preface">…</chapter>
  <chapter name="one">…</chapter>
  <chapter name="two">…</chapter>
  <chapter name="three">…</chapter>
  <chapter name="four">…</chapter>
  <chapter name="appendix1">…</chapter>
  <chapter name="appendix2">…</chapter>
  <chapter name="notes">…</chapter>
  <footer>…</footer>
</doc>

I know enough of XSLT and XPath to get me by, and had an idea of how to do it going down that road. I didn’t actually produce a working XSLT‐based answer at the time, but here’s one:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <!-- Identity transform -->
  <xsl:template match="node()|@*">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <!-- Only include certain chapter elements. -->
  <xsl:template match="chapter[@name != 'intro' and @name != 'appendix1' and @name != 'appendix2']"/>
</xsl:stylesheet>

I spent a lot of time trying to work out how this reasonably simple transform would look like using an xmlstarlet sel command line. You can select the chapter nodes with:

xmlstarlet sel -t -m "/doc/chapter[@name='intro' or @name='appendix1' or @name='appendix2']" -c .

That didn’t get me anywhere, so I gave up and started playing with other XMLStarlet commands. ed is what I really wanted:

xmlstarlet ed -d '/doc/chapter[@name != "intro" and @name != "appendix1" and @name != "appendix2"]'

Simple ☺