Normalizing Excel’s SpreadsheetML using XSLT

Spreadsheet tables are full of holes and spreadsheet processors such as OpenOffice or Excel have implemented hacks to avoid having to store empty cells.

In the case of Excel, that’s done using ss:Index and ss:MergeAcross attributes.

While these attributes are easy enough to understand, they add a great deal of complexity to XSLT transformations that need to access to a specific cell since you can’t any longer index directly your target.

The traditional way to work around this kind of issue is to pre-process your spreadsheet document to get an intermediary result that lets you index your target cells.

Having already encountered this issue with OpenOffice, I needed something to do the same with Excel when Google led me to a blog entry proposing a transformation something similar.

The transformation needed some adaptation to be usable as I wanted to use it, ie as a transformation that does not modify your SpreadsheetML document except for inserting an ss:Index attribute to every cell.

Here is the result of this adaptation:

This version is buggy. An updated one is available here

<?xml version="1.0"?>
<!--

Adapted from http://ewbi.blogs.com/develops/2004/12/normalize_excel.html

This product may incorporate intellectual property owned by Microsoft Corporation. The terms
and conditions upon which Microsoft is licensing such intellectual property may be found at
http://msdn.microsoft.com/library/en-us/odcXMLRef/html/odcXMLRefLegalNotice.asp.
-->
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns="urn:schemas-microsoft-com:office:spreadsheet"
    xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet">
    <xsl:output method="xml" indent="no" encoding="UTF-8"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="ss:Cell/@ss:Index"/>
    <xsl:template match="ss:Cell">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:variable name="prevCells" select="preceding-sibling::ss:Cell"/>
            <xsl:attribute name="ss:Index">
                <xsl:choose>
                    <xsl:when test="@ss:Index">
                        <xsl:value-of select="@ss:Index"/>
                    </xsl:when>
                    <xsl:when test="count($prevCells) = 0">
                        <xsl:value-of select="1"/>
                    </xsl:when>
                    <xsl:when test="count($prevCells[@ss:Index]) > 0">
                        <xsl:value-of select="($prevCells[@ss:Index][1]/@ss:Index) +
                            ((count($prevCells) + 1) -
                            (count($prevCells[@ss:Index][1]/preceding-sibling::ss:Cell)
                            + 1)) + sum($prevCells/@ss:MergeAcross)"/>
                    </xsl:when>
                    <xsl:otherwise>
                        <xsl:value-of select="count($prevCells) + 1 +
                            sum($prevCells/@ss:MergeAcross)"/>
                    </xsl:otherwise>
                </xsl:choose>
            </xsl:attribute>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>
            

This version is buggy. An updated one is available here

Share and Enjoy:
  • Identi.ca
  • StumbleUpon
  • del.icio.us
  • Facebook
  • Twitter
  • Add to favorites

3 thoughts on “Normalizing Excel’s SpreadsheetML using XSLT”

  1. this code has the same problem as the source it is derived from, it will fail for all cells lacking an ss:Index that are preceded by two or more missing cells

    for instance, the input cells (two missing cells, each followed by a cell with an ss:Index)…

    1,2,3,4,5,6,,8,,10,11,12

    will result in output cells with indices…

    1,2,3,4,5,6,8,10,10,11

    i haven’t tested fully, but in practice i think the output index count is one too low for each one or more contiguous cells with an ss:Index attribute

    i just wish my xsl skills were sufficient to fix it!

  2. I have rendered some XML data into spreadsheetML however I am llooking for a way to include some vba/vb script when i open this file in Excel, does anyone know how to do this?

Leave a Reply

Your email address will not be published. Required fields are marked *

Enter your OpenID as your website to log and skip name and email validation and moderation!