Jump to content

Counting without counting duplicates


Maunder

Recommended Posts

The desired output is the count of unique image file names in an XML document that must be transformed with an XSL document. The file names that need to be counted are found along the two different paths specified below. List A of "ImageName" is gleaned from "//OrdersByImages/ImageOrders" andList B of "ImageName" is gleaned from "//OrdersByImages/ImageOrders/ImageNodes/ImageNode". The problem has, at least, two aspects. The first aspect is that some file names in List B may duplicate file names in List A, thus inflating the count, i.e. img01.jpg is in both List A and List B. The second aspect is that some file names in List B may duplicate other file names in List B, also inflating the count, i.e. img01.jpg appears twice in List B. Whether one or more duplicates of, for instance, img01.jpg, appears in List A and/or List B, the solution requires that no specific image file name be counted more than once.Any relevant advice or direction will be greathly appreciated.Thanks in advance for your response.Maunder mcross@mckennapro.com

Link to comment
Share on other sites

  • 1 month later...

Using this XML (I guessed at your structure), I first created a node-set of all ImageName nodes. Then simply used the preceding-sibling axis to check for duplicates. I'm sure that this could have been much more elegant using the Muenchian method, but I will leave that to you.XML I used:

<OrdersByImages>	<ImageOrders>		<ImageName>img01.jpg</ImageName>		<ImageName>img02.jpg</ImageName>		<ImageName>img03.jpg</ImageName>		<ImageNodes><ImageName>img01.jpg</ImageName></ImageNodes>		<ImageNodes><ImageName>img04.jpg</ImageName></ImageNodes>		<ImageNodes><ImageName>img04.jpg</ImageName></ImageNodes>				<ImageNodes><ImageName>img05.jpg</ImageName></ImageNodes>	</ImageOrders></OrdersByImages>

Stylesheet

<?xml version="1.0"?><xsl:stylesheet 	version="1.0"	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"	xmlns:msxsl="urn:schemas-microsoft-com:xslt">	<xsl:template match="/"><!-- *************************************** --><!-- Put ALL ImageName nodes into a node-set --> <!-- *************************************** -->		<xsl:variable name="ImageNames">			<xsl:for-each select="//ImageName">				<ImgName>				<xsl:value-of select="." />				</ImgName>			</xsl:for-each>		</xsl:variable><!-- ************************************** --><!-- Return only the unqiue names from that node-set (sorted) --><!-- ************************************** -->		<xsl:for-each select="msxsl:node-set($ImageNames)//ImgName[not(. = preceding-sibling::ImgName)]">			<xsl:sort select="."/>			<xsl:value-of select="."/><br/>		</xsl:for-each>		<hr/>		The count is: 		<xsl:value-of select="count(msxsl:node-set($ImageNames)//ImgName[not(. = preceding-sibling::ImgName)]) " />	</xsl:template></xsl:stylesheet>

which produces this output

img01.jpgimg02.jpgimg03.jpgimg04.jpgimg05.jpg--------------------------------------------------------------------------------The count is: 5

You will note that I used MSXML parser. You will need to change the references to whatever parser you are using

Link to comment
Share on other sites

Thank you for responding to my quandary, aalbetski. Since posting it, I have devised and implemented the solution (Muenchianesque?) indicated below. Since this solution is not broken, I do not intend to fix it, but I do find your approach quite instructive and will keep it for future reference. <!-- In my solution the following appears in the style sheet prior to the template declaration--><xsl:key name="ImgFiles" match="//ImageOrders/ImageName[substring-after(text(),'.')!=comp] | //ImageOrders/ImageNodes/ImageNode/ImageName" use="text()"/><!-- Then the following is inserted in the style sheet where the count needs to appear--><xsl:number value="count(//ImageOrders/ImageName[substring-after(text(),'.')!=comp])+count(//ImageOrders/ImageNodes/ImageNode/ImageName[text() != following::ImageOrders/ImageName])-count(//ImageOrders/ImageName[substring-after(text(),'.')!=comp][text() = following::ImageOrders/ImageNodes/ImageNode/ImageName])"/>

Link to comment
Share on other sites

I must say I prefer the second solution. After all, it's portable, unlike the first example.But I've gotta ask... what is !='comp' used for? How does your XML look like? Maybe it will show the bigger picture.I would like to understand how you did that. It's a whole workaround for the non-widely-supported-yet distinct-values() function.

Link to comment
Share on other sites

I must say I prefer the second solution. After all, it's portable, unlike the first example.But I've gotta ask... what is !='comp' used for? How does your XML look like? Maybe it will show the bigger picture.I would like to understand how you did that. It's a whole workaround for the non-widely-supported-yet distinct-values() function.
Dear boen_robot,Yes, I did come up with this as a substitute for the distinct-values() function, which is not supported in my environment. Following is a 'moot court' example of the xml structure and xsl script for your consideration. Here is the xml: <?xml version="1.0" encoding="ISO-8859-1"?><!--The following line is not part of the original xml but facilitates previewing the solution during development. I use NoteTab Pro 4.95 for a text editor.--><?xml-stylesheet type="text/xsl" href="XSLtemp02.xsl"?><!--Following is a simplification of the original xml structure focused on the 'counting without duplicates' problem. Note that, in this case, the image nodes sections only appear where the image name has a comp extension, i.e. img20.comp. The comp extension is generated by the third-party software that produces the xml and refers to composite. Also, note that the images may be either jpg or tif and that such extensions may be either upper or lower case. Thus, I have used !=comp to avoid counting composites.--><OrdersByImages> <ImageOrders> <ImageName>img01.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img02.JPG</ImageName> </ImageOrders> <ImageOrders> <ImageName>img03.tif</ImageName> </ImageOrders> <ImageOrders> <ImageName>img04.JPG</ImageName> </ImageOrders> <ImageOrders> <ImageName>img05.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img06.TIF</ImageName> </ImageOrders> <ImageOrders> <ImageName>img07.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img08.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img09.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img10.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img11.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img12.comp</ImageName> <ImageNodes> <ImageNode> <ImageName>img12.jpg</ImageName> </ImageNode> </ImageNodes> </ImageOrders> <ImageOrders> <ImageName>img13.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img14.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img15.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img16.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img17.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img18.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img19.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img20.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img20.comp</ImageName> <ImageNodes> <ImageNode> <ImageName>img20.jpg</ImageName> </ImageNode> </ImageNodes> </ImageOrders> <ImageOrders> <ImageName>img22.comp</ImageName> <ImageNodes> <ImageNode> <ImageName>img21.jpg</ImageName> </ImageNode> </ImageNodes> </ImageOrders> <ImageOrders> <ImageName>img21.comp</ImageName> <ImageNodes> <ImageNode> <ImageName>img20.jpg</ImageName> </ImageNode> </ImageNodes> </ImageOrders> <ImageOrders> <ImageName>img23.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img23.comp</ImageName> <ImageNodes> <ImageNode> <ImageName>img23.tif</ImageName> </ImageNode> <ImageNode> <ImageName>img23.jpg</ImageName> </ImageNode> </ImageNodes> </ImageOrders> <ImageOrders> <ImageName>img26.comp</ImageName> <ImageNodes> <ImageNode> <ImageName>img26.jpg</ImageName> </ImageNode> </ImageNodes> </ImageOrders> <ImageOrders> <ImageName>img25.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img24.jpg</ImageName> </ImageOrders> <ImageOrders> <ImageName>img27.comp</ImageName> <ImageNodes> <ImageNode> <ImageName>img27.jpg</ImageName> </ImageNode> </ImageNodes> </ImageOrders></OrdersByImages>Here is the xsl: <?xml version="1.0" encoding="ISO-8859-1"?><xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"><xsl:key name="ImgFiles" match="//ImageOrders/ImageName[substring-after(text(),'.')!=comp] | //ImageOrders/ImageNodes/ImageNode/ImageName" use="text()"/><xsl:template match="/"><html><body><xsl:number value="count(//ImageOrders/ImageName[substring-after(text(),'.')!=comp])+count(//ImageOrders/ImageNodes/ImageNode/ImageName[text() != following::ImageOrders/ImageName])-count(//ImageOrders/ImageName[substring-after(text(),'.')!=comp][text() = following::ImageOrders/ImageNodes/ImageNode/ImageName])"/> </body></html></xsl:template></xsl:stylesheet>I hope this helps you and others. Maunder
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...