IPB

Welcome Guest ( Log In | Register )

3 Pages V   1 2 3 >  
Reply to this topicStart new topic
> Any fast way to do this?, I need to convert XHTML to XML, or "XMLization"
FirefoxRocks
post Sep 1 2007, 09:54 PM
Post #1


Invested Member
***

Group: Members
Posts: 806
Joined: 2-April 06
From: Ontario, Canada
Member No.: 3,811
Languages: XHTML, CSS, SVG, XML, XSLT, PHP, JavaScript



I had a collection of recipes that I stored in the Microsoft Word format for some time and then thought that storing them in XHTML format would save loads of space (at that time, I had floppies).

So I tried converting some of them and it was only 2-3kb per recipe instead of 20-30kb. What a huge amount of space saved!

Now that I read about XML, I think that it is appropriate for me to store my recipes in XML format instead of XHTML. The problem is I have a whole lot of XHTML files and I want to convert them to XML fast.

I still have a few Microsoft Word documents that I plan to convert to XML manually, but is there a way to XMLize the XHTML documents in a fast manner. I'm guessing not because XHTML elements are defined, but XML elements are made up, then defined in the DTD, Schema, or whatever.

I'm guessing if it is possible, I will use XML DOM.
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 1 2007, 10:54 PM
Post #2


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



XHTML is XML. You can use the same tools you use for XML on XHTML.

I mean, just imagine that you define your own XML markup language that looked like XHTML, but you specify your own DTD rather then the W3C one. If you've covered everything, you'll notice that your language is XHTML.

If you mean HTML, then you can covert it to XHTML by using tools like Tidy. Special batch files or PHP bindings may help you to do that on many files at once.
Go to the top of the page
 
+Quote Post
FirefoxRocks
post Sep 1 2007, 11:04 PM
Post #3


Invested Member
***

Group: Members
Posts: 806
Joined: 2-April 06
From: Ontario, Canada
Member No.: 3,811
Languages: XHTML, CSS, SVG, XML, XSLT, PHP, JavaScript



Ok, I thought that XML was used for storing, organizing and generally containing information while XHTML is used to present the information to people?
Go to the top of the page
 
+Quote Post
Synook
post Sep 2 2007, 06:28 AM
Post #4


53 79 6E 6F 6F 6B 0D 0A
******

Group: Moderator
Posts: 5,196
Joined: 14-July 07
From: Australia
Member No.: 15,617
Languages: (X)(HT)ML, CSS, PHP, SQL, JavaScript, Java (a bit)



XHTML is a form of XML (XHTML lol). XML is a generalized class of languages that uses the markup system (with <tags>) to control document flow. There are many other markup languages besides XHTML that are used to present information, for example SVG and SMIL. So, there is no need to convert your XHTML files to XML because they already are XML (IMG:http://w3schools.invisionzone.com/style_emoticons/default/smile.gif)

You could, however, as Boen_Robot hinted to, create a new markup language with a Document Type Definition (DTD) that is more suited to the type of data you are storing (which is why XML is extensible). Then, I suppose, you could write a script in any programming language with an XML parser (such as Java or PHP) that converts your XHTML documents to your new custom markup language. What sort of information do you have stored currently?
Go to the top of the page
 
+Quote Post
FirefoxRocks
post Sep 2 2007, 04:34 PM
Post #5


Invested Member
***

Group: Members
Posts: 806
Joined: 2-April 06
From: Ontario, Canada
Member No.: 3,811
Languages: XHTML, CSS, SVG, XML, XSLT, PHP, JavaScript



That is exactly what I need. "create a new markup language with a Document Type Definition (DTD) that is more suited to the type of data you are storing"

I will give an example of what I want to do:

Here are my current documents:
HTML
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
<head>
<title>Tea Biscuits</title>
<link rel="stylesheet" type="text/css" href="css.css" />
</head>
<body>
<h1>Tea Biscuits</h1>
<p class="intro">I have this on some of my XHTML documents, but not all of them.</p>
<h3>Ingredients</h3>
<ul>
<li>2 eggs</li>
<li>250 mL (1 cup) sweet cream</li>
<li>2.5 mL (&frac12; tsp) salt</li>
<li>10 mL (2 tsps) baking powder</li>
<li>330 mL (1 &frac34; cups) all-purpose flour</li>
</ul>
<h3>Procedure</h3>
<ol>
<li>Put unbeaten eggs in mixing bowl.</li>
<li>Add cream, salt, then flour which has been sifted with baking powder.</li>
<li>Stir well and drop on greased cookie sheet or bake in square 8x8&rdquo; pan.</li>
<li>For shortcake: bake in pie plate.</li>
</ol>
<p>Bake Time: 20 minutes at 375<sup>o</sup>F</p>
</body>
</html>


What I sort of want is this:

HTML
<?xml version="1.0" encoding="iso-8859-1"?>
<!DOCTYPE recipe SYSTEM "recipe.dtd">
<recipe>
<name>Tea Biscuits</name>
<type>Biscuits</type>
<category>-</category>
<intro>I have this on some of my XML documents, but not all of them.</intro>
<ingredient>
<amount>2</amount> eggs
</ingredient>
<ingredient>
<amount>250 mL (1 cup)</amount> sweet cream
</ingredient>
<ingredient>
<amount>2.5 mL (&frac12; tsp)</amount> salt
</ingredient>
<ingredient>
<amount>10 mL (2 tsps)</amount> baking powder
</ingredient>
<ingredient>
<amount>330 mL (1 &frac34; cups)</amount> all-purpose flour
</ingredient>
<procedure>
<step>Put unbeaten eggs in mixing bowl.</step>
<step>Add cream, salt, then flour which has been sifted with baking powder.</step>
<step>Stir well and drop on greased cookie sheet or bake in square 8x8&quot; pan.</step>
<step>For shortcake: bake in pie plate.</step>
</procedure>
<baketime>Bake Time: 20 minutes at 375<sup>o</sup>F</baketime>
</recipe>


Then I will use XSLT to transform the document to make it look nicer than the tree view in browsers (IMG:http://w3schools.invisionzone.com/style_emoticons/default/wink.gif)

So is there a fast way to do this, or no?
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 2 2007, 05:30 PM
Post #6


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



There is - more XSLT or DOM - the same tools you'll use for the XML to XHTML transformation. However, if the final goal is to display it in the browser, the point is kind'a killed, unless you wanted the data to be formatted differently (i.e. to have another markup).

Also, the XHTML generated by Word is not semantically rich enough for you to get your output (XML) that easily. Not with XSLT 1.0 anyway. The other way around would have been a breeze, but this is going to be tricky and quirky, especially if your recepies are not formatted consistently (eg. if you have "2 loafs of bread" at one spot and "loafs of bread - 2" at another).

BTW, DTD is not needed unless you wanted to use entities or were going to let others input recepies (for the second case, Schema is probably a better solution).
Go to the top of the page
 
+Quote Post
FirefoxRocks
post Sep 2 2007, 05:56 PM
Post #7


Invested Member
***

Group: Members
Posts: 806
Joined: 2-April 06
From: Ontario, Canada
Member No.: 3,811
Languages: XHTML, CSS, SVG, XML, XSLT, PHP, JavaScript



QUOTE (boen_robot @ Sep 2 2007, 12:30 PM) *
There is - more XSLT or DOM - the same tools you'll use for the XML to XHTML transformation. However, if the final goal is to display it in the browser, the point is kind'a killed, unless you wanted the data to be formatted differently (i.e. to have another markup).


The final result was originally to print more than 1 recipe on a sheet of paper, but I don't know if that is possible from using an XSLT transformation.
And it was also to organize information so I can find stuff easier.

QUOTE (boen_robot @ Sep 2 2007, 12:30 PM) *
Also, the XHTML generated by Word is not semantically rich enough for you to get your output (XML) that easily. Not with XSLT 1.0 anyway. The other way around would have been a breeze, but this is going to be tricky and quirky, especially if your recepies are not formatted consistently (eg. if you have "2 loafs of bread" at one spot and "loafs of bread - 2" at another).


I didn't use Microsoft Word to generate my XHTML, I did it manually. I think I have my recipes formatted consistently, I try to anyways.

QUOTE (boen_robot @ Sep 2 2007, 12:30 PM) *
BTW, DTD is not needed unless you wanted to use entities or were going to let others input recepies (for the second case, Schema is probably a better solution).


I need to use entities for fractions and a few other things. I already finished the DTD portion of it.
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 2 2007, 07:26 PM
Post #8


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



QUOTE (FirefoxRocks @ Sep 2 2007, 08:56 PM) *
The final result was originally to print more than 1 recipe on a sheet of paper, but I don't know if that is possible from using an XSLT transformation.
And it was also to organize information so I can find stuff easier.

OK. If that's the case, you can easily do that by just manually creating, or better yet - generating, a list of your recepies and their respective files (a sort of sitemap) into an XML and pass that as input to XSLT. From then on, using XSLT's document() function, you can easily get only the contents of each <body/> into a new tree.

QUOTE (FirefoxRocks @ Sep 2 2007, 08:56 PM) *
I didn't use Microsoft Word to generate my XHTML, I did it manually. I think I have my recipes formatted consistently, I try to anyways.

Great. All the better.

QUOTE (FirefoxRocks @ Sep 2 2007, 08:56 PM) *
I need to use entities for fractions and a few other things. I already finished the DTD portion of it.

Again good. Just be sure to have the DTD locally (instead of using W3C's entieies list directly) as you'll otherwise slow the process down a lot.
Go to the top of the page
 
+Quote Post
FirefoxRocks
post Sep 2 2007, 08:39 PM
Post #9


Invested Member
***

Group: Members
Posts: 806
Joined: 2-April 06
From: Ontario, Canada
Member No.: 3,811
Languages: XHTML, CSS, SVG, XML, XSLT, PHP, JavaScript



QUOTE (boen_robot @ Sep 2 2007, 02:26 PM) *
OK. If that's the case, you can easily do that by just manually creating, or better yet - generating, a list of your recepies and their respective files (a sort of sitemap) into an XML and pass that as input to XSLT. From then on, using XSLT's document() function, you can easily get only the contents of each <body/> into a new tree.


Sorry, I don't understand. What do you mean by getting the contents of each <body/> into a new tree?

QUOTE (boen_robot @ Sep 2 2007, 02:26 PM) *
Great. All the better.

So does this mean I have to manually convert everything into XML?

QUOTE (boen_robot @ Sep 2 2007, 02:26 PM) *
Again good. Just be sure to have the DTD locally (instead of using W3C's entieies list directly) as you'll otherwise slow the process down a lot.


My DTD looks like this, is this ok?
CODE
<!ELEMENT recipe (name,type,category?,intro?,ingredient+,procedure,baketime?,servings?)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT type (#PCDATA)>
<!ELEMENT category (#PCDATA)>
<!ELEMENT intro (#PCDATA)>
<!ELEMENT ingredient (#PCDATA|amount)*>
<!ELEMENT amount (#PCDATA)>
<!ELEMENT procedure (step+)>
<!ELEMENT step (#PCDATA)>
<!ENTITY frac12 "½">
<!ENTITY frac34 "¾">
<!ENTITY deg "º">
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 2 2007, 09:27 PM
Post #10


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



QUOTE
My DTD looks like this, is this ok?

I have no idea. I don't know much DTD.

QUOTE
So does this mean I have to manually convert everything into XML?

No, as it is already XML. That's what I was trying to say with the first post.

QUOTE
Sorry, I don't understand. What do you mean by getting the contents of each <body/> into a new tree?

OK. Too much theory, not much code. Sorry about that.

If you have a single XML file of your own, that represents a list of your recepies, say like this:
CODE
<recipies>
<recipe>recipe1.xhtml</recipe>
<recipe>recipe2.xhtml</recipe>
<!--All other recipies -->
</recipies>

You could then create an XSLT that would fetch the contents of the <body/> of each of those items in the list and assable all of them into a new XHTML (tree) that will be presented to the user. A simple template for that would be:
CODE
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml" xmlns:h="http://www.w3.org/1999/xhtml" exclude-result-prefixes="h">
    <xsl:template match="/recipies">
        <html>
            <head>
                <title>Recipies</title>
            </head>
            <body>
                <h1>My recipies</h1>
                <xsl:for-each select="recipe">
                    <div>
                        <h2>
                            <xsl:value-of select="document(.)//h:title"/>
                        </h2>
                        <div>
                            <xsl:copy-of select="document(.)//h:body/*|document(.)//h:body/text()"/>
                        </div>
                    </div>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

You could easily generate a list of your files with PHP 5 and DOM. Here's a simple wrap up for the above sample, that assumes all of your recipies are in a single folder called "recipies" at your document root:
CODE
<?php
$recipiesFolder = scandir('/recipies');

$dom = new DOMDocument('1.0','utf-8');
$dom->appendChild($dom->createElement('recipies'));
$recipies = $dom->documentElement;

foreach ($recipiesFolder as $recipe) {
    $recipies->appendChild($dom->createElement('recipe',$recipe));
}

/* This is only if you want to see the result as generated. If you want, you could also do the XSLT transformation in PHP, as long as you have the XSL extension */
header('Content-type: application/xml');
$dom->saveXML();
?>
Go to the top of the page
 
+Quote Post
FirefoxRocks
post Sep 2 2007, 09:33 PM
Post #11


Invested Member
***

Group: Members
Posts: 806
Joined: 2-April 06
From: Ontario, Canada
Member No.: 3,811
Languages: XHTML, CSS, SVG, XML, XSLT, PHP, JavaScript



QUOTE (boen_robot @ Sep 2 2007, 04:27 PM) *
If you have a single XML file of your own, that represents a list of your recepies, say like this:
CODE
<recipies>
<recipe file="recipe1.xhtml"/>
<recipe file="recipe2.xhtml"/>
<!--All other recipies -->
</recipies>


Do you seriously mean I have to create that?!?

QUOTE (boen_robot @ Sep 2 2007, 04:27 PM) *
You could then create an XSLT that would fetch the contents of the <body/> of each of those items in the list and assable all of them into a new XHTML (tree) that will be presented to the user. A simple template for that would be:
CODE
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.w3.org/1999/xhtml" xmlns:h="http://www.w3.org/1999/xhtml" exclude-result-prefixes="h">
    <xsl:template match="/recipies">
        <html>
            <head>
                <title>Recipies</title>
            </head>
            <body>
                <h1>My recipies</h1>
                <xsl:for-each select="recipe">
                    <div>
                        <h2>
                            <xsl:value-of select="document(@file)//h:title"/>
                        </h2>
                        <xsl:copy-of select="document(@file)//h:body/*|document(@file)//h:body/text()"/>
                    </div>
                </xsl:for-each>
            </body>
        </html>
    </xsl:template>
</xsl:stylesheet>

You could easily generate a list of your files with PHP and DOM.


Ok it looks simple enough but what is the h: namespace for? I don't understand too much about that part. Oh and by the way, you do realize that these files are on a USB drive/computer, not on a server with PHP.
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 2 2007, 09:48 PM
Post #12


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



It's a namespace declaration. The "h" prefix is binded to the XHTML namespace URI, which is "http://www.w3.org/1999/xhtml". You can see the actual binding in the first line of the XSLT:
CODE
xmlns:h="http://www.w3.org/1999/xhtml"

The other part
CODE
xmlns="http://www.w3.org/1999/xhtml"

adds a default namespace for the output tree, which is again XHTML's namespace URI
The last part
CODE
exclude-result-prefixes="h"

Removes the decalred prefix from the output XML file. Otherwise, the output XHTML file would needlessly (at least in this case) keep the "h" namespace binding.

QUOTE
Oh and by the way, you do realize that these files are on a USB drive/computer, not on a server with PHP.

Crap. Well, I'm afraid you'll have to put them on a server with PHP then. Either that, or compile yourself a program in JAVA or C++. The choise is yours, and I think I know what you'll choose.
Go to the top of the page
 
+Quote Post
FirefoxRocks
post Sep 2 2007, 10:38 PM
Post #13


Invested Member
***

Group: Members
Posts: 806
Joined: 2-April 06
From: Ontario, Canada
Member No.: 3,811
Languages: XHTML, CSS, SVG, XML, XSLT, PHP, JavaScript



QUOTE (boen_robot @ Sep 2 2007, 04:48 PM) *
Crap. Well, I'm afraid you'll have to put them on a server with PHP then. Either that, or compile yourself a program in JAVA or C++. The choise is yours, and I think I know what you'll choose.


1. I can install PHP on a Windows 98 computer and do it from there, but do I need apache then?
2. Is there a very fast tutorial on compiling this sort of program using Visual Basic 2005 Express?
3. In MS-DOS, I can do a "dir *.html /p" command and it will list all of the html files on the specified directory. Can I use this to generate that XML file that I need?
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 3 2007, 01:25 PM
Post #14


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



QUOTE (FirefoxRocks @ Sep 3 2007, 01:38 AM) *
1. I can install PHP on a Windows 98 computer and do it from there, but do I need apache then?
2. Is there a very fast tutorial on compiling this sort of program using Visual Basic 2005 Express?
3. In MS-DOS, I can do a "dir *.html /p" command and it will list all of the html files on the specified directory. Can I use this to generate that XML file that I need?

1. No. You could also use PHP from the command line. Compatibility with Windows 98 however is questionable.
2. Nope. At least I don't know of such.
3. The lists it generates are too verbose. You need only the names. From then on, it would have been easy to place tags around them.
Go to the top of the page
 
+Quote Post
kgun
post Sep 3 2007, 02:11 PM
Post #15


Newbie
*

Group: Members
Posts: 68
Joined: 2-August 07
From: Norway
Member No.: 16,044
Languages: English, Norwegian and some German



XHTML is not part of the XML family. XHTML stands for Exstensible Hypertext Markup Language. You can think of it as a standard for HTML markup tags that follows all well-formedness rules of XML.

Note if you have access to office 2007, the format is OpenXML. I have not tried it myself, but may be you can open the files in Word2007 and save them in Word2007 (XML) format.

Unless, you should use XSLT / XSL-FO as explained above to make the tansformation.

Generally I will also reccomend using XML Schema in stead of DTD or Relax NG since it is richer, though perhaps more difficult to learn.

In addition to the W3Schools tutorials on the XML family of technologies, here is a Norwegian paper explaning some of the members in the XML family. Even if you do not understand Norwegian, the figures and code examles (orange links) may be of value to you.

Related to these technologies, especially XLink, there is an important concept, transclusion that you should be aware of.

"In computer science, transclusion is the inclusion of part of a document into another document by reference. It is a feature of substitution templates."
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 3 2007, 03:01 PM
Post #16


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



QUOTE (kgun @ Sep 3 2007, 05:11 PM) *
XHTML is not part of the XML family. XHTML stands for Exstensible Hypertext Markup Language. You can think of it as a standard for HTML markup tags that follows all well-formedness rules of XML.

Any markup language that can be processed as XML is essentially XML based, and thus could be called part of the XML family.
QUOTE (kgun @ Sep 3 2007, 05:11 PM) *
Note if you have access to office 2007, the format is OpenXML. I have not tried it myself, but may be you can open the files in Word2007 and save them in Word2007 (XML) format.

This has almost nothing to do with the topic at hand. OpenXML still stores a lot of presentational data, increasing a lot the size of files. In addition, processing it is extra hard since it's not pure XHTML to start with and is in an archive you need to extract before processing.
QUOTE (kgun @ Sep 3 2007, 05:11 PM) *
Unless, you should use XSLT / XSL-FO as explained above to make the tansformation.

Using custom formats should only be done for new problems. You should be using standardised vocablularies when available.
Go to the top of the page
 
+Quote Post
kgun
post Sep 3 2007, 03:28 PM
Post #17


Newbie
*

Group: Members
Posts: 68
Joined: 2-August 07
From: Norway
Member No.: 16,044
Languages: English, Norwegian and some German



QUOTE (boen_robot @ Sep 3 2007, 03:01 PM) *
This has almost nothing to do with the topic at hand. OpenXML still stores a lot of presentational data, increasing a lot the size of files. In addition, processing it is extra hard since it's not pure XHTML to start with and is in an archive you need to extract before processing.

Note, the original posts heading was: "Any fast way to do this?, I need to convert XHTML to XML, or "XMLization""

My bolding.

Let us assume that the file can be imported into Office 2007 and saved as an XML document. Did that solve the problem (if it is possible as explained I have not tried since I do not have Office 2007)?

If further further processing is needed, let us call the saved file test.xml.

Note that XPath (at least version 1.0) can not select an arbitrary string that crosses several nodes, but XPointer can.

Then

$test=simplexml_load_file(test.xml);

loads the file / document into memory (XMLReader is a faster PHP stream parser).

Example:

This

(string)$test->xpointer(string-range(//*,'tagname'));

will grab every element node with name, tagname as a string that can be manipulated by using, PHP's string functions like stripos, PHP's XML parsers like SimpleXML or the more advanced DOM API. May be that is explained already. Then I am sorry for repeating.

Hopefully the fast solution is that FirefoxRocks (sidenote, have you downloaded the last version of Opera and hit view + style and what about browser security and mobility?) has access to Offiece 2007 and that solves the problem as explained above.
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 3 2007, 03:57 PM
Post #18


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



QUOTE (kgun @ Sep 3 2007, 06:28 PM) *
Note, the original posts heading was: "Any fast way to do this?, I need to convert XHTML to XML, or "XMLization""

My bolding.

Let us assume that the file can be imported into Office 2007 and saved as an XML document. Did that solve the problem (if it is possible as explained I have not tried since I do not have Office 2007)?

Right. Remember that there are many files to be processed, not a single one. What you propose would have sounded good if the file was *.doc and was a single one. There is no need for any convertion when it's XHTML. Again, XHTML is an XML vocablulary.
QUOTE (kgun @ Sep 3 2007, 06:28 PM) *
If further further processing is needed, let us call the saved file test.xml.

Note that XPath (at least version 1.0) can not select an arbitrary string that crosses several nodes, but XPointer can.

Then

$test=simplexml_load_file(test.xml);

loads the file / document into memory (XMLReader is a faster PHP stream parser).

Example:

This

(string)$test->xpointer(string-range(//*,'tagname'));

will grab every element node with name, tagname as a string that can be manipulated by using, PHP's string functions like stripos, PHP's XML parsers like SimpleXML or the more advanced DOM API. May be that is explained already. Then I am sorry for repeating.

XPath 1.0 can't do THAT? Says who? I mean... in SimpleXML maybe, but I know it's possible in DOM like so:
CODE
$xpath = new DOMXPath(DOMDocument::load('test.xml'));
$xpath->query('//*[.="tagname"]');

DOMXPath::query() returns a DOMNodeList containing all matching nodes. You can then use DOMNodeList::item() to fetch whichever one you desire and then use DOMElement::nodeValue to get the text and manipulate it with PHP's functions.
QUOTE (kgun @ Sep 3 2007, 06:28 PM) *
Hopefully the fast solution is that (s)he has access to Offiece 2007 and that solves the problem as explained above.

The Windows 98 thing above means Office 2007 is not a viable solution, as Office 2007 runs only on Windows XP and Vista.
Go to the top of the page
 
+Quote Post
kgun
post Sep 3 2007, 04:07 PM
Post #19


Newbie
*

Group: Members
Posts: 68
Joined: 2-August 07
From: Norway
Member No.: 16,044
Languages: English, Norwegian and some German



QUOTE (kgun @ Sep 3 2007, 03:28 PM) *
Note that XPath (at least version 1.0) can not select an arbitrary string that crosses several nodes, but XPointer can.


My bolding (that is before any PHP parsing. Make it simple, as simple as possible but no simpler).

QUOTE (boen_robot @ Sep 3 2007, 03:57 PM) *
XPath 1.0 can't do THAT? Says who?


An Australian professor and an It Dr from Switzerland page 4.

They have written a related book, XPath, XLink, XPointer, and XML A Practical Guide to Web Hyperlinking and Transclusion

I agree with you, that that is not a proof though, even if they ought to know what they are writing about.
Go to the top of the page
 
+Quote Post
boen_robot
post Sep 3 2007, 09:53 PM
Post #20


XSLT senior
******

Group: Moderator
Posts: 5,186
Joined: 2-October 05
From: europe://Bulgaria/Plovdiv
Member No.: 70
Languages: (X)HTML, CSS, XML, XSLT, Schema, PHP, JavaScript (a little), other XML based...



QUOTE (kgun @ Sep 3 2007, 07:07 PM) *
My bolding (that is before any PHP parsing. Make it simple, as simple as possible but no simpler).
An Australian professor and an It Dr from Switzerland page 4.

They have written a related book, XPath, XLink, XPointer, and XML A Practical Guide to Web Hyperlinking and Transclusion

I agree with you, that that is not a proof though, even if they ought to know what they are writing about.

Ahh.... Now I understand what they really mean. All nodes that contain a certain string, not that have exactly that contents. Well, again however, this is possible in pure XPath 1.0:
CODE
//para[contains(.,'tagname')]

replacing the query above will have the same effect as the XPointer you (and they) show. The first expression I show won't cut it, but this one will.
Go to the top of the page
 
+Quote Post

3 Pages V   1 2 3 >
Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



RSS Lo-Fi Version Time is now: 9th February 2010 - 11:17 PM