Jump to content

How important is schema?


aspnetguy

Recommended Posts

I have these filestest.xml

<?xml version="1.0"?><root  xmlns="testSchema"  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"  xsi:schemaLocation="testSchema test.xsd">  <item>	<subitem></subitem>  </item></root>

test.xsd

<?xml version="1.0"?><xs:schema   xmlns:xs="http://www.w3.org/2001/XMLSchema"   targetNamespace="testSchema"  xmlns="testSchema"  elementFormDefault="qualified"><xs:element name="root">	<xs:complexType>	  <xs:sequence>	<xs:element name="item" type="xs:string">		  <xs:complexType>			<xs:sequence>			  <xs:element name="subitem" type="xs:string"/>			</xs:sequence>		  </xs:complexType>		</xs:element>	  </xs:sequence>	</xs:complexType></xs:element></xs:schema>

How do I check to see if the xml document is following the schema?Did I even create and implement the schema correctly?In the W3Schools tutorial they used http://www.w3schools.com as the schema namespace. Does the name space have to be a url?

Link to comment
Share on other sites

A namespace should be a URL I think. Techinally speaking, it's only a trigger for a special behavior or a lack of it, so it shouldn't be a matter of a valid URL or anything.In order to check if a document is valid against a Schema you need to execute that validation with a (server/client side) script. It's a responsibility of the validator itself to provide the different error messages, but it's up to you if you'll show them or not, alter those messages as they come, do something on success or on failure, etc.How important is it... hm... currently, it's not important at all if your application will be the only one using the data. If other applications will be using the application with their own XMLs though, Schema is the best way to check if your application will handle the other party's documents or not.In the near future, when XSLT 2.0 comes, Schema will also be one way to tell the XSLT processor what data type to expect on each node and react properly on it. For example, if you define the element <date> with a contents of type xs:date, you'll enable the XSLT processor to perform date comparrisons with this element. Even without a schema, XSLT 2.0 could convert an ordinary string to a date by the usage of the xs:date($XPath-expression-to-the-date-element) function.The most important usage of Schema remains validation of input that doesn't come from yours truly.

Link to comment
Share on other sites

  • 1 month later...
How important is schema?
Well, I'll start by defining a schema, and its purpose (or better yet, let w3cs do it):
An XML Schema: * defines elements that can appear in a document * defines attributes that can appear in a document * defines which elements are child elements * defines the order of child elements * defines the number of child elements * defines whether an element is empty or can include text * defines data types for elements and attributes * defines default and fixed values for elements and attributes
So basically, a schema defines the structure of a document. You must therefore consider the question: When is this objective relevant?One common answer is that is is most likely to be relevant when two or more entities (programs) are likely to access the same data source. In that case, and especially in the case that one or more of the entities is not under your control, it would be of great benefit to have something to tell both programs how to structure the data, just like SQL syntax does, Access DB syntax does, TCP/IP does, etc. Hence the invention of the DTD, and proceeding it, the Schema.Put simply, a schema is needed when the data is likely to be accessed by anything outside of your control. If the XML file you're storing is simply a set of basic program preferences, you likely do not need a schema (I tend to make one anyways, since they are almost always very simple, and Visual Studio's auto generated onces are usually good enough in that case). However, if your XML file is attempting to store data of consequence, such as a list of clients, you would greatly benefit from creating a schema, since your databases would then be more readable and more maintainable due to this schema. It would be easier to write future versions, or integrate other products (on the flip side, it is easier to make compatible competitive products-perhaps you might store your schema inside your program assembly, so as not to make it quite as public-though then again, XML was never the format of choice for storing sensitive data or detailed proprietary formats-note that while there are zillions of proprietary formats, almost none are based on XML. Even the new Office 2007 XML document formats are opensource <insert M$ + opensource joke here>).In addition, by the rules above, any online format requires a schema, since by its nature, your client is out of your control (your client is usually a browser). At the moment, I'm not sure if any browsers are truly capable of using name spaces and getting useful schemas for validation from RDDL resources, but I think at least at some point they will be.///////////////////////////////////////////////////////////////////////////////////////A name space DOES NOT have to be a URL...actually it has to be a URI*, which can technically be most anything, including, most often, a URL. Further confusing the matter, is that while the name space string MUST be of URI form, it is NOT actually considered to be a URI. Why this is, I don't understand, but it is not likely relevant anyways.If your schema is a URL, the question then arises, what should be at the end of that URL. Well, the W3C didn't really think about that one when creating XML namespaces...and so, recently, they have created a standard called RDDL, which is based off of XHTML. RDDL is what they say you should put at the end of a namespace URL, at least right now. RDDL is used to specify locatable URIs for various resources, listed by their types (one of which being an actual schema file). As mentioned above, I'm not sure if any browsers currently do anything intelligent with RDDL files, but, for lack of anything better, they are probably fine to use for now.Keep in mind, however, that URL is not the only option (this confuses many, including myself, until fairly recently). For example, the string, "urn:isbn:0553294385" represents a URI, which in this case, is referenced to a URN**, instead of a URL. An XML file such as:
<?xml version="1.0"?><root xmlns="urn:isbn:0553294385">  <item>	<subitem></subitem>  </item></root>

Is a document whose namespace, if absolutely resolved, turns out to refer to a specific version of I, Robot, by Isaac Asimov, published by Bantam Spectra in paperback form. This, of course, turns out to be not particularly useful, as it is unlikely that books would require an XML namespace; but it is perfectly legal nonetheless.///////////////////////////////////////////////////////////////////////////////////////As for validation: the common way to do it is to load it into a DOM (PHP(5) has one, Java has one, and .Net has one) then load its schema and check it that way. However, I assume by your question you don't mean programatic validation but surface validation, similar to the W3C validator for HTML. After a quick search, a site located at http://www.xmlvalidation.com/ appears to have one, but you will have to specify the XML schema location in your document (I forget how, but the w3cs tutorial covers it I believe) and enable external validation on their form.However, from your code:

<xs:element name="item" type="xs:string">  <!--ERROR: you just defined the element item above to have the type xs:string. Now you are giving it-->  <!--a new complex type. Elements can have only one type.-->  <xs:complexType>	<xs:sequence>	  <xs:element name="subitem" type="xs:string"/>	</xs:sequence>  </xs:complexType></xs:element>

Fix is therefore:

<xs:element name="item">  <xs:complexType>	<xs:sequence>	  <xs:element name="subitem" type="xs:string"/>	</xs:sequence>  </xs:complexType></xs:element>

In addition, you specified that your schema applies to nodes of namespace 'testSchema' and that they should be qualified (prefixed by a namespace identifier):

<xs:schema  xmlns:xs="http://www.w3.org/2001/XMLSchema"  targetNamespace="testSchema" <!--Specifies that your contents should be of NS 'testSchema'-->  xmlns="testSchema" <!--Means nothing, since no nodes in the document are in the default NS-->  elementFormDefault="qualified" <!--Specifies qualified elements-->>

In fact, this schema, while perfectly valid, will not validate anything inside your XML file (remember, an XML file can be validated in whole or in part by any number of schemas greater than zero, usually set up so that each schema validates parts of it. This is part of the whole idea of XML inheritance), since nothing in your file has a qualified namespace of 'testSchema'. To fix this, you might try:

<?xml version="1.0"?><ts:root  xmlns:ts="testSchema" <!--Now a qualified namespace-->  xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" <!--Still talks about schema validation-->  ts:schemaLocation="test.xsd" <!--Changed to refer to the namespace prefix ts.-->>  <ts:item> <!--Notice all of the elements are of qualified namespace 'testSchema', as specified by the schema'-->	<ts:subitem></ts:subitem>  </ts:item></ts:root>

If I get anything wrong, let me know, as I'm certainly no stranger to Schema syntax errors....//////////////////////////////////////////////////////////////////////////////////////////////Footnotes:*In XML 1.1, namespaces can be either IRIs OR URIs now. As to what an IRI is, don't ask me, because I do not know, and it is unlikely to be relevant even if I did.**A URN is a naming indicator that can be of many forms. URN's are designed to reference a specific item, where as URLs are intended to reference the address of an instance of an item. For example, the URI "urn:isbn:0-553-29438-5" talks about a version of I, Robot, but tells us nothing about where to find it. In contrast, the URI "http://www.w3cschools.com/" talks about a location on the http network, in this case, the www machine at the domain w3cschools.com. It suggests (but does not declare) that the content found at this location is of HTML form (since that is what HTTP is supposed to be for). It does not, however, describe anything about that resource (in contrast, knowing the syntax of ISBN, we deduce that that the content at the prior URI is of book or similar form, is printed in the US, is of Publisher 553, which by reference in an ISBN database turns out to be Bantam Spectra (and the address of their HQ is given), and Title 29438, which by the same database resolves to I, Robot). A URI can be a URN, a URL, or both, depending on the scheme (often mistakenly called a protocol) used in it.//////////////////////////////////////////////////////////////////////////////////////////And so ends my first post on W3C Schools forum. Hope I might be of help, ask again if not....//Mattps-I do apologize for the length of the post...I have somewhat of an obsessive tendency towards completeness when describing standards, and much of this post is likely to be detailed descriptions of various different aspects of XML standards. However, much of the completeness is in response to common questions I am often asked when describing the standards, and is intended to answer them without me having to reply. Not that answering questions is at all bad, nor is asking them, but the reduction in turn around time is usually positive, since by writing in the completeness, I often save at least one question-response cycle, which means that the reader can now get to coding more quickly, which is hopefully the goal (that's my goal when reading tutorials or forums...). If anyone objects let me know and I'll try to cut it down next time....

Link to comment
Share on other sites

I like the explanations you made and certanly don't protest againt them :) .I especially enjoyed the URI/URL/URN/IRI explanations. I wan't sure of the difference myself. The only thing I knew was that URL is a subset of URI, but didn't realized what difference does this make.Do try to shorten yourself though. I'm interested in all this, so I read it without any problems and stops for thinking. For people who are just starting with Schema, the way aspnetguy is (or was?) you have to keep it really simple, so you don't bore the reader with too much info or (as it more often happens) overwhelm it with too much info to grasp and think about.

Link to comment
Share on other sites

I like the explanations you made and certanly don't protest againt them.I especially enjoyed the URI/URL/URN/IRI explanations. I wan't sure of the difference myself. The only thing I knew was that URL is a subset of URI, but didn't realized what difference does this make.
I get most of my information on this particular subject from Wikipedia, with some coming as well from the w3c documents (which I try to avoid reading where possible, since they are very hard to read, but are always fully informative). If you're interested in the subject, the WP article on URI is a very good place to start. They also cover IRI, which I did not.....
Do try to shorten yourself though. I'm interested in all this, so I read it without any problems and stops for thinking. For people who are just starting with Schema, the way aspnetguy is (or was?) you have to keep it really simple, so you don't bore the reader with too much info or (as it more often happens) overwhelm it with too much info to grasp and think about.
Thanks, I'll keep that in mind (hopefully my explanations will turn out at least slightly better than the w3c documents).//Matt
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
×
×
  • Create New...