WEEK 3: Document Type Definition

 

Reading Assignments
BOOK PAGES
XML XML in easy steps CH 3

LINKS TO XML RELATED SITES

  1. XML.COM
  2. MICROSOFT'S IE6 DEVELOPER SITE
  3. MSDN'S XML DEVELOPER SITE
  4. IBM's XML WEBSITE
  5. IBM'S ALPHAWORKS WEBSITE
  6. W3 Schools XML Tutorial
  7. Kickstart XML Tutorial

YOUR FIRST DTD

This is where the power of XML, as a method of self describing and validating data, becomes more obvious. In building DTDs, and later schemas, you will create a "schema" for the data and data types in your project files. Creating and validating with DTDs is, more or less, straightforward. We will use a couple different validating sites to determine if your files are "valid". While XSD has been approved by the W3C, many authors use different approaches to creating these schemas. DTDs are still used by business and technology as standards for ensuring that XML documents are created with consistency. This section normally takes more than a week to cover. Online students are encouraged to stay up with reading, and have have a good DTD (please see the assignment two project progress description).

  1. Your first DTD DTDs are used to describe the document to the human eye and to parsers as well. DTDs begin by declaring the document type, "DOCTYPE". The Document Type Declaration is often confused with Document Type Definition. The document type declaration is a part of Document Type Definition and it requires that the DOCTYPE match the root element of an XML document. In our case, our DOCTYPE is "address_book" and the root tag is <address_book>. The parser needs to know that the DTD matches the XML document, and it has to do that by matching the root element.
  2. Declaring Elements After declaring the document type, we need to tell the parser which elements are in our documents. The first element in our document is the root element itself, address_book. We have also stated that the address_book tag will contain parsed character data - PCDATA. We will later add elements to the DTD that are enclosed within address_book. Please note that you need to view the source of this example in order to fully view the DTD (the DTD is "internal"; it is part of the XML document).
  3. Adding Elements We will now add more elements to our document. Our address_book will contain a child tag called "record" which contains the name, address, and contact. Each one of those elements will have child elements, and some of those will have attributes. Please note, that you need to view the source of this example in order to fully view the DTD.
  4. More Elements to come The next element we like to add to our address book would be a record number. "record" is going to be a child of the root tag, address_book. Please look inside the element declaration for the record tag and observe how we have signified the name of the tags that record accepts. One thing we have to now is that "," has a bit more significance than just a tag name delimiter. There are two ways of separating or delimiting the element name "," and "|". "," forces the sequence by which elements can occur. In our example the tag "address" must come after the tag "name". Please note, that you need to view the source of this example in order to fully view the DTD.
  5. Choice of Elements - This is same as the example above, instead we are utilizing the "|" delimiter to indicate that if one element is present, the other element(s) cannot be present. In general, use choice to force one attribute or another, but not one element or another. This gets into the whole argument of consistency from one record to another record, verses one document compared to another document.
  6. ANY This is one of those options that is practically absurd. You can use it, but it means that 'anything goes'. So why bother? You still have to define each an every element that appears in your document, but they can appear in any order and frequency, or not appear at all.

?+*, ELEMENT CONTENT MODEL

  1. ?, OPTIONAL Among the information that you would want to give to human readers as well as parsers is the frequency of which a tag can appear. One the useful pieces of information which we would like to tell, is if a tag is optional or not. To signify this we use the "?" character. In our DTD if we follow the element name "comments" with the "?". We are expressing that the comments element is optional. In this case we are also stating that the appearance order of "name", "number", and "hints" is not important and also "hints" is an optional element. Please note, that you need to view the source of this example in order to fully view the DTD.
  2. +, REQUIRED AND MORE The "+" sign indicates that the tag is used at least one time or more. In this case "record" of the address_book is required, and there might be more than one record. Please note, that you need to view the source of this example in order to fully view the DTD.
  3. *, ZERO AND MORE The "*" sign indicates that the tag is used zero or more times. The "*", is rather similar to "?" with one exception. "?", indicates zero or one occurrence, whereas "*", indicates zero or more occurrences. In this case we expect zero or more "comments". Please note, that you need to view the source of this example in order to fully view the DTD.

CHILDREN TAGS

  1. "record" Our address_book will need to have series of records. For every "record" tag will have several tag inside of it. As you can see, we have told the parser that we do not care for the order of "name", "address", "contact", and "comments". But "comments" is required to come after them and there should at least be one "record". Just for your information the above content model is not done well and it would make it very difficult for the parser to validate this document. A better content model would have been: "(name, address, contact, and comments)". Please note, that you need to view the source of this and the three examples below in order to fully view the DTD.
  2. Children of name The element "name" will contain several elements. For now each "name" will have a "last_name", "first_name", "middle_name" and "nick_name". Using "?" next to "nick_name" allows us to delete it from our content model.
  3. Children of address The element "address" will also contain several elements. These include "street", "street_detail", "city", "state", and "zipcode". For now each address will have a "street_detail", which can be used for an apartment number or PO box.
  4. Children of contact The element "contact" will also contain several elements. These include "home_phone", "work_phone", "cell_phone", "fax_number", and "email_address". Please note that in a DTD you might add "pager" to the content model, but most of us have thrown ours away :)).
  5. Children of comments The element "comments" will contain just one child element, "misc_comments", but here is where you might add birthday, directions to their house, or other important information.
  6. External DTDs You have begun to see that the DTD can grow and expand into a lengthy document. This factor and more importantly the ability of sharing DTDs is the reason for externalizing the DTD of your document. Please note, that you need to view the source of this example in order to fully view the DTD, as well as viewing the address_book.dtd document.

ATTRIBUTES

  1. record attribute As you might have already realized we need to have a method for specifying the attribute for the element "record". We are also telling the parser not to parse the data assigned to the attribute by declaring it as "CDATA". Lastly it might be better to indicate that this attribute is required in the statement "#REQUIRED". You need to view the source of this example in order to fully view the DTD.

ENTITIES

  1. Copy Right You are familiar with built in entities like "&lt;" for "<". However in XML you can create your own entities as well. In this example we declared an entity named "copyright" and then used it right below the "address_book" tag. The end result is the substitution of the entity © for &copyright;. Please note, that you need to view the source of this example in order to fully view the DTD with the entity declaration for ©.
  2. ENTITIES for short cuts - I put together a small file that shows how entities can be used for inserting text, (my name as from a set of initials &rdc;) and inserting an entire paragraph (rdc.txt from rdc_txt;). See the file entity ex1.xml and rdc.txt. This can be a useful trick to use.

MIXED TEXT

  1. One of the more clever uses of XML is to provide "meta information at a granular level" to unstructured text. This is the major problem of adding contextual information to a text after it has been written, or the general problem of getting text information 'future proofed. Take a look at story.xml, story_dtd.xml, story.dtd, (and looking ahead to XML Schema) to story_xsd.xml, and story.xsd. Pay particular attention to the use of elements to define text within a text block. This is also known as 'mixed' but not in the sense that we use in the 'mixed model.

ASSERTION NETWORKS

  1. You can create an 'assertion network' by linking attributes using the ID, IDREF, and IDREFS functions in DTDs. Look closely at recipe.xml, recipe_dtd.xml, and recipe.dtd . These files link the ingredients and steps in the recipe.xml file, and the DTD creates the validation against the assertion network. I will develop this idea more, but wanted to post these files for your use and awareness of this powerful function.

EXAMPLES

HOMEWORK

For homework, create a DTD for the nested and empty XML document from your last tutorial. Please use a client validator (EditML Pro of XML Spy) to check your work. Examples of the nested, empty, and mixed file types are linked below as examples.

Please email me the linking files (filename_dtd.xml) and DTD files (filename.dtd) as attachments. Yahoo users, send multiple messages rather than zipping (two sets of two, three sets of two, or three sets of three, etc.)