WEEK 3: Document Type
Definition
|
|
| Reading Assignments
|
| BOOK |
PAGES |
| XML XML in easy
steps |
CH 3 |
|
|
|
|
LINKS TO XML RELATED SITES
- XML.COM
- MICROSOFT'S
IE6 DEVELOPER SITE
- MSDN'S
XML DEVELOPER SITE
- IBM's
XML WEBSITE
- IBM'S
ALPHAWORKS WEBSITE
- W3
Schools XML Tutorial
- Kickstart
XML Tutorial
YOUR FIRST DTD
This is where the power of XML, as
a method of self describing and validating data, becomes more obvious. In building
DTDs, and later schemas, you will create a "schema" for the data and
data types in your project files. Creating and validating with DTDs is, more
or less, straightforward. We will use a couple different validating sites to
determine if your files are "valid". While XSD has been approved by
the W3C, many authors use different approaches to creating these schemas. DTDs
are still used by business and technology as standards for ensuring that XML
documents are created with consistency. This section normally takes more than
a week to cover. Online students are encouraged to stay up with reading, and
have have a good DTD (please see the assignment
two project progress description).
- Your first
DTD DTDs are used to describe the document to the human eye and to parsers
as well. DTDs begin by declaring the document type, "DOCTYPE". The
Document Type Declaration is often confused with Document Type Definition.
The document type declaration is a part of Document Type Definition and it
requires that the DOCTYPE match the root element of an XML document. In our
case, our DOCTYPE is "address_book" and the root tag is <address_book>.
The parser needs to know that the DTD matches the XML document, and it has
to do that by matching the root element.
- Declaring Elements
After declaring the document type, we need to tell the parser which elements
are in our documents. The first element in our document is the root element
itself, address_book. We have also stated that the address_book tag will contain
parsed character data - PCDATA. We will later add elements to the DTD that
are enclosed within address_book. Please note that you need to view the source
of this example in order to fully view the DTD (the DTD is "internal";
it is part of the XML document).
- Adding Elements
We will now add more elements to our document. Our address_book will contain
a child tag called "record" which contains the name, address, and
contact. Each one of those elements will have child elements, and some of
those will have attributes. Please note, that you need to view the source
of this example in order to fully view the DTD.
- More Elements
to come The next element we like to add to our address book would be a
record number. "record" is going to be a child of the root tag,
address_book. Please look inside the element declaration for the record tag
and observe how we have signified the name of the tags that record accepts.
One thing we have to now is that "," has a bit more significance
than just a tag name delimiter. There are two ways of separating or delimiting
the element name "," and "|". "," forces the
sequence by which elements can occur. In our example the tag "address"
must come after the tag "name". Please note, that you need to view
the source of this example in order to fully view the DTD.
- Choice of
Elements - This is same as the example above, instead we are utilizing
the "|" delimiter to indicate that if one element is present, the
other element(s) cannot be present. In general, use choice to force one attribute
or another, but not one element or another. This gets into the whole argument
of consistency from one record to another record, verses one document compared
to another document.
- ANY This
is one of those options that is practically absurd. You can use it, but it
means that 'anything goes'. So why bother? You still have to define each an
every element that appears in your document, but they can appear in any order
and frequency, or not appear at all.
?+*, ELEMENT CONTENT MODEL
- ?, OPTIONAL
Among the information that you would want to give to human readers as well
as parsers is the frequency of which a tag can appear. One the useful pieces
of information which we would like to tell, is if a tag is optional or not.
To signify this we use the "?" character. In our DTD if we follow
the element name "comments" with the "?". We are expressing
that the comments element is optional. In this case we are also stating that
the appearance order of "name", "number", and "hints"
is not important and also "hints" is an optional element. Please
note, that you need to view the source of this example in order to fully view
the DTD.
- +, REQUIRED
AND MORE The "+" sign indicates that the tag is used at least
one time or more. In this case "record" of the address_book is required,
and there might be more than one record. Please note, that you need to view
the source of this example in order to fully view the DTD.
- *, ZERO AND
MORE The "*" sign indicates that the tag is used zero or more
times. The "*", is rather similar to "?" with one exception.
"?", indicates zero or one occurrence, whereas "*", indicates
zero or more occurrences. In this case we expect zero or more "comments".
Please note, that you need to view the source of this example in order to
fully view the DTD.
CHILDREN TAGS
- "record"
Our address_book will need to have series of records. For every "record"
tag will have several tag inside of it. As you can see, we have told the parser
that we do not care for the order of "name", "address",
"contact", and "comments". But "comments" is
required to come after them and there should at least be one "record".
Just for your information the above content model is not done well and it
would make it very difficult for the parser to validate this document. A better
content model would have been: "(name, address, contact, and comments)".
Please note, that you need to view the source of this and the three examples
below in order to fully view the DTD.
- Children of
name The element "name" will contain several elements. For
now each "name" will have a "last_name", "first_name",
"middle_name" and "nick_name". Using "?" next
to "nick_name" allows us to delete it from our content model.
- Children of
address The element "address" will also contain several elements.
These include "street", "street_detail", "city",
"state", and "zipcode". For now each address will have
a "street_detail", which can be used for an apartment number or
PO box.
- Children of
contact The element "contact" will also contain several elements.
These include "home_phone", "work_phone", "cell_phone",
"fax_number", and "email_address". Please note that in
a DTD you might add "pager" to the content model, but most of us
have thrown ours away :)).
- Children of
comments The element "comments" will contain just one child
element, "misc_comments", but here is where you might add birthday,
directions to their house, or other important information.
- External DTDs
You have begun to see that the DTD can grow and expand into a lengthy
document. This factor and more importantly the ability of sharing DTDs is
the reason for externalizing the DTD of your document. Please note, that you
need to view the source of this example in order to fully view the DTD, as
well as viewing the address_book.dtd document.
ATTRIBUTES
- record attribute
As you might have already realized we need to have a method for specifying
the attribute for the element "record". We are also telling the
parser not to parse the data assigned to the attribute by declaring it as
"CDATA". Lastly it might be better to indicate that this attribute
is required in the statement "#REQUIRED". You need to view the source
of this example in order to fully view the DTD.
ENTITIES
- Copy Right
You are familiar with built in entities like "<" for "<".
However in XML you can create your own entities as well. In this example we
declared an entity named "copyright" and then used it right below
the "address_book" tag. The end result is the substitution of the
entity © for ©right;. Please note, that you need to view the
source of this example in order to fully view the DTD with the entity declaration
for ©.
- ENTITIES for short cuts - I put
together a small file that shows how entities can be used for inserting text,
(my name as from a set of initials &rdc;) and inserting an entire paragraph
(rdc.txt from rdc_txt;). See the file entity ex1.xml
and rdc.txt. This can be a useful trick to use.
MIXED TEXT
- One of the more clever uses of
XML is to provide "meta information at a granular level" to unstructured
text. This is the major problem of adding contextual information to a text
after it has been written, or the general problem of getting text information
'future proofed. Take a look at story.xml, story_dtd.xml,
story.dtd, (and looking ahead to XML Schema) to story_xsd.xml,
and story.xsd. Pay particular attention to the use
of elements to define text within a text block. This is also known as 'mixed'
but not in the sense that we use in the 'mixed model.
ASSERTION NETWORKS
- You can create an 'assertion network'
by linking attributes using the ID, IDREF, and IDREFS functions in DTDs. Look
closely at recipe.xml, recipe_dtd.xml,
and recipe.dtd . These files link the ingredients
and steps in the recipe.xml file, and the DTD creates the validation against
the assertion network. I will develop this idea more, but wanted to post these
files for your use and awareness of this powerful function.
EXAMPLES
HOMEWORK
For homework, create a DTD for the
nested and empty XML document from your last tutorial. Please use a client validator
(EditML Pro of XML Spy) to check your work. Examples of the nested, empty, and
mixed file types are linked below as examples.
Please email
me the linking files (filename_dtd.xml) and DTD files (filename.dtd) as attachments.
Yahoo users, send multiple messages rather than zipping (two sets of two, three
sets of two, or three sets of three, etc.)