XML Training Services

SYS-ED Experience

Submit XML Questions


Submit XML Questions

Technology Driven IT Training

Knowledge Base

Submit XML Questions to SYS-ED Advanced search

Interrelated Information Technology

Microsoft Operating Systems and Networks Questions Java Questions
UNIX and Linux Questions Mainframe Questions

The SYS-ED knowledge base is a service for answering questions, inclusive of the research and validation of the accuracy of information in the public domain. Citation of source documentation and examples are used to provide answers to the questions. Utilization and reliance on the answers, information, or other materials received through this website is done at your own risk.


I have been informed that there are alternatives to DTDs. What is a schema?


The W3C XML Schema recommendation provides a means of specifying formal data typing and validation of element content in terms of datatypes, so that document type designers can provide criteria for checking the data content of elements as well as the markup itself. Schemas are written in XML Document Syntax, like XML documents are, avoiding the need for processing software to be able to read XML Declaration Syntax (used for DTDs).

The term vocabulary is sometimes used to refer to DTDs and Schemas together. Schemas are aimed at e-commerce, data control, and database-style applications where character data content requires validation and where stricter data control is needed than is possible with DTDs; or where strong data typing is required. They are usually unnecessary for traditional text document publishing applications.

Unlike DTDs, Schemas cannot be specified in an XML Document Type Declaration. They can be specified in a Namespace, where Schema-aware software should pick it up, but this is optional.


What's a XML namespace?


A namespace is a collection of element and attribute names identified by a Uniform Resource Identifier reference. The reference may appear in the root element as a value of the xmlns attribute.

For example, the namespace reference for an XML document with a root element x might appear like this:

     <x xmlns="http://www.company.com/company-schema">

More than one namespace may appear in a single XML document, to allow a name to be used more than once. Each reference can declare a prefix to be used by each name, so the previous example might appear as:

     <x xmlns:spc="http://www.company.com/company-schema">

which would nominate the namespace for the ‘spc’ prefix:

     <spc:name>Mr. Big</spc:name>


What's my information? DATA or TEXT?

A Some important distinctions exist between the major classes of XML applications and the way in which they are used:

Two classes of applications usually are referred to as ‘document’ and ‘data’ applications, and this is reflected in the software, which is usually (but not always) aimed at one class or the other.

Document-style applications These are in the nature of traditional publishers' work: text and images in a structured environment, with fonts and formatting; this includes Web pages as well as material destined for print like books and magazines.
Data-style applications These are found mostly in e-commerce and process or application control, with XML being used as a container for information being stored or passed between systems, usually unformatted and unseen by humans.

There is a third major area, web development, whose requirements are often hybrid, and span the features of both document and data applications because they contain partly static descriptive text and partly dynamic data.

While in theory it would be possible to use data-class software to write a novel, or document-class software to create invoices, it would probably be severely suboptimal. Because of the nature of the information used by the two classes, data-class applications tend to use schemas, and document-class applications tend to use DTDs, but there is a considerable degree of overlap.


Can XML use non-Latin characters?


Yes, the XML Specification explicitly requires that XML use ISO 10646, the international standard character repertoire which covers most known languages.

Unicode is an identical approach and the two standards correspond with each other.

The specification states:

  • All XML processors must accept the UTF-8 and UTF-16 encodings of ISO 10646.
  • The encoding specification can refer to any character set that the software supports, but the XML Specification only requires that applications support UTF-8 and UTF-16.

Common encodings supported by software include:

US-ASCII Characters codes TAB, LF, CR, space, and the printable characters 33 to 126 (decimal) only.
All other control characters are not permitted by XML.
ISO-8859-1 Western European Latin-1; this includes ASCII plus codes 128 to 255 (decimal). It covers most western European accented letters.
ISO-8859-2 to 15 These other planes of ISO-8859 cover different sets of Latin-based alphabetic and other symbols.

Q What is a CDATA section in XML?

CDATA stands for Character DATA. CDATA sections provides the ability to escape blocks of text containing mark up.

CDATA sections take the general form:

   <![CDATA[....put text containg markup here...]]>



For example, in order to print out the following line of text:

"The left angled bracket '<' and the ampersand '&' must be replaced by their entities &lt; and &amp; respectively".

In HTML, the left angled bracket '&lt;' and the ampersand '&amp;' must be replaced by their entities &amp;lt; and &amp;amp; respectively".

By escaping the text using CDATA, this can be simplified to:

    <![CDATA["The left angled bracket '<'

and the ampersand '&' must be replaced by their entities &lt; and &amp; respectively".]]>


What are the disadvantages of XML in terms of size and performance?


Despite the advantages, XML does sometimes cause a significant increase in data size and processing time. These disadvantages are the result of design decisions and tradeoffs made by XML's original designers. For example, to make XML fully internationalized, the designers chose to require Unicode support, which can increase the memory required for processing and storing information from XML documents. The designers also chose the robustness of redundant labels in start and end tags, increasing the amount of space XML requires in disk storage or the amount of bandwidth for moving it over a network. The most serious performance risk, however, is one that people do not often worry about: XML's ability to include external resources.

XML repeats every element and attribute name for every element and attribute instance: In fact, it repeats the element name twice for every instance. If a long XML document contains 20,000 nonempty elements named maintenance-entry, the string maintenance-entry will appear in the document 40,000 times, consuming between 680,000 and 2,720,000 bytes of storage space, depending on the character encoding.

For loosely structured XML, such as human-readable documents, this overhead is often not a problem, but for highly structured XML, such as a database dump, these repeated names represent a significant overhead. There is a temptation to use short, cryptic element and attribute names, such as c183, instead of workflow-approval, destroying XML's advantage of transparency. There is also a temptation to reduce the amount of tagging, using whitespace and line ends to delimit some fields. These solutions are not particularly good, but they do show the desperation people face when dealing with enormous XML data files.


What are the return values of XPaths Expressions?


They return a value:

Location expression Returns a node or node set.
Boolean expression Returns true or false.
Numeric expression Returns a number.
String expression Returns a unicode string.


What is XML? Does my company need to modify our data and applications to it?


XML is an open standard; it is not defined or proprietary to any one company such as Windows by Microsoft, WebSphere by IBM, or the Oracle database engine by Oracle Corporation. The fundamental objective of the XML standard is to enable generic SGML to be served, received, and processed on the Web in the same way that HTML currently is utilized. SGML is the Standard Generalized Markup Language. It is the international standard for defining descriptions of the structure of different types of electronic document. The XML specification can be downloaded.

XML provides this functionality:

  1. Makes it easy to use SGML on the web.
  2. Makes it easy to define document types.
  3. Makes it easy to author and manage SGML-defined documents.
  4. Allows for SGML-defined documents to be transmitted and shared across the Web.

Much of the data on the web will be XML.


I work for a small company and we are in the process of building our e-commerce web site. Should I use XML instead of HTML? And needless to say, I don't have a huge budget. What do you suggest for XML training.


When designing web content use XML.

At the most fundamental level; XML is a format for storing data; while HTML is a means for displaying data.

XML is superior to HTML for these reasons:

  • XML allows authors and providers to design their own document markup.
  • With XML document types can be tailored to an application.
  • With XML information content can be richer; this is because the descriptive and hypertext linking abilities of XML are much greater than those available in HTML.
  • XML provides better facilities for browser presentation and performance, using CSS and XSLT stylesheets.
  • XML is more flexible than HTML and can be used by any XML software.

FYI, our HTML courses provide a demonstration and overview of XML. And our XML courses teach how to convert and retrofit HTML to XML.


Are any parts of an XML document case-sensitive?


All of an XML document is case-sensitive; both markup and text. This is significantly different from HTML and most other SGML applications.

Q What is a DTD?
A DTD stands for Document Type Definition. It is a formal description in XML Declaration Syntax of a particular type of document. It defines what names are to be used for the different types of elements, where they may occur, and how they all fit together.


Is it possible to get XML data into or out of a database and how hard is to implement?

A Most the database manufacturers provide XML import and export modules.

Q What's a namespace?

A namespace is a collection of element and attribute names identified by a Uniform Resource Identifier reference.


I have both heard and read about "well-formed" documents? What is it?


A well-formed XML document is easy for a program to read, and ready for network delivery.

These attributes are typical of a well-formed document:

  • All the begin-tags and end-tags match up.
  • Empty tags use the special XML syntax.
  • All the attribute values are nicely quoted.
  • All the entities are declared.