>

Head first xml pdf

Date published: 

     

Lesson 2: Your First XML Document .. has a pre-defined "head" and "body" type structure. . First, we'll create a folder to keep all of our XML stuff organized. Let's begin by analyzing the first level of XML: how it contains and manages . it incorrectly, leaving you scratching your head wondering where you went wrong. HTML. The entire page. HEAD. Heading information, such as the page title . example, unlike HTML, every XML element must have both a start-tag and an.

Author: REGENA SAMANIEGO
Language: English, Spanish, Hindi
Country: Indonesia
Genre: Religion
Pages: 358
Published (Last): 23.11.2015
ISBN: 589-2-40998-988-6
PDF File Size: 11.11 MB
Distribution: Free* [*Regsitration Required]
Uploaded by: VELVET

5528 downloads 165390 Views 36.43MB ePub Size Report


„XML is the cure for your data exchange, information integration, data pdf">. A . Elements may have attributes (in the start tag) that have a name and a value, e.g. tion (My favorite reader comment about the first edition was, “It would seem head>.. A Pink Triangle, Listing from the XML Bible. XML stands for Extensible Markup Language and is a text-based markup If the document contains XML declaration, then it strictly needs to be the first.

And all the code used in the book is available to customers in a downloadalbe archive. As always, you can download this excerpt as a PDF if you prefer. Who here has heard of XML? Okay, just about everybody. So, what is XML? A significant portion of the group leans forward eagerly, wanting to learn more.

If you click the minus sign, Internet Explorer will collapse all the child nodes belonging to that node, as shown in Figure 1. Figure 1. Collapsing nodes displaying in Internet Explorer. View larger image. The little plus sign next to the first product node indicates that the node has children. Clicking on the plus sign will expand any nodes under that particular node. In this way, you can easily display the parts of the document on which you want to focus. Now, open your XML document in any text editing tool and scroll down to the cost node of the second product.

Save your work and reload Internet Explorer. You should see an error message that looks like the one pictured in Figure 1. As you can see, Internet Explorer provides a rather verbose explanation of the error it ran into: Furthermore, it provides a nice visual of the offending line, a little arrow pointing to the spot at which the parser thinks the problem arose. Even though the problem is really with the start tag, the arrow points to the end tag.

Because Internet Explorer uses a non-validating parser by default remember, this means it only cares about well-formedness rules , it runs into problems at the end tag. You now have to backtrack to find out why that particular end tag caused such a problem.

Open your XML document in an editor once more, and fix the problem we introduced above. Save your work and reload your browser. You should see an error message similar to the one shown in Figure 1. At first glance, this error message seems a bit more obscure than the previous one.

However, look closely and what do you see? Firefox is a popular open-source browser, and at the time this book went to print the latest version was 1.

You can download a free copy from the Mozilla website. Okay, so both Internet Explorer and Firefox will check your XML for well-formedness, but you need to know for future reference how to check that an XML file is valid i. How do you do that? There are various well-known online validating XML parsers. All you have to do is visit the appropriate page, upload your document, and the parser will validate it.

Here is the most popular online parser. Sometimes, it may be impractical to use a Website to validate your XML because of issues relating to connectivity, privacy, or security. This checks for well-formedness if the document has no DTD, and for well-formedness and validity if a DTD is specified. Results of the validation will appear under the Results area, as illustrated in Figure 1. For most purposes, an online resource will do the job nicely.

If you work in a company that has an established software development group, chances are that one of the XML-savvy developers has already set up a good validating parser. This project will help ground your skills as you obtain firsthand experience with practical XML development techniques, issues, and processes.

It usually consists of the following components:. Before you build any kind of CMS, first you must gather information that defines the basic requirements for the project. The goal of the CMS is to make things easier for those who need to develop and run the site. And making things easier means having to do more homework beforehand! Although you may groan at the thought of this kind of exercise, a set of well-defined requirements can make the project run a lot more smoothly.

What kind of requirements do we need to gather? Essentially, requirements fall into three major categories: In the world of XML, each of these different types of content is, naturally enough, called a document type. You also have to know how each of these content types will break out into its separate components, or metadata.

Each article, for instance, will have various pieces of metadata, such as a headline, author name, and keywords, each of which the CMS needs to track. The final challenge — to define various types of metadata — can be a blessing in disguise.

In my experience, once people grasp the importance of metadata, they race off in every direction and collect every single piece of metadata they can find about a given content type. For example, the client might start to track the date on which an article is first drafted. Gathering metadata can be very tricky. At first glance, we could say that all of our articles should contain elements for author name and email address, and leave it at that.

However, we may later decide that we want site visitors to search or browse articles by author. In this case, it would make more sense to have a centralized list of authors, each with his or her own unique ID. Having a separate author listing would also allow us to easily set bylines for each author, in case someone decided they wanted to publish pieces under a pen name. It would also allow us to track author information across content types. Of course, agreeing on this approach means that we need to do other work later on, such as building administrative interfaces for author listings.

The other two are site functionality and site design. Every piece of metadata could potentially drive some kind of site behavior, but each piece of metadata also must be managed by the administration tools you set up. Site behavior should always be based on and driven by metadata. Typical site behavior for a CMS-powered Website includes browsing by content categories, browsing by author, searching on titles and keywords, dynamic news sidebars, and more.

Additionally, many XML- and database-powered sites feature homepages that boast dynamically updated content, such as Top Ten Downloads, latest news headlines, and so on.

Our CMS will need to have an administrative component for each content type. It will also have to administer pieces of information that have nothing to do with content types, such as which users are authorized to log in to the CMS, and the privileges each of them has. It goes without saying that your administrative interface has to be secure, otherwise, anyone could click to your CMS and start deleting content, making unauthorized changes to existing content, or adding new content that you may not want to have on your site.

A workflow is simply a set of rules that allow you to define who does what, when, and how. For example, your workflow might stipulate that a user with writer privileges may create an article, but that only a production editor can approve that content for publication on the site. In many cases, CMS workflows emulate actual workflows that exist in publication and marketing departments.

We want to publish articles and news stories on our site. We definitely want to keep track of authors and site administrators, and we also want to build a search engine. Whenever I build an XML-powered application, I try to define the content types first, because I find that all the other elements cascade from there. The articles in our CMS will be the mainstay of our site. In addition to the article text, each of our articles will be endowed with the following pieces of metadata:.

Furthermore, because we need to identify each article in our system uniquely with an ID of some sort, it makes sense to add an id attribute to the root element that will contain this value. A unique identifier will ensure that no mistakes occur when we try to edit, delete, or view an existing article. Now, each of our articles will have an author, so we need to reserve a spot for that information.

Our article will need a headline, a short description, a publication date, and some keywords. The keyword listing can be handled in one of two ways. This approach will satisfy the structure nuts out there, but it turns out to be too complicated for the way we will eventually use these keywords. We also need to track status information on the article.

However, you probably already see that status is very similar to keyword listings in that it has the potential to belong to many different content types. As such, it makes sense to centralize this information. As most of our content will be displayed in a Web browser, it makes sense to use as many tags as possible that a browser like IE or Firefox can already understand.

But for the purposes of our article storage system, we want to treat all of the HTML tags and text that make up the document body as a simple text string, rather than having to handle every single HTML tag that could appear in the article body. My goal for that chapter was to show you how flexible XML really is. It is both a style sheet specification and a kind of programming language that allows you to transform an XML document into the format of your choice: XPath is a language for locating and processing nodes in an XML document.

Because each XML document is, by definition, a hierarchical structure, it becomes possible to navigate this structure in a logical, formal way i. A document type definition DTD is a set of rules that governs the order in which your elements can be used, and the kind of information each can contain.

While a DTD can provide only general control over element ordering and containment, schemas are a lot more specific. They can, for example, allow elements to appear only a certain number of times, or require that elements contain specific types of data such as dates and numbers. Both technologies allow you to set rules for the contents of your XML documents.

If you need to share your XML documents with another group, or you must rely on receiving well-formed XML from someone else, these technologies can help ensure that your particular set of rules is properly followed.

The ability of XML to allow you to define your own elements provides flexibility and scope. XML namespaces attempt to keep different semantic usages of the same XML elements separate and unambiguous.

In our example, each person could define their own namespace and then prepend the name of their namespace to specific tags: No one in their right mind could reasonably expect them all to switch to XML overnight.

But we can expect that some of these pages — and a large percentage of the new pages that are being coded as you read this — will make the transition thanks to XHTML.

As you can see, the XML family of technologies is a pretty big group — those XML family reunions are undoubtedly interesting! Although this means that some ideas take quite a while to reach fruition, and tend to be built by committee, it also means that no single vendor is in total control of XML.

And this, as Martha Stewart might say, is a good thing. So, what do you say? Not sure?

Well, put bluntly, the Web has reached a point at which just about anything will fly when it comes to HTML documents. Take a look at the following snippet:. Believe it or not, that snippet will render without a problem in most Web browsers. And so will this:. But, exactly what does this mean?

Use this with CSS to minimize presentational clutter. XML Namespaces were invented to rectify a common problem: Imagine you were running a bookstore and had an inventory file called inventory. A human being could probably figure out that one title has nothing to do with the other, but an application that tried to sort it out would go nuts.

We need to have a way to distinguish between the two different semantic universes in which these identical terms exist. Your inventory file stores information about books on the shelf, but the sales file stores information about books that have been bought by customers.

In either situation, regardless of the chasm that lies between the contexts of these identical terms, we need a way to properly label each context. Namespaces to the rescue! To use and declare a namespace, we must first tie the namespace to a URI.

URIs can take the following forms:. Uniform Resource Locator: Uniform Resource Name: For example, all published books have an ISBN.

However, armed with the ISBN, you could walk into the store, ask an employee to search for you, and they could take you right to the book provided, of course, that it was in stock. We want to use our namespace throughout our XML documents, though, and the last thing we want to do is type out an entire URI every time we need to distinguish one context from another.

So, we define a prefix to represent our namespace to ease the strain on our typing fingers:. The agreed way to do that is to prefix the namespace declaration with xmlns: At this point, we have something useful.

If we needed to, we could add our prefix to appropriate elements to disambiguate I love that term! In most cases, placing your namespace declarations will be rather easy. Please note, however, that namespaces have scope. Namespaces affect the element in which they are declared, as well as all the child elements of that element.

However, if you want to limit your namespace scope to a certain part of a document, feel free to do so — remembering, of course, that this can get pretty tricky. It would become pretty tiresome to have to type a prefix for every single element in a document. Notice the non-prefixed namespace: On the other side of the coin, all XSLT elements must be given the xsl: This document contains a root element letter that contains three other elements to , from , and message , each of which contains text.

When you display your XML document, you should see something similar to Figure 2. Figure 2. As you can see, CSS did a marvelous job of rendering a nicely shaded box around the entire letter, setting fonts, and even displaying things like margins and padding. Strictly speaking, the CSS standard does allow for this sort of thing with the content property, which can produce generated text before and after document elements.

Think of it as a tool that you can use to transform your XML documents into other documents. Here are some of the possibilities:. XSLT is a rules-based, or functional language.

Because XSLT can be a little bewildering even for veteran programmers, the best way to tackle it is to walk through a series of examples. Keeping both these elements simple will give us the opportunity to step through the major concepts involved. They must therefore follow the rules that apply to all XML documents: The version attribute is required.

The xmlns: In our example, we will use an xsl prefix on all the stylesheet-related tags in our XSL documents to associate them with this namespace.

The next element will be the output element, which is used to define the type of output you want from the XSL file. Now we come to the heart of XSLT — the template and apply-templates elements.

Together, these two elements make the transformations happen. Put simply, the XSLT processor for our immediate purposes, the browser starts reading the input document, looking for elements that match any of the template elements in our style sheet.

When one is found, the contents of the corresponding template element tells the processor what to output before continuing its search. Where a template contains an apply-templates element, the XSLT processor will search for XML elements contained within the current element and apply templates associated with them.

First pdf head xml

The first thing we want to do is match the letter element that contains the rest of our document. This is fairly straightforward:. This very simple batch of XSLT simply states: Were the value simply letter , the template would match letter elements throughout the document.

By default, apply-templates will match not only elements, but text and even whitespace between the elements as well. XSLT processors have a set of default, or implicit templates, one of which simply outputs any text or whitespace it encounters. We do this with another XPath expression: Each of these templates matches one of the elements we expect to find inside the letter element: In each case, we output a text label e. The last thing we have to do in the XSL file is close off the stylesheet element that began the file:.

Left this way, the output would look something like this:. From there, you could make the leap to other wild cats, then to house cats and maybe even dogs cats and dogs are both pets, after all.

Needless to say, computers are really bad at this game, which is a shame, as many computing tasks require semantic skill. However, even a cursory glance at the rest of the document reveals some very human errors. This last product listing also displays a price before the description, and the price is italicized instead of appearing in bold.

The computer would be able only to render the document to a browser with the styles associated with each tag. Notice that this new document contains absolutely no information about display. Essentially, XML allows you to separate information from presentation — just one of its many powerful abilities.

In the example above, we know that a product listing contains products, and that each product has a name, a description, a price, and a shipping cost. You could say, rightly, that each XML document is self-describing , and is readable by both humans and software. Now, everyone makes mistakes, and XML programmers are no exception. To ensure that everyone plays by the rules, you need a DTD a document type definition , or schema.

Once you have a DTD in place, anyone who creates product listings for your application will have to follow the rules. I want to examine the contents of a typical XML file, character by character. The simplest XML elements contain an opening tag, a closing tag, and some content. In XML, content is usually parsed character data. Following the content is the closing tag, which exhibits the same spelling and capitalization as your opening tag, but with one tiny change: If you use attributes on any elements, then attribute values must be single- or double-quoted.

No longer can you get by with bare attribute values like you did in HTML! The following is okay in HTML:. Also, if you nest your elements improperly i. In XML, this improper nesting of elements would cause the program reading the document to raise an error.

As XML allows you to create any language you want, the inventors of XML had to institute a special rule, which happens to be closely related to the proper nesting rule. This is called an attribute. You can think of attributes as adjectives — they provide additional information about the element that may not make any sense as content.

What information should be contained in an attribute? What should appear between the tags of an element? Some developers including me! Another common rule of thumb is to consider the length of the data. Potentially large data should be placed inside a tag; shorter data can be placed in an attribute.

In other parts of our DVD listing, the information seems a little bare. One way to do so is with the addition of attributes:. It would be smarter, from an architectural point of view, to have a separate listing of actors with unique IDs to which you could link. Some XML elements are said to be empty — they contain no content whatsoever.

Familiar examples are the img and br elements in HTML. Remember that in XML all opening tags must be matched by a closing tag. For empty elements, you can use a single empty-element tag to replace this:. I mentioned entities earlier. An entity is a handy construct that, at its simplest, allows you to define special characters for insertion into your documents. XML, true to its extensible nature, allows you to create your own entities.

What a time-saver! XML documents are more then just a sequence of elements. This feature, combined with all that content encapsulated in opening and closing tags, takes all XML documents far past the realm of mere data and into the revered halls of information.

Data can comprise a string of characters or numbers, such as But the only way to turn this data into information and therefore make it useful is to add context to it — once you have context, you can be sure about what the data represents.

A Really, Really, Really Good Introduction to XML — SitePoint

When you take into account the second point — that an XML document is really a hierarchy of objects — all sorts of possibilities open up. Remember what we discussed before — that, in an XML document, one element contains all the others? Well, that root element becomes the root of our hierarchical tree. You can think of that tree as a family tree, with the root element having various children in this case, product elements , and each of those having various children name, description, and so on.

In turn, each product element has various siblings other product elements and a parent the root , as shown in Figure 1. Because what we have is a tree, we should be able to travel up and down it, and from side to side, with relative ease. Before, we talked about transforming data into information by adding context. Earlier in this chapter, I made a point about XML allowing you to separate information from presentation.

For example, if you stored your information in a word processing program, it would contain all kinds of information about the way it should appear on the printed page — lots of bolding, font sizes, and tables.

Unfortunately, if that document also had to be posted to the Web as an HTML document, someone would have to convert it either manually or via software , clean it up, and test it. If yet another person wanted to take the same information and use it in a slide presentation, they might run the risk of using outdated information from the HTML version. As you can see, it can get pretty messy! If you made changes to the XML file, the other files would also change automatically once you passed the XML file through the process.

This notion, by the way, is an essential component of single-sourcing — i. As you can see, separating information from presentation makes your XML documents reusable, and can save hassles and headaches in environments in which a lot of information needs to be stored, processed, handled, and exchanged. That means the publisher can generate sample PDFs for its Website, make print-ready files for the printer, and potentially create ebooks in the future.

All formats will be generated from the same source, and all will be created using different style sheets to process the base XML files. One of the most powerful advantages of XML, of course, is that it allows you to define your own language. However, this most powerful feature also exposes a great weakness of XML. If all of us start defining our own languages, we run the risk of being unable to understand anything anyone else says.

A valid document, then, is nothing more then a well-formed document that adheres to its DTD. For the most part, you will only care that your documents are well formed. Well-formedness alone allows you to create ad hoc XML documents that can be generated, added to an application, and tested quickly. The first thing we want to do is to create an XML document.

If you have Internet Explorer 5 or higher installed on your machine, you can view your newly-created XML file. As Figure 1. Notice the little minus signs next to some of the XML nodes? A minus sign in front of a node indicates that the node contains other nodes. If you click the minus sign, Internet Explorer will collapse all the child nodes belonging to that node, as shown in Figure 1.

Figure 1. Collapsing nodes displaying in Internet Explorer. View larger image. The little plus sign next to the first product node indicates that the node has children. Clicking on the plus sign will expand any nodes under that particular node. In this way, you can easily display the parts of the document on which you want to focus.

Now, open your XML document in any text editing tool and scroll down to the cost node of the second product. Save your work and reload Internet Explorer. You should see an error message that looks like the one pictured in Figure 1. As you can see, Internet Explorer provides a rather verbose explanation of the error it ran into: Furthermore, it provides a nice visual of the offending line, a little arrow pointing to the spot at which the parser thinks the problem arose.

Even though the problem is really with the start tag, the arrow points to the end tag. Because Internet Explorer uses a non-validating parser by default remember, this means it only cares about well-formedness rules , it runs into problems at the end tag. You now have to backtrack to find out why that particular end tag caused such a problem.

Open your XML document in an editor once more, and fix the problem we introduced above. Save your work and reload your browser. You should see an error message similar to the one shown in Figure 1.

At first glance, this error message seems a bit more obscure than the previous one. However, look closely and what do you see? Firefox is a popular open-source browser, and at the time this book went to print the latest version was 1.

You can download a free copy from the Mozilla website. Okay, so both Internet Explorer and Firefox will check your XML for well-formedness, but you need to know for future reference how to check that an XML file is valid i. How do you do that? There are various well-known online validating XML parsers. All you have to do is visit the appropriate page, upload your document, and the parser will validate it. Here is the most popular online parser. Sometimes, it may be impractical to use a Website to validate your XML because of issues relating to connectivity, privacy, or security.

First xml pdf head

This checks for well-formedness if the document has no DTD, and for well-formedness and validity if a DTD is specified. Results of the validation will appear under the Results area, as illustrated in Figure 1. For most purposes, an online resource will do the job nicely. If you work in a company that has an established software development group, chances are that one of the XML-savvy developers has already set up a good validating parser.

This project will help ground your skills as you obtain firsthand experience with practical XML development techniques, issues, and processes. It usually consists of the following components:. Before you build any kind of CMS, first you must gather information that defines the basic requirements for the project. The goal of the CMS is to make things easier for those who need to develop and run the site. And making things easier means having to do more homework beforehand! Although you may groan at the thought of this kind of exercise, a set of well-defined requirements can make the project run a lot more smoothly.

What kind of requirements do we need to gather? Essentially, requirements fall into three major categories: In the world of XML, each of these different types of content is, naturally enough, called a document type. You also have to know how each of these content types will break out into its separate components, or metadata.

Each article, for instance, will have various pieces of metadata, such as a headline, author name, and keywords, each of which the CMS needs to track. The final challenge — to define various types of metadata — can be a blessing in disguise.

In my experience, once people grasp the importance of metadata, they race off in every direction and collect every single piece of metadata they can find about a given content type.

For example, the client might start to track the date on which an article is first drafted. Gathering metadata can be very tricky. At first glance, we could say that all of our articles should contain elements for author name and email address, and leave it at that. However, we may later decide that we want site visitors to search or browse articles by author. In this case, it would make more sense to have a centralized list of authors, each with his or her own unique ID.

Having a separate author listing would also allow us to easily set bylines for each author, in case someone decided they wanted to publish pieces under a pen name.

It would also allow us to track author information across content types. Of course, agreeing on this approach means that we need to do other work later on, such as building administrative interfaces for author listings.

The other two are site functionality and site design. Every piece of metadata could potentially drive some kind of site behavior, but each piece of metadata also must be managed by the administration tools you set up. Site behavior should always be based on and driven by metadata. Typical site behavior for a CMS-powered Website includes browsing by content categories, browsing by author, searching on titles and keywords, dynamic news sidebars, and more.

Additionally, many XML- and database-powered sites feature homepages that boast dynamically updated content, such as Top Ten Downloads, latest news headlines, and so on. Our CMS will need to have an administrative component for each content type. It will also have to administer pieces of information that have nothing to do with content types, such as which users are authorized to log in to the CMS, and the privileges each of them has. It goes without saying that your administrative interface has to be secure, otherwise, anyone could click to your CMS and start deleting content, making unauthorized changes to existing content, or adding new content that you may not want to have on your site.

A workflow is simply a set of rules that allow you to define who does what, when, and how. For example, your workflow might stipulate that a user with writer privileges may create an article, but that only a production editor can approve that content for publication on the site.

In many cases, CMS workflows emulate actual workflows that exist in publication and marketing departments. We want to publish articles and news stories on our site. We definitely want to keep track of authors and site administrators, and we also want to build a search engine. Whenever I build an XML-powered application, I try to define the content types first, because I find that all the other elements cascade from there.

The articles in our CMS will be the mainstay of our site. In addition to the article text, each of our articles will be endowed with the following pieces of metadata:. Furthermore, because we need to identify each article in our system uniquely with an ID of some sort, it makes sense to add an id attribute to the root element that will contain this value.

A unique identifier will ensure that no mistakes occur when we try to edit, delete, or view an existing article. Now, each of our articles will have an author, so we need to reserve a spot for that information. Our article will need a headline, a short description, a publication date, and some keywords. The keyword listing can be handled in one of two ways. This approach will satisfy the structure nuts out there, but it turns out to be too complicated for the way we will eventually use these keywords.

We also need to track status information on the article. However, you probably already see that status is very similar to keyword listings in that it has the potential to belong to many different content types. As such, it makes sense to centralize this information. As most of our content will be displayed in a Web browser, it makes sense to use as many tags as possible that a browser like IE or Firefox can already understand.

But for the purposes of our article storage system, we want to treat all of the HTML tags and text that make up the document body as a simple text string, rather than having to handle every single HTML tag that could appear in the article body. My goal for that chapter was to show you how flexible XML really is. It is both a style sheet specification and a kind of programming language that allows you to transform an XML document into the format of your choice: XPath is a language for locating and processing nodes in an XML document.

Because each XML document is, by definition, a hierarchical structure, it becomes possible to navigate this structure in a logical, formal way i. A document type definition DTD is a set of rules that governs the order in which your elements can be used, and the kind of information each can contain. While a DTD can provide only general control over element ordering and containment, schemas are a lot more specific.

They can, for example, allow elements to appear only a certain number of times, or require that elements contain specific types of data such as dates and numbers. Both technologies allow you to set rules for the contents of your XML documents. If you need to share your XML documents with another group, or you must rely on receiving well-formed XML from someone else, these technologies can help ensure that your particular set of rules is properly followed.

If you made changes to the XML file, the other files would also change automatically once you passed the XML file through the process. This notion, by the way, is an essential component of single-sourcing — i.

As you can see, separating information from presentation makes your XML documents reusable, and can save hassles and headaches in environments in which a lot of information needs to be stored, processed, handled, and exchanged. That means the publisher can generate sample PDFs for its Website, make print-ready files for the printer, and potentially create ebooks in the future. All formats will be generated from the same source, and all will be created using different style sheets to process the base XML files.

One of the most powerful advantages of XML, of course, is that it allows you to define your own language. However, this most powerful feature also exposes a great weakness of XML. If all of us start defining our own languages, we run the risk of being unable to understand anything anyone else says. A valid document, then, is nothing more then a well-formed document that adheres to its DTD. For the most part, you will only care that your documents are well formed. Well-formedness alone allows you to create ad hoc XML documents that can be generated, added to an application, and tested quickly.

Head first xml pdf

The first thing we want to do is to create an XML document. If you have Internet Explorer 5 or higher installed on your machine, you can view your newly-created XML file. As Figure 1. Notice the little minus signs next to some of the XML nodes?

A minus sign in front of a node indicates that the node contains other nodes.

You might also like: BEGINNING XML PDF

If you click the minus sign, Internet Explorer will collapse all the child nodes belonging to that node, as shown in Figure 1. Figure 1. Collapsing nodes displaying in Internet Explorer. View larger image. The little plus sign next to the first product node indicates that the node has children.

Clicking on the plus sign will expand any nodes under that particular node. In this way, you can easily display the parts of the document on which you want to focus. Now, open your XML document in any text editing tool and scroll down to the cost node of the second product.

Save your work and reload Internet Explorer. You should see an error message that looks like the one pictured in Figure 1. As you can see, Internet Explorer provides a rather verbose explanation of the error it ran into: Furthermore, it provides a nice visual of the offending line, a little arrow pointing to the spot at which the parser thinks the problem arose.

Even though the problem is really with the start tag, the arrow points to the end tag. Because Internet Explorer uses a non-validating parser by default remember, this means it only cares about well-formedness rules , it runs into problems at the end tag. You now have to backtrack to find out why that particular end tag caused such a problem. Open your XML document in an editor once more, and fix the problem we introduced above.

Save your work and reload your browser. You should see an error message similar to the one shown in Figure 1. At first glance, this error message seems a bit more obscure than the previous one. However, look closely and what do you see? Firefox is a popular open-source browser, and at the time this book went to print the latest version was 1. You can download a free copy from the Mozilla website.

Okay, so both Internet Explorer and Firefox will check your XML for well-formedness, but you need to know for future reference how to check that an XML file is valid i. How do you do that? There are various well-known online validating XML parsers. All you have to do is visit the appropriate page, upload your document, and the parser will validate it.

Here is the most popular online parser. Sometimes, it may be impractical to use a Website to validate your XML because of issues relating to connectivity, privacy, or security. This checks for well-formedness if the document has no DTD, and for well-formedness and validity if a DTD is specified. Results of the validation will appear under the Results area, as illustrated in Figure 1.

For most purposes, an online resource will do the job nicely. If you work in a company that has an established software development group, chances are that one of the XML-savvy developers has already set up a good validating parser. This project will help ground your skills as you obtain firsthand experience with practical XML development techniques, issues, and processes. It usually consists of the following components:.

Before you build any kind of CMS, first you must gather information that defines the basic requirements for the project. The goal of the CMS is to make things easier for those who need to develop and run the site. And making things easier means having to do more homework beforehand!

Although you may groan at the thought of this kind of exercise, a set of well-defined requirements can make the project run a lot more smoothly. What kind of requirements do we need to gather? Essentially, requirements fall into three major categories: In the world of XML, each of these different types of content is, naturally enough, called a document type.

You also have to know how each of these content types will break out into its separate components, or metadata. Each article, for instance, will have various pieces of metadata, such as a headline, author name, and keywords, each of which the CMS needs to track. The final challenge — to define various types of metadata — can be a blessing in disguise. In my experience, once people grasp the importance of metadata, they race off in every direction and collect every single piece of metadata they can find about a given content type.

For example, the client might start to track the date on which an article is first drafted. Gathering metadata can be very tricky. At first glance, we could say that all of our articles should contain elements for author name and email address, and leave it at that. However, we may later decide that we want site visitors to search or browse articles by author. In this case, it would make more sense to have a centralized list of authors, each with his or her own unique ID.

Having a separate author listing would also allow us to easily set bylines for each author, in case someone decided they wanted to publish pieces under a pen name. It would also allow us to track author information across content types. Of course, agreeing on this approach means that we need to do other work later on, such as building administrative interfaces for author listings.

The other two are site functionality and site design. Every piece of metadata could potentially drive some kind of site behavior, but each piece of metadata also must be managed by the administration tools you set up. Site behavior should always be based on and driven by metadata. Typical site behavior for a CMS-powered Website includes browsing by content categories, browsing by author, searching on titles and keywords, dynamic news sidebars, and more. Additionally, many XML- and database-powered sites feature homepages that boast dynamically updated content, such as Top Ten Downloads, latest news headlines, and so on.

Our CMS will need to have an administrative component for each content type. It will also have to administer pieces of information that have nothing to do with content types, such as which users are authorized to log in to the CMS, and the privileges each of them has.

It goes without saying that your administrative interface has to be secure, otherwise, anyone could click to your CMS and start deleting content, making unauthorized changes to existing content, or adding new content that you may not want to have on your site. A workflow is simply a set of rules that allow you to define who does what, when, and how.

For example, your workflow might stipulate that a user with writer privileges may create an article, but that only a production editor can approve that content for publication on the site. In many cases, CMS workflows emulate actual workflows that exist in publication and marketing departments.

We want to publish articles and news stories on our site. We definitely want to keep track of authors and site administrators, and we also want to build a search engine. Whenever I build an XML-powered application, I try to define the content types first, because I find that all the other elements cascade from there. The articles in our CMS will be the mainstay of our site. In addition to the article text, each of our articles will be endowed with the following pieces of metadata:.

Furthermore, because we need to identify each article in our system uniquely with an ID of some sort, it makes sense to add an id attribute to the root element that will contain this value. A unique identifier will ensure that no mistakes occur when we try to edit, delete, or view an existing article.

Now, each of our articles will have an author, so we need to reserve a spot for that information. Our article will need a headline, a short description, a publication date, and some keywords. The keyword listing can be handled in one of two ways. This approach will satisfy the structure nuts out there, but it turns out to be too complicated for the way we will eventually use these keywords. We also need to track status information on the article. However, you probably already see that status is very similar to keyword listings in that it has the potential to belong to many different content types.

As such, it makes sense to centralize this information. As most of our content will be displayed in a Web browser, it makes sense to use as many tags as possible that a browser like IE or Firefox can already understand.

But for the purposes of our article storage system, we want to treat all of the HTML tags and text that make up the document body as a simple text string, rather than having to handle every single HTML tag that could appear in the article body. My goal for that chapter was to show you how flexible XML really is.

It is both a style sheet specification and a kind of programming language that allows you to transform an XML document into the format of your choice: XPath is a language for locating and processing nodes in an XML document.

Because each XML document is, by definition, a hierarchical structure, it becomes possible to navigate this structure in a logical, formal way i. A document type definition DTD is a set of rules that governs the order in which your elements can be used, and the kind of information each can contain.

While a DTD can provide only general control over element ordering and containment, schemas are a lot more specific. They can, for example, allow elements to appear only a certain number of times, or require that elements contain specific types of data such as dates and numbers. Both technologies allow you to set rules for the contents of your XML documents.

If you need to share your XML documents with another group, or you must rely on receiving well-formed XML from someone else, these technologies can help ensure that your particular set of rules is properly followed. The ability of XML to allow you to define your own elements provides flexibility and scope. XML namespaces attempt to keep different semantic usages of the same XML elements separate and unambiguous. In our example, each person could define their own namespace and then prepend the name of their namespace to specific tags: No one in their right mind could reasonably expect them all to switch to XML overnight.

But we can expect that some of these pages — and a large percentage of the new pages that are being coded as you read this — will make the transition thanks to XHTML. As you can see, the XML family of technologies is a pretty big group — those XML family reunions are undoubtedly interesting! Although this means that some ideas take quite a while to reach fruition, and tend to be built by committee, it also means that no single vendor is in total control of XML.

And this, as Martha Stewart might say, is a good thing. So, what do you say? Not sure? Well, put bluntly, the Web has reached a point at which just about anything will fly when it comes to HTML documents. Take a look at the following snippet:. Believe it or not, that snippet will render without a problem in most Web browsers. And so will this:. But, exactly what does this mean? Use this with CSS to minimize presentational clutter. XML Namespaces were invented to rectify a common problem: Imagine you were running a bookstore and had an inventory file called inventory.

A human being could probably figure out that one title has nothing to do with the other, but an application that tried to sort it out would go nuts. We need to have a way to distinguish between the two different semantic universes in which these identical terms exist.

Your inventory file stores information about books on the shelf, but the sales file stores information about books that have been bought by customers. In either situation, regardless of the chasm that lies between the contexts of these identical terms, we need a way to properly label each context. Namespaces to the rescue! To use and declare a namespace, we must first tie the namespace to a URI. URIs can take the following forms:. Uniform Resource Locator: Uniform Resource Name: For example, all published books have an ISBN.

However, armed with the ISBN, you could walk into the store, ask an employee to search for you, and they could take you right to the book provided, of course, that it was in stock. We want to use our namespace throughout our XML documents, though, and the last thing we want to do is type out an entire URI every time we need to distinguish one context from another.

So, we define a prefix to represent our namespace to ease the strain on our typing fingers:. The agreed way to do that is to prefix the namespace declaration with xmlns: At this point, we have something useful. If we needed to, we could add our prefix to appropriate elements to disambiguate I love that term! In most cases, placing your namespace declarations will be rather easy. Please note, however, that namespaces have scope. Namespaces affect the element in which they are declared, as well as all the child elements of that element.

However, if you want to limit your namespace scope to a certain part of a document, feel free to do so — remembering, of course, that this can get pretty tricky.

It would become pretty tiresome to have to type a prefix for every single element in a document. Notice the non-prefixed namespace: On the other side of the coin, all XSLT elements must be given the xsl: This document contains a root element letter that contains three other elements to , from , and message , each of which contains text.

When you display your XML document, you should see something similar to Figure 2. Figure 2. As you can see, CSS did a marvelous job of rendering a nicely shaded box around the entire letter, setting fonts, and even displaying things like margins and padding.

Strictly speaking, the CSS standard does allow for this sort of thing with the content property, which can produce generated text before and after document elements. Think of it as a tool that you can use to transform your XML documents into other documents. Here are some of the possibilities:. XSLT is a rules-based, or functional language. Because XSLT can be a little bewildering even for veteran programmers, the best way to tackle it is to walk through a series of examples.

Keeping both these elements simple will give us the opportunity to step through the major concepts involved.

First xml pdf head

They must therefore follow the rules that apply to all XML documents: The version attribute is required. The xmlns: In our example, we will use an xsl prefix on all the stylesheet-related tags in our XSL documents to associate them with this namespace.

The next element will be the output element, which is used to define the type of output you want from the XSL file.

Related Documents


Copyright © 2019 vitecek.info.