DapperCamp Wiki > Documentation > Dapper XML Format

Dapper XML Format

Table of contents
No headers

The core Dapper output format was developed by Dapper and captures the content from a website in a nested hierarchy.  We'll use the following example to explain the XML:

./ishot-1.jpg

The root node of every Dapp XML is named elements.  Every Dapp XML contains at least one child of the elements node named dapper, which contains metadata relevant to the Dapp and the specific execution.  Here are some of the nodes the dapper node may contain:

  • dappTitle: the title of the Dapp
  • dappName: the identifier of the Dapp (same as dappTitle, but with only letters and numbers)(
  • url: a URL that was in the original sample set when the Dapp was created
  • applyToUrl: the URL on which the Dapp was just run
  • executionTime: the amount of time Dapper spent within the algorithms extracting content
  • ranAt: the time at which the Dapp was run (useful in debugging caching)

After the dapper node, there are two types of nodes you can encounter: "group" nodes and "field" nodes. 

Field nodes repesent a field in the Dapp (and are named by whatever the creator of the Dapp called them in the Dapp Factory, modified so they are safe for XML).  The contain content from the original website.  Each field node has the following attributes:

  • fieldName: the original name of the field, before it was made safe for XML
  • originalElement: the corresponding element in the HTML document
  • type: always set to "field" - useful for xquery and manipulating the DOM
  • href: the href of the element or any ancestor element in the HTML - if it is a link in the HTML, the destination of the link will be in this attribute (optional)
  • src: the src of the element or any acestor element in the HTML - if it is an image in the HTML, the URL of the image will be in this attribute (optional)

Group nodes group together one or more field nodes.  Grouping is determined manually by the creator of the Dapp in the Dapp Factory.  The same rules apply to group nodes in terms of being renamed to be XML safe.  Group nodes contain the following attributes:

  • groupName: the original name of hte group, before it was made safe for XML
  • type: always set to "group" - useful for xquery and manipulating the DOM


Tag page
You must login to post a comment.