[Docutils-develop] docutils.dtd: id vs. ids

Discussion:

Guenter Milde

2017-01-01 22:24:12 UTC

Dear Docutils developers, dear David,

Testing docutils.dtd with ::

xmllint --dtdvalid docutils.dtd standalone_rst_docutils_xml.xml

shows ca. 100 error messages like ::

...
standalone_rst_docutils_xml.xml:1082: element footnote: validity error : IDREFS attribute backrefs references an unknown ID "id31"
standalone_rst_docutils_xml.xml:154: element reference: validity error : IDREF attribute refid references an unknown ID "topics-sidebars-and-rubrics"
...

and concludes:

Document standalone_rst_docutils_xml.xml does not validate against docutils.dtd

The problem is, that in XML there is no datatype IDS for a list of ID
values (similar to NMTOKEN/NMTOKENS). Hence, the docutils.dtd uses ::

" ids NMTOKENS #IMPLIED

OTOH, there are references to the ids ::

" refid IDREF #IMPLIED ">

" backrefs IDREFS #IMPLIED ">

However, xmllints does not know that NMTOKENS are used as ID and hence
reports validity errors.

When (as a test) changing the datatype of ids to ID ::

- " ids NMTOKENS #IMPLIED
+ " ids ID #IMPLIED

xmllint reports 22 errors ::

...
standalone_rst_docutils_xml.xml:243: element reference: validity error : IDREF attribute refid references an unknown ID "subtitle"
standalone_rst_docutils_xml.xml:1583: element system_message: validity error : IDREFS attribute backrefs references an unknown ID "id86"

The XML standard says that an id must be unique and that only one id per
element is allowed.

git blame says, that id was changed to ids (and ID to NMTOKEN) in Oktober
2005.

What was the reason for multiple ids on one element?

Can we avoid this?
How can we proceed?

1. normalize ids (use the first and change references, say)

a) during parsing
b) in a transform

2. normalize ids just for XML output

3. don't care for xmllint

4. use "NMTOKEN/NMTOKENS" instead of "REFID/S" in
refid and backrefs? (don't test for matching references and id uniqueness)

My preference would be to change "ids" to "id" and use one id per object
either during parsing or via a transform.

Günter

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-develop mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply

David Goodger

2017-01-02 19:36:15 UTC

Permalink

Post by Guenter Milde
Dear Docutils developers, dear David,
xmllint --dtdvalid docutils.dtd standalone_rst_docutils_xml.xml
...
standalone_rst_docutils_xml.xml:1082: element footnote: validity error : IDREFS attribute backrefs references an unknown ID "id31"
standalone_rst_docutils_xml.xml:154: element reference: validity error : IDREF attribute refid references an unknown ID "topics-sidebars-and-rubrics"
...
Document standalone_rst_docutils_xml.xml does not validate against docutils.dtd
The problem is, that in XML there is no datatype IDS for a list of ID
" ids NMTOKENS #IMPLIED
" refid IDREF #IMPLIED ">
" backrefs IDREFS #IMPLIED ">
However, xmllints does not know that NMTOKENS are used as ID and hence
reports validity errors.
- " ids NMTOKENS #IMPLIED
+ " ids ID #IMPLIED
...
standalone_rst_docutils_xml.xml:243: element reference: validity error : IDREF attribute refid references an unknown ID "subtitle"
standalone_rst_docutils_xml.xml:1583: element system_message: validity error : IDREFS attribute backrefs references an unknown ID "id86"
The XML standard says that an id must be unique and that only one id per
element is allowed.
git blame says, that id was changed to ids (and ID to NMTOKEN) in Oktober
2005.
What was the reason for multiple ids on one element?

I believe it was a practical addition, as we needed it. Maybe because
of multiple hyperlink targets assigned to an object.

Post by Guenter Milde
Can we avoid this?
How can we proceed?
1. normalize ids (use the first and change references, say)
a) during parsing
b) in a transform
2. normalize ids just for XML output
3. don't care for xmllint
4. use "NMTOKEN/NMTOKENS" instead of "REFID/S" in
refid and backrefs? (don't test for matching references and id uniqueness)
My preference would be to change "ids" to "id" and use one id per object
either during parsing or via a transform.

Why do you want to make a change? What's the purpose, the desired improvement?

I don't care about xmllint at all. The DTD was never meant to be
prescriptive, just descriptive. It merely describes the internal data
structure used by Docutils: the Doctree. It was never meant to be used
to validate documents (you may be the first to try). Doctree-XML
documents are relatively rare anyway.

You're letting the tail wag the dog. Don't do that.

If a validatable XML output format is needed, a Writer can be
implemented for that. It would have to have a slightly different DTD,
and that's OK.

David Goodger
<http://python.net/~goodger>

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, SlashDot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-develop mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply All" to reply to the list.

Guenter Milde

2017-01-05 13:37:44 UTC

Permalink

Dear Docutils developers, dear David,