Discussion:
[Docutils-develop] a question about the languages supported vz smart quotes
jfbu
2017-05-26 15:15:48 UTC
Permalink
Hi,

please excuse the naive question (and perhaps I should have used
user list)

in ``docutils.languages.get_language()`` the list of languages
for which ``<language>.py`` exists is different than the list
of languages for which the ``smartchars`` class in
``docutils.utils.smartquotes`` provides settings.

currently we have an issue at Sphinx
https://github.com/sphinx-doc/sphinx/issues/3788 which was
initially related to the fact that Sphinx supports more languages
than those available in ``docutils/languages/``. Thus the attempt
to use the smart quotes feature was causing the problem that
``docutils.languages.get_language('tr')`` raises a warning.

the current envisioned fix will drop the attempt to apply Docutils
smart quotes if analysis of the returned value from
``get_language('<language_code>')`` (with no reporter
so that no warning is issued) shows that the language is not supported
(for example ``tr``)

but this sounds a pity because for example the development
version of Docutils has the ``tr`` smart quotes, and already
0.13.1 has the ``et`` smart quotes. But there is no ``et.py``
in ``docutils/languages/``.

Do you have any advice about how to fully benefit from
Docutils smart quotes without the limitation that ``get_language``
only recognizes a shorter list of language codes ?

I apologize if my question displays basic mis-understanding
on my part: I am not all familiar with Docutils source code.

.. and only a little with Sphinx ;-) by the way it looks as if
with Sphinx before 1.6.1, calls to Docutils ``get_language()``
always were with ``'en'``, as far as I understand. So the issue
did not arise so far.

Best
Jean-François
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-develop mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply All" to reply to the list.
jfbu
2017-05-26 21:59:42 UTC
Permalink
Hi Günter

thanks for the precision of your reply. It is late (for me) thus
Post by jfbu
.. and only a little with Sphinx ;-) by the way it looks as if
with Sphinx before 1.6.1, calls to Docutils ``get_language()``
always were with ``'en'``, as far as I understand. So the issue
did not arise so far.
This would be a true bug.
You may check with a French document: if a ``.. caution::`` directive comes out
as "Avertissement!" its fine, if it comes out as "Caution" its wrong.
in a Python 2.7 site package I have modified directly the
docutils/languages/__init__.py so that get_language looks now like this

COMPTE = 0

def get_language(language_code, reporter=None):
"""Return module with language localizations.

`language_code` is a "BCP 47" language tag.
If there is no matching module, warn and fall back to English.
"""
global COMPTE
COMPTE = COMPTE+1
print(language_code, reporter, COMPTE)
[... original contents ...]

then I use Sphinx 1.5.6 (we are now at 1.6.1)
on a project with language Turkish containing
a ``.. caution::`` directive. The html output contains Uyarı which
I assume is Turkish for Caution ;-)

on the other hand the console output contains lines of the
following type

('en', <docutils.utils.Reporter instance at 0x109197878>, 1)
....
('en', <docutils.utils.Reporter instance at 0x109197878>, 32)
looking for now-outdated files... none found
pickling environment... done
checking consistency... done
preparing documents... done
('en', <docutils.utils.Reporter instance at 0x109356ab8>, 33)
...
('en', <docutils.utils.Reporter instance at 0x109371710>, 50)
('en', <docutils.utils.Reporter instance at 0x109360e18>, 51)
...
('en', <docutils.utils.Reporter instance at 0x109360e18>, 58)

sorry to put so much Sphinx debugging stuff here, but this is only to
illustrate that apparently with Sphinx 1.5.6
this ``get_language`` was called only with ``'en'``. According
to your earlier explanations this might be explained if Sphinx
relies entirely on its own localization, not on Docutils's one.
(I am supposed to know Sphinx, but I am still learning it and
I am familiar with bits and pieces only)

And testing with French I see the Caution is translated into Prudence,
(Avertissement is used for Warning, according to the fr/sphinx.po
file), despite the Docutils get_language having been called
only with 'en'.

In the Sphinx issue I linked to there was a problem with
many warnings related to smart quotes, but this happened
when environment from a previous build, say with English,
was then picked over for say Turkish. (for which 0.13.1
does not have the smart quotes). Takeshi @tk0miya has since added
to its PR fixing the original issue a commit
to clean up the pickled environment in such circumstances.

I tend to deduce from your precise explanations and the experiment
reported above that it should indeed be possible to access the Docutils
smart quotes facilities without restraining the language to those
for which Docutils has already contributed translations available.

Currently the fix mentioned above drops smart quotes if get_language()
informs the language has no Docutils provided translations,

Thanks

Jean-François


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-develop mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply All" to r
jfbu
2017-05-27 07:30:43 UTC
Permalink
Hi Günter
A warning for every block-level element by the smartquotes transition is
indeed unfriendly. This should not happen with the Docutils version,
maybe the "monkeypatch" left out the code that ensured these warnings are
only given once per document? Or maybe the user translated a larger
project and there is one warning per document?
1. I have observed same multiple warnings with only one document source

2. looking at docutils/transforms/universal.py the last line

self.unsupported_languages = set() # reset

is included in the for loop

for node in self.document.traverse(nodes.TextElement):

I am wondering if indentation level should be reduced.

I have tested moving self.unsupported_languages = set() out of the
loop and it has the expected effect to reduce the number of SmartQuotes
warnings to only one.

(in the convoluted Sphinx 1.6.1 situation with a pickled environment
having used English, when one now tries Turkish)

Best

Jean-François


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Docutils-develop mailing list
Docutils-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/docutils-develop

Please use "Reply All" to reply to the list.
Guenter Milde
2017-05-27 10:02:47 UTC
Permalink
On 2017-05-27, jfbu wrote:

...
Post by jfbu
2. looking at docutils/transforms/universal.py the last line
self.unsupported_languages = set() # reset
is included in the for loop
I am wondering if indentation level should be reduced.
I have tested moving self.unsupported_languages = set() out of the
loop and it has the expected effect to reduce the number of SmartQuotes
warnings to only one.
You found a bug.

It is solved in my private working tree.

Thank you,

Günter


@Engelbert: Is it OK to commit the following patch to SVN?

Index: universal.py
===================================================================
--- universal.py (Revision 8072)
+++ universal.py (Arbeitskopie)
@@ -296,4 +296,4 @@
for txtnode, newtext in zip(txtnodes, teacher):
txtnode.parent.replace(txtnode, nodes.Text(newtext))

- self.unsupported_languages = set() # reset
+ self.unsupported_languages = set() # reset
engelbert gruber
2017-05-27 13:02:26 UTC
Permalink
Post by Guenter Milde
@Engelbert: Is it OK to commit the following patch to SVN?
of course it is.

and while i am at it i make 0.14a1 (release often :-)

cheers

Loading...