Have transifex ignore empty msgids in pot files


I’m using Transifex to translate JupyterBook content via .pot files made with sphinx’s gettext builder; briefly, for reasons I don’t fully understand, for pages that are at the top level of a section, a gettext with an empty msgid gets added (see commit here for an example). Unfortunately, when reading in files via the GitHub integration, Transifex barfs on any file that contains one and it becomes untranslatable. It’s easy enough to manually remove them when I’m manually making the pot files, but I’d love to automate pot file creation with a GitHub action, and this would be a showstopper. I’m going to try to work this problem from the other end as well to try to avoid getting the unnecessary empty msgid’s added in the first place, but is there either a) an existing setting or something I can set to have Transifex skip these lines rather than lose the whole file and/or b) is that something that can be pitched to the dev team? Thanks!

Hello Beth, this is Sandy from the Transifex support team.

Could you please share a screenshot of the error you see when uploading this file using our Github integration? Also, Could you please try this command on your file and let me know the output? Thank you so much in advance! :slight_smile:

Hi Sandy,

Thanks for the quick response! See below. If msgfmt is saying it’s a fatal error, it’s not surprising to me Transifex isn’t reading it, although it seems like Transifex is parsing it ok, due to the error saying “empty string” and not “duplicate message”. So if it’s possible to add an “ignore empty msgid” toggle button, that works great for my case, but seems like my case is likely to be a corner case.

> msgfmt -c _build/gettext/welcome.pot 
_build/gettext/welcome.pot:20: duplicate message definition...
_build/gettext/welcome.pot:8: ...this is the location of the first definition
msgfmt: found 1 fatal error

Contents of welcome.pot (lines 1-25)

# Copyright (C) 2023
# This file is distributed under the same license as the Python package.
#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: Python \n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2023-05-03 09:29-0400\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"

#: ../../welcome.md:1
msgid ""
msgstr ""

#: ../../welcome.md:1
msgid "Sample Preparation"
msgstr ""


Hello Beth,

Gettext added the empty msgid you mentioned (Which is expected) in the header. But in the following lines, another empty string also has an empty msgid; this second msgid is the one the Github integration is barfing about.

This is also why you see the error “duplicate message definition…” when you run the command “msgfmt -c” because now you have a duplicated key value (msgid), the one in the header and the one in the posterior lines.

Answering your initial question:
a) Unfortunately, there is no way to make Transifex skip this line because it is expected for the parser to have it there.
b) I recommend you remove the second msgid or replace it with another value

I hope my reply helps; Let me know if you have further questions,

Sandy :slight_smile: