My favorites | Sign in
Project Home Downloads Wiki Issues Source
Search
for
TranslatorsGuide  
Guidelines for people translating the NLTK book
Updated Apr 3, 2012 by StevenBird1

If you have questions about any aspect of these guidelines, please contact Steven Bird.

1. Proposal

A proposal for a translation into the given language should be submitted to the nltk-translation mailing list. The proposal should identify the translation team, including the advisors, and should describe the team's expertise in NLP. Describe the steps that will be taken to ensure a good quality translation. List any specific language resources that could possibly be redistributed, and any steps that have been taken to get permission. Specify the timeframe for the translation work, and any intermediate milestones.

If a previous team has not completed the translation work, and has been inactive for a period of three months (according to the public repository), then another team can propose to take over the translation work.

Once approved, the team will be given write-access to the repository. The existence of the project, and the membership of the team, and pointers to any translated materials, will be posted on the NLTK website.

2. Translation team

Ideally there should be multiple translators who share in the task, and check each other's work. There should be at least two advisors who are experienced researchers in NLP and who have agreed to read the translation and give feedback. These people will be recognized as translators or advisors, but not co-authors.

3. Technical vocabulary

A table of key NLP terms and their translations should be maintained in a publicly accessible location. The translation should use this vocabulary consistently. Terminological questions should be resolved by the translation team. Unresolved issues should be recorded in notes to the vocabulary table.

4. Translator's preface, appendices, language HOWTO

The translated book should have a second preface, identifying the translators and their contributions. Each translated chapter may contain an appendix with extra information on how the topic of the chapter applies to the language in question. The appendix should identify the authors, and the appendices of different chapters might have different authorship. Topics covered in the appendix could include:

  • discussion of particular issues for NLP in the language
  • available resources such as annotated corpora and tools
  • selected publications describing NLP tasks involving the language

In addition, a language HOWTO can be provided, summarizing the application of NLTK to the language in a single place.

5. Language resources

Annotated corpora can be conveniently distributed using NLTK's corpus downloader. Where possible, obtain permission to redistribute existing corpora (or corpus samples) in the language, and add these to NLTK's corpus collection. Submit a ticket to NLTK's issue tracker, giving the URL of the corpus, stating its license, and describing any agreement with the provider of the data.

6. Discussion forum

There should be a mailing list nltk-xyz established, for discussions concerning translation and the use of NLTK for the given language. Membership of the list will be open to people who are interested in the translation work or in the application of NLTK to the language.

7. Markup, typesetting, dissemination

The translation work should be hosted in the NLTK repository, in the directory trunk/nltk/doc/xy where xy is the two-letter ISO-639 code for the language. The content should be authored using Restructured Text markup, built using NLTK's build tools, and disseminated via the NLTK website under the terms of the Creative Commons Attribution Non-Commercial License 3.0 (U.S.)

8. Authorship

The authors of the NLTK book will still be identified as authors of the translated work. They will be free to seek advice from anyone concerning the translation, and will be free to re-assign the translation work to others. The translators will be identified in the book, and will be named as authors of the second preface, and of any appendices and HOWTOs.

9. Funding

Note that the NLTK Project is purely voluntary. No funding is available to support translation work. If funding is required, we would be happy to provide a letter of support, in case it helps with any funding application.

Powered by Google Project Hosting