9cd22e907ddc25e36dcd18c493631aa457acbd6d max Mon Jan 29 08:14:25 2024 -0800 updating python style guide diff --git python/style.txt python/style.txt index b55a218..3439c7c 100644 --- python/style.txt +++ python/style.txt @@ -1,112 +1,113 @@ Style Guide for Python Code -CODE CONVENTIONS +The browser uses very few Python scripts. Most are one-shot scripts that were used when building a track. We archive +them in this repo but do not run them a lot anymore. -Follow the Python coding conventions laid out by the Python Style guide, except for the -UCSC Genomics Group specific conventions outlined below. - http://www.python.org/dev/peps/pep-0008/ +CGIs in Python + +We have only 1-2 CGI scripts in Python (e.g. hgGeneGraph and hgMirror, which +runs only on GBIB) and they do not get a lot of usage. However, they do exist +and the pyLib directory contains hgLib3.py with ports of e.g. the menu, +bottleneck, cart parsing and cgi argument parsing, often with the same function +names as their kent C equivalents. So writing CGIs in Python is possible, as +long as they are not computationally intensive. We are not using a special Python +webserver, we are running CGIs so far like we run C programs, this costs us 200 msec +at startup, but makes management on our web servers much easier. For the two CGIs, +it's certainly sufficient. + +PYTHON VERSIONS + +Python2 is not used anymore anywhere, and Python3 is now required. The problem of +version incompatibility is vexing in Python, even sometimes among the 3.x +versions. You can usually work around it buy sticking to the basic Python 3.6 +or so features and not using the very advanced features. Testing on a very +recent Python version can help. hgLib3.py uses one single external package, the +MySQL library, which comes with it. It should be possible to not + +CALLING C CODE + +It's possible to call C library functions directly from Python. But in practice +we only call C binaries via exec(), because of the memory management issue. If you +find yourself doing a lot of that, it may be better to write C directly. + +CODE CONVENTIONS INDENTATION AND SPACING Each block of code is indented by 4 spaces from the previous block. Do not use tabs to separate blocks of code. The indentation convention differs from the C coding style found in src/README, which uses 4-base indents/8-base tabs. Common editor configurations for disallowing tabs are: vim: Add "set expandtab" to .vimrc emacs: Add "(setq-default indent-tabs-mode nil)" to .emacs Lines are no more than 100 characters wide. INTERPRETER DIRECTIVE -The first line of any UCSC Python script should be: - #!/usr/bin/env python2.7 +The first line of any Python script should be: + #!/usr/bin/env python3 -This line will invoke python2.7 found in the user's PATH. It ensures that scripts developed +This line will invoke python3 found in the user's PATH. It ensures that scripts developed by UCSC can be distributed and explicitly states which Python version was used to develop the scripts. +The kent repo contains a few Python2.7 scripts. These are mostly archived +versions of scripts that are not run anymore. + NAMING CONVENTIONS Use mixedCase for symbol names: the leading character is not capitalized and all successive words are capitalized. (Classes are an exception, see below.) Non-UCSC Python code may follow other conventions and does not need to be adapted to these naming conventions. Abbreviations follow rules in src/README: Abbreviation of words is strongly discouraged. Words of five letters and less should generally not be abbreviated. If a word is abbreviated in general it is abbreviated to the first three letters: tabSeparatedFile -> tabSepFile In some cases, for local variables abbreviating to a single letter for each word is okay: tabSeparatedFile -> tsf In complex cases you may treat the abbreviation itself as a word, and only the first letter is capitalized: genscanTabSeparatedFile -> genscanTsf Numbers are considered words. You would represent "chromosome 22 annotations" as "chromosome22Annotations" or "chr22Ann." Note the capitalized 'A" after the 22. -Packages and Modules - -In Python, a package is represented as a directory with an __init__.py file in it, -and contains some number of modules, which are represented as files with a .py extension. -A module may in turn contain any number of related classes and methods. This differs from Java, -where one file correlates to one class: in Python it is correct to treat one module similar to -a whole namespace in Java. - -In general try to keep modules on the order of 100's of lines. - -Internal packages and modules should have short names in mixedCase, with no spaces or underscores. -A good example of this style is the ucscGb package: - - ucscGb/ - __init__.py - ra.py - cv.py - ... - - For more information: - http://docs.python.org/tutorial/modules.html - Imports The most correct way to import something in Python is by specifying its containing module: import os from ucscGb import ra Then, the qualified name can be used: someRa = ra.RaFile() Do not import as below, as this may cause local naming conflicts: from ucscGb.ra import RaFile from ucscGb.track import * Imports should follow the structure: - 1. Each import should be on a separate line, unless modules are from the same package: - import os - import sys - - from ucscGb import ra, track, qa - - 2. Imports should be at the top of the file. Each section should be separated by a blank line: + 1. Imports should be at the top of the file. Each section should be separated by a blank line: a. standard library imports b. third party package/module imports c. local package/module imports For more information, see the "Imports" section: http://www.python.org/dev/peps/pep-0008/ Classes CapitalCase names. Note the leading capital letter to distinguish between a ClassName and a functionName. Underscores are not used, except for private internal classes, where the name is preceded by double underscores which Python recognizes as private. @@ -121,80 +122,41 @@ Functions mixedCase names. The leading character is not capitalized, but all successive words are capitalized. In general try to keep methods around 20 lines. Variables mixedCase names. Underscores are not used, except for private internal variables, where the name is preceded by double underscores which Python recognizes as private. COMMENTING -Note: Still working out which automation document tool to use. - -Automated documentation is carried out using the Epydoc tool: - http://epydoc.sourceforge.net/ - Comments should follow the conventions: 1. Every module should have a paragraph at the beginning. Single class modules may omit paragraph in favor of class comment. 2. Use Python's docstring convention to embed comments, using """triple double quotes""": http://www.python.org/dev/peps/pep-0257/ - 3. Use Epytext markup language conventions when commenting: - http://epydoc.sourceforge.net/epytext.html - - 4. Use Epytext field tags to describe specific properties of objects: - - Structure: - - a. Fields must be placed at the end of a docstring. - - b. Each field is distinguished by the following pattern: - @tag: body - @tag arg: body - - c. All blocks pertaining to a field must have equal indentation - greater than or equal to field tag indentation. - - d. Optional field tags to use: - - i. @param - Description of parameter to a function - - ii. @return - Description of a function's return value - - def exampleFunction(): - """ - This paragraph describes the object. - - @param inputFile: Input file name - - @return: This is a description of the function's return value - """ - - For more information and supported fields, see: - http://epydoc.sourceforge.net/fields.html#fields - TESTING -Testing is carried out using the unittest module in Python: +Testing can be carried out using the unittest module in Python: http://docs.python.org/library/unittest.html This module allows for self-running scripts, which are self-contained and should provide their own input and output directories and files. The scripts themselves are composed of one or more classes, all of which inherit from unittest.TestCase and contain one or more methods which use various asserts or failure checks to determine whether a test passes or not. Structure: 1. At the start of a script import unittest module: import unittest 2. A test case is created as a sub-class of unittest.TestCase: class TestSomeFunctions(unittest.TestCase):