Python language support for threads was introduced at a late stage in the Python development process. At this point, the majority of Python code was not thread-safe and an approach using coarse grained locking was adopted. More specifically, the approach was to use a single global lock known as the global interpreter lock (or GIL) to serialize execution of byte codes. The GIL has remained a major bottleneck when executing multi-threaded Python applications, in particular on multi-processor architectures. To illustrate the performance degradation imposed by Python's threading model, consider the simple application of multiplying two matrices. A master thread spawns a set of worker threads which calculate rows in parallel. When all rows have been calculated, the application terminates. The results from a Python version of parallel matrix multiplication are shown in Table 1. The application is run on an 8-way SMP machine with 2GB of RAM hosting 200MHz Pentium Pro processors, Linux 2.4.18 and Python2.2. Each Python thread runs 100 byte codes between each context switch1.
The execution times are weighted, assigning a weight of 100 to the time used by a single-threaded implementation. The results in Table 1 show that when more threads are added, performance does not increase with the number of workers (and hence the number of CPUs in the system). Rather, performance decreases because of contention on the GIL. The degradation is most noticeable when a second thread is introduced, as there is no lock contention in a single-threaded application. Performance issues aside, threads in Python present an attractive programming model, since multiple threads can communicate e.g. by synchronized access to standard container types.
Number of workers | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
Time used (weighted) | 100.0 | 100.7 | 108.1 | 108.7 | 109.1 | 109.2 | 109.2 | 109.4 |
A common workaround to achieve parallelism on multi-processor architectures is to use multiple processes instead of threads. In such multi-process applications, inter-process communication is usually performed with shared memory or an ad-hoc messaging API using e.g. pipes. One drawback with using multiple processes in Python is that programming communication among multiple processes can be more complex than communication between threads using standard container types.
This paper presents POSH, which is an extension module to Python that attempts to address the above problems by enabling placement of Python objects in shared memory. POSH allows multiple processes to share objects in much the same way that threads do with standard Python objects. This following sections present the design, usage and performance evaluation of POSH.