Process Managment in Python Part 1: The Process Mediator
Introduction
This post is part 1 of a multi-part arc in which I intend to explore process management and multi-threaded application development in Python. The purpose in doing so is to share my experience and provide guidance and recommendations to python enthusiasts interested in inter-process communication and multi-threaded application development.
Background
Recently I've been working on application prototypes using python and have had the need to quickly create graphical user interfaces to facilitate research, development and testing. While some might expect such efforts to be pretty much straight-forward I found working through the docs to be a somewhat of an obtuse experience -- particularly with regard to process management. So I thought I'd try to clarify some things here and offer up some patterns for process management in GUI development for anyone who might be interested.
First, What Exactly are Processes?
In modern operating systems a process is a program in execution. A process is an instance of a program. As such, it comprises a context which includes a program stack and memory which holds data. A key aspect of a computer process (that seems easy to forget in the age of the graphical user interface) is that execution is always sequential.
Over the course of execution, a process can enter into any of a number of states as illustrated below.
From process launch to completion, the states may be described as follows:
-
new : The process is just being created.
-
running : The process is sequentially executing instructions.
-
waiting : The process is waiting for an event (e.g., I/O or receipt of a signal).
-
ready : The process is ready for some CPU time.
-
terminated : The process is done (with or without error).
The important thing to understand here is that only one process can be running on a processor at any particular instant in time. And depending on the requirements of the program associated processes may block or enter into a waiting state from which it may be released on the occurrence of an event or receipt of some signal.
Process Management in Python Applications
Python provides a number of modules intended to facilitate process management (and relatedly, multi-threaded application development) -- one of which is multiprocessing. Multiprocessing is very useful for optimizing code intended for execution in multi-processor environments but is overkill for my current needs so it's out-of-scope for the present discussion. Instead, in this post I'm focusing on the subprocess module -- which provides a lower level API allowing you to spawn new processes and connect to their input/output/error pipes.
Inter process communication is enabled through OS layer pipes . A pipe is essentially a queue of bytes shared between two processes. One process writes into the pipe and another reads from it. Unix and derivative operating systems (and possibly Windows) define three standard pipes; standard output, standard input, and standard error (termed
stdout
, stdin
and stderror
) respectively. In advanced python development often there is a need to create child processes and redirect these pipes to enable communication with the parent.
Mediating Inter process Communication
The problem with working with sub processes is that -- while they facilitate re-use -- creating multiple interconnections to enable communication adds complexity that quickly becomes hard to manage and difficult to maintain. Multiprocessing introduces process dependencies that often require coordination among participants. For example, consider the implementation of a network protocol which requires a number of inter process interactions to occur in a specific sequence.
Many issues arising in these sorts of scenarios can be prevented and/or addressed through the use of a mediator-type object.
The mediator should encapsulate the control and coordination of interactions among groups of processes vastly simplifying (among other things):
-
The management of shared resources,
-
The tracking and enforcement of sequential dependencies,
-
The proper disposition of allocated system resources over the life-cycle of the application.
The mediator facilitates inter process communication by reducing the overall number of connections among objects. Objects can communicate through the mediator obviating the need to maintain state in a distributed fashion across multiple process instances and define direct communication protocols. This, in turn, enables the development of more atomic functions lowering the potential for unwanted side-effects and eliminating whole classes of issues.
The following instance diagram illustrates the relationships among objects with a sample mediator; ProcessDirector.
Notice how the director object mediates communication between sub processes and client code. Processes communicate between each other and with clients only indirectly via the mediator. The client doesn't need to "know" any of the internal implementation details of the sub processes, nor do the processes need to maintain any state related to each other. This is all the responsibility of the director. With the responsibility of directing the behavior of its aggregates scoped to one specific class (and potentially sub classes) logic can be readily changed and/or replaced by extending or swapping out that singular implementation.
The following class diagram highlights key collaborations and functionality in such a sub-system.
The diagram illustrates the aggregate relationship between the ProcessDirector
class and it's SubProcess
instances. The director class can define logic to properly launch and dispose of child processes executing python scripts and business logic to orchestrate sequential behavior. In addition, it serves as a composite whole maintaining any shared resources (e.g., pipes, queues, streams, etc.) to enable communication.
Implementation
The following code example highlights some of the implementation details.
class ProcessDirector : ''' Responsible for spawning and directing processes to execute python scripts. ''' def __init__ ( self ) : self.processes = [] self.t_queue = queue.Queue() def launch_process( self, python_script, cmd_ln_args ) : launch_sequence = [ sys.executable, python_script ] + cmd_ln_args proc = subprocess.Popen( launch_sequence, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, text=True, bufsize=1 ) self.processes.append( proc ) t = Thread( target=self.read_proc_output, args=[proc.pid] ) t.daemon = True t.start() return proc.pid def terminate_children( self ) : for i in range( len( self.processes ) - 1, -1, -1 ) : target = self.processes[i] target.terminate() target.wait() if target.returncode is not None: self.processes.pop( i ) def read_proc_output( self, pid ) : ''' Read-loop target for read thread. Reads stdout of child process given Arguments: pid: The pid (process id) of the process (should be obtained on launch...) ''' for process in self.processes : if process.pid == pid : proc = process while True: # sleep. otherwise you work too hard and heat up the box... time.sleep( 0.02 ) output = proc.stdout.readline() if not output : continue self.t_queue.put( output ) def send_input(self, input_text): ''' Send input to the process. ''' proc = self.processes[0] proc.stdin.write(input_text + '\n') proc.stdin.flush() def process_q ( self ) : output = "" while not self.t_queue.empty () : output += self.t_queue.get() return output
Analysis
The key points for the present analysis revolve around the creation and disposition of sub processes and disposition of resources. Notice the ProcessDirector ...
-
Defines logic for launching and maintaining a list of sub processes given; (a) a script name, and (b) command-line arguments (which it marshals to create a 'launch sequence'),
-
Provides a method to properly dispose of the sub processes it creates -- releasing any system resources and insuring that no orphan child processes should be left behind, and
-
Implements business logic aimed at orchestrating communication between sub processes and exposing resultant information to client objects (for example GUI components).
Those details are the main focus of the present post. I'll provide more details around inter process communication using threads in the context of GUI development in subsequent additions to this arc.
Discussion
In this post I've explored the application of a mediator pattern to facilitate inter process communication in python. The approach involves the use of an object responsible for tracking process creation, disposition, and state over the life-cycle of an application.
This approach centralizes the control of sub processes and facilitates the management of shared resources. It stands in contrast to federated approaches requiring dependency injection. A full-blown comparison between the two approaches is out-of-scope for this post. But suffice it to say that based on my experience, systems that rely too heavily on dependency injection are very difficult to iteratively and incrementally develop and maintain.
The benefits the centralized approach include:
-
simplification of synchronization logic,
-
facilitation of the enforcement of sequential dependencies, and
-
ease of maintenance and code re-use.
The use-cases for the pattern are ubiquitous and include any application requiring task parallelization such as real-time data processing with end-user interaction, simulations, and any application requiring concurrent I/O operations. Similar patterns are used in python's multiprocessing module with its manager objects. Working through this "homegrown" example helps better understand the need for and application of these patterns.
Summary
This is the first part of a multi-post arc exploring multiprocess communication in python. In this post the focus was on the creation and management of sub-processes and communication between them using python's subprocess module. In a subsequent post I'll dig a bit deeper into GUI application development using sub processes and threads.