Show long running process output using servlets

Table of Contents

Show long running process output using servlets #

Let’s say that a process is launched using a servlet and its output must be displayed, what means sending it back to the browser. Using plain Java without going into details the code to do it is something like :

protected void doGet(HttpServletRequest req, HttpServletResponse resp)  
    throws ServletException, IOException {  
        
    resp.setContentType(“text/plain;charset=UTF-8”);  
    PrintWriter out = resp.getWriter();  

    final ProcessBuilder process = new ProcessBuilder(new String[] {“/bin/ping”, “-c”, “20”, “localhost”});  
    try  {  
        final Process p = process.start();  
        int c;  
        while( (c = p.getInputStream().read() ) != -1 ) {  
            out.print((char) c);  
        }  
    } catch (IOException e) {  
        e.printStackTrace();  
    } finally  {  
        out.close();  
    }  
}

There is more than one problem with this code :

the output is shown only at process completion if the stop button is pressed at the browser the process will continue executing and no new output will be shown, worsened if the process will run for hours time printing the stack trace when an exception is thrown is not very useful

Flushing the output after each print statement solves the first issue. The last one depends on the command and else then will not be covered in this post.

Long running process issue #

To begin with the process that launched the servlet thread and the process running the launched command are not the same, but because the servlet thread contains the process reference (Process p) when the stop button is pressed at the browser the servlet thread ends and the reference is lost while the process execution continues. There’s no way to read the process output (actually it can be recovered at the OS level, but this is not a topic for this post). A simple solution is to save the process reference in a field for later use in case the current thread is stopped and a new one wants to show the process output. This solution will stop fun at all, then let’s look at problems/enhancements this solution poses and go beyond this simple solution ! Using one field to save the current process allows only one process output per JVM to be accessed (the last one saved). Each client identity can be obtained from the session id (just a sample taken for this post) then let’s use a map and save the process reference in it. The proposal is to use the http session information as a key because if the consumer is closed (because of TCP/IP time-out, browser stop button pressed or else) the client browser can connect again which uses the same id than before, and detecting this allows running a new command or continue showing the previously interrupted output.

private Map cmdStore = new HashMap();

protected void doGet(HttpServletRequest req, HttpServletResponse resp)  
 throws ServletException, IOException  {  

    resp.setContentType(“text/plain;charset=UTF-8”);  
    PrintWriter out = resp.getWriter();  =
    final HttpSession session = req.getSession();

    Process p;  
    synchronized(cmdStore) {  
        if ( ! cmdStore.containsKey(session.getId()) )  {
            p = new ProcessBuilder(new String[] {  
                “/bin/ping”, “-c”, “20”, “localhost”
                }).start();  
            cmdStore.put(session.getId(), p);  
        } else {  
            p = cmdStore.get(session.getId());  
        }
    }  

    try {  
        int c;  
        while( (c = p.getInputStream().read() ) != -1 ) {  
            out.print((char) c);  
        }  
    } catch (IOException e) {  
        e.printStackTrace();  
    } finally {  
        out.close();
        synchronized (cmdStore) {  
            cmdStore.remove(session.getId());  
        }
    }  
}

When the process is running, the servlet thread is suddenly stopped, and a new request arrives the not consumed output will be shown only (consumed one has gone already). The implemented solution is to decouple the process output consumption from the servlet thread (introduced in the while loop) thus creating a new thread that consumes the output, and stores it for later consumption, will do the trick. The previous solution introduces a new problem : communication. How to communicate the produced output to the servlet thread that consumes it ? The basic idea is to let the consumers subscribe to the generated output and let’s the producer notify each consumer when a new output is generated.

Implementation #

Showing here all the code is a painless and real estate consuming task then I’ll explain the relevant classes and code snippets only.

Class CmdExecutor implements the servlet that shows the output to the user and contains stuff to save the process reference for later use. To be honest the reference saved belongs to class Cmd which starts the process and also a new thread (see class Cmd.OutputExtractor) whose main task is to consume the process output, save it and notify the subscribers. Finally the class CmdOutputDisplay is used by CmdExecutor to display the process output.

Further research #

The design involves one thread for each servlet, one for each output (CmdOutputDisplay) and one for each process output consumer (Cmd.OutputExtractor). In a high traffic site this can be a lot of resources and some extra syncing then will be nice to look for a solution that depends less on threads and syncing. Other topic that caught my attention was using some kind of blocking list to store the process output. The topic here is to use a data structure designed to work under heavy concurrent load. Will be fun to create a new one (stay tuned!) Consumer notification on a line by line basis (instead of character based) will also help if the process output is quite verbose.

Final thoughts #

The cmdStore field access is synchronized and is a common practice to avoid sync by using thread local variables (variables local to each thread). This solution cannot be applied here because is not guaranteed that the same thread is used for subsequent calls from the same browser. In the same line of thought if we get stuck with synchronized blocks the first think that comes to my mind if this is really a problem. I mean in a high traffic site the contention in each of this blocks might be huge but I’m not sure how bad it is. Would be nice to do some performance testing and some profiling but I guess that the limit will be imposed by the number of processes running or memory consumed or else before this contention is a problem (just a guess, and also depends on the process characteristics such as run time, memory consumption, etc) Hope you enjoyed this post!

EDIT