Tracking down the ^Z problem in vim
Coders who use vim a lot may be familiar with an irritating race condition that appears on some systems: You save the source code, suspend vim using ^Z, and then immediately start typing a command (such as "make"), and sometimes the first few characters of the command are lost. The obvious workaround is to always wait for the shell prompt after typing ^Z, but this gets annoying after a while. I decided to dig deeper and see if I could find a quick fix.
Origin of the problem
My first thought was that this problem was due to asynchronous signal handling. The tty layer generates a SIGTSTP when ^Z is pressed, and if this signal is blocked during a save operation, further key presses will be queued up inside the tty. Then, after the job suspends, the shell might restore terminal modes using a call to tcsetattr with TCSAFLUSH, causing the queued data to be discarded. This would have called for a fix at the shell level. However, this turned out to be far from the truth.
The fact is that vim doesn't use the SIGTSTP mechanism at all, choosing instead to interpret ^Z as regular input. You can observe this by dumping the tty settings for a terminal running vim; you will find that isig is disabled.
Depending on the underlying filesystem and device drivers, saving a file to disk can easily take half a second. As strace will tell you, vim updates both a swapfile and the actual file, and makes sure to fsync both of them. Meanwhile, characters are received by the tty. On several occasions during the save operation, as well as after it has completed, vim performs a select followed by a read into a large (4 KB) input buffer. This will unfortunately grab not only the ^Z character, but also any subsequent characters typed during the save operation; characters that were intended for the shell.
On a sidenote, one of the fsync operations is ineffective, at least in vim 7.3. When vim needs to overwrite an existing file, it first renames the old file, then writes to a fresh inode, syncs it, and finally deletes the old file. Curiously, it does not call fsync on the directory that contains the saved file. Thus, if there is a power loss and a journaling file system was used, the file might end up containing the old data after all. However, this is not a problem for the swapfile.
Some complications
Back to the problem at hand. Note that, since select and read are called multiple times, it won't be enough to patch the tty layer to cause read to return immediately when a ^Z has been typed. Furthermore, it might not scale well to have vim read one character at a time from the terminal, especially for large paste operations from X or screen. And we can't patch vim to stuff characters back into the tty after they have been read. While there is an ioctl for injecting characters into the input buffer, they would get pushed in from the wrong end.
Another idea would be for the tty to hold back anything typed after a ^Z, until we can somehow detect from the tty side that the ^Z has been carried out, e.g. by waiting for the foreground process group to change. But this is problematic when the application does not interpret the ^Z as a suspend character, such as when vim is in insert mode.
Getting closer
There is a much more crude approach, however: Simply remove the calls to fsync. This way, the save operation will complete quickly, and the race condition becomes unwinnable even for a really fast typist. Vim actually provides options for this, so there's no need to patch it. All we have to do is ":se nofsync" and ":se swapsync=".
But what about protection from data loss? Most systems will sync automatically every 30 seconds or so, and one might argue that this is good enough. But the fact of the matter is that our little workaround has had a negative effect on the overall robustness of the system. Can we do better?
Many shells offer a way of computing a dynamic prompt. We observe that whenever a job is suspended, we get a new prompt. Hence, if we instruct the shell to sync the filesystem as part of computing the new prompt, we get the robustness back. This could be as simple as:
In .bash_profile:
PROMPT_COMMAND=sync
In .zprofile:
precmd(){sync}
Unfortunately, this will slow down the system. We will synchronise too much, too often. We may invoke sync as a disowned background job using "(sync&)", in order to get to the prompt more quickly, but we are still synchronising the entire filesystem, not just the recently saved data.
The fix
A better solution would be to make vim call fsync asynchronously, e.g. from a child process. The following short C program is a wrapper library (for use with LD_PRELOAD) that does precisely that, and works really well in practice.
You will have to decide for yourself whether you prefer to use the wrapper, to patch vim to do something similar, or to simply turn off the sync options as outlined above.
lazysync.c:
#include <unistd.h> #include <sys/syscall.h> int fsync(int fd) { pid_t pid; pid = fork(); if(pid == 0) { if(fork() == 0) { syscall(__NR_fsync, fd); } _exit(0); } else if(pid > 0) { setpgid(pid, pid); waitpid(pid, 0, 0); } return 0; }
To build:
gcc -o liblazysync.so -fPIC -shared lazysync.c -lc
Wrapper in ~/bin/vim:
#!/bin/sh LD_PRELOAD=path/to/liblazysync.so exec /usr/bin/vim "$@"
The replacement fsync forks, and the child is placed in a new process group so it won't get suspended along with vim when the save completes. The child spawns a grandchild to make the actual call to fsync. This way, we can waitpid on the child before returning to the application, to avoid leaving lots of zombie processes around. Note that vim does not install a SIGCHLD handler.
In order to call fsync without just recursing back to the wrapper itself, the syscall interface is used. To my knowledge, this is Linux specific, but other operating systems may offer similar functionality.
Final words
The fix presented here doesn't actually remove the race condition, it just causes the save operation to complete faster than you can type. Occasionally, and in my experience infrequently, the save operation is delayed for other reasons. Then the problem remains.
For a proper solution, one would have to redesign the low-level input handling code in vim, possibly around SIGTSTP.
Posted Wednesday 18-Dec-2013 15:29
Discuss this page
Disclaimer: I am not responsible for what people (other than myself) write in the forums. Please report any abuse, such as insults, slander, spam and illegal material, and I will take appropriate actions. Don't feed the trolls.
Jag tar inget ansvar för det som skrivs i forumet, förutom mina egna inlägg. Vänligen rapportera alla inlägg som bryter mot reglerna, så ska jag se vad jag kan göra. Som regelbrott räknas till exempel förolämpningar, förtal, spam och olagligt material. Mata inte trålarna.
Sun 22-Dec-2013 07:59
anyway, thanks for the tip as it happens so often here too ;-)
Sun 6-Apr-2014 19:43
Sun 13-Apr-2014 16:13
^Z will suspend the process and allow it to be resumed later with "fg" (in Bash and similar shells).
This can matter if it takes a long time to start up Vim again (slow system, slow disk), or if you have a lot of windows or tabs you don't want to have to reconfigure.
Mon 19-May-2014 11:29
Tue 29-Jul-2014 08:09
Mon 4-Aug-2014 05:36
Linus Åkesson
Sun 17-Aug-2014 21:33
I agree that it's probably not necessary to suspend vim just to run make. A better example would be running the actual program that's being developed. In my experience, sometimes you want to suspend the program, then browse the source code, and then resume the program again; sometimes you want to run the program with debug output piped to less, and then switch back and forth between the less process and the editor; and so on.
Thu 23-Dec-2021 22:03
TMux "fixed" quite some problems in my daily workflow to the good.
P.S. You have a talent to explain difficult thing easy! THANX for explaining your fixes this way.
vscd[N]