Linux进程控制

Published on 2016 - 07 - 19

Listing Processes

From the command line, the ps command is the oldest and most common command for listing processes currently running on your system. The top command provides a more screen-oriented approach to listing processes and can also be used to change the status of processes.

Listing processes with ps

The most common utility for checking running processes is the ps command. Use it to see which programs are running, the resources they are using, and who is running them. The following is an example of the ps command:

$ ps u
USER   PID %CPU %MEM  VSZ    RSS   TTY    STAT  START  TIME  COMMAND
jake   2147 0.0  0.7 1836   1020   tty1   S+    14:50  0:00  -bash
jake   2310 0.0  0.7 2592    912   tty1   R+    18:22  0:00  ps u

The USER column shows the name of the user who started the process. Each process is represented by a unique ID number referred to as a process ID (PID). You can use the PID if you ever need to kill a runaway process or send another kind of signal to a process. The %CPU and %MEM columns show the percentages of the processor and random access memory, respectively, that the process is consuming.

VSZ (virtual set size) shows the size of the image process (in kilobytes), and RSS (resident set size) shows the size of the program in memory. The VSZ and RSS sizes may be different because VSZ is the amount of memory allocated for the process, whereas RSS is the amount that is actually being used. RSS memory represents physical memory that cannot be swapped.

START shows the time the process began running, and TIME shows the cumulative system time used. (Many commands consume very little CPU time, as reflected by 0:00 for processes that haven't even used a whole second of CPU time.)

Many processes running on a computer are not associated with a terminal.

The ps command can be customized to display selected columns of information and to sort information by one of those columns. Using the -o option, you can use keywords to indicate the columns you want to list with ps. For example, the next example lists every running process (-e) and then follows the -o option with every column of information I want to display, including:

The process ID (pid), username (user), user ID (uid), group name (group), group ID (gid), virtual memory allocated (vsz), resident memory used (rss), and the full command line that was run (comm). By default, output is sorted by process ID number.

$ ps -eo pid,user,uid,group,gid,vsz,rss,comm | less
  PID USER     GROUP      GID    VSZ   RSS COMMAND
    1 root     root         0  19324  1320 init
    2 root     root         0      0     0 kthreadd

If you want to sort by a specific column, you can use the sort= option. For example, to see which processes are using the most memory, I sort by the rss field. That sorts from lowest memory use to highest. Because I want to see the highest ones first, I put a hyphen in front of that option to sort (sort=-rss).

$ ps -eo pid,user,group,gid,vsz,rss,comm --sort=-rss | less
  PID USER     GROUP      GID     VSZ    RSS COMMAND
12005 cnegus   cnegus   13597 1271008 522192 firefox
 5412 cnegus   cnegus   13597  949584 157268 thunderbird-bin
25870 cnegus   cnegus   13597 1332492 112952 swriter.bin

Listing and changing processes with top

The top command provides a screen-oriented means of displaying processes running on your system. With top, the default is to display processes based on how much CPU time they are currently consuming. However, you can sort by other columns as well. After you identify a misbehaving process, you can also use top to kill (completely end) or renice (reprioritize) that process.

If you want to be able to kill or renice processes, you need to run top as the root user. If you just want to display processes, and possibly kill or change your own processes, you can do that as a regular user. Figure shows an example of the top window:

The following list includes actions you can do with top to display information in different ways and modify running processes:

  • Press h to see help options, and then press any key to return to the top display.
  • Press M to sort by memory usage instead of CPU, and then press P to return to sorting by CPU.
  • Press the number 1 to toggle showing CPU usage of all your CPUs, if you have more than one CPU on your system.
  • Press R to reverse sort your output.
  • Press u and enter a username to display processes only for a particular user.

A common practice is to use top to find processes that are consuming too much memory or processing power and then act on those processes in some way. A process consuming too much CPU can be reniced to give it less priority to the processors. A process consuming too much memory can be killed. With top running, here's how to renice or kill a process:

  • Renicing a process: Note the process ID of the process you want to renice and press r. When the PID to renice: message appears, type the process ID of the process you want to renice. When prompted to Renice PID to value: type in a number from –19 to 20.
  • Killing a process: Note the process ID of the process you want to kill and press k. Type 15 to terminate cleanly or 9 to just kill the process outright.

Managing Background and Foreground Processes

Starting background processes

If you have programs that you want to run while you continue to work in the shell, you can place the programs in the background. To place a program in the background at the time you run the program, type an ampersand (&) at the end of the command line, like this:

$ find /usr > /tmp/allusrfiles &
[3] 15971

The ampersand (&) runs that command line in the background. Notice that the job number, [3], and process ID number, 15971, are displayed when the command is launched. To check which commands you have running in the background, use the jobs command, as follows:

$ jobs
[1]  Stopped (tty output)  vi /tmp/myfile
[2]  Running        find /usr -print > /tmp/allusrfiles &
[3]  Running        nroff -man /usr/man2/* >/tmp/man2 &
[4]- Running        nroff -man /usr/man3/* >/tmp/man3 &
[5]+ Stopped        nroff -man /usr/man4/* >/tmp/man4

The jobs command uses a few different command line parameters, as shown in Table .

Parameter Description
-l List the PID of the process along with the job number.
-n List only jobs that have changed their status since the last notification from the shell.
-p List only the PIDs of the jobs.
-r List only the running jobs.
-s List only stopped jobs.

The first job shows a text-editing command (vi) that I placed in the background and stopped by pressing Ctrl+Z while I was editing. Job 2 shows the find command I just ran. Jobs 3 and 4 show nroff commands currently running in the background. Job 5 had been running in the shell (foreground) until I decided too many processes were running and pressed Ctrl+Z to stop job 5 until a few processes had completed.

The plus sign (+) next to number 5 shows that it was most recently placed in the background. The minus sign (-) next to number 4 shows that it was placed in the background just before the most recent background job. Because job 1 requires terminal input, it cannot run in the background. As a result, it is Stopped until it is brought to the foreground again.

Using foreground and background commands

Continuing with the example, you can bring any of the commands on the jobs list to the foreground. For example, to edit myfile again, type:

$ fg %1

As a result, the vi command opens again. All text is as it was when you stopped the vi job.

To refer to a background job (to cancel or bring it to the foreground), use a percent sign (%) followed by the job number. You can also use the following to refer to a background job:

  • % — Refers to the most recent command put into the background (indicated by the plus sign when you type the jobs command). This action brings the command to the foreground.
  • %string — Refers to a job where the command begins with a particular string of characters. The string must be unambiguous. (In other words, typing %vi when there are two vi commands in the background results in an error message.)
  • %?string — Refers to a job where the command line contains a string at any point. The string must be unambiguous or the match fails.
  • %-- — Refers to the previous job stopped before the one most recently stopped. If a command is stopped, you can start it running again in the background using the bg command. For example, take job 5 from the jobs list in the previous example:
[5]+ Stopped nroff -man man4/* >/tmp/man4

Type the following:

$ bg %5

After that, the job runs in the background. Its jobs entry appears as follows:

[5] Running nroff -man man4/* >/tmp/man4 &

Running Scripts Without a Console

There will be times when you want to start a shell script from a terminal session and then let the script run in background mode until it finishes, even if you exit the terminal session. You can do this by using the nohup command.

The nohup command runs another command blocking any SIGHUP signals that are sent to the process. This prevents the process from exiting when you exit your terminal session.
The format used for the nohup command is as follows:

$ nohup ./test1 &
[1] 19831
$ nohup: ignoring input and appending output to  ‘nohup.out’
$

As with a normal background process, the shell assigns the command a job number, and the Linux system assigns a PID number. The difference is that when you use the nohup command, the script ignores any SIGHUP signals sent by the terminal session if you close the session.

Because the nohup command disassociates the process from the terminal, the process loses the STDOUT and STDERR output links. To accommodate any output generated by the command, the nohup command automatically redirects STDOUT and STDERR messages to a file, called nohup.out.

The nohup.out file contains all of the output that would normally be sent to the terminal monitor. After the process finishes running, you can view the nohup.out file for the output results:

$ cat nohup.out
This is a test program
Loop #1
Loop #2
Loop #3
Loop #4
Loop #5
Loop #6
Loop #7
Loop #8
Loop #9
Loop #10
This is the end of the test program
$

The output appears in the nohup.out file just as if the process ran on the command line!

Killing and Renicing Processes

Killing processes with kill and killall

Although usually used for ending a running process, the kill and killall commands can actually be used to send any valid signal to a running process. Besides telling a process to end, a signal might tell a process to reread configuration files, pause (stop), or continue after being paused, to name a few possibilities.

Signals are represented by both numbers and names. Signals that you might send most commonly from a command include SIGKILL (9), SIGTERM (15), and SIGHUP (1). The default signal is SIGTERM, which tries to terminate a process cleanly. To kill a process immediately, you can use SIGKILL. The SIGHUP signal tells a process to reread its configuration files. SIGSTOP pauses a process, while SIGCONT continues a stopped process.

Different processes respond to different signals. Processes cannot block SIGKILL and SIGSTOP signals, however. Table shows examples of some signals:

Signal Number Description
SIGHUP 1 Hang-up detected on controlling terminal or death of controlling process.
SIGINT 2 Interrupt from keyboard.
SIGQUIT 3 Quit from keyboard.
SIGABRT 6 Abort signal from abort(3).
SIGKILL 9 Kill signal.
SIGTERM 15 Termination signal.
SIGCONT 19,18,25 Continue if stopped.
SIGSTOP 17,19,23 Stop process.

Notice that there are multiple possible signal numbers for SIGCONT and SIGSTOP because different numbers are used in different computer architectures. For most x86 and power PC architectures, use the middle value. The first value usually works for Alpha and Sparc, while the last one is for MIPS architecture.

Using kill to signal processes by PID

Using commands such as ps and top, you can find processes you want to send a signal to. Then you can use the process ID of that process as an option to the kill command, along with the signal you want to send.

For example, you run the top command and see that the bigcommand process is consuming most of your processing power:

  PID USER   PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
10432 chris  20   0  471m 121m  18m S 99.9  3.2  77:01.76 bigcommand

Here, the bigcommand process is consuming 99.9 percent of the CPU. You decide you want to kill it so that other processes have a shot at the CPU. If you use the process ID of the running bigcommand process, here are some examples of the kill command you can use to kill that process:

$ kill 10432
$ kill -15 10432
$ kill -SIGKILL 10432

The default signal sent by kill is 15 (SIGTERM), so the first two examples have exactly the same results. On occasion, a SIGTERM doesn't kill a process, so you may need a SIGKILL to kill it. Instead of SIGKILL, you can use –9.

Another useful signal is SIGHUP. Some server processes, such as the httpd process, which provides web services, respond to a SIGHUP (1) signal by rereading its configuration files. In fact, the command service httpd reload (in RHEL 6) or systemctl reload httpd (RHEL 7) actually sends SIGHUP to httpd processes running on your system to tell them that configuration files need to be read again. If the httpd process had a PID of 1833, you could use either of these commands to have it read configuration files again:

# kill -1 1833
# systemctl reload httpd

Generating Signals

The bash shell allows you to generate two basic Linux signals using key combinations on the keyboard. This feature comes in handy if you need to stop or pause a runaway program.

Interrupting a Process

The Ctrl+C key combination generates a SIGINT signal, and sends it to any processes currently running in the shell. You can test this by running a command that normally takes a long time to finish, and pressing the Ctrl+C key combination:

$ sleep 100
ˆC
$

The Ctrl+C key combination simply just stops the current process running in the shell. The sleep command pauses the operation for the specified number of seconds. Normally, the command prompt wouldn't return until the timer has expired. By pressing the Ctrl+C key combination before the timer expires, you can cause the sleep command to terminate prematurely.

Pausing a Process

Instead of terminating a process, you can pause it in the middle of whatever it's doing. Sometimes this can be a dangerous thing (for example, if a script has a file lock open on a crucial system file), but often it allows you to peek inside what a script is doing without actually terminating the process.

The Ctrl+Z key combination generates a SIGTSTP signal, stopping any processes running in the shell. Stopping a process is different than terminating the process, as stopping the process leaves the program still in memory and able to continue running from where it left off.

When you use the Ctrl+Z key combination, the shell informs you that the process has been stopped:

$ sleep 100
ˆZ
[1]+  Stopped                 sleep 100
$

The number in the square brackets is the job number assigned by the shell. The shell refers to each process running in the shell as a job and assigns each job a unique job number. It assigns the first process started job number 1, the second job number 2, and so on.

If you have a stopped job assigned to your shell session, bash will warn you if you try to exit the shell:

$ exit
logout
There are stopped jobs.
$

You can view the stopped job by using the ps command:

$ ps au
USER PID   %CPU %MEM   VSZ  RSS TTY   STAT  START   TIME COMMAND
rich 20560  0.0  1.2  2688 1624 pts/0  S    05:15   0:00 -bash
rich 20605  0.2  0.4  1564  552 pts/0  T    05:22   0:00 sleep 100
rich 20606  0.0  0.5  2584  740 pts/0  R    05:22   0:00 ps au
$

In the STAT column, the ps command shows the status of the stopped job as T. This indicates the command is either being traced or is stopped.

If you really want to exit the shell with the stopped job still active, just type the exit command again. The shell will exit, terminating the stopped job. Alternately, now that you know the PID of the stopped job, you can use the kill command to send a SIGKILL signal to terminate it:

$ kill -9 20605
$
[1]+  Killed                  sleep 100
$

When you kill the job, initially you won't get any response. However, the next time you do something that produces a shell prompt, you'll see a message indicating that the job was killed. Each time the shell produces a prompt, it also displays the status of any jobs that have changed states in the shell. After you kill a job, the next time you force the shell to produce a prompt it will display a message showing that the job was killed while running.

Trapping Signals

Instead of allowing your script to ignore signals, you can trap them when they appear and perform other commands. The trap command allows you to specify which Linux signals your shell script can watch for and intercept from the shell. If the script receives a signal listed in the trap command, it prevents it from being processed by the shell, and instead handles it locally.

The format of the trap command is:

trap commands signals

That's simple enough. On the trap command line, you just list the commands you want the shell to execute, along with a space-separated list of signals you want to trap. You can specify the signals either by their numeric value or by their Linux signal name.

Here's a simple example of using the trap command to ignore SIGINT and SIGTERM signals:

$
$ cat test1
#!/bin/bash
# testing signal trapping
#
trap “echo ‘ Sorry! I have trapped Ctrl-C’” SIGINT SIGTERM
echo This is a test program
count=1
while [ $count -le 10 ]
do
  echo “Loop #$count”
  sleep 5
  count=$[ $count + 1 ]
done
echo This is the end of the test program
$

The trap command used in this example displays a simple text message each time it detects either the SIGINT or SIGTERM signal. Trapping these signals makes this script impervious to the user attempting to stop the program by using the bash shell keyboard Ctrl+C command:

$
$ ./test1
This is a test program
Loop #1
Loop #2
Loop #3
ˆC Sorry! I have trapped Ctrl-C
Loop #4
Loop #5
Loop #6
Loop #7
ˆC Sorry! I have trapped Ctrl-C
Loop #8
Loop #9
Loop #10
This is the end of the test program
$

Each time the Ctrl+C key combination was used, the script executed the echo statement specified in the trap command instead of ignoring the signal and allowing the shell to stop the script.

Trapping a Script Exit

Besides trapping signals in your shell script, you can trap them when the shell script exits. This is a convenient way to perform commands just as the shell finishes its job.

To trap the shell script exiting, just add the EXIT signal to the trap command:

$ cat test2
#!/bin/bash
# trapping the script exit

trap “echo byebye” EXIT

count=1
while [ $count -le 5 ]
do
   echo “Loop #$count”
   sleep 3
   count=$[ $count + 1 ]
done
$ 
$ ./test2
Loop #1
Loop #2
Loop #3
Loop #4
Loop #5
byebye
$

When the script gets to the normal exit point, the trap is triggered, and the shell executes the command you specify on the trap command line. The EXIT trap also works if you prematurely exit the script:

$ ./test2
Loop #1
Loop #2
ˆCbyebye

$

When the Ctrl+C key combination is used to send a SIGINT signal, the script exits (because that signal isn't listed in the trap list), but before the script exits, the shell executes the trap command.

Removing a Trap

You can remove a set trap by using a dash as the command and a list of the signals you want to return to normal behavior:

$ cat test3
#!/bin/bash
# removing a set trap

trap “echo byebye” EXIT

count=1
while [ $count -le 5 ]
do
   echo “Loop #$count”
   sleep 3
   count=$[ $count + 1 ]
done
trap - EXIT
echo “I just removed the trap”
$ 
$ ./test3
Loop #1
Loop #2
Loop #3
Loop #4
Loop #5
I just removed the trap
$

Once the signal trap is removed, the script ignores the signals. However, if a signal is received before the trap is removed, the script processes it per the trap command:

$ ./test3
Loop #1
Loop #2
ˆCbyebye

$

In this example, a Ctrl+C key combination was used to terminate the script prematurely. Because the script was terminated before the trap was removed, the script executed the command specified in the trap.

Using killall to signal processes by name

With the killall command, you can signal processes by name instead of by process ID. The advantage is that you don't have to look up the process ID of the process you want to kill. The potential downside is that you can kill more processes than you mean to if you are not careful. (For example, typing killall bash may kill a bunch of shells that you don't mean to kill.)

Like the kill command, killall uses SIGTERM (signal 15) if you don't explicitly enter a signal number. Also as with kill, you can send any signal you like to the process you name with killall. For example, if you see a process called testme running on your system and you want to kill it, you can simply type the following:

$ killall -9 testme

The killall command can be particularly useful if you want to kill a bunch of commands of the same name.

Setting processor priority with nice and renice

When the Linux kernel tries to decide which running processes get access to the CPUs on your system, one of the things it takes into account is the nice value set on the process. Every process running on your system has a nice value between –20 and 19. By default, the nice value is set to 0. Here are a few facts about nice values:

  • The lower the nice value, the more access to the CPUs the process has. In other words, the nicer a process is, the less CPU attention it gets. So, a –20 nice value gets more attention than a process with a 19 nice value.
  • A regular user can set nice values only from 0 to 19. No negative values are allowed. So a regular user can't ask for a value that gives a process more attention than most processes get by default.
  • A regular user can set the nice value higher, not lower. So, for example, if a user sets the nice value on a process to 10, and then later wants to set it back to 5, that action will fail. Likewise, any attempt to set a negative value will fail.
  • A regular user can set the nice value only on the user's own processes.
  • The root user can set the nice value on any process to any valid value, up or down.

You can use the nice command to run a command with a particular nice value. When a process is running, you can change the nice value using the renice command, along with the process ID of the process, as in the example that follows:

# nice +5 updatedb &

The updatedb command is used to generate the locate database manually by gathering names of files throughout the file system. In this case, I just wanted updatedb to run in the background (&) and not interrupt work being done by other processes on the system. I ran the top command to make sure that the nice value was set properly:

PID USER        PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
20284 root      25   5 98.7m  932  644 D  2.7  0.0   0:00.96 updatedb

Notice that under the NI column, the nice value is set to 5. Because the command was run as the root user, the root user can lower the nice value later by using the renice command. (Remember that a regular user can't reduce the nice value or ever set it to a negative number.) Here's how you would change the nice value for the updatedb command just run to –5:

# renice -n -5 20284

If you ran the top command again, you might notice that the updatedb command is now at or near the top of the list of processes consuming CPU time because you gave it priority to get more CPU attention.

Limiting Processes with cgroups

You can use a feature like “nice” to give a single process more or less access to CPU time. Setting the nice value for one process, however, doesn't apply to child processes that a process might start up or any other related processes that are part of a larger service. In other words, “nice” doesn't limit the total amount of resources a particular user or application can consume from a Linux system.

As cloud computing takes hold, many Linux systems will be used more as hypervisors than as general-purpose computers. Their memory, processing power, and access to storage will become commodities to be shared by many users. In that model, more needs to be done to control the amount of system resources to which a particular user, application, container, or virtual machine running on a Linux system has access.

That's where cgroups come in.

Cgroups can be used to identify a process as a task, belonging to a particular control group. Tasks can be set up in a hierarchy where, for example, there may be a task called daemons that sets default limitations for all daemon server processes, then subtasks that may set specific limits on a web server daemon (httpd) or FTP service daemon (vsftpd).

As a task launches a process, other processes the initial process launches (called child processes) inherit the limitations set for the parent process. Those limitations might say that all the processes in a control group have access only to particular processors and certain sets of RAM. Or they may allow access only to up to 30 percent of the total processing power of a machine.

The types of resources that can be limited by cgroups include the following:

  • Storage (blkio)—Limits total input and output access to storage devices (such as hard disks, USB drives, and so on).
  • Processor scheduling (cpu)—Assigns the amount of access a cgroup has to be scheduled for processing power.
  • Process accounting (cpuacct)—Reports on CPU usage. This information can be leveraged to charge clients for the amount of processing power they use.
  • CPU assignment (cpuset)—On systems with multiple CPU cores, assigns a task to a particular set of processors and associated memory.
  • Device access (devices)—Allows tasks in a cgroup to open or create (mknod) selected device types.
  • Suspend/resume (freezer)—Suspends and resumes cgroup tasks.
  • Memory usage (memory)—Limits memory usage by task. It also creates reports on memory resources used.
  • Network bandwidth (net_cls)—Limits network access to selected cgroup tasks. This is done by tagging network packets to identify the cgroup task that originated the packet and having the Linux traffic controller monitor and restrict packets coming from each cgroup.
  • Network traffic (net_prio)—Sets priorities of network traffic coming from selected cgroups and lets administrators change these priorities on the fly.
  • Name spaces (ns)—Separates cgroups into namespaces, so processes in one cgroup can only see the namespaces associated with the cgroup. Namespaces can include separate process tables, mount tables, and network interfaces.

Creating and managing cgroups, at its most basic level, is generally not a job for new Linux system administrators. It can involve editing configuration files to create your own cgroups (/etc/cgconfig.conf) or limit particular users or groups (/etc/cgrules.conf). Or you can use the cgreate command to create cgroups, which results in those groups being added to the /sys/fs/cgroup hierarchy. Setting up cgroups can be tricky and, if done improperly, can make your system unbootable.

The reason I introduce cgroups here is to help you understand some of the underlying features in Linux that you can use to limit and monitor resource usage. In the future, you will probably run into these features from controllers that manage your cloud infrastructure. You will be able to set rules like: “Allow the marketing department's virtual machines to consume up to 40 percent of the available memory” or “Pin the database application to a particular CPU and memory set.”

Reference