GroupBlog

为什么实现一个自己 shell?

你是一个 Window/Android 用户，你可以直接用图形化桌面直接双击跑一个程序，没压力吧，轻松吧。图形化工具为我们屏蔽了底层进程调用的细节。或者你是一个 Linux 用户，你喜欢用 bash 执行 grep 命令来查找文件中的内容，喜欢用 git 来提交代码，喜欢用 ls 来查看目录内容，有时你还会去使用将多个命令一起使用：cat cat.txt | grep “smelly cat” | wc -c。作为一个 Linux Hacker，你理应当了解这些图形化工具背后的原理，利用它实现一些更好玩的事情。

Part 1

假设有以下文件 1.txt 记录着学长们悲惨的成绩：

zzw   环境编程    33
rzj   环境编程    55
lsh   网络编程    33
hzn   网络编程    55
zzy   数据结构    33
zt    计算机组成原理  55
lsh   计算机组成原理  55
zzy   计算机组成原理  55
xjj   数据结构    33

TASK

执行下面这行的命令的结果是什么？你能解释下原因么？

cat 1.txt | awk '{print $1}' | sort | uniq -c | sort -r -n | head -n 5
grep "rzj" > 2.txt < 1.txt
echo "the answer is 42" > 1.txt

先来学习下基础知识:

IPC 方法

Linux 环境下，进程地址空间相互独立，每个进程各自有不同的用户地址空间。任何一个进程的全局变量在另
一个进程中都看不到，所以进程和进程之间不能相互访问，要交换数据必须通过内核，在内核中开辟一块缓冲区，
进程 1 把数据从用户空间拷到内核缓冲区，进程 2 再从内核缓冲区把数据读走，内核提供的这种机制称为进程间通
信（IPC，InterProcess Communication）。

在进程间完成数据传递需要借助操作系统提供特殊的方法，如：文件、管道、信号、共享内存、消息队列、套
接字、命名管道等。随着计算机的蓬勃发展，一些方法由于自身设计缺陷被淘汰或者弃用。现今常用的进程间通信
方式有：
① 管道 (使用最简单)
② 信号 (开销最小)
③ 共享映射区 (无血缘关系)
④ 本地套接字 (最稳定)

管道

管道的概念：

管道是一种最基本的 IPC 机制，作用于有血缘关系的进程之间，完成数据传递。调用 pipe 系统函数即可创建一
个管道。有如下特质：

其本质是一个伪文件(实为内核缓冲区)
由两个文件描述符引用，一个表示读端，一个表示写端。
规定数据从管道的写端流入管道，从读端流出。
管道的原理: 管道实为内核使用环形队列机制，借助内核缓冲区(4k)实现。
管道的局限性：
① 数据不能进程自己写，自己读。
② 管道中数据不可反复读取。一旦读走，管道中不再存在。
③ 采用半双工通信方式，数据只能在单方向上流动。
常见的通信方式有，单工通信、半双工通信、全双工通信。

多管道通信

多管道通信
第一条

cat 1.txt | awk '{print $1}' | sort | uniq -c | sort -r -n | head -n 5

cat在man手册中的定义

concatenate files and print on the standard output

cat效果

awk在man手册中的定义

pattern scanning and processing language

加入awk后效果

sort在man手册中的定义

sort lines of text files

加入sort后效果

uniq在man手册中的定义

report or omit repeated lines

-c参数在man手册中的定义

prefix lines by the number of occurrences

加入uniq-c后效果

sort的-r参数

reverse the result of comparisons

sort的-n参数

compare according to string numerical value

加入sort -r -n后效果

head在man手册中的定义

output the first part of files

-n参数的定义

print the first NUM lines instead of  the  first  10;  
with  the leading '-', 
print all but the last NUM lines of each file

加入head -n 5后的效果

第二条

grep "rzj" > 2.txt < 1.txt

grep在man手册中的定义

print lines that match patterns

2.txt

echo在man手册中的定义

echo "the answer is 42" > 1.txt

1.txt

此处使用>输出重定向将echo的标准输出输入到了1.txt之中，覆写了1.txt的原有内容
(如果使用>>就不会覆写1.txt的内容，而是将echo的标准输出追加到1.txt的末尾)

Part 2

你的终端 bash 或者 zsh 是如何执行这些命令的呢？参考阅读资料调研 fork(), exec(), pipe(), mkfifo() 等进程 API。

TASK

打造一个绝无伦比的 xxx-super-shell (xxx 是你的名字)，它能实现下面这些功能：

实现管道 (也就是 |)
实现输入输出重定向(也就是 < > >>)
实现后台运行（也就是 & ）
实现 cd，要求支持能切换到绝对路径，相对路径和支持 cd -
屏蔽一些信号（如 ctrl + c 不能终止）
界面美观
开发过程记录、总结、发布在个人博客中
要求：

不得出现内存泄漏，内存越界等错误
学会如何使用 gdb 进行调试，使用 valgrind 等工具进行检测

Example

xxx@xxx ~ $ ./xxx-super-shell
xxx@xxx ~ $ echo ABCDEF
xxx@xxx ~ $ echo ABCDEF > ./1.txt
xxx@xxx ~ $ cat 1.txt
xxx@xxx ~ $ ls -t >> 1.txt
xxx@xxx ~ $ ls -a -l | grep abc | wc -l > 2.txt
xxx@xxx ~ $ python < ./1.py | wc -c
xxx@xxx ~ $ mkdir test_dir
xxx@xxx ~/test_dir $ cd test_dir
xxx@xxx ~ $ cd -
xxx@xxx ~/test_dir $ cd -
xxx@xxx ~ $ ./xxx-super-shell # shell 中嵌套 shell
xxx@xxx ~ $ exit
xxx@xxx ~ $ exit

注：示例请参考 Bash、Zsh 命令

语言要求

语言在 C、C++、Go、Rust 中任选

截止时间

2023-03-25 20：00（UTC+8）

知识要点

懂得如何使用 shell
理解 shell 原理
Linux系统编程：进程控制
gdb
valgrind

参考资料

man 手册.
MichaelKerrisk.Linux/UNIX系统编程手册[M].北京:人民邮电出版社.
W.RichardStevens.Stephen.UNIX环境高级编程[M].第3版.戚正伟,译.北京:人民邮电出版社.

以下是我实现的完整代码,多管道通信采用递归

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
#include <dirent.h>
#include <fcntl.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <pwd.h>
#include <signal.h>
#include <readline/readline.h> //用命令下载库,然后在makefile里面-lreadline
#include <readline/history.h>

#define PATH_MAX 4096
#define MAX_ARGS 64
#define MAX_LINE 1024
#define MAX_HISTORY 100
#define MAX_NAME 100

void prompt(void);
int parse_args(char *line, char **args);
void change_dir(char **args, char **history);
void execute(char **args, int args_cnt);
void div_by_pipe(int left, int right, char **args);
int find(char *command);
void redirect(int left, int right, char **args);
void sigint_handler(int signum);
void my_err(const char *err_string, int line);

int cd_cnt = 0;

int main(int argc, char *argv[])
{
   
    char **history = (char **)malloc(sizeof(char *) * MAX_HISTORY);
    for (int i = 0; i < MAX_HISTORY; i++)
    {
   
        history[i] = (char *)malloc(sizeof(char) * PATH_MAX);
    }

    strcpy(history[0], "/home/shawn/");
    int args_cnt;

    char **args = (char **)malloc(sizeof(char *) * MAX_ARGS);
    for (int i = 0; i < MAX_ARGS; i++)
    {
   
        args[i] = (char *)malloc(sizeof(char) * MAX_LINE);
    }

    signal(SIGINT, sigint_handler); // 信号处理函数,好nb

    while (1)
    {
   
        prompt(); // bug每次循环都要打印

        char *line = (char *)malloc(sizeof(char) * MAX_LINE);
        fgets(line, MAX_LINE, stdin);
        line[strlen(line) - 1] = '\0';

        args_cnt = parse_args(line, args);
        // printf("%d\n",args_cnt);//打印参数个数
        if (args_cnt == 0) // 处理直接enter
        {
   
            continue;
        }

        if (strcmp(args[0], "cd") == 0)
        {
   
            change_dir(args, history);
            continue;
        }

        if (strcmp(args[0], "exit") == 0)
        {
   
            break;
        }

        // args_cnt--; // bug
        execute(args, args_cnt);

        free(line);
        free(args);
    }

    return 0;
}

void prompt(void)
{
   
    char cusername[MAX_NAME];
    char at = '@';
    char hostname[MAX_NAME];
    char chostname[MAX_NAME + 50];
    char path[PATH_MAX];
    char cpath[PATH_MAX + 50];

    struct passwd *pw;
    pw = getpwuid(getuid());
    gethostname(hostname, sizeof(hostname));
    getcwd(path, sizeof(path));
    char m = '$';

    sprintf(cusername, "\033[1;32m%s\033[0m", pw->pw_name);
    sprintf(chostname, "\033[1;32m%s\033[0m", hostname);
    sprintf(cpath, "\033[1;34m%s\033[0m", path);

    printf("%s\033[1;32m%c\033[0m%s:%s%c", cusername, at, chostname, cpath, m); // bug可以直接打印有颜色的字符@

    return;
}

int parse_args(char *line, char **args)
{
   
    int args_cnt = 0;
    char *token;
    char *delim = " ";

    token = strtok(line, delim);

    while (token != NULL)
    {
   
        args[args_cnt++] = token; // 更先进的处理方式,不用在main()函数里args_cnt--并且能处理无命令直接enter的情况,这样可以直接反映参数个数
        token = strtok(NULL, delim);
    }

    args[args_cnt] = NULL;
    return args_cnt;
}

void change_dir(char **args, char **history)
{
   
    if (args[1] == NULL) // bug比如cd后啥也没输
    {
   
        return;
    }

    if (strcmp(args[1], "~") != 0 && strcmp(args[1], "-") != 0)
    {
   
        char buf[PATH_MAX];
        getcwd(buf, sizeof(buf));
        strcpy(history[++cd_cnt], buf);
    }

    if (strcmp(args[1], "~") == 0)
    {
   
        char buf[PATH_MAX];
        getcwd(buf, sizeof(buf));
        strcpy(history[++cd_cnt], buf);
        strcpy(args[1], "/home/shawn/");
    }

    if (strcmp(args[1], "-") == 0)
    {
   
        strcpy(args[1], history[cd_cnt]);
        cd_cnt--;
        char buf[PATH_MAX];
        getcwd(buf, sizeof(buf));
        strcpy(history[++cd_cnt], buf);
    }

    if (chdir(args[1]) == -1)
    {
   
        printf("chdir error\n"); // bug不能因为chdir错误就退出程序
    }

    return;
}

void execute(char **args, int args_cnt)
{
   
    pid_t pid;
    pid = fork();
    if (pid == -1)
    {
   
        my_err("fork", __LINE__);
    }

    if (pid == 0)
    {
   
        // int infd = dup(0);
        // int outfd = dup(1);
        div_by_pipe(0, args_cnt, args);
        // dup2(infd, STDIN_FILENO);
        // dup2(outfd, STDOUT_FILENO);
        exit(0);
    }
    else if (pid > 0)
    {
   
        waitpid(pid, NULL, 0);
    }
}

void div_by_pipe(int left, int right, char **args)
{
   
    int pipepos = -1;

    for (int i = left; i < right; i++)
    {
   
        if (strcmp(args[i], "|") == 0) // 大bug没有加=0
        {
   
            pipepos = i;
            break;
        }
    }

    if (pipepos == -1) // bug先于管道|后缺命令执行
    {
   
        redirect(left, right, args);
        return;
    }

    if (pipepos + 1 == right)
    {
   
        printf("behind the | need a command\n");
        return;
    }

    int fd[2];
    if (pipe(fd) == -1)
    {
   
        my_err("pipe", __LINE__);
    }

    pid_t pid = fork();
    if (pid == -1)
    {
   
        my_err("fork", __LINE__);
    }

    if (pid == 0)
    {
   
        close(fd[0]);
        dup2(fd[1], STDOUT_FILENO);
        // close(fd[1]);
        redirect(left, pipepos, args);
        exit(0);
    }
    else if (pid > 0)
    {
   
        waitpid(pid, NULL, 0);

        close(fd[1]);
        dup2(fd[0], STDIN_FILENO);
        // close(fd[0]);
        div_by_pipe(pipepos + 1, right, args);
    }
}

void redirect(int left, int right, char **args)
{
   
    int fd;
    int is_background = 0;

    if (!find(args[left]))
    {
   
        return;
    }

    for (int i = left; i < right; i++)
    {
   
        if (strcmp(args[i], "&") == 0)
        {
   
            is_background = 1;
            args[i] = NULL;
            break;
        }
    }

    for (int i = left; i < right; i++)
    {
   
        if (strcmp(args[i], "<") == 0)
        {
   
            fd = open(args[i + 1], O_RDONLY);
            dup2(fd, STDIN_FILENO);
            close(fd);
            args[i] = NULL;
            // args[i+1]=NULL;//处理<的同时处理文件
            i++;
        }
        if (strcmp(args[i], ">") == 0)
        {
   
            fd = open(args[i + 1], O_WRONLY | O_CREAT | O_TRUNC, 0644);
            dup2(fd, STDOUT_FILENO);
            close(fd);
            args[i] = NULL;
            // args[i+1]=NULL;
            i++;
        }
        if (strcmp(args[i], ">>") == 0)
        {
   
            fd = open(args[i + 1], O_WRONLY | O_CREAT | O_APPEND, 0644);
            dup2(fd, STDOUT_FILENO);
            close(fd);
            args[i] = NULL;
            // args[i+1]=NULL;
            i++;
        }
    }

    pid_t pid = fork();
    if (pid == -1)
    {
   
        my_err("fork", __LINE__);
    }

    if (pid == 0)
    {
   
        // execvp(args[left], args + left);//bug
        char *command[MAX_ARGS];
        for (int i = left; i < right; i++)
        {
   
            // strcpy(command[i], args[i]);//bug
            command[i] = args[i];
        }
        command[right] = NULL; // bug有效参数的个数j
        execvp(command[left], command + left);
    }
    else if (pid > 0)
    {
   
        if (is_background == 0)
        {
   
            waitpid(pid, NULL, 0);
        }
    }
}

int find(char *command)
{
   
    DIR *dir;
    struct dirent *sdp;
    char *path[] = {
   "/bin", "/usr/bin", "./", NULL};

    if (strncmp(command, "./", 2) == 0) // 类似于./a.out
    {
   
        command = command + 2; // 去除./,方便与d_name比较
    }

    for (int i = 0; path[i] != NULL; i++)
    {
   
        dir = opendir(path[i]);
        while ((sdp = readdir(dir)) != NULL)
        {
   
            if (strcmp(command, sdp->d_name) == 0)
            {
   
                closedir(dir);
                return 1;
            }
        }
    }

    closedir(dir);
    return 0;
}

// 信号处理函数
void sigint_handler(int signum)
{
   
    printf("\n");
    fflush(stdout);
}

void my_err(const char *err_string, int line)
{
   
    fprintf(stderr, "line:%d", __LINE__);
    perror(err_string);
    exit(1);
}