问题：Linux虚拟内存管理中的active_mm作用？

问题的提出：阅读Linux虚拟内存管理源码时，看到复制一个mm，里面赋值了tsk->mm和tsk->active_mm，于是回忆这两者的作用。

static int copy_mm(unsigned long clone_flags, struct task_struct * tsk)
{
   
	struct mm_struct * mm, *oldmm;
	int retval;

	tsk->min_flt = tsk->maj_flt = 0;   //初始化与内存管理相关的task_struct字段
	tsk->cmin_flt = tsk->cmaj_flt = 0;
	tsk->nswap = tsk->cnswap = 0;

	tsk->mm = NULL;
	tsk->active_mm = NULL;

	/*
	 * Are we cloning a kernel thread?
	 *
	 * We need to steal a active VM for that..
	 */
	oldmm = current->mm;   //用当前运行进程的mm复制
	if (!oldmm)   //没有mm的内核线程，所以立即返回
		return 0;

	if (clone_flags & CLONE_VM) {
      //如果设置了CLONE_VM标志位，子进程与父进程共享mm
		atomic_inc(&oldmm->mm_users);   //mm_user字段加1
		mm = oldmm;
		goto good_mm;   //good_mm标记设置tsk->mm和tsk->active_mm，并返回成功
	}

	retval = -ENOMEM;
	mm = allocate_mm();
	if (!mm)
		goto fail_nomem;

	/* Copy the current MM stuff.. */
	memcpy(mm, oldmm, sizeof(*mm));
	if (!mm_init(mm))
		goto fail_nomem;

	if (init_new_context(tsk,mm))
		goto free_pt;

	down_write(&oldmm->mmap_sem);
	retval = dup_mmap(mm);
	up_write(&oldmm->mmap_sem);

	if (retval)
		goto free_pt;

	/*
	 * child gets a private LDT (if there was an LDT in the parent)
	 */
	copy_segments(tsk, mm);

good_mm:
	tsk->mm = mm;
	tsk->active_mm = mm;
	return 0;

free_pt:
	mmput(mm);
fail_nomem:
	return retval;
}

此前的笔记：

进程地址空间由mm_struct结构描述，所以一个进程只有一个mm_struct结构，且该结构在进程用户空间中由多个线程共享。内核线程（kernel thread）不需要mm_struct，故task_struct->mm字段总为NULL。
那些未访问用户空间的进程所做的TLB刷新操作是无效的，Linux采用“延迟TLB”的技术避免这种刷新操作。Linux通过借用前个任务的mm_struct，并放入task_struct->active_mm中，避免了调用switch_mm()刷新TLB。

进入延迟TLB时，在SMP上系统会调用enter_lazy_tlb()确保mm_struct不会被SMP处理器共享，在UP机器上这是一个空操作。
进程退出时，系统会在该进程等待父进程回收时调用start_lazy_tlb()。

mm_struct有两个引用计数，mm_users and mm_count。

mm_users：描述存取这个mm_struct用户空间的进程数，存取的内容有页表、文件的映像等。例如线程会增加这个计数，以确保mm_struct不会被过早释放。当这个计数值减为0时，exit_mmap()会删除所有的映像并释放页表，然后减少mm_count值。
mm_count：对mm_struct匿名用户的计数。匿名用户不关心用户空间的内容，只是借用mm_struct。例如使用延迟TLB转换的核心线程（kernel thread）。当这个计数减为0时，就可以安全释放掉mm_struct。

新的问题。
Q1：匿名用户是什么？
Q2：mm_users and mm_count的作用？

Q1：匿名用户是什么？

查阅后看到Linus对 Is there a brief description someplace on how "mm" vs. "active_mm" in the task_struct are supposed to be used? 该问题的回答：
https://www.kernel.org/doc/Documentation/vm/active_mm.txt

邮件中提出两个此前没有看过的概念：“real address spaces” and “anonymous address spaces”
an anonymous address space doesn’t care about the user-level page tables at all, so when we do a context switch into an anonymous address space we just leave the previous address space active.
这里对anonymous address spaces进行了描述，可以理解。反过来应该就是real address spaces。邮件中也提到kernel thread基本上被认为是 anonymous address spaces 。到这里也就明白了匿名用户。Q1 解决。

也对原问题（Linux虚拟内存管理中mm_struct为什么有active_mm？）有了更深的理解：
kernel threads不关心real address spaces，即 kernel threads 可以使用任何进程的memory address spaces。在切换上下文到 kernel threads 时需要记录下 real address spaces。mm_struct->active_mm就用以记录 real address spaces，即memory address spaces of users。kernel threads通过active_mm确定当前执行流的 user memory address spaces 是什么。

Q2：mm_users and mm_count的作用？

Q2中mm_users and mm_count的作用，结合《Linux虚拟内存管理》和linus的邮件回复。mm_struct有两个计数是为了满足多核的处理。在一个CPU上运行的进程执行流包括用户部分和内核部分，换个说法也许更好，real address spaces and anonymous address spaces。进程执行流中real address spaces会被切换到其他CPU上，所以需要两个计数。

a “mm_users” counter that is how many “real address space users” there are.
a “mm_count” counter that is the number of “lazy” users (ie anonymous users) plus one if there are any real users.

这里可以用INIT_MM的初始化引证：

#define INIT_MM(name) \
{
     			 				\
	mm_rb:		RB_ROOT,			\
	pgd:		swapper_pg_dir, 		\
	mm_users:	ATOMIC_INIT(2), 		\
	mm_count:	ATOMIC_INIT(1), 		\
	mmap_sem:	__RWSEM_INITIALIZER(name.mmap_sem), \
	page_table_lock: SPIN_LOCK_UNLOCKED, 		\
	mmlist:		LIST_HEAD_INIT(name.mmlist),	\
}

可以看到INIT_MM的mm_count初始化为1。

参考文献：
[1] Linus Torvalds. https://www.kernel.org/doc/Documentation/vm/active_mm.txt. 1999-07-30
[2] 白洛. 深入理解Linux虚拟内存管理. 2006-1
[3] Mel Gorman. Understanding the Linux Virtual Memory Manager. 2004-5-9