- rCore on Windows platform
- rust : download rust from the default settings, and then set
rust-toolchain.toml
to specified thenightly
version. - qemu : the
qemu
version must be7.0.0
download the qemu, otherwise, the startup address won't be0x0000000
.
needing tools:
rustup target add riscv64gc-unknown-none-elf
cargo install cargo-binutils
rustup component add llvm-tools-preview
rustup component add rust-src
and the gdb tool: link
Here we implement the basic environment to run a application, print a "hello world" string at the qemu simulator console.
We remove the standard library of rust, only using the core library, and setting the panic_handler
.
Introduction relevant bootstrap process in qemu.
In order for the implementation to properly interface with Qemu, we can adjust the behavior of the linker through Linker Script(src/linker.ld) and set config(.cargo/config)
The basic theory of stack and calling the function in assemble.
jal rd, imm[20:1]
rd <- pc + 4
pc <- pc + imm
jalr rd, (imm[11:0])rs
rd <- pc + 4
pc <- rs + imm
rs: Source Register
rd: Destination Register
(x0 ~ x31)
imm: Immediate
- Callee-Saved: used by callee
- Caller-Saved: used by caller
Register:
register | saver |
---|---|
a0 ~ a7(x10 ~ x17) | callee |
t0 ~ t6(x5 ~ x7, x28 ~ x31) | callee |
s0 ~ s11(x8 ~ x9, x18 ~ x27) | caller |
zero(x0), ra(x1), sp(x2), fp(s0), gp(x3), tp(x4)
SEE: Supervisor Execution Environment
AEE: Application Execution Environment
Privilege-level software and hardware co-design
ecall
eret
privilege command:
sret
: S return to U.wfi
: waiting for interrupt.sfence.vma
: reflash TLB cache.- visit
spec/stvec/scause/sscartch/stval/sstatus/stap
CSR instruction
and then, we write 5 program which running on user-space.
these programs can be test in linux platforms with qemu-riscv64
, which the command can not find in windows.
The cache made by the CPU on the physical memory is divided into two parts:
- the data cache (d-cache)
- the instruction cache (i-cache) which are used when the CPU accesses and fetches the memory respectively.
Here we only notice that the privilege-level change only between RISCV-U and RISCV-S.
|csr name| function with Trap| CSR(Control and status register) |---|---| |sstatus| current privilege-level type info| |spec|if the current trap is an exception,record the last instruction address| |scause|the reason for this trap| |stval|the additional information about this trap| |stvec|the entry point address of trap handler code|
stvec = MODE[1:0] + BASE[63:2]
MODE == 0 - Direct Mode
MODE -> Vectored
ecall -> u - Trap into - S
- sstatus.SPP -> U/S
- spec Trap next instruction address
- scause/stval record information
- CPU -> stvec, privilege-level set as S
sret -> s -> u
- sstatus.SPP -> U/S
- CPU -> spec
User stack and Kernel stack(Just wrapping the bytes array).
#[repr(C)]
pub struct TrapContext {
pub x: [usize; 32],
pub sstatus: Sstatus,
pub sepc: usize,
}
General purpose register x0~x31
, sstatus
, spec
.
The overall process of Trap
processing is as follows:
- First save the Trap context on the kernel stack by
__alltraps
. - Jump to
trap_handler
functions written in Rust to complete Trap distribution and processing. - When
trap_handler
returns, use__restore
to recover registers from the Trap context stored on the kernel stack. - Finally, return to the application for execution through a
sret
instruction.
Instructions for reading and writing CSR are a class of instructions that can complete multiple read and write operations without interruption.
We use python to build the linker script, we replace the application start address with the correct address which i-th application should place on.
Considering an one-time execution application as a task
,
an execution segment
or idle segment
on a time segment
in the application execution process is called a
computing task slice
or idle task slice
.
Switching tasks from one program to another is called task switching
.
The control flow between Task switching
and Trap
:
- It does not involve privilege level switching
- Part of it is done with the help of the compiler
- Same as
Trap
, it is transparent to the application
In fact, Task switching
is a switch between the Trap control flow
in the kernel from two different applications.
when an application trap into S-mode, this trap control flow can call
a special function __switch
. After the __switch
return, it will continue to execute the follow program segment.
That means the previous trap control of application A flow will be paused and switch out. Then CPU will handle another trap control of application B. And then at the right time, the trap control flow of application A will be switch back from the trap control flow of application C (maybe not the pervious his switch to).
Function __switch
will change the stack
.
When we doing the switching operation, we should stored some registers (Task Context). The Task Context
will stored at TaskManager
as a TaskControlBlock
, responsibly for saving the state of a task.
TaskContext *current_task_cx_ptr = &tasks[current].task_cx;
TaskContext *next_task_cx_ptr = &tasks[next].task_cx;
pub struct TaskContext {
ra: usize,
sp: usize,
s: [usize; 12],
}
ra
: It records where the__switch
function should jump to continue execution after returning, so that it can get to the correct position after the task switch is completed and ret.sp
: kernel stack pointer of app.s
: Stored s0 ~ s11.
A TaskControlBlock
contains: TaskStatus
, TaskContext
.
- Software Interrupt
- Timer Interrupt
- External Interrupt
In the privilege level of S, interrupt mask - CSR has sstatus
and sie
.
sie
has three type of interrupts: ssie
/ stie
/ seie
.
when the interrupt happen,
sstatus.sie
will save atsstatus.spie
, andsstatus.sie
clear as zero.- after completion of interrupt handler,
sret
will return to the place interrupted by the interrupt and continue execution. Hardware will recover thesstatus.sie
with the value ofsstatus.spie
.
In RISC-V 64 architectures, there is a 64-bit counter CSR mtime
.
another 64-bit counter CSR mtimecmp
, if the value of mtime
is exceed mtimecmp
, it will cause a timer interrupt.
*const T / *mut T
&T / &mut T
Box<T>
Rc<T> / Arc<T>
RefCell<T>
Mutex<TJk>
using the crate buddy_system_allocator
as heap allocator. and use #[global_allocator]
for Semantic items tagged.
buddy_system_allocator::LockedHeap
has implemented GlobalAlloc
interface.
we use a static mut bytes array(in kernel .bss segment), as the heap area.
with the help of hardware:
virtual address -> CPU.MMU(Memory Management Unit) -> Physical address
Address Translation
- Every application address size limited to a const number
bound
, that is, each application virtual address range from$[0, bound)$ . - physical address was split into several slots of the same size, except of the kernel reserved space.
- Each slot has the base address and the visitable range
$[base, base + bound)$ .
In this case, MMU only need two registers: base
and bound
. And use a Bitmap
to show the slots using status.
However, it may waste too much memory resources(Internal Fragment).
do some improve with different base/bound: however, it also may waste too much memory(External Fragment).
- kernel do physical memory manage base on page.
- each application virtual address can be divided into serval virtual page.
- available physical memory are divided into serval physical as Frame.
- each application address are consist of serval virtual pages.
- VPN: Virtual Page Number
- PPN: Physical Page Number
- Each application has his own
Page Table
(existing in memory, managed by kernel), recording each virtual page mapping the actual physical Frame. - The
Page Table
, Key is VPN, Value is PPN. - set the protection bit
rwx
(VPN, Offset) -> MMU -> (PPN, Offset)
Here, we use SV39
provided by riscv architecture.
In default, MMU is disabled.Modify a CSR satp
to enable it.
After that, address which S mode or U mode would be all consider as a virtual address.It needs to translate into a physical address by MMU, and then visit the physical memory.And address in M mode would be consider as physical address.
- MODE: 0-visit physical address/8-Sv39
- ASID: Address space identifier.
- PPN: The physical page number where the root page table is located.
- V(Valid): valid if current bit is 1
- R(Read)/W(Write)/X(Execute)
- U(User)
- G
- A(Accessed)
- D(Dirty)
Allocation on demand, and using the trie
algorithm.
- v == 0 => nullptr
- v == 1 && R/W/X == 0 => valid page table entry, containing ptr point at next page table entry.
- v == 1 && R/W/X |!= 0 => valid page table entry, containing the physical page number corresponding to the virtual address.
TLB, Translation Lookaside Buffer
sfence.vma
+ virtual address, only refresh this relevant mapping.
In enabling the Page management, when kernel code want to visit a virtual address, it should leave it to MMU to do physical address conversion. Giving each application kernel stack, and a Trampoline
.
Each kernel stack would reserve a Guard Page
as empty hole.
Process: An execution process of a program under the management of the operating system.
- fork: frok a new process, everything are same except
a0
register(function returns value), hard-code process:initial Process
- waitpid: when a process calling
exit
system call to exit the process. The resources it occupies cannot be fully recovered at once(process kernel stack). When the process exit, it will be set asZombie Process
. And then its father process will collect this process state and recycle all the resources it occupies throughwaitpid
. - exec: accepting a parameter
path: &str
similar to char* inC
.
Each process has it own unique process identifier symbol
,
PidHandler(PidAllocator): allocator <-alloc/dealloc-> pid(usize).
KernelStack: contain a pid
.
// immutable
pid: PidHandle,
kernel_stack: KernelStack
UPSafeCell {
trap_cx_ppn: PhyPageNum,
base_size: usize,
task_cx: TaskContext,
task_status: TaskStatus,
memory_set: MemorySet,
parent: Option<Weak<TaskControlBlock>>,
children: Vec<Arc<TaskControlBlock>>,
exit_code: i32,
}
- new(elf_data) -> TaskControlBlock
- exec(&self, elf_data)
- fork(self: &Arc) -> Arc
TaskManager: Round Robin
algorithm
ready_deque: VecDeque<Arc<TaskControlBlock>>
Processor: Maintain CPU state
current: Option<Arc<TaskControlBlock>>,
idle_task_cx: TaskContext,
run_task()
:
Here we cannot rely on the compiler to automatically recycle borrowing mark(task_inner, processor),
because we will call the __switch
before the automatic recycle.
schedule(switched_task_cx_ptr: *mut TaskContext)
:
unsafe {
__switch(
switched_task_cx_ptr,
idle_task_cx_ptr,
);
}
-
Create a initial process(initproc).
-
Process scheduling mechanism, when process calling
sys_yield
to give up the right of using CPU, or the kernel swaps out the process that runs out of time slices allocated in this round and swaps in the next process. -
Process generation mechanism(sys_fork/sys_exec).
-
Process resource recycling mechanism, when the process calling
sys_exit
, or error happened and being terminated by kernel(saving the exit code), its father process can collect the information of this process and recycle its resources by callingsys_waitpid
. -
Character input mechanism, for supporting
user_shell
getchar(sys_read
).
After the kernel has been initialized, adding initproc
by task
module.
we calling TaskControlBlock::new
to create a process control block, it needs accept
a elf
data slice(provided by get_app_data_by_name
from loader
module).
+---------------------+
| |
| Analyze ELF file |
| |
+----------+----------+
|
v
+----------+----------+
| |
| Get Trap context |
| |
+----------+----------+
|
v
+----------+--------------------+
| |
| Allocate PID and kernel stack |
| |
+----------+--------------------+
|
v
+----------+----------------+
| |
| Push initial task context |
| |
+----------+----------------+
|
v
+----------+-------------------+
| |
| Create process control block |
| |
+----------+-------------------+
|
v
+----------+--------------+
| |
| Initialize Trap context |
| |
+----------+--------------+
|
v
+----------+-------------------+
| |
| Return process control block |
| |
+----------+-------------------+
calling by suspend_current_and_run_next()
from task
module.
using in sys_yield()
and Interrupt::SuperiorTimer
match status.
In suspend_current_and_run_next
, take_current_task
take out the task currently running, then modify this process control block status, and then push it into the tail of the queue.
After that, calling schedule
function to toggle the scheduling and switching task.
Fork a child process:
create a new memory_set by form_exited_user()
, allocate a new pid
, create a kernel_stack
, set the trap_cx.kernel_sp
-> kernel_stack
.
Trap::Exception(Exception::UserEnvCall) => {
// jump to next instruction anyway
let mut cx = current_trap_cx();
->cx.sepc += 4;
// get system call return value
let result = syscall(cx.x[17], [cx.x[10], cx.x[11], cx.x[12]]);
// cx is changed during sys_exec, so we have to call it again
cx = current_trap_cx();
->cx.x[10] = result as usize;
}
pub fn sys_fork() -> isize {
let current_task = current_task().unwrap();
let new_task = current_task.fork();
let new_pid = new_task.pid.0;
// modify trap context of new_task, because it returns immediately after switching
let trap_cx = new_task.inner_exclusive_access().get_trap_cx();
// we do not have to move to next instruction since we have done it before
// for child process, fork returns 0
->trap_cx.x[10] = 0; //x[10] is a0 reg
// add new task to scheduler
add_task(new_task);
new_pid as isize
}
replace current address with a new address space form by ELF.
// memory_set with elf program headers/trampoline/trap context/user stack
let (memory_set, user_sp, entry_point) = MemorySet::from_elf(elf_data);
There is no need to deal with the task context here, because the process itself is already executing, and only the suspended application needs to keep a task context on the kernel stack.
Recycling the resources by calling exit_current_and_run_next()
.
setting child process status as Zombie
, and then it will be recycled by its father
process with waitpid
.
graph TD;
A[Easy-fs File System] --> B[Boot Block];
A --> C[Super Block];
A --> D[Block Bitmap];
A --> E[Node Block];
A --> F[Data Block];
C --> G[Node Group];
C --> H[Data Group];
G --> I[Node 1];
G --> J[Node 2];
G --> K[...];
H --> L[Data Block 1];
H --> M[Data Block 2];
H --> N[...];
In the diagram above, the Easy-fs file system consists of a boot block
,
a super block
, a block bitmap
, node blocks
, and data blocks
.
boot block
is used to boot the file system,super block
contains metadata information about the file system,block bitmap
is used to record the usage of blocks,node blocks
are used to store directory and file information,data blocks
are used to store the actual data of files.
The super block
contains information about Node groups
and Data groups
.
Node groups
are used to store Node blocks
, while data groups are used to store Data blocks
.
Node blocks X
contain metadata information about files and directories, such as file name, file size, creation time, etc.Data blocks X
store the actual data content of files.
Overall, the Easy-fs file system adopts a simple and effective design approach by storing the metadata and data of the file system in different areas to improve the reliability and performance of the file system.
add user lib systemctl call:
sys_open
sys_close
easy-fs
: implement a simple file system layout.
easy-fs-fuse
: is an application that can run in a development environment,
it can test easy-fs
, or package the application developed for our kernel
as an easy-fs file system image.
graph TB
A[easy-fs] -- Connects via abstract \n interface BlockDevice --> B(Underlying device driver)
A -- Isolates memory management of kernel \n via Rust's alloc crate --> C(OS kernel)
B -- Accesses virtio_blk virtual disk device \n via polling --> C(OS kernel)
A -- Isolates process management of OS kernel \n by avoiding direct access to \n process-related data and functions --> C
// easy-fs/src/block_dev.rs
pub trait BlockDevice : Send + Sync + Any {
// read the block which id equals to `block_id` data
// from disk to memory.
fn read_block(&self, block_id: usize, buf: &mut [u8]);
// write the block which id equals to `block_id` with data,
// from memory to disk.
fn write_block(&self, block_id: usize, buf: &[u8]);
}
Every time we want to do read/write in a disk block, we should put the
block data onto a temporarily created cache area, writing it back after serval writing/reading operations.
But in the the consideration of effective, we should reduce the time of calling read/write_block
as less as possible, because they incur a lot of overhead every time they are called. To do this, the key is to merge
block reads and writes.
We have a BLOCK_CACHE_MANAGER
to manage the caches we want to do operations,
it use FIFO, each time we have a willing the change a block with the id,
it will search the cache in its queue, if not matching, it will add it to the cache
or find a cache which the count of strong references is just equal to 1, and then replace it with the one we want. (if the upper bound is full, panic)
The exposed api, it just accept a closure conveniently:
impl BlockCache {
pub fn read<T, V>(&self, offset: usize, f: impl FnOnce(&T) -> V) -> V {
f(self.get_ref(offset))
}
pub fn modify<T, V>(&mut self, offset:usize, f: impl FnOnce(&mut T) -> V) -> V {
f(self.get_mut(offset))
}
}
graph LR
A[Super Block * 1]--- B[Inode Bitmap * n]
B --- C[Inode region * n]
C --- D[Data Bitmap * n]
D --- E[Data region * n]
B -- Manage --> C
D -- Manage --> E
C -- Manage --> E
Super Block: It is stored at the beginning of the block numbered 0 on the disk.
BitMap: In easy-fs layout, existing two type of bitmap, managing index nodes and data blocks separately. Consisting of several blocks (each one: 512 bytes = 4096 bits). There are two state in each block record:
- 0: unallocated
- 1: allocated
bitmap should find a bit which its state is 0 to allocate out and recycle bit(inode/block).
DiskInode:
Each file/directory are stored as a DiskInode
in disk.
including metadata: size(sizeof file/directory bytes), type_(DiskInodeType),
graph TD
A(Block number \n is stored in a u32) --> B(Indexing method)
B --> C(Direct indexing)
C --> D(The direct array \n can point to a maximum of \n INODE_DIRECT_COUNT \n data blocks)
D --> E(Which allows you to find 14KiB of \n content through direct indexing)
B --> F(Indirect indexing)
F --> G(First-level indirect index indirect1)
G --> H(Points to a first-level index block)
H --> I(u32 points to a data block \n in the data block area \n that stores the file content)
F --> J(Second-level indirect index indirect2)
J --> K(Points to second-level \n index block located \n in the data block area)
K --> L(u32 points to a different \n first-level index block)
A --- M(DiskInode is sized to \n 128 bytes)
M --- N(Each block can hold exactly \n 4 DiskInodes)
N --- O(Direct index direct can be appropriately reduced)
O --- P(Space saved can be used to \n store other meta-data)
P --- Q(DiskInode's overall size \n can still be guaranteed to be \n 128 bytes)
Each files can be considered as servals of DataBlock[u8;BLOCK_SZ]
,
But Directory must follow some special format, considering as a consequence, which is
a binary group, the first element is the name of this directory, and then is the number of Inode.
Manage Super Block
, Inode Bitmap
, Inode region
, Data Bitmap
and Data region
.
Explosive data structure.
pub struct Inode {
block_id: usize,
block_offset: usize,
fs: Arc<Mutex<EasyFileSystem>>,
block_device: Arc<dyn BlockDevice>,
}
Implement:
// efs
// Acquire the root inode
fn root_inode(efs: &Arc<Mutex<Self>>) -> Inode
// Inode
// Find the inode(Only called by directory)
fn find(name: &str) -> Option<Arc<Inode>>
fn find_inode_id(name: &str, disk_inode: &DiskInode) -> Option<u32>
// collect the files/directories name
fn ls() -> Vec<String>
// create a file in root inode
fn create(name: &str) -> Option<Arc<Inode>>
// clear the file (or open with `CREATE`)
fn clear()
// file read and write
fn read_at(offset: usize, buf: &mut [u8]) -> usize
fn write_at(offset: usize, buf: &[u8]) -> usize
fn increase_size(new_size: u32, disk_inode: &mut DiskInode, fs)
Testing the Efs system, with linux(current system):
using a file to emulate a block device.
After implementing the easy-fs file system,
we can finally package these applications into the easy-fs image and put them on disk. When we want to execute the application,
we only need to take out the application in the ELF execution file
format from the file system and load it.
It can be executed in memory, which avoids the storage overhead and other problems in the previous chapters.
we using crate virtio-divers
to support VirtIO block devices....
// Indicates a regular file or directory that is opened in the process
pub struct OSInode {
readable: bool,
writable: bool,
inner: Mutex<OSInodeInner>,
}
pub struct OSInodeInner {
offset: usize,
inode: Arc<Inode>,
}
pub trait File: Send + Sync {
/// If readable
fn readable(&self) -> bool;
/// If writable
fn writable(&self) -> bool;
/// Read file to `UserBuffer`
fn read(&self, buf: UserBuffer) -> usize;
/// Write `UserBuffer` to file
fn write(&self, buf: UserBuffer) -> usize;
}
impl for OSInode {}
In order to supporting file management for Process,
add corresponding fields: pub fd_table: Vec<Option<Arc<dyn File + Send + Sync>>>,
And then add system call api:
- sys_open: process only open a file and return a writeable and readable fd to process.
- sys_close: process close the fd, guaranteeing legal access to documents.
- sys_read: process can read the contents to memory.
- sys_write: process can write the contents into the file.
File system initialization:
- open the block device.
- open the file system in block device.
- acquire the root inode in file system.
IPCOS add two commination ways: Pipe and signal.
Everything in linux are file.
Implement File trait
to Stdin
and Stdout
(meta structure).
- fd == 0 Standard Input
- fd == 1 Standard Output
- fd == 2 Standard Error Output
In process forking, child process should completely extend from father process fd table, in order to share all files.
sys_pipe(pipe: *mut usize) -> isize;
Pipe is based on file:
pub struct Pipe {
// read-end or write-end
readable: bool,
writable: bool,
buffer: Arc<Mutex<PipeRingBuffer>>,
}
impl File for Pipe {}
And then, in a further warping:
enum RingBufferStatus {
FULL,
EMPTY,
NORMAL,
}
pub struct PipeRingBuffer {
arr: [u8; RING_BUFFER_SIZE],
head: usize,
tail: usize,
status: RingBufferStatus,
write_end: Option<Weak<Pipe>>,
}
write_end
:
Keeps a weak reference count for its writes,
because in some cases it is necessary to confirm whether
all writes of the pipeline have been closed,
this is easy to confirm with this field.
providing two api in Pipe
:
fn read_end_with_buffer(buffer: Arc<Mutex<PipeRingBuffer>>) -> Self
fn write_end_with_buffer(buffer: Arc<Mutex<PipeRingBuffer>>) -> Self
then in making up a pipe:
/// Return (read_end, write_end)
pub fn make_pipe() -> (Arc<Pipe>, Arc<Pipe>) {
let buffer = Arc::new(Mutex::new(PipeRingBuffer::new()));
let read_end = Arc::new(
Pipe::read_end_with_buffer(buffer.clone())
);
let write_end = Arc::new(
Pipe::write_end_with_buffer(buffer.clone())
);
buffer.lock().set_write_end(&write_end);
(read_end, write_end)
}
Adding Command parameters and Redirecting the standard I/O:
modify the system call api:
pub fn sys_exec(path: &str, args: &[*const u8]) -> isize;
Since we want to pass the starting address of this vector to the kernel, in order for the kernel to get the number of command line arguments, we need to put a 0 at the end of the args_addr, so that the kernel can know when it sees it. The command line parameters have been obtained.
Then in TaskControlBlock
, we push arguments on user stack,
such as : accepting "aa" and "bb":
At the same time, we also need to modify the a0/a1
register in the Trap context, so that a0
represents the number of command line parameters, and a1
represents the starting address of the blue area argv_base in the figure.
sys_dup
: copy its a already open file reference and allocate a new fd point at this file.
pub fn sys_dup(fd: usize) -> isize {
let task = current_task().unwrap();
let mut inner = task.acquire_inner_lock();
// check the fd validity
if fd >= inner.fd_table.len() {
return -1;
}
if inner.fd_table[fd].is_none() {
return -1;
}
let new_fd = inner.alloc_fd();
inner.fd_table[new_fd] =
Some(Arc::clone(inner.fd_table[fd].as_ref().unwrap()));
new_fd as isize
}
The process of opening the file and replacing it takes place in the child process branch after the fork.
if !input.is_empty() {
let input_fd = open(input.as_str(), OpenFlags::RDONLY);
if input_fd == -1 {
println!("Error when opening file {}", input);
return -4;
}
let input_fd = input_fd as usize;
close(0);
assert_eq!(dup(input_fd), 0);
close(input_fd);
}
signal: process should have event notify
ability. software interrupt.
Signals is an asynchronous notification mechanism for inter-process communication in UNIX-like operating systems
The Receiver of signals is a process, having 3 ways to handle:
- ignored: ignore the signal.
- capture: call corresponding signal handler function.
- terminate: process exit
if some signals don't have its corresponding handler function, kernel will handle it in default way.
signal has 2 sources come from:
- at the beginning of process running, asynchronous signal.
- the execution trigger of the process itself, synchronous signal.
The kernel will check whether the process to be returned to has any signals pending before the Trap process is completed and is about to return to user mode. If it is required to handle, there are 2 ways:
-
call the handler function provided by corresponding signal by
sigaction
, kernel will store the current processTrap Context
in some place, and then return to the handler function in user mode. After this function is completed, callingsigreturn
to restore theTrap Context
and return user mode again to run next instruction. -
otherwise, kernel will solve this signal in default way, and then return to user mode to run next instruction.
sys_sigaction
: set up a signal processing routine.sys_procmask
: set the signal mask of the process.sys_sigreturn
: clears the stack frame, returns from the signal processing routine.
Design and Implement Signaling Mechanisms:
In order to set up signal process routine, adding serval fields in TaskControlBlock
:
pub signal_mask: SignalFlags,
pub signal_actions: SignalActions,
- process send
kill
to itself or other process. - kernel detect some events and then send the signal to process,
but this event has nothing to do with the execution of the process receiving the signal, such as
SIGCHLD
. - current process execution trigger some condition, then Trap into kernel handler, and then kernel send the corresponding signal to this kernel, such as
SIGSEGV
andSIGILL
.
adding two fields in TaskControlBlock
:
pub handling_sig: isize,
pub trap_ctx_backup: Option<TrapContext>,
fn handle_signals()
fn check_pending_signals()
fn call_kernel_signal_handler(signal: SignalFlags)
fn call_user_signal_handler(signal: SignalFlags)
graph TB
A[handler_signals] --> B[check_pending_signals]
B -- SIGKILL, SIGSTOP, SIGCONT, SIGDEF --> C[call_kernel_signal_handler]
B -- else --> D[call_user_signal_handler]
C -- !frozen or killed --> E[suspend_current_and_run_next]
D -- !frozen or killed --> E
here we consider thread as a task:
- Thread Control Block (TCB)
- Process Control Block (PCB)
and then, adding Mutex
, Semaphore
and Condvar
.
two example code file:
each thread has its user resource:
pub struct TaskUserRes {
/// TID allocated by current process.
pub tid: usize,
/// Used to calculate the thread user stack position.
pub ustack_base: usize,
/// Weak reference to the current process.
pub process: Weak<ProcessControlBlock>,
}
pub struct TaskControlBlock {
// immutable
pub process: Weak<ProcessControlBlock>,
pub kstack: KernelStack,
// mutable
inner: UPSafeCell<TaskControlBlockInner>,
}
pub struct TaskControlBlockInner {
/// Thread resource set.
pub res: Option<TaskUserRes>,
/// Indicates the physical page number of the physical page frame,
/// in which the Trap context in the application address space is placed.
pub trap_cx_ppn: PhysPageNum,
/// Save the task context of the suspended task in the task control block.
pub task_cx: TaskContext,
/// Maintain the execution state of the current process.
pub task_status: TaskStatus,
/// when the process system call `exit()` or execution meets error and terminated by kernel,
/// `exit_code` will save in its TCB, and then waiting for recycle its resources by it's parent process
/// by calling `waittid()`.
pub exit_code: Option<i32>,
}
pub struct ProcessControlBlock {
/// immutable
/// Process Identifier.
pub pid: PidHandle,
/// mutable
inner: UPSafeCell<ProcessControlBlockInner>,
}
pub struct ProcessControlBlockInner {
/// whether current process is being killed.
pub is_zombie: bool,
/// Represents the application address space.
pub memory_set: MemorySet,
/// Points to the parent process of the current process.
pub parent: Option<Weak<ProcessControlBlock>>,
/// Save the PCB of all child processes of the current process
/// in a `Vec` in the form of `Arc` smart pointers, so that they can be found more easily.
pub children: Vec<Arc<ProcessControlBlock>>,
/// when the process system call `exit()` or execution meets error and terminated by kernel,
/// `exit_code` will save in its PCB, and then waiting for recycle its resources by it's parent process
/// by calling `waitpid()`.
pub exit_code: i32,
/// Type description:
/// - Vec: dynamic length.
/// - Option: we can distinguish the file descriptor whether is
/// in free status(None) or being occupying(Some).
/// - Arc: sharing reference.
/// - dyn: maybe `Stdin` / `Stdout`
pub fd_table: Vec<Option<Arc<dyn File + Send + Sync>>>,
/// Record which signals have been received by the corresponding process
/// and have not yet been processed.
pub signals: SignalFlags,
/// Recording the threads generated by current process.
pub tasks: Vec<Option<Arc<TaskControlBlock>>>,
/// Used to allocate thread identifier.
pub task_res_allocator: RecycleAllocator,
pub mutex_list: Vec<Option<Arc<dyn Mutex>>>,
pub semaphore_list: Vec<Option<Arc<Semaphore>>>,
pub condvar_list: Vec<Option<Arc<Condvar>>>,
}
// os/src/task/manager.rs
/// lazy_static:
/// 1. TASK_MANAGER
/// 2. PID2TCB
pub fn add_task(task: Arc<TaskControlBlock>);
pub fn remove_task(task: Arc<TaskControlBlock>);
pub fn fetch_task() -> Option<Arc<TaskControlBlock>>;
pub fn pid2process(pid: usize) -> Option<Arc<ProcessControlBlock>>;
pub fn insert_into_pid2process(pid: usize, process: Arc<ProcessControlBlock>);
pub fn remove_from_pid2process(pid: usize);
// os/src/task/processor.rs
/// lazy_static:PROCESSOR
pub fn run_tasks();
pub fn take_current_task() -> Option<Arc<TaskControlBlock>>;
pub fn current_task() -> Option<Arc<TaskControlBlock>>;
pub fn current_process() -> Arc<ProcessControlBlock>;
pub fn current_user_token() -> usize;
pub fn current_trap_cx() -> &'static mut TrapContext;
pub fn current_trap_cx_user_va() -> usize;
pub fn current_kstack_top() -> usize;
pub fn schedule(switched_task_cx_ptr: *mut TaskContext);
// spin mutex
locked: UPSafeCell<bool>,
fn lock(&self) {
loop {
let mut locked = self.locked.exclusive_access();
if *locked {
drop(locked);
suspend_current_and_run_next();
continue;
} else {
*locked = true;
return;
}
}
}
fn unlock(&self) {
let mut locked = self.locked.exclusive_access();
*locked = false;
}
// -----------
// block mutex
locked: bool,
wait_queue: VecDeque<Arc<TaskControlBlock>>,
fn lock(&self) {
let mut mutex_inner = self.inner.exclusive_access();
if mutex_inner.locked {
mutex_inner
.wait_queue
.push_back(current_task().unwrap());
drop(mutex_inner);
block_current_and_run_next();
} else {
mutex_inner.locked = true;
}
}
fn unlock(&self) {
let mut mutex_inner = self.inner.exclusive_access();
assert!(mutex_inner.locked);
if let Some(waking_task) =
mutex_inner.wait_queue.pop_front()
{
wakeup_task(waking_task);
} else {
mutex_inner.locked = false;
}
}
// ------------
CAS
, TAS
for Atomic*
:
compare_exchange
compare_and_swap
test_and_set
fn compare_and_swap(ptr: *mut i32, expected: i32, new: i32) -> i32 {
let original = unsafe { *ptr };
if original == expected {
unsafe { *ptr = new; }
}
original
}
fn test_and_set(ptr: *mut i32, new: i32) -> i32 {
let original = unsafe { *ptr };
unsafe { *ptr = new };
original
}
add timer(TimerCondvar
) using BinaryHeap
// fields:
pub count: isize,
pub wait_queue: VecDeque<Arc<TaskControlBlock>>
pub fn up(&self) {
let mut inner = self.inner.exclusive_access();
inner.count += 1;
if inner.count <= 0 {
if let Some(task) = inner.wait_queue.pop_front() {
wakeup_task(task);
}
}
}
pub fn down(&self) {
let mut inner = self.inner.exclusive_access();
inner.count -= 1;
if inner.count < 0 {
inner.wait_queue.push_back(current_task().unwrap());
drop(inner);
block_current_and_run_next();
}
}
pub wait_queue: VecDeque<Arc<TaskControlBlock>>
pub fn signal(&self) {
let mut inner = self.inner.exclusive_access();
if let Some(task) = inner.wait_queue.pop_front() {
wakeup_task(task);
}
}
pub fn wait(&self, mutex: Arc<dyn Mutex>) {
mutex.unlock();
let mut inner = self.inner.exclusive_access();
inner.wait_queue.push_back(current_task().unwrap());
drop(inner);
block_current_and_run_next();
mutex.lock();
}
struct Barrier {
mutex_id: usize,
condvar_id: usize,
count: UnsafeCell<usize>,
}
pub fn block(&self) {
mutex_lock(self.mutex_id);
let count = self.count.get();
// SAFETY: Here, the accesses of the count is in the
// critical section protected by the mutex.
unsafe { *count = *count + 1; }
if unsafe { *count } == THREAD_NUM {
condvar_signal(self.condvar_id);
} else {
condvar_wait(self.condvar_id, self.mutex_id);
condvar_signal(self.condvar_id);
}
mutex_unlock(self.mutex_id);
}