Linux Kernel VFS-Read(2)

2021SC@SDUSC

vfs-read

Function vfs-read is in fs/read_write.c just as sys_read is.In the last blog we've analysed about sys_read, and a dramatic thing is that in the OS experiment we've operated this term, file system is also a fancinating part.The experiment uses Nachos as platform, which is a simulation of operating system runs on Linux(I heard that it is also available in Windows somehow)based in Linux itself. In the last blog I said that how system call read transfer to vfs_read is out of the question of these blogs, but in the experiment after doing it myself(in a simpler way of course), I now a little bit understand that system calls are functioned as exceptions.When an exception is caught, the system will discuss the type of it in which sys_call is a particular one.Then the real function like sys_read will be called to handle this exception, the system call.

 Let's go back to topic.Here is the code(ver 5.14.8)

ssize_t vfs_read(struct file *file, char __user *buf, size_t count, loff_t *pos)
{
	ssize_t ret;

	if (!(file->f_mode & FMODE_READ))
		return -EBADF;
	if (!(file->f_mode & FMODE_CAN_READ))
		return -EINVAL;
	if (unlikely(!access_ok(buf, count)))
		return -EFAULT;

	ret = rw_verify_area(READ, file, pos, count);
	if (ret)
		return ret;
	if (count > MAX_RW_COUNT)
		count =  MAX_RW_COUNT;

	if (file->f_op->read)
		ret = file->f_op->read(file, buf, count, pos);
	else if (file->f_op->read_iter)
		ret = new_sync_read(file, buf, count, pos);
	else
		ret = -EINVAL;
	if (ret > 0) {
		fsnotify_access(file);
		add_rchar(current, ret);
	}
	inc_syscr(current);
	return ret;
}

The function will first check the flags if the file is allowed to read or not, if not, just return the fault.Then rw_verify_area function will check some basic parameters to ensure the read operation can be done, such as if the position to read is below zero or the distant of reading is out of the range for the file.If something wrong occurs, ret will carry the faliure and the function stops immidiatly.

int rw_verify_area(int read_write, struct file *file, const loff_t *ppos, size_t count)
{
	struct inode *inode;
	int retval = -EINVAL;

	inode = file_inode(file);
	if (unlikely((ssize_t) count < 0))
		return retval;

	/*
	 * ranged mandatory locking does not apply to streams - it makes sense
	 * only for files where position has a meaning.
	 */
	if (ppos) {
		loff_t pos = *ppos;

		if (unlikely(pos < 0)) {
			if (!unsigned_offsets(file))
				return retval;
			if (count >= -pos) /* both values are in 0..LLONG_MAX */
				return -EOVERFLOW;
		} else if (unlikely((loff_t) (pos + count) < 0)) {
			if (!unsigned_offsets(file))
				return retval;
		}

		if (unlikely(inode->i_flctx && mandatory_lock(inode))) {
			retval = locks_mandatory_area(inode, file, pos, pos + count - 1,
					read_write == READ ? F_RDLCK : F_WRLCK);
			if (retval < 0)
				return retval;
		}
	}

	return security_file_permission(file,
				read_write == READ ? MAY_READ : MAY_WRITE);
}

In the code, we see that the function checked pos if it has meaning, if count is below the system's available setting and if mandatory locks are able to set.

_vfs_read

Then _vfs_read is called.You may not see it because the function doesn't exist in the new versions, and the code is now saves in vfs_read itself, like this:

	if (file->f_op->read)
		ret = file->f_op->read(file, buf, count, pos);
	else if (file->f_op->read_iter)
		ret = new_sync_read(file, buf, count, pos);
	else
		ret = -EINVAL;

Through f_op the read function finally walk from vfs to read file systems such as ext4.If f_op(operations' list) has a read function existed, call read,if not read_iter may exist as well.If we could't find any of the methods, reture _EINVAL.

In ext4, the f_op is like this, and we can see read_iter in it.

const struct file_operations ext4_file_operations = {

        .llseek         = ext4_llseek,

        .read_iter      = ext4_file_read_iter,

        .write_iter     = ext4_file_write_iter,

        .unlocked_ioctl = ext4_ioctl,

#ifdef CONFIG_COMPAT

        .compat_ioctl   = ext4_compat_ioctl,

#endif 

        .mmap           = ext4_file_mmap,

        .mmap_supported_flags = MAP_SYNC,

        .open           = ext4_file_open,

        .release        = ext4_release_file,

        .fsync          = ext4_sync_file,

        .get_unmapped_area = thp_get_unmapped_area,

        .splice_read    = generic_file_splice_read,

        .splice_write   = iter_file_splice_write,

        .fallocate      = ext4_fallocate,

};

If there is no real file system, operations in def_blk_fops will be seeked.Remember that in linux, even devices are files so a file system must have organized them even there is no real file system.This is how vfs functions in the end, right?

const struct file_operations def_blk_fops = {

        .open           = blkdev_open,

        .release        = blkdev_close,

        .llseek         = block_llseek,

        .read_iter      = blkdev_read_iter,

        .write_iter     = blkdev_write_iter,

        .mmap           = generic_file_mmap,

        .fsync          = blkdev_fsync,

        .unlocked_ioctl = block_ioctl,

#ifdef CONFIG_COMPAT

        .compat_ioctl   = compat_blkdev_ioctl,

#endif

        .splice_read    = generic_file_splice_read,

        .splice_write   = iter_file_splice_write,

        .fallocate      = blkdev_fallocate,

};

new_sync_read

As for ext4, which is also my teamate's analysed fs, since read_iter is listed in f_op, new_sync_read will be done right after this.The function's also in read_write.c.

static ssize_t new_sync_read(struct file *filp, char __user *buf, size_t len, loff_t *ppos)
{
	struct iovec iov = { .iov_base = buf, .iov_len = len };
	struct kiocb kiocb;
	struct iov_iter iter;
	ssize_t ret;

	init_sync_kiocb(&kiocb, filp);
	kiocb.ki_pos = (ppos ? *ppos : 0);
	iov_iter_init(&iter, READ, &iov, 1, len);

	ret = call_read_iter(filp, &kiocb, &iter);
	BUG_ON(ret == -EIOCBQUEUED);
	if (ppos)
		*ppos = kiocb.ki_pos;
	return ret;
}

kiocb means io control block, which is used to trace io operations' status.iov_iter is used for transmitting data from kernal to users, and since that happens among all the running time, the struct is used in many places than here in vfs.

Init_sync_kiocb and iov_iter_init will initialize those structs(although I still didn't figure out how these functions are named)

static inline void init_sync_kiocb(struct kiocb *kiocb, struct file *filp)

{         

        *kiocb = (struct kiocb) {

                .ki_filp = filp,

                .ki_flags = iocb_flags(filp),

                .ki_hint = file_write_hint(filp), 

        };

}

After initialization, function call_read_iter will be done.This is quite simple, but is a bridge that finally make us access the operations for various file systems.

static inline ssize_t call_read_iter(struct file *file, struct kiocb *kio,

                                     struct iov_iter *iter)

{       return file->f_op->read_iter(kio, iter);

} 

We could see that in this function, read_iter in f_op(listed the operations for the targeted file systems such as ext4) will be called.Before this function, all the work yet is done in the virtual file system, and after this, we'll have to find directly into different file systems for their different operations towards their own structure and organizations for the saved files.

This is a wonderful place to better understand about the role vfs played in Linux system.

LOL.

In the next blog, we will use ext4 as example to finish the rest parts of the read function.Since different fils systems have different ways to read their own file(the read_iter function) and hand in the data to the vfs, we will not be able to analyse all of them.

上一篇:Jupyter Notebook使用多个conda虚拟环境


下一篇:opencv基础