Lseek implementation


 
Thread Tools Search this Thread
Top Forums UNIX for Dummies Questions & Answers Lseek implementation
# 1  
Old 09-13-2011
Question Lseek implementation

Hi everybody,

i've been googling for ages now and gotten kinda desperate... The question, however, might be rather trivial for the experts: What is it exactly, i.e. physically, the POSIX function (for a file) "lseek" does? Does it trigger some kind of synchronization on disk? Is it just for the file system?

Rationale:
I'm am running some benchmarks to get an idea, how our system (ext4@Debian5) works. I'm having 100 threads reading or writing randomly small requests on disk (POSIX read/write with DIRECT_IO) -> read,lseek,read,lseek,... or write,lseek,write,lseek,... . The mean lseek response time while reading is marginally small, however, the mean lseek response time while writing is appr. as high as the mean response time of a write itself (several ms), and I don't know why...

Any help is appreciated.
# 2  
Old 09-13-2011
ext4 uses generic_file_llseek for lseek, and I find this implementation for that in fs/read_write.c:
Code:
/**
 * generic_file_llseek - generic llseek implementation for regular files
 * @file:       file structure to seek on
 * @offset:     file offset to seek to
 * @origin:     type of seek
 *
 * This is a generic implemenation of ->llseek useable for all normal local
 * filesystems.  It just updates the file offset to the value specified by
 * @offset and @origin under i_mutex.
 */
loff_t generic_file_llseek(struct file *file, loff_t offset, int origin)
{
        loff_t rval;

        mutex_lock(&file->f_dentry->d_inode->i_mutex);
        rval = generic_file_llseek_unlocked(file, offset, origin);
        mutex_unlock(&file->f_dentry->d_inode->i_mutex);

        return rval;
}

/**
 * generic_file_llseek_unlocked - lockless generic llseek implementation
 * @file:       file structure to seek on
 * @offset:     file offset to seek to
 * @origin:     type of seek
 *
 * Updates the file offset to the value specified by @offset and @origin.
 * Locking must be provided by the caller.
 */
loff_t
generic_file_llseek_unlocked(struct file *file, loff_t offset, int origin)
{
        struct inode *inode = file->f_mapping->host;

        switch (origin) {
        case SEEK_END:
                offset += inode->i_size;
                break;
        case SEEK_CUR:
                /*
                 * Here we special-case the lseek(fd, 0, SEEK_CUR)
                 * position-querying operation.  Avoid rewriting the "same"
                 * f_pos value back to the file because a concurrent read(),
                 * write() or lseek() might have altered it
                 */
                if (offset == 0)
                        return file->f_pos;
               break;
        }

        if (offset < 0 || offset > inode->i_sb->s_maxbytes)
                return -EINVAL;

        /* Special lock needed here? */
        if (offset != file->f_pos) {
                file->f_pos = offset;

                file->f_version = 0;
        }

        return offset;
}

So really, nothing to it, and the only thing that could be blocking is that mutex...

I think you've saturated the kernel with so many simultaneous system calls to the same inode that they're competing for i_mutex.

I don't think this'd happen if you hadn't opened it with O_DIRECT. Caching is your friend...

Last edited by Corona688; 09-13-2011 at 11:05 AM..
# 3  
Old 09-13-2011
POSIX specifies programming APIs. It is silent on the implementation of those APIs.

However, the behavior you see if what I would expect. Writes by their very nature are going to take longer than reads. Reads can come from cache. Writes cannot.
# 4  
Old 09-13-2011
Thank you for your replies.

Quote:
Originally Posted by Corona688
ext4 uses generic_file_llseek for lseek, and I find this implementation for that in fs/read_write.c:
(...)
So really, nothing to it, and the only thing that could be blocking is that mutex...

I think you've saturated the kernel with so many simultaneous system calls to the same inode that they're competing for i_mutex.
(...)
I'm trying to wrap my mind around this... The mutex should be released after the lseek, right? Is the mutex active while writing? Otherwise the behaviour explanied below wouldn't make sense to me, as either lseek while reading would be slow as well or the mutex should be released rather quickly... :S

Quote:
Originally Posted by fpmurphy
(...)

However, the behavior you see if what I would expect. Writes by their very nature are going to take longer than reads. Reads can come from cache. Writes cannot.
I would hardly believe this statement to be generally true as writes can be asynchronous, but that is another story.

The point is that I'm having huge lseek latencies when running a benchmark where 100 threads are writing randomly into files compared to 100 threads randomly reading files:
a) read, lseek, read, lseek, read, lseek,...
mean read latency: ~4ms
mean lseek latency: ~0,001ms
b) write, lseek, write, lseek, ...
mean write latency: ~10ms
mean lseek latency: ~8ms
Smilie
# 5  
Old 09-13-2011
Quote:
Originally Posted by Humudituu
I'm trying to wrap my mind around this... The mutex should be released after the lseek, right? Is the mutex active while writing? Otherwise the behaviour explanied below wouldn't make sense to me, as either lseek while reading would be slow as well or the mutex should be released rather quickly... :S
I suspect reads are happening faster than writes because a disk has its own internal cache, too.

That mutex must control more than just the a offset...

On thinking about this a little more, I think this happens because POSIX requires the ordering of some block operations to be preserved. It's pretty much just common-sense rules, like if one program reads a block after another writes to it, the reading program should get the new contents and not the old.

Forcing things to go in order is easy when you have cache. Just keep the cache consistent and everything's golden. Things don't have to wait for each other. Reads still happen randomly as needed, while disk writes happen in orderly groups, at times of the kernel's own choosing. ext2/3/4 are designed for this mode of operation.

When you switch to direct I/O, writes must happen in lock-step for consistency to be preserved. The order of reads doesn't matter as much.

I suspect you'd get better performance by writing to a raw disk device instead of a file on disk. That's the context I usually see O_DIRECT employed in.

Last edited by Corona688; 09-13-2011 at 02:27 PM..
 
Login or Register to Ask a Question

Previous Thread | Next Thread

10 More Discussions You Might Find Interesting

1. Programming

Problem with lseek call.

The following code: int fd; if (fd = open("mem", O_RDONLY) == -1) return 1; if (lseek(fd, 0, SEEK_SET) == -1) { char *buf = malloc(512); buf = strerror(errno); printf("%s\n", buf); return 1; } always returns with "illegal seek"... (2 Replies)
Discussion started by: Sir_Tomasz
2 Replies

2. AIX

Backup: The lseek call failed

Hi, We are facing issues while backing up our 1205 GB filesystem on LTO5 Tape. During backup the "backup: The lseek call failed." messages were generated, I want to know why these messages were generating AIX version is: 6100-08-00-0000 backup: The date of this level 0 backup is Mon Mar 11... (4 Replies)
Discussion started by: m_raheelahmed
4 Replies

3. Programming

what is the main difference between difference between using nonatomic lseek and O_APPEND

I think both write at the end of the file ...... but is there a sharp difference between those 2 instruction ..... thank you this is my 3rd question today forgive me :D (1 Reply)
Discussion started by: fwrlfo
1 Replies

4. UNIX for Dummies Questions & Answers

Understanding lseek

I tried to use lseek system call to determine the number of bytes in a file. To do so, I used open system call with O_APPEND flag to open a file. As lseek returns the current offset so I called lseek for opened file with offset as zero and whence as SEEK_CUR. So I guess it must return the number of... (3 Replies)
Discussion started by: Deepak Raj
3 Replies

5. UNIX for Dummies Questions & Answers

lseek() equivalent

I know there is lseek() function that will allow to write or read from certain position in the file. Is there similar function that will let do same but for array rather then file? (9 Replies)
Discussion started by: joker40
9 Replies

6. UNIX for Advanced & Expert Users

Malloc Implementation in C

Hey Guys Some of my friends have got together and we are trying to write a basic kernel similar to Linux. I am trying to implement the malloc function in C and I am using a doubly linked list as the primary data structure. I need to allocate memory for this link list (duh...) and I don't feel... (2 Replies)
Discussion started by: rbansal2
2 Replies

7. Programming

Malloc implementation in C

Hey Guys I am trying to implement the malloc function for my OS class and I am having a little trouble with it. I would be really grateful if I could get some hints on this problem. So I am using a doubly-linked list as my data structure and I have to allocate memory for it (duh...). The... (1 Reply)
Discussion started by: Gambit_b
1 Replies

8. Programming

Hairy Problem! lseek over 4G

recently my project needs me to lseek a position over 4G size.... i found in linux or unix the parameters are all ulong 32 bits...the limit dooms the movement of a position over 4G I was told that i should lseek64 to meet my need... but i have no idea where i can get the function neither by "man... (8 Replies)
Discussion started by: macroideal
8 Replies

9. Shell Programming and Scripting

Need help on AWK implementation

Hi, I am accepting a string from user. compare this output with the awk output as below... echo "\n\n\tDay : \c" read day awk '{ if($day == $2) { if ($mon == $1) { print "Yes" }}}' syslog.txt I am getting the follwoing error awk: Field $() is not correct. The input line... (5 Replies)
Discussion started by: EmbedUX
5 Replies

10. Programming

lseek in c

sir, i used lssek as this lseek(fp,-10,2); i am not getting any output i dont now why can you explan sir.. Thanks in advance, Arunkumar (4 Replies)
Discussion started by: arunkumar_mca
4 Replies
Login or Register to Ask a Question