cfq misbehaving on 2.6.11-1.14_FC3

linux

cfq misbehaving on 2.6.11-1.14_FC3

Postby spaminos-ker » Sun, 12 Jun 2005 08:00:16 GMT

Hello, I am running into a very bad problem on one of my production servers.

* the config
Linux Fedora core 3 latest everything, kernel 2.6.11-1.14_FC3
AMD Opteron 2 GHz, 1 G RAM, 80 GB Hard drive (IDE, Western Digital)

I have a log processor running in the background, it's using sqlite for storing
the information it finds in the logs. It takes a few hours to complete a run.
It's clearly I/O bound (SleepAVG = 98%, according to /proc/pid/status).
I have to use the cfq scheduler because it's the only scheduler that is fair
between processes (or should be, keep reading).

* the problem
Now, after an hour or so of processing, the machine becomes very unresponsive
when trying to do new disk operations. I say new because existing processes
that stream data to disk don't seem to suffer so much.

On the other hand, opening a blank new file in vi and saving it takes about 5
minutes or so.
Logging in with ssh just times out (so I have to keep a connection open to
avoid being locked out). << that's where it's a really bad problem for me :)

Now, if I switch the disk to anticipatory or deadline, by setting
/sys/block/hda/queue/scheduler, things go back to regular times very quickly.
Saving a file in vi takes about 12 seconds (slow, but not unbearable,
considering the machine is doing a lot of things).
Logging in takes less than a second.

I did a strace on the process that is causing havock, and the pattern of usage
is:
* open files
*
about 5000 of combinations of
llseek+read
llseek+write
in 1000 bytes requests.
* close files

The process is also niced to 8, but it doesn't seem to make any difference. I
found references to a "ionice" or "iorenice" syscall, but that doesn't seem to
exist anymore.
I thought that the i/o scheduler was taking the priority into account?

Is this a know problem? I also thought that timed cfq was supposed to take care
of such workloads?

Any idea on how I could improve the situation?

Thanks

Nicolas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Andrew Morton » Sun, 12 Jun 2005 18:40:09 GMT



It might be useful to test 2.6.12-rc6-mm1 - it has a substantially
rewritten CFQ implementation.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Andrew Morton » Wed, 15 Jun 2005 16:10:10 GMT




Bear in mind that after one minute, all of vi's text may have been
reclaimed from pagecache, so the above would have to do a lot of randomish
reads to reload vi into memory.  Try reducing the sleep interval a lot.


Don't know.  Try to work out (from vmstat or diskstats) how much reading is
going on.

Try stracing the check, see if your version of vi is doing a sync() or
something odd like that.


OK, well if the latency is mainly due to reads then one would hope that the
anticipatory scheduler would do better than that.

But what happened to this, from your first report?


Are you able to reproduce that 5-minute stall in the more recent testing?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby spaminos-ker » Thu, 16 Jun 2005 08:30:10 GMT



The read/write patterns of the background process is about 35% reads.

vi is indeed doing a sync on the open file, and that's where the time was
spend.
So I just changed my test to simply opening a file, writing some data in it and
calling flush on the fd.

I also reduced the sleep to 1s instead of 1m, and here are the results:

cfq: 20,20,21,21,20,22,20,20,18,21 - avg 20.3
noop: 12,12,12,13,5,10,10,12,12,13 - avg 11.1
deadline: 16,9,16,14,10,6,8,8,15,9 - avg 11.1
as: 6,11,14,11,9,15,16,9,8,9 - avg 10.8

As you can see, cfq stands out (and it should stand out the other way).


I suspect the latency is due to writes: it seems (and correct me if I am wrong)
that write requests are enqueued in one giant queue, thus the cfq algorithm can
not be applied to the requests.

Either that, or there is a different queue that cancels out the benefits of cfq
when writing (because even though the writes are down the right way, this other
queue to the device keeps way too much data).

But then, why would other i/o schedulers perform better in that case?

The most I got with this kernel, is a 1 minute stall, so there is improvement
there. Yet, a single process should not be able to cause this kind of stall
with cfq.

Nicolas


------------------------------------------------------------
video meliora proboque deteriora sequor
------------------------------------------------------------
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Jens Axboe » Sat, 18 Jun 2005 23:20:15 GMT





This doesn't look good (or expected) at all. In the initial posting you
mention this being an ide driver - I want to make sure if it's hda or
sata driven (eg sda or similar)?


That is correct. Each process has a sync queue associated with it, async
requests like writes go to a per-device async queue. The cost of
tracking who dirtied a given page was too large and not worth it.
Perhaps rmap could be used to lookup who has a specific page mapped...


Yeah, the global write queue doesn't explain anything, the other
schedulers either share read/write queue or have a seperate single write
queue as well.

I'll try and reproduce (and fix) your problem.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Andrea Arcangeli » Sun, 19 Jun 2005 01:00:26 GMT



I doubt, the computing and locking cost for every single page write
would be probably too high. Doing it during swapping isn't a big deal
since cpu is mostly idle during swapouts, but doing it all the time
sounds a bit overkill.

A mechanism to pass down a pid would be much better. However I'm unsure
where you could put the info while dirtying the page. If it was an uid
it might be reasonable to have it in the address_space, but if you want
a pid as index, then it'd need to go in the page_t, which would waste
tons of space. Having a pid in the address space, may not work well with
a database or some other app with multiple processes.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Jens Axboe » Sun, 19 Jun 2005 03:20:11 GMT





We could cut the lookup down to per-request, it's not very likely that
seperate threads would be competing for the exact same disk location.
But it's still not too nice...


The previous patch just added a pid_t to struct page, but I knew all
along that this was just for testing, I never intended to merge that
part.

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby spaminos-ker » Sun, 19 Jun 2005 08:10:08 GMT



This is a regular IDE drive (a WDC WD800JB), no SATA, using hda

I didn't mention it before, but this is on a AMD8111 board.


I don't know how all this works, but would there be a way to slow down the
offending writer by not allowing too many pending write requests per process?
Is there a tunable for the size of the write queue for a given device?
Reducing it will reduce the throughput, but the latency as well.

Of course, there has to be a way to get this to work right.

To go back to high latencies, maybe a different problem (but at least closely
related):

If I start in the background the command
dd if=/dev/zero of=/tmp/somefile2 bs=1024

and then run my test program in a loop, with
while true ; do time ./io 1; sleep 1s ; done

I get:

cfq: 47,33,27,48,32,29,26,49,25,47 -> 36.3 avg
deadline: 32,28,52,33,35,29,49,39,40,33 -> 37 avg
noop: 62,47,57,39,59,44,56,49,57,47 -> 51.7 avg

Now, cfq doesn't behave worst than the others, like expected (now, why it
behaved worst with the real daemons, I don't know).
Still > 30 seconds has to be improved for cfq.

the test program being:

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>

int main(int argc, char **argv) {
        int fd,bytes;

        fd = open("/tmp/somefile", O_WRONLY | O_CREAT | O_CREAT, S_IRWXU);
        if (fd < 0) {
                perror("Could not open file");
                return 1;
        }
        bytes = write(fd, &fd, sizeof(fd));
        if (bytes < sizeof(fd)) {
                perror("Could not write");
                return 2;
        }
        if (argc != 1) {
                fsync(fd);
        }
        close(fd);
        return 0;
}

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Jens Axboe » Thu, 23 Jun 2005 18:40:08 GMT



The 2.4 SUSE kernel actually has something in place to limit in-flight
write requests against a single device. cfq will already limit the
number of write requests you can have in-flight against a single queue,
but it's request based and not size based.


THe problem here is that cfq  (and the other io schedulers) still
consider the io async even if fsync() ends up waiting for it to
complete. So there's no real QOS being applied to these pending writes,
and I don't immediately see how we can improve that situation right now.

What file system are you using? I ran your test on ext2, and it didn't
give me more than ~2 seconds latency for the fsync. Tried reiserfs now,
and it's in the 23-24 range.

-- 
Jens Axboe < XXXX@XXXXX.COM >

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby spaminos-ker » Fri, 24 Jun 2005 03:10:14 GMT


<I might sound stupid>
I still don't understand why async requests are in a different queue than the
sync ones?
Wouldn't it be simpler to consider all the IO the same, and like you pointed
out, consider synced IO to be equivalent to async + some sync (as in wait for
completion) call (fsync goes a little too far).
</I might sound stupid>

I am using ext3 on Fedora Core 3.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Jens Axboe » Fri, 24 Jun 2005 06:00:15 GMT





First, lets cover a little terminology. All io is really async in Linux,
the block io model is inherently async in nature. So sync io is really
just async io that is being waited on immediately. When I talk about
sync and async io in the context of the io scheduler, the sync io refers
to io that is wanted right away. That would be reads or direct writes.
The async io is something that we can complete at will, where latency
typically doesn't matter. That would be normal dirtying of data that
needs to be flushed to disk.

Another property of sync io in the io scheduler is that it usually
implies that another sync io request will follow immediately (well,
almost) after one has completed. So there's a depedency relation between
sync requests, that async requests don't share.

So there are different requirements for sync and async io. The io
scheduler tries to minimize latencies for async requests somewhat,
mainly just by making sure that it isn't starved for too long. However,
when you do an fsync, you want to complete lots of writes, but the io
scheduler doesn't get this info passed down. If you keep flooding the
queue with new writes, this could take quite a while to finish. We could
improve this situation by only flushing out the needed data, or just a
simple hack to onlu flush out already queued io (provided the fsync()
already made sure that the correct data is already queued).

I will try and play a little with this, it's definitely something that
would be interesting and worthwhile to improve.


Journalled file systems will behave worse for this, because it has to
tend to the journal as well. Can you try mounting that partition as ext2
and see what numbers that gives you?

-- 
Jens Axboe

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby spaminos-ker » Sat, 25 Jun 2005 03:50:11 GMT



I did the tests again on a partition that I could mkfs/mount at will.

On ext3, I get about 33 seconds average latency.

And on ext2, as predicted, I have latencies in average of about 0.4 seconds.

I also tried reiserfs, and it gets about 22 seconds latency.

As you pointed out, it seems that there is a flow in the way IO queues and
journals (that are in some ways queues as well), interact in the presence of
flushes.

Nicolas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby Con Kolivas » Sat, 25 Jun 2005 08:40:04 GMT

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 





I found the same, and the effect was blunted by noatime and 
journal_data_writeback (on ext3). Try them one at a time and see what you 
get.

Cheers,
Con

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBCu0bjZUg7+tp6mRURAhPPAJ46XcrflO3LSn5xaAAszUFsSYPS/QCfULLf
BS2wcpNz4XBAhdVdx/lAPZE=
=rCyR
-----END PGP SIGNATURE-----


Re: cfq misbehaving on 2.6.11-1.14_FC3

Postby spaminos-ker » Sat, 25 Jun 2005 11:40:10 GMT



I had to move to a different box, but get the same kind of results (for ext3
default mount options).

Here are the latencies (all cfq) I get with different values for the mount
parameters

ext2 default
0.1s

ext3 default
52.6s avg

reiser defaults
29s avg 5 minutes
then, 
12.9s avg

ext3 rw,noatime,data=writeback
0.1s avg

reiser rw,noatime,data=writeback
4s avg for 20 seconds
then 0.1 seconds avg


So, indeed adding noatime,data=writeback to the mount options improves things a
lot.
I also tried without the noatime, and that doesn't make much difference to me.

That looks like a good workaround, I'll now try with the actual server and see
how things go.

Nicolas

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to  XXXX@XXXXX.COM 
More majordomo info at   http://www.**--****.com/ 
Please read the FAQ at   http://www.**--****.com/ 

Similar Threads:

1.Two network connections NICs not activating on Fedora 3 Linux 2.6.11-1.14_FC3

Hi,

   Running Fedora kernel 2.6.11-1.14_FC3

   This problem has been present since Fedora 2 (kernel 2.6 ?)

   I have a generic laptop with built-in UTP LAN socket and a Netgear
802.11b PCMCIA card.

   When the laptop boots, the onboard LAN is assigned eth0 and the
PCMCIA Netgear eth1 .

    Both appear as Active in the Network Device dialog - even if no UTP
cable is connected to the eth0 port .

   When UTP cable is plugged in, the eth0 can get DHCP IP Address from
the router and then connect out to the internet.

   However, when there is no UTP cable connected but there is wireless
card that shows steady green light, the device is not connected to the
router. I have to disable BOTH eth0 and eth1 then activate eth1 (the
wireless card) to get a DHCP IP address and then an active network
connection

   Is there anyway to get the NetGear wireless card on eth1 active and
usable after boot without going through all of this?

Thanks

Clive

2.Mouse problems with " Fedora Core (2.6.11-1.14_FC3)" kernel

Hi,

I have just installed a FC new kernel on my notebook, but it seems that
the mouse is less sensitive (you must hit strong the mouse). With the
previous kernel (2.6.10-1.770_FC3) the mouse run right. Has anybody
this problem ?

Thanks,
Jose Luis.

3.2.6.11-ck1 (cfq-timeslice)

4.[PATCH 11/15] cfq-iosched: don't pass unused preemption variable around

5.[PATCH 11/20] blkio: Some CFQ debugging Aid

6. [PATCH] cfq: Take whether cfq group is changed into account when choosing service tree

7. [PATCH] cfq: Take whether cfq group is changed into account when choosing service tree

8. [PATCH Linux 2.6.12-rc4-mm2 01/03] cfq: cfq ELEVATOR_INSERT_BACK fix



Return to linux

 

Who is online

Users browsing this forum: No registered users and 63 guest