Pruebas de performance al kernel 2.6.x

Imagen de Epe

Tema: 

En el sitio kerneltrap.org ha salido un interesante artículo sobre pruebas de benchmark hechas al kernel 2.6, aqui voy a resumir en español el inicio del artículo, los comentarios son muchos, los dejo en inglés.

El artículo original está aquí.

En un post reciente en lkml, Kenneth Chen resumió, "el ascenso de la montaña rusa prosigue en el kernel 2.6 por la forma en que se mide el desempeño usando una utilería de pruebas de desempeño. Medimos el kernel 2.6.11 y encontramos que este está un 13% por debajo de la referencia tomada." La referencia que menciona fué un servidor con RedHat Enterprise 3 Linux , y fue comparada con el último 2.6.11 y también con algunos kernels previos (de la 2.6). En sus pruebas de desempeño, el 2.6.9 estuvo un 9% más lento que la referencia en las pruebas de transacciones de bases de datos, el 2.6.8 fue un 23% más lento y el 2.6.2 sólo un 1% más lento. Kenneth
también hizo notar que tiene el propósito de probar cada uno de los kernels que sigan saliendo respecto al performance.

El creador de Linux - Linus Torvalds - se mostró preocupado sobre qué tan duplicables podrían ser estas pruebas , y solicitó que estas se repitan con cada snapshot diario del kernel, de ser posible, o en todo caso cada dos o tres días. Linus explica, "el hacer pruebas sólo en kernels liberados significará que habrá una demora de uno a dos meses para decirle a los desarrolladores que algo echó a perder el desempeño del kernel" Hacerlo cada día, o al menos un par de veces a la semana, será mucho más interesante. Explicó que con pruebas de desempeño más frecuentes se facilita la labor de detectar dónde ha estado el problema y corregirlo. También indicó que snapshots diarios pueden ser obtenidos en kernel.org, y que también sería de mucha ayuda volver en el tiempo y probar kernels antiguos para buscar cambios bruscos en el desempeño de los kernels. Linus concluyó, "Me encantaría ver resultados frecuentes para la mayoría de las pruebas. Aunque claro, es preferible que se emitan resultados condensados sobre las pruebas"

Esta es una traducción libre, hecha por mi.

En resumen, siempre he sido partidario de no ir tras el ultimo de los kernels ni el ultimo de los paquetes de no ser necesario, porque en el código libre, no siempre lo último es lo mejor, sino que a veces se cometen errores o fallas que pueden incluir problemas en la seguridad o simplemente disminuir el performance como es este caso.

No dudo que el problema se arregle o al menos se tomen acciones para corregirlo, pero definitivamente es necesario siempre esperar un poco y usar aplicaciones más estables y posiblemente con más desempeño para una necesidad comercial o algo estable.

La discusión completa en inglés:

From: Chen, Kenneth W [email blocked]

To: "'Andrew Morton'" [email blocked]

Subject: Industry db benchmark result on recent 2.6 kernels

Date: Mon, 28 Mar 2005 11:33:19 -0800

The roller coaster ride continues for the 2.6 kernel on how it measure

up in performance using industry standard database transaction processing

benchmark. We took a measurement on 2.6.11 and found it is 13% down from

the baseline.

We will be taking db benchmark measurements more frequently from now on with

latest kernel from kernel.org (and make these measurements on a fixed interval).

By doing this, I hope to achieve two things: one is to track base kernel

performance on a regular base; secondly, which is more important in my opinion,

is to create a better communication flow to the kernel developers and to keep

all interested party well informed on the kernel performance for this enterprise

workload.

With that said, here goes our first data point along with some historical data

we have collected so far.

2.6.11 -13%

2.6.9 - 6%

2.6.8 -23%

2.6.2 - 1%

baseline (rhel3)

The glory detail on the benchmark configuration: 4-way SMP, 1.6 GHz Intel

itanium2, 64GB memory, 450 73GB 15k-rpm disks. All experiments were done

With exact same hardware and application software, except different kernel

versions.

From: Linus Torvalds [email blocked]

Subject: Re: Industry db benchmark result on recent 2.6 kernels

Date: Tue, 29 Mar 2005 16:00:09 -0800 (PST)

On Mon, 28 Mar 2005, Chen, Kenneth W wrote:

>

> With that said, here goes our first data point along with some historical data

> we have collected so far.

>

> 2.6.11 -13%

> 2.6.9 - 6%

> 2.6.8 -23%

> 2.6.2 - 1%

> baseline (rhel3)

How repeatable are the numbers across reboots with the same kernel? Some

benchmarks will depend heavily on just where things land in memory,

especially with things like PAE or even just cache behaviour (ie if some

frequenly-used page needs to be kmap'ped or not depending on where it

landed).

You don't have the PAE issue on ia64, but there could be other issues.

Some of them just disk-layout issues or similar, ie performance might

change depending on where on the disk the data is written in relationship

to where most of the reads come from etc etc. The fact that it seems to

fluctuate pretty wildly makes me wonder how stable the numbers are.

Also, it would be absolutely wonderful to see a finer granularity (which

would likely also answer the stability question of the numbers). If you

can do this with the daily snapshots, that would be great. If it's not

easily automatable, or if a run takes a long time, maybe every other or

every third day would be possible?

Doing just release kernels means that there will be a two-month lag

between telling developers that something pissed up performance. Doing it

every day (or at least a couple of times a week) will be much more

interesting.

I realize that testing can easily be overwhelming, but if something like

this can be automated, and run in a timely fashion, that would be really

great. Two months (or half a year) later, and we have absolutely _no_ idea

what might have caused a regression. For example, that 2.6.2->2.6.8 change

obviously makes pretty much any developer just go "I've got no clue".

In fact, it would be interesting (still) to go back in time if the

benchmark can be done fast enough, and try to do testing of the historical

weekly (if not daily) builds to see where the big differences happened. If

you can narrow down the 6-month gap of 2.6.2->2.6.8 to a week or a few

days, that would already make people sit up a bit - as it is it's too big

a problem for any developer to look at.

The daily patches are all there on kernel.org, even if the old ones have

been moved into /pub/linux/kernel/v2.6/snapshots/old/.. It's "just" a

small matter of automation ;)

Btw, this isn't just for you either - I'd absolutely _love_ it for pretty

much any benchmark. So anybody who has a favourite benchmark, whether

"obviously relevant" or not, and has the inclination to make a _simple_

daily number (preferably a nice graph), go for it.

Linus

From: Chen, Kenneth W [email blocked]

Subject: RE: Industry db benchmark result on recent 2.6 kernels

Date: Tue, 29 Mar 2005 16:22:20 -0800

On Mon, 28 Mar 2005, Chen, Kenneth W wrote:

> With that said, here goes our first data point along with some historical data

> we have collected so far.

>

> 2.6.11 -13%

> 2.6.9 - 6%

> 2.6.8 -23%

> 2.6.2 - 1%

> baseline (rhel3)

Linus Torvalds wrote on Tuesday, March 29, 2005 4:00 PM

> How repeatable are the numbers across reboots with the same kernel? Some

> benchmarks will depend heavily on just where things land in memory,

> especially with things like PAE or even just cache behaviour (ie if some

> frequenly-used page needs to be kmap'ped or not depending on where it

> landed).

Very repeatable. This workload is very steady and resolution in throughput

is repeatable down to 0.1%. We toss everything below that level as noise.

> You don't have the PAE issue on ia64, but there could be other issues.

> Some of them just disk-layout issues or similar, ie performance might

> change depending on where on the disk the data is written in relationship

> to where most of the reads come from etc etc. The fact that it seems to

> fluctuate pretty wildly makes me wonder how stable the numbers are.

This workload has been around for 10+ years and people at Intel studied the

characteristics of this workload inside out for 10+ years. Every stones will

be turned at least more than once while we tune the entire setup making sure

everything is well balanced. And we tune the system whenever there is a

hardware change. Data layout on the disk spindle are very well balanced.

> Also, it would be absolutely wonderful to see a finer granularity (which

> would likely also answer the stability question of the numbers). If you

> can do this with the daily snapshots, that would be great. If it's not

> easily automatable, or if a run takes a long time, maybe every other or

> every third day would be possible?

I sure will make my management know that Linus wants to see the performance

number on a daily bases (I will ask for a couple of million dollar to my

manager for this project :-))