mod_deflate on high traffic site?

Status
Not open for further replies.

jmcc

Active Member
My main site has been getting between 60K and 83K page impressions per day. Most of these are due to Google deep spidering the site and it is a sustained level of traffic. (1.3M pages last month, 977K pages already this month.) The box serving the site is a 3G P4 and I've been thinking of implementing mod_deflate in order to reduce the bandwidth. I think Google's googlebot spider can handle gzipped/compressed pages. Can the other major spiders handle compression or is there any problem with mod_deflate and browsers? The other aspect is that the box is serving a page roughly every second. Will mod_deflate slow this rate down or significantly increase the load on the server?

Regards...jmcc
 

niall

New Member
Will mod_deflate slow this rate down or significantly increase the load on the server?

As you're already stated, mod_deflate is basically trading CPU for bandwidth. In our case CPU is more expensive than bandwidth, so we don't bother with mod_deflate and it's friends.

However, if you have aggressive caching behind mod-deflate it might be a help to lower your bandwidth consumption while not hitting the CPU too hard. Then again, depending on how you're doing the caching, it might not be a great help with a searchbot crawling the site.
 

jmcc

Active Member
However, if you have aggressive caching behind mod-deflate it might be a help to lower your bandwidth consumption while not hitting the CPU too hard. Then again, depending on how you're doing the caching, it might not be a great help with a searchbot crawling the site.
The main caching is in MySQL as each hoster record is generated from a highly integrated table. The search option is also optimised. Any direct query of the type that Google is making is very low cost in db terms. I guess I'll have to test various levels of compression to see which is best. However the load on the box is very low. The db access is completely Read Only except for the periodic updates.

Regards..jmcc
 

louie

New Member
Have you got many pages that doesn't change very often as well as navigation parts, etc.

You might save a lot of DB queries by catching them as plain HTML pages for a short period of time (lets say 15-30min). A simple time difference script can check to see if page is expired & get new data from DB or display the static one.

Catching pages helps as you can easily trim white spaces, new lines, etc... optimising the load time.
 

jmcc

Active Member
Have you got many pages that doesn't change very often as well as navigation parts, etc.
Well the 2004 to 2009 hoster pages don't change unless I add navigation elements or change advertising slots. The data itself doesn't change.

You might save a lot of DB queries by catching them as plain HTML pages for a short period of time (lets say 15-30min). A simple time difference script can check to see if page is expired & get new data from DB or display the static one.
The number of pages involved (48M or so) makes it easier to use a DB approach. Each year's data would cover around 1.2 million hosters but not every hoster's stats would be accessed. By tweaking MySQL's config, it is possible to cache the queries so that each subsquent query for the same page will be hitting the memory cache rather than doing a new query on the table.

Catching pages helps as you can easily trim white spaces, new lines, etc... optimising the load time.
When the traffic patterns are clearer, I might look at caching the high use pages. At the moment, the main traffic is Google deep spidering the website so almost every page being spidered is unique.

Regards...jmcc
 

jmcc

Active Member
It looks like you are on the ball on this. If you need any help let me know.
Thanks Louie,
It is still very much a learning process as this is the busiest site that I've ever developed. That static html thing has given me an idea.

Regards...jmcc
 

jmcc

Active Member
I've added mod_deflate and it seems to be working. At the moment I've only tested it with Firefox and IE8. Googlebot seems to be accepting the compressed data as is Yahoo's Slurp.

Regards...jmcc
 

mneylon

Administrator
Staff member
John
How are you monitoring performance?
If the site is using PHP, are you using any of the caching engines?

Michele
 

jmcc

Active Member
John
How are you monitoring performance?
If the site is using PHP, are you using any of the caching engines?
Tailing the logfile and webalizer, Michele,
I've also a separate log for the compression ratios. So far it seems to be working. The thing that I was worried about was how search engine spiders and proxies would handle the compressed data. However it seems to be working well. I decided against using a caching at the PHP level as most of the queries are unique apart from those on the front page. I've tweaked the MySQL my.cnf so that it a lot of the query caching is done there and this means that a query is hitting the memory cache first (in theory) and the HD second. The load rarely goes above 1.0 except when I am updating. Googlebot has slowed down a bit over the last few days but the box seems to be able to handle at least two page requests a second without even breaking a sweat.

Regards...jmcc
 

mneylon

Administrator
Staff member
John

Apart from tweaking my.cnf, have you also tweaked MySQL's internals at all?

What kind of spec does the machine have?

Michele
 

jmcc

Active Member
John

Apart from tweaking my.cnf, have you also tweaked MySQL's internals at all?
Not really, Michele,
The main tweaks have been to do with keys and memory. A lot of the ideas are from the O'Reilly 'High Performance MySQL' book.

What kind of spec does the machine have?
A single P4 running at 3GHz. 4G of RAM. Two 500G IDE harddrives. The advantage over something like a BBS is that I have a lot of control over what the user gets as the data is largely read-only except for the monthly updates. The table types are MyISAM as they are the fastest for the kind of read-only access.

Regards...jmcc
 

jmcc

Active Member
The raw stats from using mod_deflate seem good. The page counts below are roughly similar but the saving in bandwidth is significant.

Regards...jmcc
[SIZE=-1][/SIZE][SIZE=-1]
[/SIZE][SIZE=-1]Pages: 62907[/SIZE] KBytes: [SIZE=-1]986414[/SIZE] [SIZE=-1]
Pages: 63432
[/SIZE] KBytes: [SIZE=-1]348442[/SIZE][SIZE=-2]



[/SIZE]
 

mneylon

Administrator
Staff member
That's an impressive saving - not sure if I'd be bothered with it .. but it's still cool :)
 

jmcc

Active Member
A lot of it is Googlebot. The interesting thing is that Googlebot, Yahoo Slurp, Baidu, MSNbot and Twiceler all accept compressed content as do most browsers. From a load point of view, it is miniscule and the deflate rate is just default.

Regards...jmcc
 
Status
Not open for further replies.
Top