18.7.08

Implementing Conditional 304 Gets for RSS and Magpie

HTTP Conditional Get for RSS Hackers

Given the massive confusion exhibited here, I've written a nice, simple guide on how to implement HTTP's Conditional GET mechanism, with regards to producers and consumers of RSS feeds.

This article presumes you are familiar with the mechanics of an HTTP query, and understand the layout of request, response, header and body.

What is a conditional get?

My full-length RSS feed is about 24,000 bytes long. It probably gets updated on average twice a day, but given the current tools, people still download the whole thing every hour to see if it's changed yet. This is obviously a waste of bandwidth. What they really should do, is first ask whether it's changed or not, and only download it if it has.

The people who invented HTTP came up with something even better. HTTP allows you to say to a server in a single query: “If this document has changed since I last looked at it, give me the new version. If it hasn't just tell me it hasn't changed and give me nothing.” This mechanism is called “Conditional GET”, and it would reduce 90% of those significant 24,000 byte queries into really trivial 200 byte queries.

Client implementation

The mechanism for performing a conditional get has changed slightly between HTTP versions 1.0 and 1.1. Like many things that changed between 1.0 and 1.1, you really have to do both to make sure you're satisfying everybody.

When you receive the RSS file from the webserver, check the response header for two fields: Last-Modified and ETag. You don't have to care what is in these headers, you just have to store them somewhere with the RSS file.

Next time you request the RSS file, include two headers in your request.. Your If-Modified-Since header should contain the value you snagged from the Last-Modified header earlier. The If-None-Match header should contain the value you snagged from the ETag header.

If the RSS file has changed since you last requested it, the server will send you back the new RSS file in the perfectly normal way. However, if the RSS file has not changed, the server will respond with a ‘304’ response code (instead of the usual 200), where 304 means ‘Not Modified’. In the case of a 304, the response will have an empty body and the RSS file won't be sent back to you at all.

There's a temptation for clients to put their own date in the If-Modified-Since header, instead of just copying the one the server sent. This is a bad thing, what you should be sending back is exactly the same date the server sent you when you received the file. There's two reasons for this. Firstly, your computer's clock is unlikely to be exactly synchronised with the webserver, so the server could still send you files by mistake. Secondly, if the server programmer has followed this guide (see below), it'll only work if you send back exactly what you received.

Server Implementation for Static Files

If you are using one of those weblogging tools that just sticks regular files on a regular webserver (e.g. or Moveable Type), your webserver will almost certainly already follow the get standard. HTTP 1.1 has been around 31 years now, and there's really not much of an excuse for anyone to not be following it.

One thing you'll have to watch out for, though, is if your site's RSS file is regenerated frequently even when it's not changed. If that happens, the server won't be able to keep track of the last modified time properly, and you'll get people downloading the file even when it's not changed. The solution is for the writers of weblogging tools to optimise their software to make sure that files are only updated if they've actually changed in some way. (i.e. have them generate the new file, compare it with the old one, and if they're the same leave the old one untouched.)

Server Implementation for Dynamic Content

If you've got a weblogging tool that re-generates the RSS file every time a request is made, there's a little more work to do. This section is aimed more at the writers of the tools than at the user, because it's the tool writers that need to fix their software so that it follows the specs.

I'll concentrate purely on RSS files, but the concepts used here can be applied to any page in the weblog, and may further reduce the bandwidth usage for your users.

In your RSS feed generator, you'll have to keep track of two values: the time the file was last modified (converted to Greenwich Mean Time), and an “etag”. According to RFC2616, the etag is an “opaque value”, which means you can put anything you like in it, providing you stick double-quotes around the whole lot. The time in the Last-Modified header needs to be formatted in a certain way, though, the same format used in email headers. For example, ‘Mon, 17 Sep 2001 11:54:29 GMT’.

Whenever someone requests your RSS file, send those values for the Last-Modified and Etag headers. Every web scripting language allows you to add and remove headers like that at will, just check the manual if you don't know how.

Now for the other bit. Whenever someone requests your RSS file, check the headers of their request for an If-Modified-Since header, or an If-None-Match header. If either of them are there, and if [deleted either ] both of them match the values you were planning to send out with the file, then don't send the file. Once again, consult your manual to see how to send back a "304 Not Modified" reply instead of the "200 OK" that you normally would. If you send back the 304 reply, you don't have to generate the RSS file at all. Just send out the headers, followed by two linefeeds to show the headers are done, and the client will know there's nothing else coming.

Technically, what you should do with an If-Modified-Since header is convert it to a date, and compare it with your stored date. However, 90% of the time you can get away with just doing a straight match, so it's probably not worth the effort.

How do I calculate the Last-Modified date?

Easy. It's the time that the most-recently-changed item in the RSS file was modified. Something like that should be pretty easy to store and fetch.

What should I put in an etag?

The Apache server uses a hash of the contents of the file. This isn't necessary though. All the eTag has to be is something that changes every time the file changes. So it could be a version number, or it could even be exactly the same as the Last-Modified date, just in double-quotes.

2002-11-11 Update: A number of people have written to me to remind me of HTTP's Gzip Content-encoding (compressing the files during transfer). This is a little beyond the scope of this essay. The worst thing you can do when suggesting a solution to a problem is to provide alternatives, people end up arguing the alternatives instead of implementing the fix.

17.7.08

Tutorials on htaccess | LearnWebDesignOnline.com

Tutorials on htaccess | LearnWebDesignOnline.com: "Tutorials on htaccess

.htaccess (dot-htaccess) is the directory-level configuration file of the Apache web server. It has a lot of control over how your web server works. Here are some tutorials about it.

* Wikipedia entry of .htaccess
* htaccess-guide.com
* Apache tutorial
* Example usage of .htaccess

For example, the following in the .htaccess file


Deny from all


will make sure that people can not browse inc and class files.

And ...

IndexIgnore *

will prevent browser from directory listing your files on your server."

Some useful tips to optimize your PHP code « CarlosPSY’s Weblog

Some useful tips to optimize your PHP code « CarlosPSY’s Weblog: "Some useful tips to optimize your PHP code

I’ve found very useful tips to optimize your PHP Code, and accelerate the script execution.

1. If a method can be static, declare it static. Speed improvement is by a factor of 4.
2. echo is faster than print.
3. Use echo’s multiple parameters instead of string concatenation.
4. Set the maxvalue for your for-loops before and not in the loop.
5. Unset your variables to free memory, especially large arrays.
6. Avoid magic like __get, __set, __autoload
7. require_once() is expensive
8. Use full paths in includes and requires, less time spent on resolving the OS paths.
9. If you need to find out the time when the script started executing, $_SERVER[’REQUEST_TIME’] is preferred to time()
10. See if you can use strncasecmp, strpbrk and stripos instead of regex
11. str_replace is faster than preg_replace, but strtr is faster than str_replace by a factor of 4
12. If the function, such as string replacement function, accepts both arrays and single characters as arguments, and if your argument list is not too long, consider writing a few redundant replacement statements, passing one character at a time, instead of one"

16.7.08

Index of /rdf

[   ] ATAG10.rdf                       22K  Authoring Tool Accessibility Guidelines 1.0
[   ] CCPP-struct-vocab.rdf 26K Composite Capability/Preference Profiles CC/PP Structure and Vocabularies 1.0
[   ] CSS2.rdf 12K Glossary of Cascading Style Sheets, level 2 CSS2 Specification
[   ] DOM-Level-2-Events.rdf 4.6K Glossary of Document Object Model (DOM) Level 2 Events
[   ] DOM-Level-2-HTML.rdf 5.5K Glossary of Document Object Model (DOM) Level 2 HTML Specification
[   ] DOM-Level-2-Traversal-Range.rdf 5.1K Document Object Model (DOM) Level 2 Traversal and Range Specification
[   ] DOM-Level-3-Events.rdf 27K Document Object Model (DOM) Level 3 Events Specification
[   ] MathML2.rdf 39K Mathematical Markup Language (MathML) Version 2.0
[   ] P3P.rdf 13K The Platform for Privacy Preferences 1.0 (P3P1.0) Specification
[   ] PNG.rdf 47K Portable Network Graphics (PNG) Specification (Second Edition)
[   ] Process.rdf 6.7K World Wide Web Consortium Process Document
[   ] REC-xml-names.rdf 5.7K Namespaces in XML
[   ] REC-xml.rdf 40K Extensible Markup Language (XML) 1.0
[   ] WCA-terms.rdf 40K Web Characterization Terminology Definitions Sheet
[   ] available_lang.rdf 6.8K
[   ] charreq.rdf 7.7K Requirements for String Identity Matching and String Indexing
[   ] copy.xsl 166
[   ] di-gloss.rdf 58K Glossary of Terms for Device Independence
[   ] home2rss092.xsl.xml 4.6K
[   ] hypertext-terms.rdf 20K Hypertext Terms
[   ] index.rdf 20K
[   ] owl-guide.rdf 17K OWL Web Ontology Language Guide
[   ] qa-glossary.rdf 9.2K W3C QA - Quality Assurance glossary
[   ] qaframe-spec.rdf 12K QA Framework: Specification Guidelines
[   ] rdf-mt.rdf 24K RDF Semantics
[   ] rdf-syntax.rdf 6.3K Resource Description Framework (RDF) Model and Syntax Specification
[   ] rfc2616-sec1.rdf 17K Hypertext Transfer Protocol -- HTTP/1.1
[   ] ruby.rdf 9.4K Ruby Annotation
[   ] soap12-part1.rdf 15K SOAP Version 1.2 Part 1: Messaging Framework
[   ] used_lang.rdf 902
[   ] uuag10.rdf 91K User Agent Accessibility Guidelines 1.0
[   ] voicexml20.rdf 23K Voice Extensible Markup Language (VoiceXML) Version 2.0
[   ] w3c-jargon.rdf 19K Glossary of W3C Jargon
[   ] wcag10.rdf 27K Web Content Accessibility Guidelines 1.0
[   ] weaving.rdf 61K Glossary of
[   ] ws-gloss.rdf 99K Web Services Glossary
[   ] xforms.rdf 12K XForms 1.0
[   ] xhtml-modularization.rdf 19K Modularization of XHTML
[   ] xhtml1.rdf 11K XHTML 1.0: The Extensible HyperText Markup Language (Second Edition)
[   ] xkms2-req 12K
[   ] xlink.rdf 18K XML Linking Language (XLink)
[   ] xml-names.rdf 1.6K Namespaces in XML 1.0
[   ] xml-names11.rdf 9.2K Namespaces in XML 1.1
[   ] xml11.rdf 41K Extensible Markup Language (XML) 1.1
[   ] xmlschema-2.rdf 11K XML Schema Part 2: Datatypes
[   ] xpath-datamodel 21K
[   ] xpath-datamodel.rdf 21K XQuery 1.0 and XPath 2.0 Data Model (XDM)
[   ] xpath.rdf 7.4K XML Path Language (XPath)
[   ] xpath20 49K
[   ] xpath20.rdf 49K XML Path Language (XPath) 2.0
[   ] xptr-framework.rdf 8.0K XPointer Framework
[   ] xquery 79K
[   ] xquery.rdf 79K XQuery 1.0: An XML Query Language
[   ] xslt20 81K
[   ] xslt20.rdf 81K XSL Transformations (XSLT) 2.0