9.1.08
Vote for the Search Blogs Awards of 2007
Best SEO Blog of 2007
* SEOmoz Blog
* Sebastian’s Pamphlets
* Search Engine Roundtable
* Graywolf’s Wolf-Howl
* Tropical SEO
* PageTrafficBlog
* SugarRae
* SEO Scoop
* Search Rank Blog
* SEO by the SEA
* Search Marketing Gurus
* SEO Book"
Custom Google Search Engine for Apache HTTPD Server

Apache CSE Custom Google Search Engine
Looking for mod_rewrite answers?, a .htaccess file sample, or anything related to the Apache HTTPD Web Server, then use googles free Apache HTTPD Search , which is a Custom Search Engine courtesy of Google.
Want to Contribute?
You can volunteer to contribute links and labels to this CSE.
Bridging XHTML, XML and RDF with GRDDL
GRDDL, a technology in development in W3C, allows to incorporate semantics from XML vocabularies and XHTML conventions into the Semantic Web by re-using existing extensibility hooks of the Web. This paper explains the basic principles of its mechanisms, and explore how it can be applied for various communities.
Table of Contents
Introduction
Bridging semantics across markup languages
GRDDL mechanisms
Specifying a Transformation For a Family of Documents
Specifying a Transformation For an Individual Document
Scenarios of applications
GRDDL status and future development
Specification
Implementations
Test Suite
Conclusion
Bibliography
Changelog
Introduction
Re-using the same same technologies for sharing documents on the Web to share information and data that can be processed directly by computers is an idea as old as the Web itself.
The Semantic Web, built on the Resource Description Framework (RDF), is the point of reference for sharing computer-processable information on the Web. Howeve"
8.1.08
PHP CURL Code Grabs Feed Subscribers from Google Reader
read more | digg story
dblog » curl keeps connections alive
Just in the last few days we modified curl to enable the SO_KEEPALIVE option on connections it creates. It basically means that curl will now detect connections that are idle after a certain amount of time, even if that time is something around two hours by default and that’s what most systems will have it set to.cURL
The main problem that caused us to finally enable this (you can still disable this by using –no-keep-alive) is when people do (long-lasting) FTP transfers and they use a NAT, firewall or router that detects and removes what it considers are idle connections. An FTP transfer is using two connections, but the control one where the commands are sent over is completely quiet while the actual data transfer is in progress so when the transfer is done, the control connection has been nuked by the router/NAT. Of course curl survives this as good as possible, but it can’t do proper error-checking etc in this situation.
Funnily, there’s no really good fix for the FTP situation since the two hours SO_KEEPALIVE timeout will many times be too long to help (although most modern systems allows you to change the timeout or a system or application level), but the other “obvious” fix is to send a “NOOP” command on the control channel every once"
Internet Architecture Board - IAB Documents
The DNS 'wildcard' mechanism has been part of the DNS protocol since the original specifications were written twenty years ago, but the capabilities and limitations of wildcards are sufficiently tricky that discussions of both the protocol details of precisely how wildcards should be implemented and the operational details of how wildcards should or should not be used continue to the present day. This section attempts to explain the essential details of how wildcards work, but readers should refer to the DNS specifications ([RFC 1034] et sequentia) for the full details.
In essence, DNS wildcards are rules which enable an authoritative name server to synthesize DNS resource records on the fly. The basic mechanism is quite simple, the complexity is in the details and implications.
The most basic and by far the most common operation in the DNS protocols is a simple query for all resource records matching a given query name, query class, and query type. Assuming (for simplicity) that all the software and networks involved are working correctly, such a query will produce one of three possible results:
success
If the system finds a match for all three parameters, it returns the matching set of resource records;
no"
What are regular expressions?
Posix regular expressions are used to match or capture portions of a field using wildcards and metacharacters. They are often used for text manipulation tasks. Most of the filters included in Google Analytics use these expressions to match the data and perform an action when a match is achieved. For instance, an exclude filter is designed to exclude the hit if the regular expression in the filter matches the data contained in the field specified by the filter.
Regular expressions are text strings that contain characters, numbers, and wildcards. A list of common wildcards is contained in the table below. Note that these wildcard characters can be used literally by escaping them with a backslash '\'. For example, when entering www.google.com, escape the periods with a backslash: www\.google\.com
Wildcard Meaning
. match any single character
* match zero or more of the previous items
+ match one or more of the previous items
? match zero or one of the previous items
() remember contents of parenthesis as item
[] match one item in this list
- create a range in a list
| or
^ match to the beginning of the field
$ match to the end of"
7.1.08
The Granilus Blog: Using Apache for SSL & GZip Compression Offloading
Again, in conf/httpd.conf, uncomment the following lines:
LoadModule deflate_module modules/mod_deflate.so
Then, add the following lines to the end of your conf/httpd.conf file:
AddOutputFilterByType DEFLATE text/html text/plain text/xml text/css text/javascript
DeflateFilterNote Input instream
DeflateFilterNote Output outstream
DeflateFilterNote Ratio ratio
LogFormat ''%r' %{outstream}n/%{instream}n (%{ratio}n%%)' deflate
CustomLog logs/deflate.log deflate
These configurations will enable compression for HTML, CSS, and JavaScript files, and will also log the compression ratios in a deflate.log file. This helps ensure that compression is working, and you can disable these logs if you no longer need them."
Improved printenv and test-cgi script
read more | digg story
19.12.07
Wireshark: Go deep.
Dec 14, 2007
Nmap, everyone's favorite network mapper, is 10 years old. Congratulations to Fyodor and the rest of the Nmap team!"
mod_security tricks for .htaccess or httpd.conf
read more | digg story
16.12.07
CURL_MULTI Functions and Options
DESCRIPTION
curl_multi_setopt() is used to tell a libcurl multi handle how to behave. By using the appropriate options to curl_multi_setopt(3), you can change libcurl's behaviour when using that multi handle. All options are set with the option followed by the parameter param. That parameter can be a long, a function pointer, an object pointer or a curl_off_t type, depending on what the specific option expects. Read this manual carefully as bad input values may cause libcurl to behave badly! You can only set one option in each function call.
OPTIONS
Pass a pointer to a function matching the curl_socket_callback prototype. The curl_multi_socket(3) functions inform the application about updates in the socket (file descriptor) status by doing none, one or multiple calls to the curl_socket_callback given in the param argument. They update the status with changes since the previous time a curl_multi_socket(3) function was called. If the given callback pointer is NULL, no callback will be called. Set the callback's userp argument with CURLMOPT_SOCKETDATA. See curl_multi_socket(3) for more callback details.
Pass a pointer to whatever you want passed to the curl_socket_callback's forth argument, the userp pointer. This is not used by libcurl but only passed-thru as-is. Set the callback pointer with CURLMOPT_SOCKETFUNCTION.
Pass a long set to 1 to enable or 0 to disable. Enabling pipelining on a multi handle will make it attempt to perform HTTP Pipelining as far as possible for transfers using this handle. This means that if you add a second request that can use an already existing connection, the second request will be "piped" on the same connection rather than being executed in parallell. (Added in 7.16.0)
Pass a pointer to a function matching the curl_multi_timer_callback prototype. This function will then be called when the timeout value changes. The timeout value is at what latest time the application should call one of the "performing" functions of the multi interface (curl_multi_socket(3), curl_multi_socket_all(3) and curl_multi_perform(3)) - to allow libcurl to keep timeouts and retries etc to work. A timeout value of -1 means that there is no timeout at all, and 0 means that the timeout is already reached. Libcurl attempts to limit calling this only when the fixed future timeout time actually change. See also CURLMOPT_TIMERDATA. This callback can be used instead of, or in addition to, curl_multi_timeout(3). (Added in 7.16.0)
Pass a pointer to whatever you want passed to the curl_multi_timer_callback's third argument, the userp pointer. This is not used by libcurl but only passed-thru as-is. Set the callback pointer with CURLMOPT_TIMERFUNCTION. (Added in 7.16.0)
Pass a long. The set number will be used as the maximum amount of simultaneously open connections that libcurl may cache. Default is 10, and libcurl will enlarge the size for each added easy handle to make it fit 4 times the number of added easy handles.
By setting this option, you can prevent the cache size to grow beyond the limit set by you.
When the cache is full, curl closes the oldest one in the cache to prevent the number of open connections to increase.
This option is for the multi handle's use only, when using the easy interface you should instead use the CURLOPT_MAXCONNECTS option.
(Added in 7.16.3)13.12.07
Frequently Requested .htaccess examples
Frequently Requested .htaccess examples
More .htaccess code available at htaccess example
They are "Generic code" and no editing required. Note 1: You may not need Options FollowSymlinks
as it's commonly set by serverwide config file, already.
Note 2: You need RewriteEngine On
only once in the .htaccess file.
Fix for trailing slash problems
If you have 'www' dropping from the URL or asked twice for the password, this is the fix. It happens when you enter the URL of a directory without trailing slash and your domain name contains 'www'.
ex. http://www.example.com/subdir <== No slash at the end.
Generally, you should ALWAYS put a slash after a directory name. But some robots and links don't follow this practice, and you may want to use this fix. In short, you only need this code if you use 'www.' for your domain name. I prefer URL without 'www', these days, as it's shorter and easier.
Options +FollowSymlinks
RewriteEngine On
RewriteRule ^/*(.+/)?([^.]*[^/])$ http://%{HTTP_HOST}/$1$2/ [L,R=301]
If you want to cover both http and https:
Options +FollowSymlinks
RewriteEngine On
RewriteCond s%{HTTPS} ^((s)on|s.*)$ [NC]
RewriteRule ^/*(.+/)?([^.]*[^/])$ http%2://%{HTTP_HOST}/$1$2/ [L,R=301]
Note: These codes are very efficient compared to the code with '-d' check,
but they won't work with directories that have a dot (period) in it.
(ex. /a_directory.name/)
Forcing www for your domain name
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_HOST} !^(www\.|$) [NC]
RewriteRule ^ http://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
If you want to cover both http and https:
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_HOST}//s%{HTTPS} ^([^.]{4,}|[^w.]?[^.][^w.]?[^.]?[^w.]?)\..*//((s)on|s.*) [NC]
RewriteRule ^ http%3://www.%{HTTP_HOST}%{REQUEST_URI} [L,R=301]
Combined code for Missing trailing slash problems and Force www
It may look complicated, but this code reduce the wasteful redirect to your site.
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_URI}\\/%{HTTP_HOST}/www. ^/+(.+/)?[^.]*[^/]\\(/)(([^.]{4,}|[^w.]?[^.][^w.]?[^.]?[^w.]?)\..+/(www\.)|.*)$ [OR,NC]
RewriteCond %{HTTP_HOST}/www. ^(/)?(#)?(/)?(([^.]{4,}|[^w.]?[^.][^w.]?[^.]?[^w.]?)\..+/(www\.))$ [NC]
RewriteRule ^ http://%6%{HTTP_HOST}%{REQUEST_URI}%2 [L,R=301]
If you want to cover both http and https:
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{REQUEST_URI}\\/%{HTTP_HOST}/www.//s%{HTTPS} ^/+(.+/)?[^.]*[^/]\\(/)(([^.]{4,}|[^w.]?[^.][^w.]?[^.]?[^w.]?)\..+/(www\.)|.*)//((s)on|s.*)$ [OR,NC]
RewriteCond %{HTTP_HOST}/www.//s%{HTTPS} ^(/)?(/)?(([^.]{4,}|[^w.]?[^.][^w.]?[^.]?[^w.]?)\..+/(www\.))//((s)on|s.*)$ [NC]
RewriteRule ^ http%7://%5%{HTTP_HOST}%{REQUEST_URI}%2 [L,R=301]
Force to remove www from your domain name
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_HOST} ^www\.(.*)$ [NC]
RewriteRule ^ http://%1%{REQUEST_URI} [L,R=301]
If you want to cover both http and https:
Options +FollowSymlinks
RewriteEngine On
RewriteCond %{HTTP_HOST}//s%{HTTPS} ^www\.(.*)//((s)on|s.*)$ [NC]
RewriteRule ^ http%3://%1%{REQUEST_URI} [L,R=301]
12.12.07
Best FLV MP3 Flash Player
SETUP WIZARD
Use this page to render the code you need to use for a specific setup of the players (I don't have a wizard for the rotator yet). You can also experiment a bit with this page, to see what's possible with the players. Thanks a lot to Lars Nyboe Andersen for creating a first version of this wizard!
PLAYER PREVIEW
11.12.07
"Various .htaccess samples and tutorials" - DreamHost Knowledge Base
.htaccess
Many people have only taken the .htaccess file as far as using it for password protection and custom error documents. There is a lot more to what can be done with an .htaccess than just these two features. The .htaccess file is a normal file that you can edit in programs such as Notepad, just as simple as editing your everyday documents.
.htaccess is not a name of a file; it's a file with a file extension, but no name. A file on Windows consists of a filename and an extension, such as document.doc. Windows doesn't allow files with an extension and no filename. However, on UNIX, you can call a file whatever you want, extension or no extension.
Warning
Although using .htaccess on your virtual server hosting account is extremely unlikely to cause you any problems (if something is wrong it simply won't work), you should be wary if you are using Microsoft FrontPage Extensions. The FrontPage extensions use the .htaccess file so you should not really edit it to add your own information. If you do want to (this is not recommended, but possible) you should download the .htaccess file from your server first (if it exists) and then add your code at the top of the file.
Creating the .htaccess File
To create a .htaccess file on Windows, just open a new document in Notepad and save it as .htaccess and make sure All files is selected in the Save as type drop-down menu so it doesn't save it as .htaccess.txt. When you go to upload an .htaccess file to your account, make sure that the data transfer mode is set to ASCII, never BINARY since it is a text file. While .htaccess files will work just by uploading them, we recommend that you CHMOD the .htaccess file to 644 (RW-R--R--). This makes the file readable by your web server, but at the same time, disables browsers from reading it. If your .htaccess file can be read by anyone, you're security is in big trouble.
When you create an .htaccess file, make sure that your text editor has word wrap disabled. If you don't, your text editor might add characters to the file that will cause problems with the Web server which will result in a non-functional .htaccess file and a 500 server error on your website's home page. Also make sure that all of your commands in an .htaccess file are on a separate line. If you don't you will end up with an .htaccess file that will cause problems on your account.
When you use a .htaccess file on your web server, the file affects the current directory and any of it's sub-directories. If you place an .htaccess file in the root directory of your website, it will affect every directory on your website.
Custom Error Pages
Custom error pages enable you to customize the pages that are displayed when an error occurs. Not only will they make your website seem a lot more professional, but they can also save you some visitors. If a visitor sees a generic error page, they are likely to leave your site. However, if they see a helpful error page, they might just stay at your site because they can just click on a link to go to another page within your site. You can create error pages for all error codes, however many webmasters only make error pages for the 4 most common errors, which are:
- Error 401 - Authorization Required
- Error 403 - Forbidden
- Error 404 - Not Found
- Error 500 - Internal Server Error
To specify what the server should do when an error is found on your website, enter the following into an .htaccess file:
ErrorDocument
/home/LOGIN/public_html/error-document.html
Change
to the code of the error. Also, change the path to the error document. Simply repeat the above line of code for all other errors. Once the file is uploaded, your visitors will be directed to the page that you specified.
Here's a sample .htaccess file with ErrorDocument enabled:
ErrorDocument 401 /401.html
ErrorDocument 403 /403.html
ErrorDocument 404 /404.html
ErrorDocument 500 /500.html
You can use full URL's for the path to your error documents on all error codes except 401, which must use a local path. Also, instead of specifying a URL for an error code, you can display a message too. Here's an example:
ErrorDocument 404 "
Sorry, the document you requested could not be found.
"
This is quite useful if you only need to display a short message because it saves you having to create additional files. As you can see, you can use normal HTML code.
Here's another .htaccess file with ErrorDocument enabled. This time, we are displaying messages instead of going to a different URL:
ErrorDocument 401 "
Error 401
Authorization Required.
"
ErrorDocument 403 "
Error 403
Forbidden.
"
ErrorDocument 404 "
Error 404
Not Found.
"
ErrorDocument 500 "
Error 500
Internal Server Error.
"
Limit the Number of Concurrent Visitors to your Website
If you need to limit the amount of concurrent visitors to your website, this can be easily set up. Open a program such as Notepad and insert the following line of code:
MaxClients
Change
to the maximum number of clients you want to allow access to your website.
Disable Directory Listings
Occasionally, you may not have a default index document in a directory. If a default document is not found, whenever a visitor types in the directory name in their browser, a full listing of all the files in that directory will be displayed. This could be a security risk for your site. To prevent without having to add a default index document to every folder, you can enter the following line in your .htaccess file to disable a directory's contents from being shown:
Options -Indexes
Deny/Allow Certain IP Addresses">Deny/Allow Certain IP Addresses
If you have problems with certain visitors to your website, you can easily ban them. There are two different ways to ban visitors. This can be done using their IP address or with the domain name which they came from.
Here's an example showing you how to deny a user by their IP address:
order allow,deny
deny from 201.68.101.5
allow from all
The above code will deny the 201.68.101.5 IP address and allow everyone else to enter. If you want to deny a block of IP addresses, use this code:
order allow,deny
deny from 201.68.101.
allow from all
The above code will deny the 201.68.101.0 IP address, the 201.68.101.5 IP address and all the way up to 201.68.101.255 or 255 IP addresses. Here's an example showing you how to deny a user by the domain name from which they came from:
order allow,deny
deny from www.theirdomain.com
allow from all
The above code will deny anyone coming from www.theirdomain.com and allow everyone else to enter. Here's an example showing you how to deny a user from a domain name and all subdomains within the domain name:
order allow,deny
deny from .theirdomain.com
allow from all
The above code will deny anyone coming from www.theirdomain.com, all sub-domains within the domain and allow everyone else to enter.
Order deny,allow
Deny from all
Allow from youripaddress
The above code will block all visitors from accessing your site except for yourself if you replace youripaddress with the IP address that was assigned to you by your ISP.
Deny Access To a Folder During a Specific Time
If for some reason you would like to block access to files in a directory during a specific time of day, you can do so by adding the following code to an .htaccess file.
RewriteEngine On
# If the hour is 16 (4 PM)
RewriteCond %{TIME_HOUR} ^16$
# Then deny all access
RewriteRule ^.*$ - [F,L]
# Multiple hour blocks
# If the hour is 4 PM or 5 PM or 8 AM
RewriteCond %{TIME_HOUR} ^16|17|08$
Alternative Index Files
When a visitor accesses your website, the server checks the folder for an index file. Some examples of common index files are: index.htm, index.html, index.php, index.cgi, index.pl. The supported index files depend on the how the server is set up. If the server cannot find an index file, it will try to display an index of all the files within the current directory, however if this is disabled, the server will end up displaying a 403 forbidden error. Using .htaccess, you can use a completely different index file instead of the defaults listed above. To do this, insert the following line into an .htaccess file:
DirectoryIndex pagename.html
Change pagename.html to the page that you would like to use as the index file.
Redirection
Using Redirect in an .htaccess file will enable you to redirect users from an old page to a new page without having to keep the old page. For example if you use index.html as your index file and one day rename index.html to home.html, you could set up a redirect to redirect users from index.html to home.html and index.html. Redirect works by typing:
Redirect /home/LOGIN/public_html/path/to/old/file/old.html http://www.yourdomain.com/new/file/new.html
The first path to the old file must be a local UNIX path. The second path to the new file can be a local UNIX path, but can also be a full URL to link to a page on a different server.
Here are a few examples of some redirects:
Redirect / /new/
Redirect /index.html /default.html
Redirect /private/ http://www.anotherdomain.com/private/
Redirect /img/logo.gif http://www.photos.net/images/logo.gif
Protect Your .htaccess File
When a visitor tries to obtain access to your .htaccess or .htpasswd file, the server automatically generates a 403 forbidden error, even with the file permissions at their default settings. However, you can apply a bit more security to your .htaccess files by adding the following code:
order allow,deny
deny from all
If you would like to redirect anything from http://domain.com to http://www.domain.com (so the www is always in the URL), you can accomplish this by using the code below. This is helpful in search engine optimization and will help give your site a higher page rank.
RewriteEngine On
RewriteCond %{HTTP_HOST} !^www\..* [NC]
RewriteRule ^(.*) http://www.%{HTTP_HOST}/$1 [R=301]
Prevent Image Hot Linking
Hot linking or bandwidth stealing is a common problem. It happens when people link to files and images on a different server, display them on their website and the bandwidth is at the other person's expense. By entering the lines below, you can prevent hot linking to your website:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yourdomain.com/.*$ [NC]
RewriteRule \.(gif|jpg)$ http://www.yourdomain.com/hotlink.gif [R,L]
Change yourdomain.com to your domain name. On the last line of code, change hotlink.gif to the path to an image file that explains that hot linking is disabled on your server or display a spacer image.
Force Text Files to Download and Not Show in Your Browser
By default, if a text file (.txt) is requested, the contents of the file is shown in the browser and is not downloaded. This is because the default MIME type for .txt files specifies to show the files and not download them. You can however change this by adding the line below:
AddType application/octet-stream txt
Be warned though, every .txt file in the current directory and any subdirectories will be affected. If you only need to target a specific file, use this code:
AddType application/octet-stream txt
Email Address">Specify the Server Administrators Email Address
When users on your website encounter an error, a page is displayed with details about the error and the server administrator's email address is displayed. To modify the server administrator's email address insert the following code:
ServerAdmin admin@yourdomain.com
Be sure to change admin@yourdomain.com to the server administrator's email address.
Specify a Custom Error Log
The ErrorLog feature allows you to specify the local UNIX path to store your server error logs. These logs contain errors that visitors have encountered on your website. To specify a custom error log on your account, insert the following code:
ErrorLog /logs/error_log.log
You can change the path and filename of the error log, but your path must start with a forward slash.
Enable Password Protection
Password protection is probably the most popular feature of htaccess and is used all over the Internet. The reason why it is so popular is because it is very simple to set up and is the strongest form of protection which cannot be bypassed. When you set up password protection, you need to set up the password protection options in a .htaccess file and you need to set up usernames and passwords inside a .htpasswd file.
First, we are going to set up the usernames and passwords inside the .htpasswd file. The passwords inside a .htpasswd file are encrypted for added security, so you will need to use the htpasswd generator utility to create your usernames and passwords.
Once you have created the required usernames and passwords, you need to place them inside a .htpasswd file. Open a program such as Notepad and copy the username and password combinations that you generated using the htpasswd generator utility and place each username/password combination on it's own line. Here's a sample .htpasswd file with 3 username/password combinations specified:
user:XsexPxQgcBoTc
webmaster:LMmm0OcSGsnI2
admin:oZ8O/CyiGjtHE
Once your .htpasswd contains all of the username and passwords required, save the file as .htpasswd (be sure to select All files in the Save as type if you are using Notepad). Leave the file where it is for now, as we now need to set up the .htaccess file.
Setting up the .htaccess file is quite simple, all you need to do is specify the path to the .htpasswd file, the name of the restricted area, what user(s) to require and the authorization type.
The first thing to configure is the path to the .htpasswd file:
AuthUserFile /home/LOGIN/public_html/path/to/.htpasswd
Next up, what the restricted area is called.
AuthName Password Protected
Then, the authorization type:
AuthType basic
Finally, you need to specify what users are allowed to enter the restricted area. Even if you have for example 10 users in your .htpasswd file, you can allow only some users:
require user admin
Or, to allow all users that are listed in the .htpasswd file to access the restricted area:
require valid-user
Here's a sample .htaccess file setup for password protection. Copy the code below and change the path to the .htpasswd file, the name of the restricted area and what users to require. Leave the AuthType as it is:
AuthUserFile /pub/home/htdocs/.htpasswd
AuthName "Password Protected"
AuthType Basic
require valid-user
Open a program such as Notepad, insert the code, and save the file as .htaccess. Then upload .htpasswd and .htaccess to your account. Remember that you have to upload the .htpasswd to the directory specified in the AuthUserFile part of the .htaccess file. Also, remember that wherever you place the .htaccess file, that directory and any sub-directories will now be password protected. Attempt to access the protected directory and you will be prompted to enter a username and password.
The features that have been covered in this tutorial are the most commonly used features within a .htaccess file. There are many more different features that can be used. To learn more, check out Apache's website on Apache Directives.
Computer Security Edu .htaccess FAQ
NOTE
This page has been updated and moved into the new FAQ area. Find it here.Old info
So you have one or more web pages that you want to publish via the web but you don't want to make them available to everyone? A solution is to use the htaccess password mechanism that is part of the Apache Web Server. This Apache feature allows you to publish static web pages to validated users via a web browser prompt for a username and password.
For added security, you can force users to access your pages using an SSL (Secure Socket Layer) connection. This means transmitted data is encrypted, so passwords and webpages cannot be read in cleartext over the internet.
Questions:
- How do I secure a Page?
- How do I restrict access to SSL connections (https)?
- How do I redirect non-secure connections to the secure address?
- Can I restrict access by to a Group (e.g. members of comp9316)?
- Can I restrict access by both CSE password and UniPass?
- So my pages are completely secure now?
- Can I use htaccess for securing CGI scripts?
- Are there any alternatives?
- For more information
How do I secure a Page?
Let's say you want to restrict access to the directory called "/home/me/public_html/secret/" to just a small group of people. Then you need to create two files:
- /home/me/public_html/secret/.htaccess - which details the access restrictions for that directory
- .htpasswd - has the username and password details. It doesn't need to be in the same directory
AuthUserFile /home/me/public_html/.htpasswdNote that the .htaccess file needs to be readable by the webserver for it to work. You should set its permissions to 644 (ie. chmod 644 .htaccess).
AuthName "Access to Private Web Pages"
AuthType Basic
require valid-user
You need to create the .htpasswd file, which just contains a username and crypt'd password separated by a ':' on each line, eg:
me:tz373OcXNjQF.nThis can be created on any CSE linux machine, like this:
someoneelse:aSJeo1t2DvYyg
htpasswd -c /home/me/public_html/.htpasswd meWhich will then prompt you for the password for "me" and add the entry to the .htpasswd file. Run "man htpasswd" for more details.
How do I restrict access to SSL connections (https)?
By adding the SSLRequireSSL directive to your .htaccess file, the page is only allowed access through an SSL connection (ie having https at the start of the URL). However, to give a meaningful message when not using SSL, you can add a section like this to your .htaccess:
SSLRequireSSL
# no non-ssl access
order allow,deny
How do I redirect non-secure connections to the secure address?
The .htaccess example above denies access if the connection is not made using SSL. Alternatively, you can tell apache to redirect people to the secure page automatically, by using the following snippet for the non-ssl access instead:
The Redirect directive will automatically append parts beyond / so they are redirected to the correct page. For example, if the web page: http://www.cse.unsw.edu.au/~foo/bar.html is protected from non-secure access with the above, apache will redirect the browser to: https://www.cse.unsw.edu.au/~foo/bar.html (https instead of http).
# no non-ssl access
Redirect permanent / https://www.cse.unsw.edu.au/
Can I restrict access by to a Group (e.g. members of comp9316)?
By using yp authentication, you can restrict access to web pages to certain CSE groups. Here's an example .htaccess to restrict access just to members of comp9316:
AuthType BasicThere is no need for the .htpasswd file here because all the password information is retrieved via YP. However, it is important to note that Web passwords are transmitted over the network in clear text, which might result in the user's CSE password being intercepted. That's why the example here is restricted to just machines in the unsw.edu.au domain. An even better idea is to force the use of SSL as shown above, which encrypts the entire communication including passwords. Under certain conditions (the user belonging to a large number of groups), the authorisation system may incorrectly reject valid group members. If this happens, ask SS to consider raising the netgroup priority for the group in question.
AuthName "Restricted Directory"
AuthYP On
require group @COMP9316
order allow,deny
allow from .unsw.edu.au
Can I restrict access by both CSE password and UniPass?
Yes, the following .htaccess file allows authentication by both CSE users (using YP) and UNSW users (using UniPass/RADIUS):
AuthName "CSE and Unipass Authentication Example"
AuthType basic
# stuff to turn on UniPass authentication
AuthRadiusAuthoritative Off
AuthUseUnipass On
# stuff to turn on YP authentication
AuthYP on
AuthYPAuthoritative Off
SSLRequireSSL
require group @REASON # ie, anyone with a valid CSE account
require user s1234567 s2345678 # list of valid UniPass usernames
RedirectMatch /(.*)$ https://www.cse.unsw.edu.au/$1
So my pages are completely secure now?
No, they're not quite. They are restricted by htaccess when retrieved via the web but because the files are world readable so the web server can serve them, they are also readable by anyone with a CSE account by browsing the file system. In order to prevent people from browsing them this way you need to use the priv webonly
script. The priv webonly
program will make your "secret" directory group-owned by w3serv (the web server account) and remove world access permissions. This means only you, and w3serv can access those files in the secret dir.
$ priv webonly ~/public_html/secretdirNote that priv webonly on its own doesn't restrict access through the web server - you must use it in conjunction with a
.htaccess
file. Can I use htaccess for securing CGI scripts?
Yes. It works just like for normal URLs, except that if you are using a Redirect, you need to redirect to cgi.cse.unsw.edu.au instead of www.cse.unsw.edu.au. Any file references (e.g. for AuthUserFile will also need to be CGI-compatible. In other words, /home/username/public_html/ needs to be referenced as /web/username).
You should also keep in mind that files accessed by your CGI script are not controlled by .htaccess.