Posts Tagged ‘Search Engine’

July 31st, 2007

Control Google Bot With The New X-Robots-Tag

Google has extended its support for Google Bot restriction by giving us web developers a new tool to stick in our belt. It was announced today on the Google Blog that you can now control access to your non-HTML files on your website with a simple header. The header X-Robots-Tag will allow you to do everything the normal Robots Meta tag will, but now you can do it for the PDF, Word, Image, and any other document you can think of that is served via HTTP. They also announced on the same post a new type of exclusion cause that lets you set when the document will be unavailable, see below for more information on this new feature as well as currently supported ones for use with X-Robots-Tag:

  • INDEX|NOINDEX - Tells whether the page may be indexed or not
  • FOLLOW|NOFOLLOW - Tells whether crawlers may follow links provided on the page or not
  • ALL|NONE - ALL = INDEX, FOLLOW (default), NONE = NOINDEX, NOFOLLOW
  • NOODP - tells search engines not to use page titles and descriptions from the ODP on their SERPs.
  • NOYDIR - tells Yahoo! search not to use page titles and descriptions from the Yahoo! directory on the SERPs.
  • NOARCHIVE - Google specific, used to prevent archiving (cached page copy)
  • NOSNIPPET - Prevents Google from displaying text snippets for your page on the SERPs
  • UNAVAILABLE_AFTER: RFC 850 formatted timestamp - Removes an URL from Google’s search index a day after the given date/time

So how can X-Robots-Tags help you better control the content that is indexed by Google? Well you can now tell the Google Bot that you do not want specific non-HTML documents like PDF, Word, and Image documents that you don’t want them cached on the Google Server or that a paper you have released on your website in PDF format should only be good until a specific date. So now you just need to force you server to include an addition X-Robots-Tag in the header which can be done with any of the modern languages and server, the header would look something like this:

Date: Tue, 31 Jul 2007 21:41:38 GMT
Server: Apache/1.3.37 (Unix) PHP/4.4.4
X-Powered-By: PHP/4.4.4
X-Robots-Tag: index, noarchive, nosnippet
Connection: close
Transfer-Encoding: chunked
Content-Type: application/pdf

You can do this with anything that can be served over HTTP now, so this is a huge boost for any of us control freaks that like to have our content easily organized and controlled on what is searchable on Google.

Tags: , ,

Posted in Programming, SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »

May 15th, 2007

A Blog Owners Best Friend Google Analytics

A major update has been pushed out for Google Analytics, as described in a post on Google Webmaster:

Webmaster tools from Google are indispensable for people who optimize their site for indexing in Google. Eighteen months ago, Google launched another free tool for webmasters - Google Analytics - which tells you about your visitors and the traffic patterns to your site using a JavaScript code snippet to execute tracking and reporting. This past Tuesday, Google Analytics launched a new version, with an easier-to-use interface that has more intuitive navigation and greater visibility for important metrics. We also introduced some collaboration and customization features such as email reports and custom dashboards.

I simply love this tool, and the data it provides is invaluable to my day to day operations of this website.

New Google Analytics

Tags: ,

Posted in SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »

April 27th, 2007

Unfortunate Placement of Yahoo! Ad

Only a geek would find this funny. And if you really don’t understand it then you may want to brush up on your HTTP Status Codes.

Tags: ,

Posted in SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | 1 Comment »

April 24th, 2007

World Of HTTP/1.1 Status Codes

In a follow up to my previous post on Proper URL Construction, I am going to dive more deeply in to the Status Codes that control the redirects that were talked about in my previous article.

Most developers are familiar with the HTTP 1.0 Status Codes, that have been recently popularized by the SEO guys. We have all heard that you should use 301 Moved Permanently instead of 302 Temporary Redirect. What many of the SEO guys won’t tell you, because they don’t know any better, is that they are using the RFC 1945 HTTP/1.0 Standard that was released in May 1996, that is right it is about 12 years old. The newest HTTP/1.1 Standard, RFC 2616, was released in June 1999, and made some pretty drastic changes the the 3xx Redirect Status Codes. The goal of this post is to inform and familiarize developers with the HTTP/1.1 Standard, specifically the 3xx Redirect Status Code changes. This can have drastic effect on how you handle requests on your website and optimize your site for search engines.

History

In the middle-to-late 1990’s 302 Moved Temporarily was the most popular redirect code, but also an example of industrial practice contradicting the standard. HTTP/1.0 specification (RFC 1945) required the client to perform a temporary redirect (the original describing phrase was “Moved Temporarily”), but popular browsers implemented it as though it was a 303 See Other.

Note from 302 Found: RFC 1945 and RFC 2068 specify that the client is not allowed to change the method on the redirected request. However, most existing user agent implementations treat 302 as if it were a 303 response, performing a GET on the Location field-value regardless of the original request method. The status codes 303 and 307 have been added for servers that wish to make unambiguously clear which kind of reaction is expected of the client.

Therefore, HTTP/1.1 added status codes 303 and 307 to disambiguate between the two behaviors. However, majority of Web applications and frameworks still use the 302 status code as if it were the 303.

Proper Use of HTTP Redirects

The next part will be a guide of the conditions that should be met in order to use the specific redirect.

301 Moved Permanently

  • The URL (or page) is going to permanently reside in a differently location
  • The domain should always be displayed a certain way, (i.e. This domain is always displayed as coderjoural.com, so any traffic to www.coderjournal.com gets a 301 redirect to coderjoural.com).
  • This should be used for most static redirects that are not generated programmatically.
  • *NEW* This status was mostly designed to be used with GET and HEAD requests.

303 See Other

  • This is going to be the most common type of redirect that you want to use when you are programmatically changing where the user is located in your site during a POST back.
  • Any time you want to redirect a user to another URL after a POST from a form has occurred (i.e. The visitor to your site registers with your site and after they are done registering you want to direct them back to the home page, this is when you would use a 303 redirect).
  • *NEW* This status was designed to be used with POST requests specifically, so it should not be used for GET or HEAD requests.

307 Temporary Redirect

  • Anytime that you want to put up a temporary page (i.e. your site is under construction and you want all traffic temporarily redirected to a static HTML page).
  • *NEW* This should be used when you want to redirect a GET request to different location each time the URL is requested.
  • *NEW* This should be not be used with POST requests, because of this statement in the specification:

    the user agent MUST NOT automatically redirect the request unless it can be confirmed by the user, since this might change the conditions under which the request was issued.

302 Found

  • Use this for any condition not met above.
  • This should be used sparingly because there is a search engine penalty if used too much, because of some spammers that used an Exploit called Page Hijacking.
  • This is sort of the antithesis to 404 Not Found and should be used in a similar way. So if you have a page that is referenced but no longer exists, but you do not want to return a 404 and just redirect the user to a random (not static as defined in a 301) site you would use a 302 redirect. (note this argument is very weak and there is very little reason in a HTTP/1.1 world to use a 302 redirect)
  • *NEW* This status should be used during GET requests for any semi-static URL’s that may change in the future, but don’t change with each and every request. A good example of this on Coder Journal is my Essential Software Every Developer Needs which I publish annually, and is located at http://www.coderjournal.com/essential-software/. It changes but it only changes once a year, so it is semi-static in terms of the internet.
  • *NEW* The 302 Found falls right between 301 Moved Permanently and 307 Temporary Redirect in terms of how permanent the URL is for GET requests.

An example of an HTTP Redirect Response will look something like the following, this was take from my own site when somebody queries www.coderjournal.com:

HTTP/1.1 301 Moved Permanently
Date: Tue, 24 Apr 2007 18:12:55 GMT
Server: Apache
Location: http://www.coderjournal.com/
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html; charset=iso-8859-1131

If you would like to learn more about how to perform these redirects, that I have talked about above, in your favorite language please read this article from Steven Hargrove.

Update (2008-5-20): I have updated my understanding of the different types of redirects that developers may want to use. See above for my new understandings.

Tags: , , ,

Posted in How To, Programming, SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | 8 Comments »

April 17th, 2007

Sitemap Auto Discovery And You

Last week all the major search engine providers, announced that they were going to support a new specification at sitemap.org that allows them to auto discover your sitemap without you having to submit it:

Yahoo did a good job at summing up the advantages to putting your sitemap location in the robots.txt file.

All search crawlers recognize robots.txt, so it seemed like a good idea to use that mechanism to allow webmasters to share their Sitemaps. You agreed and encouraged us to allow robots.txt discovery of Sitemaps on our suggestion board. We took the idea to Google and Microsoft and are happy to announce today that you can now find your sitemaps in a uniform way across all participating engines.

If you want to see my implementation of this for my sitemap go to http://www.coderjournal.com/robots.txt. Further details about this can be found at http://sitemaps.org/protocol.htm or for your convenience I have included them below.

Specifying the Sitemap location in your robots.txt file

You can specify the location of the Sitemap using a robots.txt file. To do this, simply add the following line:

Sitemap: <sitemap_location>

The <sitemap_location> should be the complete URL to the Sitemap, such as: http://www.example.com/sitemap.xml

This directive is independent of the user-agent line, so it doesn’t matter where you place it in your file. If you have a Sitemap index file, you can include the location of just that file. You don’t need to list each individual Sitemap listed in the index file.

Tags: , , , ,

Posted in SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | 1 Comment »

April 10th, 2007

A Guide To Proper URL Construction

For many developers the URL Address is just a means to an end, so very little time is actually spent on creating and planning a URL that is both functional and user friendly. We have all seen the URLs that seem to go on forever, I am not going to dwell on those URLs because you can find them anywhere. I am going to go over what a good URL consists of, and some easy ways to increase your search engine ranking with your already developed application.

Search Engines Crawlers are like People

One thing that a web developer has to understand is that Search Engine Crawlers are like people. Everybody understands that if your site content is not laid out in a way that is readable, people will not spend much time on your site. The same goes for Search Engine Crawlers, if your site doesn’t conform to XHTML or at the very least HTML standards the search engine crawler isn’t going to spend much time indexing your site.

Well the very same goes for the URL of your website, if it is ugly and looks like http://somesite.com/default.aspx?a=0038383-838308380-8383&c=3&p=30203#page-2 it is very hard to determine what part of the URL changes the content displayed and what this content is actually suppose to be. I don’t even think the developer of this application could tell you. But a more friendly version of the same URL might be written like http://somesite.com/authorname/google/correct-use-of-the-url.html#page-2. Just like the content example above, the Search Engine Crawler will have an easier time cataloging the nicer URL because it actually uses real words instead of magic numbers that don’t mean anything except to the program.

3 Tips For Constructing a Proper URL

  1. Remove Duplicate URLs
    Jeff Atwood recently wrote an article dealing with multiple URLs and the effects they have on your Search Engine Ranking:

    As a software developer, you may be familiar with the DRY principle: don’t repeat yourself. It’s absolute bedrock in software engineering, and it’s covered beautifully in The Pragmatic Programmer, and even more succinctly in this brief IEEE software article (pdf). If you haven’t committed this to heart by now, go read these links first…

    With URLs there are many ways to get to a website:

    1. http://www.coderjournal.com
    2. http://www.coderjournal.com
    3. http://www.coderjournal.com/index.html
    4. http://www.coderjournal.com/index.html

    Having these multiple URLs reference the same content decreases your Search Engine Ranking, specifically PageRank is calculated per-URL. So the best idea is to do a 301 Redirect for the different patterns I listed above. In my case of Coder Journal I have URLs 2,3,4 all redirecting to URL 1.

  2. Combine Domains
    Most people don’t know but this blog has multiple domains that get you to the same point.

    • http://www.coderjournal.com
    • http://coderjournal.net
    • http://coderjournal.org

    Just like what we previously went over about Duplicate URLs the same applies to domain names. So it is wise to also do a 301 Redirect from the domains. In the case of this blog I have the .net and .org domains doing a 301 redirect to my .com domain name.

  3. Increasing Your Surface Area With Keywords in URLs
    If you do most any search on Google, you will notice that Google also highlights the keywords that show up in the URL. So a URL that looks like this http://www.coderjournal.com/2007/04/new-novell-ad-campaign-mac-vs-pc-vs-linux-continued/ is going to attract a lot more attention on keyword searches than a URL that looks like http://www.coderjournal.com/2007/04/new-novell-ad-campaign-mac-vs-pc-vs-linux-continued/

The 3 tips that I gave you above are just the tip of the SEO iceberg. However implementing one or all of these should increase your Search Engine Ranking, without effecting the functionality of your application. What more could you ask for?

Tags: ,

Posted in SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | 1 Comment »

March 26th, 2007

Google Pack a Computer Users Best Friend

Do you hate having to go to umpteen sites just to download your essential software to get your computer running?
Do you hate then having to again go to umpteen sites again to check for software updates, and then downloading and installing them?

Google Updater Installed Software ScreenWell I have the answer for you, it is call the Google Pack. Not only does the Google Pack include a wealth of Google software it also includes many non-Google software titles such as, Skype, RealPlayer, Adobe Reader, Norton AntiVirus 2005 SE, Ad-Aware SE Personal, Mozilla Firefox, and all the software is downloaded and installed according to your preferences. And as an added bonus there is a service that runs in the background called Google Updater and it will keep all of the supported installed programs updated to their latest and greatest version. See image to the right for a screen shot of Google Updater.

So if you would like to check out the Google Pack just click the button below:


The following wealth of programs is one of the main reasons I recommend it along with Firefox on the left side of my site.

List of Software in Google Pack

  • Google Earth
  • Google Desktop
  • Picasa
  • Google Pack Screensaver
  • Google Toolbar for Internet Explorer
  • Mozilla Firefox with Google Toolbar
  • Norton AntiVirus 2005 Special Edition
  • Ad-Aware SE Personal
  • Adobe Reader
  • Google Talk
  • Google Video Player
  • RealPlayer
  • GalleryPlayerHD Images
  • Skype

Tags: , , ,

Posted in Rant, SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »

March 8th, 2007

Track Cookie Usage with a Web Service via AJAX

Many of the modern and sophisticated traffic monitoring software can track everything from number of visits, where a visitor came from, what search terms were used to find your site, where the visitor came from geographically, to what type of browser they are using on what platform.

Many of the JavaScript based solutions such as Google Analytics offer much more information about the browser than ones that sit between the web and the web server such as AWStats. This is because there is much more information provided via JavaScript about the screen resolution and color depth, however the down side is that if JavaScript is disabled you don’t get any information. So it is usually a wise idea to use a combination of both Client and Server based traffic monitoring.

However one feature that they all seem to lack is tracking if a user has cookies enabled on their browser. For the life of me I cannot understand why this is the case for the JavaScript based Client side solutions. So I developed a small web service in .NET that allowed me to track this information via an AJAX call. You can accomplish the same thing with about an hour of your time and the ASP.NET AJAX Library provided by Microsoft.

First you need to create a web service called CookieSupport.asmx with the following code.

using System;
using System.IO;
using System.Web.Services;
using System.ComponentModel;

using Microsoft.Web.Script.Services;

namespace CookieTest
{
	/// 
	/// Summary description for CookieSupport
	/// 
	[WebService(Namespace = "http://tempuri.org/")]
	[WebServiceBinding(ConformsTo = WsiProfiles.BasicProfile1_1)]
	[ToolboxItem(false)]
	[ScriptService]
	public class CookieSupport : System.Web.Services.WebService
	{
		[WebMethod]
		public void Supported(bool supported)
		{
			using (StreamWriter writer = File.CreateText(”c:results.txt”))
			{
				writer.WriteLine(@”—————————————-
IPAddress = {0}
Cookies Supported = {1}
Location = {2}
Browser = {3}”,
				   Context.Request.UserHostAddress,
				   supported,
				   Context.Request.Url,
				   Context.Request.UserAgent
				);
				writer.Flush();
				writer.Close();
			}
		}
	}
}

Second you need to add the script manager to the body.

	<asp:ScriptManager id="scriptManager" runat="server">
		<Services>
			<asp:ServiceReference Path="CookieSupport.asmx" />
		</Services>
	</asp:ScriptManager>

Last you need to add the following script in the head of your website.

// POST cookie support of browser back to web service
function PostCookiesSupport()
{
	var cookieEnabled = navigator.cookieEnabled;

	//if not IE4+ nor NS6+
	if (typeof navigator.cookieEnabled == "undefined" && !cookieEnabled) {
		document.cookie = "testcookie"
		cookieEnabled = (document.cookie.indexOf("testcookie")!=-1)? true : false
	}

	CookieTest.CookieSupport.Supported(cookieEnabled);
}

You may have noticed that CookieTest.CookieSupport.Supported(cookieEnabled); is the same namespace as the C# web service I listed in the first step. They took the idea of C# namespaces and translated it in to JavaScript to make it easy to remember and call.
This is facilitated though the ScriptServiceAttribute which creates a JavaScript file that is imported into your website, that is based on the methods of the WebService. You can view this generated JavaScript by visiting http://www.yoursite.com/CookieSupport.asmx/js, I have also included it below:

Type.registerNamespace('CookieTest');
CookieTest.CookieSupport=function() {
this._timeout = 0;
this._userContext = null;
this._succeeded = null;
this._failed = null;
}
CookieTest.CookieSupport.prototype={
Supported:Sys.Net._WebMethod._createProxyMethod(this,"Supported", "CookieTest.CookieSupport.Supported",false,"supported"),_get_path: function() { return CookieTest.CookieSupport.get_path(); },
    set_timeout: function(value) { this._timeout = value; },
    get_timeout: function() { return this._timeout; },
    set_defaultUserContext: function(value) { this._userContext = value; },
    get_defaultUserContext: function() { return this._userContext; },
    set_defaultSucceededCallback: function(value) { this._succeeded = value; },
    get_defaultSucceededCallback: function() { return this._succeeded; },
    set_defaultFailedCallback: function(value) { this._failed = value; },
    get_defaultFailedCallback: function() { return this._failed; }
}
CookieTest.CookieSupport._staticInstance = new CookieTest.CookieSupport();
CookieTest.CookieSupport.set_path = function(value) { CookieTest.CookieSupport._staticInstance._path = value; }
CookieTest.CookieSupport.get_path = function() { return CookieTest.CookieSupport._staticInstance._path; }
CookieTest.CookieSupport.set_timeout = function(value) { CookieTest.CookieSupport._staticInstance._timeout = value; }
CookieTest.CookieSupport.get_timeout = function() { return CookieTest.CookieSupport._staticInstance._timeout; }
CookieTest.CookieSupport.set_defaultUserContext = function(value) { CookieTest.CookieSupport._staticInstance._userContext = value; }
CookieTest.CookieSupport.get_defaultUserContext = function() { return CookieTest.CookieSupport._staticInstance._userContext; }
CookieTest.CookieSupport.set_defaultSucceededCallback = function(value) { CookieTest.CookieSupport._staticInstance._succeeded = value; }
CookieTest.CookieSupport.get_defaultSucceededCallback = function() { return CookieTest.CookieSupport._staticInstance._succeeded; }
CookieTest.CookieSupport.set_defaultFailedCallback = function(value) { CookieTest.CookieSupport._staticInstance._failed = value; }
CookieTest.CookieSupport.get_defaultFailedCallback = function() { return CookieTest.CookieSupport._staticInstance._failed; }
CookieTest.CookieSupport.set_path("/CookieSupport.asmx");
CookieTest.CookieSupport.Supported= function(supported,onSuccess,onFailed,userContext) {CookieTest.CookieSupport._staticInstance.Supported(supported,onSuccess,onFailed,userContext); }

This came in very useful for me tracking the number of users that have cookies enabled. This may come in very useful for anybody else who wants to join their client side script via AJAX with a web service. This new technology in ASP.NET has limitless possibilities of joining your existing web services with the client side browser with out having to refresh the content of your page with each and every call that needs to be made.

As always happy coding.

Tags: ,

Posted in SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »

March 7th, 2007

Send Google Maps to your BMW

Google Maps Germany has a new feature: if you have a BMW car that includes a navigation system and you happen to live in Germany, it’s easy to send the address of a local business to your car’s navigation system.

The “send” link from every Google Maps page will open a dialog that lets you fill your BMW account name and send an address plus some notes to your car. This service is free and it works only for businesses in Germany.
Source

Tags: , ,

Posted in Programming, SEO | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »