Archive for the ‘How To’ Category

January 26th, 2010

Gracefully Failing

Today I just noticed that my blog’s CDN (Google AppEngine) was failing to resolve.  After a little research I realized that WebSense, a corporate web filtering software and scourge of intranet users that need to be productive, had decided that the IP Address range that Google AppEngine uses should be completely blocked.  And because most corporate IT departments just blindly apply the WebSense rules that they are sent, with out first verifying that they make sense; my blog is completely without JavaScript and CSS on these corporate networks.

The good news however is that I now know that my blog gracefully downgrades and is still very readable without JavaScript or CSS running.

gracefully-failing

This is an important test you should run on your blog.  Because this is how search bots, screen readers, and blocked content users will see your blog.  A couple tips to keep in mind when designing your website, in order for it to be readable when under any of the circumstances that I previous listed are:

  1. Make sure the main content of the page is the first element right below your header.
  2. Make sure to use as many standard HTML elements as possible, such as H1, H2, H3, EM, STRONG, B, I, and etc. This will keep you website looking good when CSS fails.
  3. Make sure that reading of content is not reliant on JavaScript, because if JavaScript is turned off or the client doesn’t support JavaScript (search bots, screen readers, etc), then your website will be unreadable.

Does your site gracefully fail?

Posted in How To | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »

January 12th, 2010

Determining A Significant Change In A Web Page

The problem with many web pages today is that they include useless hidden pieces of constantly changing information that serves no purpose to the general web or anything else besides debugging.  A pretty good example of this is one that I found on my own blog, and removed, as I was experimenting with the code in this post:

<!-- 12 queries. 0.274 seconds. -->

This really serves no purpose to anybody but the the few debuggers who might be looking at the code for performance reasons once every so often. 

You may be saying what’s the big deal, it is a comment and it is not seen by the user, so who cares? Well you are right nobody does care, and this post isn’t going to change that fact, this type of debugging flair is going to continue on for the entire lifespan of the web. 

But it does cause problems for any service that wants to figure out if your webpage has been changed overtime.  Like search engines, proxy caches, and the service I am working on that inspired this post.

The simplest method for determining a change

The simplest method for determining a change is to use a hashing algorithm, like MD5, which will provide you a 128-bit (16 byte) hash of whatever length text you put in it.

MD5("The quick brown fox jumps over the lazy dog")
    = 9e107d9d372bb6826bd81d3542a419d6

Even a small change results it a completely different hash value.

MD5("The quick brown fox jumps over the lazy dog.")
    = e4d909c290d0fb1ca068ffaddf22cbd0

So as you can see, by just adding a period to the end of the string, it resulted in a completely different hash of the string.  And this is an important and desirable property of most hashing algorithms called the avalanche effect, because it allows you to determine if there have been even the smallest of changes made.

Because of the avalanche effect, you are able to just store the hash of your old value and compare it to the hash of your new value to determine if there has been a change made.  And for reasons I won’t get into, in this post, this is desirable because you can use it compare inputs of any size, 1 B, 1 KB, 1 MB, 1 GB, 1 TB, and etc.  And with each case it will only produce a simple to store 16 byte output.

The Problem

Many of you already know where I am going with this.  But given the fact that many web pages contain this constantly changing debugging flair that will produce different hashes with every request. 

How can you determine if a change has been significant enough to warrant the CPU cycles to reprocess the web page?

Well that was the problem I was stuck trying to figure out.  Because after I realized that a simple hashing comparison wasn’t enough and I was spending too much time storing and processing pages that really didn’t change.  I determined a couple simple rules for determining if a change was significant enough to warrant a reprocessing.  

  1. Any change less that 5 characters wouldn’t be counted.
  2. And the whole web page had to change by more than 1% for it to be counted as significant.

After I had these simple rules, I decided to turn to an O(ND) Difference Algorithm like the one used on StackOverflow.  After a little research I determined that the StackOverflow team was probably using this difference algorithm created in C# and has been around and maintained since 2002.

Using the difference algorithm I created the following method to check for significant changes.

private bool IsChangeSignificant(string text1, string text2)
{
	var differences = Difference.DiffText(text1, text2, true, true, true);
	int text2SignificantChangeLength = 0;
	int text2Length = text2.Length;

	foreach (var diff in differences)
	{
		if (diff.insertedB > 5)
			text2SignificantChangeLength += diff.insertedB;
	}

	// if more than 1% of the document has changed of changes larger than 5 characters,
	// then it is considered significant
	// TODO: probably need a more robust solution in the future
	return ((double)text2SignificantChangeLength / (double)text2Length) > 0.01;
}

As my TODO indicates I probably need to find a more robust solution in the future, however this simple routine, so far, has given me the results I was looking for.  For the most part all the debugging flair has been marginalized as related to comparing two web pages.

However there are some downsides I can anticipate in the future. 

  1. A 1% change for a 1 KB web page is only about 100 characters, which is probably enough to weed out the debugging flair, however a 1% change for a 1 MB web page is about 1000 characters, and this is well beyond the debugging flair changes.
  2. The CPU overhead is more significant in comparing text this way, than comparing two hashes.  In my case it made sense for various reasons to incur this overhead, it might not for you.
  3. Weeding out 5 characters and above for a significant change is arbitrary, because if depends on where those 5 characters occurred at within the web page.  If it was in the main body, it is probably a more significant change than if it occurred in the footer.

Tags: , , , ,

Posted in C#, How To | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »

January 1st, 2010

Show CodeRush Xpress 9.3 Menu in Visual Studio

As promised here is the updated scripts for CodeRush Xpress 9.3, which was released a few week ago.

The Keyboard Command Way

Shift+Ctrl+Alt+O

The Registry Hack Way

I will state this again: I really wish DevExpress would stop treating the registry as a dumping ground and creating a new parallel registry path with each new install, it makes customizing the registry settings very difficult to keep up with. I could see it for each major version, but common is a new registry path really needed for each minor version?

Tags: , ,

Posted in How To | kick it on DotNetKicks.com | Bookmark | View blog reactions | 2 Comments »

December 15th, 2009

Server Backup On The Cheap: Backup For Less Than $10.00 A Year

I have started and stopped this post probably about 10 times now.  I just didn’t feel it was that interesting.  But there probably isn’t a better time to capitalize on this post than this week, because of a certain few widely known bloggers, who should have known better, and had their websites go down with out any backups. 

codinghorror-down

This post won’t be as flashy as Robs method, but it has worked very well for my WordPress, MySQL, and Windows deployment for 2 and 1/2 years.  So here is the original post.


This method of backing up your web server very cheaply has actually been deployed on this server since mid-2007, so it has been working flawlessly, with very little maintenance and work over the past 2 and 1/2 years.  And it has saved my but once or twice.

Server Setup

Before I jump in to the specifics, this post is geared towards a WIMP (Windows, IIS, MySQL, and PHP) install, or more specifically a Windows WordPress install. Here are the tools that I am using to accomplish this:

Make sure you install everything from above that you haven’t already done so on your server.

GoDaddy Domain w/ Free FTP

The first thing you need to do is buy a from GoDaddy if you don’t already have one, and then venture over to setup your free hosting that came with your domain.  With this hosting you will get an FTP account to access this hosting account.  I am not going to cover this anymore, because if you need help GoDaddy has many resources that will help you get started. 

Get your FTP setup with your username and password for the FTP access and put those aside you will need them for later.

Backup Batch File

From this point on we are going to get in to the real meat of the backup process.  The first thing we are going to want to setup is the batch file that will run on the schedule task. 

The first part of the batch file is the setup for the processing.

@echo OFF
CLS
Title Website Backup

@rem Date Configuration
for /f "tokens=1-4 delims=/ " %%a in ('date/t') do (
set weekday=%%a
set month=%%b
set day=%%c
set year=%%d
)

@rem Backup Configuration
set servername=localhost
set database=managedfusion coderjournal
set backupdir="C:\websites"
set supportdir=%backupdir%\_support
set databasedir=%backupdir%\_db_backups
set ftpcommands=%supportdir%\ftp-commands.txt
set logfile=%supportdir%\backup.log
set zipfile=%backupdir%\backup.%weekday%.7z
set sqlfile=%databasedir%\backup.%weekday%.sql
set zip="C:\program files\7-Zip\7z.exe"
set mysqldump="C:\Program Files\MySQL\MySQL Server 5.1\bin\mysqldump.exe"
set mysqluser=root
set mysqlpassword=FlufflyBunnies1234

set start=%date% - %time%: Database Backup of %database% Started

A couple things that you are going to want to change right away are the database, backupdir, and mysqlpassword.  (By the way I was that is not my real password.)  Nothing really interesting here, I made this batch file very configurable, so that I could use it in future deployments.  Another interesting setting you will want to look at is the sqlfile which has a special %weekday% command in it that will insert the day of the week.  This means that if something goes wrong you will have the last week worth of data to look through.

The next thing we want to do is clean up any logs that where left around last time the process was run, and start the logging process for this run of the batch file.

@rem Remove Old Log
del /f /q %logfile%

@rem Start Logging
echo %start%
echo %start% >> %logfile%

The logging isn’t too detailed, just that the batch file started and stopped and how long it took.  Next we are going to use a special program, that comes with MySQL, called mysqldump.exe, which can be used to dump the schema and the data in the databases. 

@rem dump database. This is all one line
del %sqlfile%
%mysqldump% --user=%mysqluser% --password=%mysqlpassword% --comments --create-options --extended-insert --tz-utc --result-file=%sqlfile% --databases %database%
if not exist %sqlfile% goto FAIL_DUMP

This will dump the entire schema and data in to a SQL file that can be used to rebuild your entire database as of the time the backup was run if you ever needed to do so.  After the database is backed up, which could take some time depending on the size of your database, it is now time to create an easily transferable file.  Because you will have your entire database file, as well as your images, and anything else that is deployed on your website.  To do this I choose to use 7-Zip, which has a very nice command line tool, to compact everything in to one easily transferred file over FTP.

@rem Zip up database
del %zipfile%
%zip% a -t7z -p%mysqlpassword% -mx=1 -mhe=on -x!*.7z -r -y %zipfile% %backupdir%\*
if not exist %zipfile% goto FAIL_GZIP

I also used the MySQL password to secure the file, just as an extra added layer of protection.  Next we are going to FTP our newly created zip file to our hosting previous created from GoDaddy.  In Windows FTP can be done from the command line, but it takes an extra commands file, that feeds the FTP command line a list of commands to execute against the server.  The code I am going to show you is part of the batch file.

@rem FTP archives offsite
ftp -n -s:%ftpcommands%

Very simple right.  The next thing is the list of commands which sits in another file appropriately called ftp-commands.txt that sits right next to the batch file we have been creating.

open
myFTPhosting.com
user
myUSERaccount
FluffyBunnies1234
bin
put backup.Mon.7z
put backup.Tue.7z
put backup.Wed.7z
put backup.Thu.7z
put backup.Fri.7z
put backup.Sat.7z
put backup.Sun.7z
quit

You are going to want to change the following:

  • Line 2: your domain name for the FTP hosting you setup above
  • Line 3: your FTP username
  • Line 4: your FTP password (still not my password I swear)

After you modify those lines save the file and we can continue with the batch file setup for the backup.  The rest really doesn’t need much explaining, because it is just logging and error handling.

@REM All is well
GOTO SUCCESS

:FAIL_DUMP
SET message=%date% - %time%: Database Dump of %database% Failed
GOTO END
:FAIL_GZIP
SET message=%date% - %time%: Backup Compression of %database% Failed
GOTO END
:SUCCESS
SET message=%date% - %time%: Backup of %database% Completed Succesfully
GOTO END

:END
ECHO %message%
ECHO %message% >> %logfile%

Scheduling Your Backup

The next thing we need to do is setup a schedule for how often we want to run this batch file we have just created.  Just open up the Task Scheduler and click Create Basic Task… to bring up the wizard window.

task-step1

Just continue through the wizard and setup the settings how you want. I choose it to run daily at 1:00 AM.

task-step2

Make sure to fill in the Start in field to be the root of your website, even though it is not the location of your batch file.  This is because the batch file is setup from the perspective of that it will be running in the root of the website.

task-step3

After you have finished this wizard kick it off once just to make sure everything is running fine.

Sit Back And Relax

You now have a daily backup of your website.  So sit back and relax.  If your server ever fails, you need to switch hosts, or reproduce the website on your local machine for some development.  You have an easy backup of everything you need to get rolling in a relatively small amount of time.  Plus as an added bonus you don’t have to lose sleep over not having any backups of your website that you have spent countless hours creating.

Like I said this process has been running with out problems since I started this blog 2 and 1/2 years ago. So you can rest assured that it will work for you, if you follow the steps above exactly as they are laid out.

Tags:

Posted in How To | kick it on DotNetKicks.com | Bookmark | View blog reactions | No Comments »