What is URL Canonicalization?

Posted by reviews on Mar 12, 2008

First of all, what is canonicalization? Canonicalization (abbreviated c14n) is a process for converting data that has more than one possible representation into a “standard” canonical representation.

Example:

[[Egg_salad]]

[[egg salad]]

[[  egg_salad  ]]

As you will notice, each of these are slightly different from one another.

In terms of the URL it is the interpretation of variations of the same url, as seen below. The concern is that Google sees the different variants of the same url in the same way. It’s important to create consistency across your website URLs so you

From Matt Cutt’s Page

Q: What is a canonical url? Do you have to use such a weird word, anyway?
A: Sorry that it’s a strange word; that’s what we call it around Google. Canonicalization is the process of picking the best url when there are several choices, and it usually refers to home pages. For example, most people would consider these the same urls:

  • www.example.com
  • example.com/
  • www.example.com/index.html
  • example.com/home.asp

But technically all of these urls are different. A web server could return completely different content for all the urls above. When Google “canonicalizes” a url, we try to pick the url that seems like the best representative from that set.

Q: So how do I make sure that Google picks the url that I want?
A: One thing that helps is to pick the url that you want and use that url consistently across your entire site. For example, don’t make half of your links go to http://example.com/ and the other half go to http://www.example.com/ . Instead, pick the url you prefer and always use that format for your internal links.

Q: Is there anything else I can do?
A: Yes. Suppose you want your default url to be http://www.example.com/ . You can make your webserver so that if someone requests http://example.com/, it does a 301 (permanent) redirect to http://www.example.com/ . That helps Google know which url you prefer to be canonical. Adding a 301 redirect can be an especially good idea if your site changes often (e.g. dynamic content, a blog, etc.).

Essentially, you would want to canonicalize the following situations:


The BIGGEST Search Engine Optimization Mistakes

Posted by reviews on Nov 28, 2007

There are arguably many, many mistakes that could be made as far as Search Engine Optimization (SEO) is concerned, however I am listing some of the biggest and most common mistakes.


The Power of Google

Posted by reviews on Oct 12, 2007

Google seems to be growing exponentially. During the month of August alone they were reported to have had 31 billion queries throughout the world. In total more than 37 billion searches were carried out across all Google sites. Second on the list was Yahoo!, which had a paltry 8.5 billion searches recorded during the month. Yahoo!, Ask, and MSN Search do not come anywhere near being as used or have the level of usage. Although there was a poll conducted recently showing that Yahoo! had a slight lead for users in terms of being a favored search engine. These users still seem to be using Google anyway.

The Asia-Pacific region, including China, Japan and India, contained the greatest number of unique searchers, with 258 million conducting over 20 billion searches during the month.

Since Google has such dominance, as far as search engines are go, they also can greatly influence the way websites are designed, how people market their sites, and to that end, which websites people visit. They can also pretty much dictate the cost of advertising as well. This level of control over something so pervasive, being the internet itself, makes one think that they have more influence now than Microsoft ever had during its hayday.

I believe they are bordering on monopoly control of the internet. I wonder if they will be eventually targeted for regulation or some sort of government instituted penalties, such as being broken up.


SERP Rank - The alternative to Page Rank

Posted by reviews on Oct 4, 2007

Most of us know or suspect that PageRank has little meaning these days.

Here’s my idea:

Someone (not necessarily Google) creates a toolbar and/or small site banners showing the current SERP for specific terms for that page.

1. Roll your mouse over the top of the banner and you then see the terms or key words (the banner could also show a numerical value, such as “page 1, pos 3″ or just “P1,p3″ for the page showing up on the first page at position 3 on the page.

2. The toolbar could show the search terms and the “SERP rank“.

3. There could be a site that could show graphs of SERP changes and notify the user when there are sudden drops or increases in the SERP.  Perhaps this could be a paid for service to cover costs.

I want to know your thoughts on this. :)


Improving Search Engine Results Page (SERP) Rank

Posted by reviews on Oct 1, 2007

Wikopedia describes the Search Engine Results Page (SERP) as “the listing of web pages returned by a search engine in response to a keyword query. The results normally include a list of web pages with titles, a link to the page, and a short description showing where the keywords have matched content within the page. A SERP may refer to a single page of links returned, or to the set of all links returned for a search query.”

The SERPs are the single most important thing with respect to search engines and optimizing your website, and in particular, the individual pages. The SERPs are the “free” way of getting targetted traffic to your site. The usual alternative is paid advertising, which is anything but free. Search Engine Optimization is the science (or art) of optimizing a website for the best SERP results possible for range of potential search terms that the webmaster is trying to improve upon for the SERPs. The higher the position on the page, preferably the first page, of the SERP the better. Most search engine users only look at the top choices on the first page.

Each search term, such as “reviewer” or “reviewer of sites“, will have different results for the respective search engine. You will even notice that the results may change from day to day for the exact same search term on the specific search engine. There are many variables to consider when optimizing your site and pages, including how and what your competition is doing on their end for the given search terms. In my opinion, the best methods for improving your SERPs for the specific search terms is to do the following:

It does take time to work your way toward the top of the SERPs, especially if the keywords have a lot of competition, so you will need to be patient. The important thing is to not do anything that could get you penalized by Google, since it is usually the SERPs for your site that takes the hit.


The Best Free SEO, Marketing, PPC Tools

Posted by reviews on Sep 6, 2007

Below is a list of the best tools on the internet for SEO, internet marketing, PPC (pay-per-click) and webmaster related needs.

Backlinks (who is linking to you):

Domain Pop Backlink Checker - This is probably the best backlink tool out there. It sorts by site, so if you have sitewide backlinks, you won’t have to look at pages and pages of backlinks from the same site. Fantastic!

iWebTool’s Backlink Checker

Backlink Watch

Key Words (helpful for PPC - Adsense, YPN, etc):

Google Key Word Tool - You can’t beat Google’s own keyword tool!

SEO Key Word Tool

Assortment of Web-based Tools

SEOChat - On left side

Smart PageRank

WebConf

Browser based tools:

–Firefox Plugins–

Smart Pages plugin - Very handy tool!

SEOQuake - shows nofollow links, along with much more

Niche Watch Tool - Excellent tool for studying keywords. As it states, “This wonderful SEO extension provides you the technical information required to beat your competitor websites in serps.”

SearchStatus - Also shows nofollow links, plus much more

Alexa Sparky - Official tool for Alexa. Displays a nice graph along your bottom task bar. Also, helps to boost your Alexa rank (not very relevant unless you sell blog posts)


Search Engine Optimization and Marketing Terms and Acronyms

Posted by reviews on Aug 31, 2007

302

Found - The requested resource resides temporarily under a different URI. Since the redirection might be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.
304
Not Modified - If the client has performed a conditional GET request and access is allowed, but the document has not been modified, the server SHOULD respond with this status code.
307
Temporary Redirect - The requested resource resides temporarily under a different URI. Since the redirection MAY be altered on occasion, the client SHOULD continue to use the Request-URI for future requests.
400
Bad Request - The request could not be understood by the server due to malformed syntax. The client SHOULD NOT repeat the request without modifications.
401
Unauthorized - The request requires user authentication. The response MUST include a WWW-Authenticate header field containing a challenge applicable to the requested resource.
403
Forbidden - The server understood the request, but is refusing to fulfill it. Authorization will not help and the request SHOULD NOT be repeated.
404
Not Found - The server has not found anything matching the Request-URI. No indication is given of whether the condition is temporary or permanent.
410
Gone - The requested resource is no longer available at the server and no forwarding address is known. This condition is expected to be considered permanent.
500
Internal Server Error - The server encountered an unexpected condition which prevented it from fulfilling the request.
501
Not Implemented - The server does not support the functionality required to fulfill the request. This is the appropriate response when the server does not recognize the request method and is not capable of supporting it for any resource.

Adsense

Algorithm

Anchor Text

Anchor text refers to the visible text for a hyperlink. For example:

< a href=”http://www.reviewerofsites.com/” >This is the anchor text< /a >
ATW

AlltheWeb (search engine)

Authority

A website that is considered by Google or other search engines to be trusted and usually given a high PageRank.

BackLink
A web page that has a hyperlink to one of your pages, usually the home page.
B2B

Business to Business

Blog

BOT

Abbreviation for robot (also called a spider). It refers to software programs that scan the web. Bots vary in purpose from indexing web pages for search engines to harvesting e-mail addresses for spammers.

BOTW

Best Of The Web (Directory)

Cache
CAPTCHA
Completely Automated Public Turing Tests to Tell Computers and Humans Apart

Cloaking
Cloaking describes the technique of serving a different page to a search engine spider than what a human visitor sees. This technique is abused by spammers for keyword stuffing. Cloaking is a violation of the Terms Of Service of most search engines and could be grounds for banning.
Canonicalization

Click Fraud

Conversion
Conversion refers to site traffic that follows through on the goal of the site (such as buying a product on-line, filling out a contact form, registering for a newsletter, etc.). Webmasters measure conversion to judge the effectiveness (and ROI) of PPC and other advertising campaigns. Effective conversion tracking requires the use of some scripting/cookies to track visitors actions within a website. Log file analysis is not sufficient for this purpose.

Content
Context
Relevance of one element of a web page’s content to other elements of web page’s content.

CPL

Cost Per Lead
CPM
Cost Per Thousand
CPS
Cost Per Sale
CPC
Cost Per Click (Google’s PPC Program)
CTA
Content Targeted Adverstising - It refers to the placement of relevant PPC ads on content pages for non-search engine websites.
CTR
Click Through Rate
CVC | CVC2
Card Verification Code

Data Center

Dedicated Server

Deep Link
A hyperlink from another website that is pointing to one of your website pages, other than the home page.
Directory
A web directory lists web sites by category and subcategory.

DMOZ
Directory MOZilla (a directory)
DNS
Domain Name System

Domain Name Doorway Page
A doorway page exists solely for the purpose of driving traffic to another page. They are usually designed and optimized to target one specific keyphrase. Doorway pages rarely are written for human visitors. They are written for search engines to achieve high rankings and hopefully drive traffic to the main site. Using doorway pages is a violation of the Terms Of Service of most search engines and could be grounds for banning.
Duplicate Content

Dynamic IP

EPC

Earnings Per Click

FFA
Free For All - FFA sites post large lists of unrelated links to anyone and everyone. FFA sites and the links they provide are basically useless. Humans do not use them and search engines minimize their importance in ranking formulas.

GAP
Google Advertising Professionals
GOOGLE
Google (a search engine)
HTTP
Hypertext Transfer Protocol
HTTPS
HyperText Transfer Protocol Secure
IBL
Inbound Link (see Back Link)
KDA
Keyword Density Analyzer
KEI
Keyword Effectiveness Index
Keyword/Keyphrase
Keywords are words or terms which are used in search engine queries. Keyphrases are phrases consisting of multiple words that are used in search engine queries.
Keyword Stuffing
Keyword stuffing refers to the practice of adding (many) keywords to a web page for the benefit of influencing how a search engine ‘perceives’ the page. This is not for the benefit of human visitors.

Link Building
The process of finding or acquiring backlinks to your website or sites.
Link Farm

A link farm is a group of separate, highly interlinked websites for the purposes of inflating link popularity.

Link Juice

Link Popularity

Long Tail

LS

LookSmart (a PPC Directory)
LSA
Latent Semantic Analysis
LSI
Latent Semantic Indexing

Matt Cutts
Google Employee in charge of preventing or limiting SPAM in the Google search engine results pages. His blog.

Monetize
Nofollow
Noindex
OBL
Outbound Link

Organic Results

Page Rank (or PageRank)
PageRank is a numerical weighting (0 to 10, 10 being the highest) based upon the Google link analysis algorithm.
PFI

Pay For Inclusion

PPC
Pay Per Click
PPR
Pay Per Rank
PPV
Pay Per Visitor
PR
Google PageRank™ is a numerical weighting (0 to 10, 10 being the highest) based upon the Google link analysis algorithm.
PR0
PageRank Zero - the lowest actual PageRank given by Google.

Proxy

PSA
Public Service Ad
PubCon
Reciprocal Link

A two way hyperlink between websites.
Robots.txt

Robots.txt is a file which well behaved spiders read to determine which parts of a website they may visit.
SEM
Search Engine Marketeer
Search Engine Marketer
Search Engine Marketing
SEMPO
Search Engine Marketing Professional Organization
SEO
Search Engine Optimization
Search Engine Optimizer
SEP
Search Engine Placement
Search Engine Positioning
Search Engine Promotion
SERPs
Search Engine Results Pages
SEs
Search Engines
SES
Search Engine Strategies (a conference)
SEU
Search Engine Usability
Sitemap
Shared Server
SMM
Social Media Marketing
SMO
Social Media Optimization

Snap shot

SPAM

Sites Positioned Above Me

Spamdexing
Spamdexing was describes the efforts to spam a search engine’s index.

Spider
Also called a bot (or robot). Spiders are software programs that can scan the web, normally following hyperlinks (links) throughout a website (via internal links) and to other websites (via external links). They have many different purposes, including indexing web pages for search engines to finding e-mail addresses for email spammers.
SSL
Secure Socket Layer

Static IP
An IP address that does not change.
Subdomain

Text Link
TLD
Top Level Domain
TOS
Terms of Service

URI
Uniform Resource Identifier
URL
Uniform Resource Locater
W3C

World Wide Web Consortium
Webmaster Tools
Wikipedia
A is a multilingual, web-based, free content encyclopedia project, operated by the Wikimedia Foundation, a non-profit organization. It is the largest, most extensive and fastest growing encyclopedia currently available on the Internet.

Y!

Yahoo! (a directory)