Source: Wikipedia http://en.wikipedia.org/wiki/Search_engine_optimization
Search engine optimization (SEO)
is a set of methods
aimed at improving the ranking of a website
in search engine
listings, and could be considered a subset of search engine marketing. The term SEO (Search Engine Optimizers) also
refers to an industry of consultants who carry out optimization projects on behalf of clients' sites. Some
commentators, and even some SEOs, break down methods used by practitioners
into categories such as "white hat SEO" (methods generally
approved by search engines, such as building content and improving site
quality), or "black hat SEO" (tricks such as cloaking
and spamdexing). White hatters charge that black hat methods
are an attempt to manipulate search rankings unfairly. Black hatters
counter that all SEO is an attempt to manipulate rankings, and
that the particular methods one uses to rank well are irrelevant.
Search engines display different kinds
of listings in the search engine results pages (SERPs), including: pay per click advertisements, paid inclusion
listings, and organic search
results. SEO is primarily concerned with advancing the goals of a website
by improving the number and position of its organic search
results for a wide variety of relevant keywords.
SEO strategies can increase both the number and quality of visitors,
where quality means visitors who complete the action hoped for by the
site owner (e.g. purchase, sign up, learn something). Search engine
optimization is sometimes offered as a stand-alone service, or as a
part of a larger marketing effort, and can often be very effective when
incorporated into the initial development and design of a site.
For competitive, high-volume search terms,
the cost of pay per click advertising can be substantial. Ranking well in the organic
search results can provide the same targeted traffic at a potentially
significant savings. Site owners may choose to optimize their sites
for organic search, if the cost of optimization is less than the cost
of advertising.
Not all sites have identical goals for
search optimization. Some sites are seeking any and all traffic, and
may be optimized to rank highly for common search phrases. A broad search
optimization strategy can work for a site that has broad interest, such
as a periodical, a directory,
or site that displays advertising with a CPM
revenue model. In contrast, many businesses try to optimize their sites
for large numbers of highly specific keywords that indicate readiness
to buy. Overly broad search optimization can hinder marketing strategy
by generating a large volume of low-quality inquiries that cost money
to handle, yet result in little business. Focusing on desirable traffic
generates better quality sales leads,
resulting in more sales. Search engine optimization can be very effective
when used as part of a smart niche marketing
strategy.
History
Early search engines
Webmasters and content providers began
optimizing sites for search engines in the mid-1990s,
as the first search engines were cataloging the early Web.
Initially, all a webmaster
needed to do was submit a site to the various engines which would run spiders,
programs to "crawl" the site, and store the collected data.
The default search-bracket was to scan an entire webpage for so-called
related search words, so a page with many different words matched more
searches, and a webpage containing a dictionary-type listing would match
almost all searches, limited only by unique names. The search engines
then sorted the information by topic, and served results based on pages
they had spidered. As the number of documents online kept growing, and
more webmasters realized the value of organic search listings, some
popular search engines began to sort their listings so they could display
the most relevant pages first. This was the start of a friction between
search engine and webmasters that continues to this day.
At first search engines were guided by
the webmasters themselves. Early versions of search algorithms
relied on webmaster-provided information such as category and keyword meta tags,
or index files in engines like ALIWEB.
Meta-tags provided a guide to each page's content. When some webmasters
began to abuse meta tags, causing their pages to rank for irrelevant
searches, search engines abandoned their consideration of meta tags
and instead developed more complex ranking algorithms,
taking into account factors that elevated a limited number of words
(anti-dictionary) and were more diverse, including:
- Text within the title tag
- Domain name
- URL
directories and file names
- HTML tags:
headings, bold and emphasized text
- Term
frequency, both in the document
and globally, often misunderstood and mistakenly referred to as Keyword density
- Keyword proximity
- Keyword adjacency
- Keyword sequence
- Alt attributes
for images
- Text within NOFRAMES tags
Pringle, et al. (Pringle et al., 1998) [1],
also defined a number of attributes within the HTML source of a page
which were often manipulated by web content providers attempting to
rank well in search engines. But by relying so extensively on factors
that were still within the webmasters' exclusive control, search engines
continued to suffer from abuse and ranking manipulation. In order to
provide better results to their users, search engines had to adapt to
ensure their SERPs showed the most relevant search results, rather than
useless pages stuffed with numerous keywords by unscrupulous webmasters
using a bait-and-switch lure to display unrelated webpages. This led
to the rise of a new kind of search engine.
Organic search engines
Google was started by two PhD students at Stanford University, Sergey Brin
and Larry
Page, and brought a new concept
to evaluating web pages. This concept, called PageRank,
has been important to the Google algorithm from the start [2].
PageRank relies heavily on incoming links
and uses the logic that each link to a page is a vote for that page's
value. The more incoming links a page had the more "worthy"
it is. The value of each incoming link itself varies directly based
on the PageRank of the page it comes from and inversely on the number
of outgoing links on that page.
With help from PageRank, Google proved
to be very good at serving relevant results. Google became the most
popular and successful search engine. Because PageRank measured an off-site
factor, Google felt it would be more difficult to manipulate than on-page
factors.
However, webmasters had already developed
link-manipulation tools and schemes to influence the Inktomi search
engine. These methods proved to be equally applicable to Google's algorithm.
Many sites focused on exchanging, buying, and selling links on a massive
scale. PageRank's reliance on the link as a vote of confidence in a
page's value was undermined as many webmasters sought to garner links
purely to influence Google into sending them more traffic, irrespective
of whether the link was useful to human site visitors.
Further complicating the situation, the
default search-bracket was still to scan an entire webpage for
so-called related search-words, and a webpage containing a dictionary-type
listing would still match almost all searches (except special names)
at an even higher priority given by link-rank. Dictionary pages and
link schemes could severely skew search results.
It was time for Google -- and other search
engines -- to look at a wider range of off-site factors. There were
other reasons to develop more intelligent algorithms. The Internet was
reaching a vast population of non-technical users who were often unable
to use advanced querying techniques to reach the information they were
seeking and the sheer volume and complexity of the indexed data was
vastly different from that of the early days. Search engines had to
develop predictive, semantic, linguistic
and heuristic algorithms. Around the same time as the work
that led to Google, IBM
had begun work on the Clever Project [3],
and Jon Kleinberg
was developing the HITS algorithm.
A proxy for the PageRank metric is still
displayed in the Google Toolbar,
but PageRank is only one of more than 100 factors that Google considers
in ranking pages.
Today, most search engines keep their
methods and ranking algorithms secret, to compete for finding the most
valuable search-results and to deter spam pages from clogging those
results. A search engine may use hundreds of factors in ranking the
listings on its SERPs; the factors themselves and the weight each carries
may change continually. Algorithms can differ widely: a webpage that
ranks #1 in a particular search engine could rank #200 in another search
engine.
Much current SEO thinking on what works
and what doesn't is largely speculation and informed guesses. Some SEOs
have carried out controlled experiments to gauge the effects of different
approaches to search optimization.
The following factors are speculation
on some of the considerations search engines may presently be using
or which could be built into their algorithms. A number of these are
taken from one of Google's patent applications [4],
and may give some indication as to what is in the pipeline. Some are
pure speculation. It's also good to keep in mind that Google has over
180 patents and patent applications assigned to them at the US Patent and Trademark Office (USPTO), and a number of those include possible
insights into other factors, and other directions that the search engine
may follow, some of which may not be consistent with this list.
- Age of site
- Length of time domain has
been registered
- Age of content
- Frequency of content: regularity
with which new content is added
- Text size: number of words
above 200-250 (not affecting Google in 2005)
- Age of link and reputation
of linking site
- Standard on-site factors
- Negative scoring for on-site
factors (for example, a dampening for websites with extensive keyword
meta-tags indicative of having been optimized [^SEO-ed])
- Uniqueness of content
- Related terms used in content
(the terms the search engine associates as being related to the main
content of the page)
- Google Pagerank (Only used
in Google's algorithm)
- External links, the anchor
text in those external links and in the sites/pages containing those
links
- Citations and research sources
(indicating the content is of research quality)
- Stem-related terms in the
search engine's database (finance/financing)
- Incoming backlinks and anchor
text of incoming backlinks
- Negative scoring for some
incoming backlinks (perhaps those coming from low value pages, reciprocated
backlinks, etc.)
- Rate of acquisition of backlinks:
too many too fast could indicate "unnatural" link buying activity
- Text surrounding outward links
and incoming backlinks. A link following the words "Sponsored Links"
could be ignored
- Use of "rel=nofollow"
to suggest that the search engine should ignore the link
- Depth of document in site
- Metrics collected from other
sources, such as monitoring how frequently users hit the back button
when SERPs send them to a particular page
- Metrics collected from sources
like the Google Toolbar, Google AdWords/Adsense
programs, etc.
- Metrics collected in data-sharing
arrangements with third parties (like providers of statistical programs
used to monitor site traffic)
- Rate of removal of incoming
links to the site
- Use of sub-domains, use of
keywords in sub-domains and volume of content on sub-domains… and
negative scoring for such activity
- Semantic connections of hosted
documents
- Rate of document addition
or change
- IP of hosting service and
the number/quality of other sites hosted on that IP
- Other affiliations of linking
site with the linked site (do they share an IP? have a common postal
address on the "contact us" page?)
- Technical matters like use
of 301 to redirect moved pages, showing a 404 server header rather than
a 200 server header for pages that don't exist, proper use of robots.txt
- Hosting uptime
- Whether the site serves different
content to different categories of users (cloaking)
- Broken outgoing links not
rectified promptly
- Unsafe or illegal content
- Quality of HTML coding, presence
of coding errors
- Actual click through rates
observed by the search engines for listings displayed on their SERPs
- Hand ranking by humans of
the most frequently accessed SERPs
The relationship between
SEO and the search engines
The first mentions of Search Engine Optimization
don't appear on Usenet until 1997, a few years after the launch of the
first Internet search engines. The operators of search engines recognized
quickly that some people from the webmaster community were making efforts
to rank well in their search engines, and even manipulating the page
rankings in search results. In some early search engines, such as Infoseek,
ranking first was as easy as grabbing the source code of the top-ranked
page, placing it on your website, and submitting a URL to instantly
index and rank that page.
Due to the high value and targeting of
search results, there is potential for an adversarial relationship between
search engines and SEOs. In 2005, an annual conference named AirWeb
was created to discuss bridging the gap and minimizing the sometimes
damaging effects of aggressive web content providers.
Some more aggressive site owners and
SEOs generate automated sites or employ techniques which eventually
get domains banned from the search engines. Many search engine optimization
companies, which sell services, employ long-term, low-risk strategies,
and most SEO firms that do employ high-risk strategies do so on their
own affiliate, lead-generation, or content sites, instead of risking
client websites.
Some SEO companies employ aggressive
techniques that get their client websites banned from the search results.
The Wall Street Journal
profiled a company which allegedly used high risk techniques and failed
to disclose those risks to its clients.[5] Wired
reported the same company sued a blogger for mentioning that they were
banned.[6]
Google's Matt
Cutts later confirmed that
Google did in fact ban Traffic Power and some of its clients.[7].
Google has enforced webpage restrictions
for years, such as for hidden-text (background and foreground colors
the same hue); in 2006, Google could punish a non-standard website by
blocking search-results, automatically, the next day for 30-35 days
(or longer), pending a reinclusion request, and if reinstated, revert
the index to old/expired/deleted webpages from a year earlier, delaying
the re-indexing of the current website for a total of 2-4 months.
Yahoo and MSN Search do not automatically
punish entire websites for small amounts of accidental hidden text.
Not surprisingly, Google's market share of daily searches has fallen
rapidly from 75% to 56% over the past few years, as other search engines
find many valuable webpages that Google has banned and cannot display
due to Google's severely limited index. In early 2006, MSN Search typically
re-indexed small websites every 14 days, and Yahoo also re-indexed quickly,
much faster than Google, but all three MSN/Yahoo/Google could require
more than a month to index a new page (new file name) on an old website.
Some search engines have also reached
out to the SEO industry, and are frequent sponsors and guests at SEO
conferences and seminars. In fact, with the advent of paid inclusion,
some search engines now have a vested interest in the health of the
optimization community. All of the main search engines provide information/guidelines
to help with site optimization: Google's, Yahoo's, MSN's
and Ask.com's.
Google has a Sitemaps program
to help webmasters learn if Google is having any problems indexing their
website and also provides an data on Google traffic to the website.
Yahoo! has SiteExplorer
that provides a way to submit your URLs for free (like MSN/Google),
determine how many pages are in the Yahoo index and drill down on inlinks
to deep pages. Yahoo! has an Ambassador Program
and Google has a program for qualifying Google Advertising Professionals.
Getting into search engines' listings
New sites do not need to be "submitted"
to search engines to be listed. A simple link from an established site
will get the search engines to visit the new site and begin to spider
its contents. It can take a few days or even weeks from the acquisition
of a link from such an established site for all the main search engine
spiders to commence visiting and indexing the new site.
Once the search engine has found the
new site, it will generally visit and start to index the pages on the
site, as long as all the pages are linked to with anchor tag hyperlinks.
Pages which are accessible only through Flash
or Javascript links may not be findable by the spiders.
Search engine crawlers may look at a
number of different factors when crawling
a site, and many pages from a site may not be indexed by the search
engines until they gain more pagerank or links or traffic. Distance
of pages from the root directory of a site may also be a factor in whether
or not pages get crawled, as well as other importance metrics. Cho et
al. (Cho et al., 1998) [8]
described some standards for those decisons as to which pages are visited
and sent by a crawler to be included in a search engine's index.
Webmasters can instruct spiders to not
index certain files or directories through the standard robots.txt
file in the root directory of the domain. Standard practice requires
a search engine to check this file upon visiting the domain, though
a search engine crawler will keep a cached copy of this file as it visits
the pages of a site, and may not update that copy as quickly as a webmaster
does. The web developer can use this feature to prevent pages such as
shopping carts or other dynamic, user-specific content from appearing
in search engine results, as well as keeping spiders from endless loops
and other spider traps.
For those search engines who have their
own paid submission (like Yahoo), it may save some time to pay a nominal
fee for submission, though Yahoo's paid submission program does not
guarantee inclusion in their search results.