Getting Your Site Ready for Google
If you want people and Google to notice your site, you’ve got to make it presentable. That means paying attention to visual, interactive, and technical details. While Google doesn’t give a hoot about color scheme or whether your humor site is actually funny, it does care about many of the same things your visitors do. If your site is difficult for Google to roam through and read, the search engine is unlikely to index it properly and show it in search results as you’d like.
Tip: As a Webmaster, it behooves you to keep up with the latest trends in search engines and how best to prepare your site for maximum indexing, impact, and, ultimately, visitors. (Geeks and other techno-wonks call this process search engine optimization, or SEO.)Two fabulous resources for all things Webmaster-and search-engine-related are Search Engine Watch (www.searchenginewatch.com) and Webmaster World (www.webmasterworld.com). The Fundamental Steps
Here’s how you can win friends and influence Google.
Don’t hide indoors
Google tracks only what is actually on the Web and readily accessible. Just because your site is on a server (a networked computer that holds the files that make up a Web site) doesn’t mean other people can see it. If you trap your site behind a corporate firewall, at the end of a DSL or cable-modem link that doesn’t allow traffic to your home server, or make it unreachable in any other way to the general public, Google will never find it. It may sound obvious, but many a fledgling Webmaster has missed this point.
If you set up your site at work and you can’t reach it from outside your corporate network, chances are your company doesn’t like its employees running Web sites from its computers and has set up their system to prevent it. Check with your IT department about their policy and where best to put your site on their system.
If you’ve set up your site at home, make sure your Internet service provider lets you run a server over their network. Many don’t, so it’s important to ask. But they may well provide some space for your site on their servers. In fact, many individuals’ sites actually live on their ISP’s servers.
Google and your visitors are likely to be put off by complex URLs that are hard to decode and differentiate from one another. Humans like easily readable, memorable addresses. But Google has its own logic for avoiding complicated URLs. The problem, as Google sees it, is that complex addresses often point to dynamic pages those that your site has created temporarily, in response to a query. And a dynamic page suggests to Google that your site may have a large database underlying it one for which it would take Google’s spiders eons to discern all the possible ways people could view the data.
Tip: You can spot dynamic pages easily: they include a “?” in the URL. For example, say your site sells hosiery, and it’s connected to your huge database containing descriptions and prices for thousands of pairs of socks and leggings. When somebody searches your Web site for blue children’s stockings, your site might generate a page just for that person, showing the eight items that match the query. If Google catches a whiff of this setup, it flees in terror, assuming that to properly track your site, it would need to index thousands or millions of pages, many of which might show the same things in a different order.
The most well-known system for creating dynamic pages is called CGI, which stands for Common Gateway Interface and is the “Look at me, I’m building Web pages on the fly” of file types. CGI scripts are bits of programming code you can set to build Web pages on request out of databases and other bits and bobs. If you’re using CGI scripts, Google may not properly index your site which is what’s happening if Google seems to know about all of your site except for the parts served up by a CGI script. If the situation is dire enough (that is, Google is ignoring you altogether), you might want to consider reconfiguring your system.
Warning: Other dynamic pages that Google may consider too hot to handle include those built with templating systems like PHP, JSP, and ASP, and those named after programming languages like Perl (.pl), and Python (.py), to name a few. Because it knows what it’s getting, Google is more comfortable with sites consisting of pages that you always have up (known as static pages). URLs with endings like .html and .htm indicate stability and are thus Google-friendly.
Alternatively, you can try massaging your Web site application or content management system so that it produces clear and simple URLs rather than a litany of session variables strung together like so many Christmas lights.
Provide a clear path into your site
Google, like everyone else, hates wasting time on superfluous pages. The most serious offender is the splash page, those annoying intro pages that you sometimes have to view or click through before you get to a site’s real home page. Splash pages typically feature Flash animations that can suck important minutes out of your day, but offer nary a real link to anything. Google and many visitors take a dim view of splash pages. Do everyone a favor and skip them. You can show off your graphic sensibility and your Flash skills on real pages in your site.
Identify yourself clearly and concisely
Your friendly, homey design touches and inviting color scheme draw people in by the ton. But they don’t mean a thing to poor color-blind and design-sense-deprived Google. All the Google robots and spiders have to go by are the metadatathe details you embed in your Web pages’ HTML code, like the title (<title> Hosiery R Us</title>, for example).
Google may interpret your metadata hierarchically when it’s trying to decide how relevant your page is to a particular search. For example, if you have a first-order heading like <h1>Socks</h1> followed by a word set off in italics, like <i>plaid</i>, Google might consider the word in the heading more important than the word in italics. Which is just what you wantif somebody is looking for socks. But if your site is primarily about plaid items, and you want people searching for plaid to find you near the top of their results, you probably ought to make the word “plaid” a heading and not just an italicized comment.
Here are more tricks to help Google read between the lines, and therefore index your site appropriately:
Title your pages properly. Nothing says “half-baked” quite like a site where all the pages have the same title, an utterly meaningless title, or no title at all. Title or subtitle different sections of your site appropriately. Take the 11 seconds to add <title>Al’s Auto Parts: Support</title> to your HTML code.
Tip: Watch out for the dreaded “New Page” title that some Web page software automatically slaps onto any new HTML page. Microsoft FrontPage, for instance, automatically dubs your new pages serially as New Section 3.1, New Section 3.2, and so onwhich is definitely not what you want to see in your Google results. Provide meta tags. meta tags are bits of detail about your site that you can embed in HTML tags. The cool thing is, they’re invisible to your visitors but useful to Web robots. Google doesn’t say how much attention it pays to meta tags, but it can’t hurt to add them, and they might just be useful to another search engine, too. Useful options include a description (as in, <meta name=”description” content=”A blog about computers, politics, and the punk rock underworld.” />), some keywords (for example, <meta name=”keywords” content=”fenders, hoses, wiper blades” />), and perhaps even who’s put it together (<meta name=”author” content=”Jonny Slick” />).
Tip: There is some evidence that excessive keywords are off-putting to search engines especially if they’re repetitive and obvious, like “free, free, free, sale, sale, sale, Viagra, Viagra, Viagra…”. When it comes to keywords, focus and frugality are your friends. Augment your pictures with alt tags. Ever wonder how Google Images knows what’s a photo of your Aunt Sarah on her 101st birthday and what’s a snap of your summer holiday in Spain? Mostly, Google takes hints from nearby text. But what if your nearby text mentions Aunt Sarah and ugly Uncle Phil? You can help ensure that Google understands and properly indexes your pictures by giving them descriptive titles in alt or “alternative” information tags.
Tagging pictures is particularly important because most image-editing software automatically names pictures things like camera_1.jpg or set55_02.tif, which never helps anyone, Google or human, figure out that you’re really offering a lovely photo of Monarch butterflies migrating or a diagram of the food chain. And if you’ve renamed those pictures butterfly.jpg and diagram.tif, you haven’t helped much, either. But when you associate an alt tag with a picture, you can give explicit details, like this: <img src=”butterfly.jpg” alt=”Monarch butterflies migrating over Kansas” />). The alt tag then appears as your picture loads on your Web page and with your picture in Google Images helping everyone find your meticulous migration study.
Don’t fence yourself in with an overabundance of frames
Frames are pieces of Web pages you can designate to appear independently, like a scrolling column that moves while the navigation bar stays put are confusing to robots and people alike. Use them sparingly, if you must use them at all, and label them clearly (for example, <frame src=”menu.html” name=”menubar”> and <frame src=”home.html” name=”content”>). You should also provide a <noframes> option just in case the spider or, indeed, your visitor’s browser doesn’t know what to do with frames.
Tip: Danny Sullivan’s excellent article, “Search Engines and Frames” (www.searchenginewatch.com/Webmasters/article.php/2167901), provides much-needed advice on keeping your frames search-engine friendly. Find and fix broken links
Google’s spiders, like nearly all Web site visitors, have zero interest in guessing where this or that link should have taken them. In fact, spiders have nothing but links to go on to find the rest of your site; don’t stop them short with a broken link. Before you publish a new article or add any new links within your site, preview the content in your browser and make sure that any links you’ve embedded do in fact point where they’re supposed to.
While you’re at it, do your pals downstream a favor and make sure your outbound links (your links to other Web sites) are still valid. Remember: The next downstream site the Google spider doesn’t find could be your own.
Let Google in
Make sure you aren’t fencing Google out with robots.txt files (notes telling robots that they can’t look at all or part of your site) or meta rules (notes telling robots that they can’t perform certain behaviors on a particular page, like indexing it, caching it, or following links to other pages).
Tip: For even more loving detail on making your Web site inviting to Google, be sure to take a stroll through Brett Tabke’s Search Engine Optimization Template (www.clickmojo.com/more/122_0_1_0_M/). Brett is the proprietor of WebmasterWorld.com and knows an awful lot about search engine optimization.
Google: The Missing Manual, 2nd Edition
By J.D. Biersdorfer, Rael Dornfest, Matthew MacDonald, Sarah Milstein
Pub Date: March 2006
Print ISBN-10: 0-596-10019-1
Print ISBN-13: 978-0-59-610019-3