top of page

Technical SEO Strategies: Auditing, XML Sitemap, Migration, Log File, Guest And Post SEO, 404 Error.

Technical SEO is a set of optimizations related to the internal structure of a site that impact organic search results. The intention is that pages become faster, more understandable, crawlable and indexable. Even if you create amazing content with images/videos, links and so on, Google will not put your site at the top of search because the essential part is missing: Technical SEO, the base of the pyramid. Although the focus of technical SEO is to demonstrate to search engines how the site works, it also has the purpose of delivering the best user experience, and thus the company benefits from increased ROI (Return on Investment). In this way, all the investment in technical SEO, and in others, returns in the form of profit, and can be applied to other business strategies.


In the news article “Introduction to Technical SEO” we exemplified some practical SEO techniques about HTTPS, Schema, Web Vital Signs, Mobile, Browsers, Hreflangs, Canonization, TXT Robots and XML Sitemap, now let’s understand in depth, step by step, how these and other technical SEO parameters proceed.




• Technical SEO Step 1: Audit

• Strategy Step 2: Get the Indexing right

• XML Sitemap

• Migration

• Step 3 of the Strategy: Getting the Tracking right

• Tracking Quote

• Log File

• Get and Post SEO

• Site Availability

• 404 Error

• Conclusion


Technical SEO Step 1: Audit

First, before putting technical SEO into practice, it is necessary to audit the site to see if it has any errors, problems, points for improvement, and/or ideas for new content (as a result of keyword research).

The steps of a proper audit are:

  • Verify that the site is being crawled, indexed and rendered correctly by Google;

  • Check for on-page SEO problems;

  • Examine off-page SEO (if on sites linked to yours, there are possible problems regarding links that forward back to the site);

  • Check if the site offers a good user experience on mobile and desktop;

  • Check the keywords;

  • Analyze the competition;

  • Check if they have duplicate or low quality content;

  • Check the crawl budget, i.e. the speed and amount of pages that a search engine intends to crawl on the site.

  • Check for manual actions (Google penalties on the site);

  • Create and update reports, to track the site’s performance.

Once the audit is done, it’s time to put technical SEO into practice, with the following tools:


Strategy Step 2: Get The Indexing Right

Put XML Sitemap On The Site

Before you can even create an XML Sitemap you must make your site visible on Google by registering your domain/business, this is called Local SEO. In short you have to enter your complete URL here: https://www.google.com.br/intl/pt-BR/add_url.html, and create a Google My Business: https://www.google.com/intl/pt-PT_pt/business/ + a Google Maps for your company/website. Once these steps are done you can create the XML Sitemap in Google Search Control. Then the search engines can crawl the site/domain effectively and quickly.


This is the formula of a Sitemap in the HTML code, which will be in the header of a website:


<?xml version=”1.0″ encoding=”UTF-8″?>

<url>   

</url>

</urlset>


But some Sitemaps can be generated automatically by Google extensions like “Sitemap Generator” or plugins, in WordPress you can use “Yoast SEO” or “Rank Math”, you just have to follow the steps.





Another system where you can create a Sitemap without having a CMS is in the free software “Screaming Frog”, (also very useful to see other technical SEO parameters) go to Mode > Spider, paste your homepage URL into the box labeled “Enter URL to spider” and hit “Start”, after the crawl is done, in the bottom right corner will be a tab saying “Completed + “number”, if that number is 499 or below this one, go to the Screaming Frog “XML Sitemap” and press “Next” to have it saved on your computer, then you can export it into SEO software like “Ahrefs” or even Google.


Once you have created the Sitemap, it has to go to Google. In Google Search Console there will be a tab called “Sitemap”, here you have to add/submit the URL of your Sitemap. Google will process it and give you the message that it was a “success”. In Google Search Console you can put several Sitemaps of the same website, in fact if a site is very extensive (for example, E-commerce), should be several Sitemaps listed by Categories, Posts, Pages, etc., and not just a single index of the general Sitemap.


However, there may be indexing errors in the XML Sitemap pages: server errors, redirect errors such as a redirect in a loop, URL blocking by the robots.txt file, URL blocking by the “noindex” tag or a non-existent URL (404 error). To identify these problems you can go to Google Search Console to check the reports on index coverage status. Each URL should be analyzed to correct the error preventing it from indexing.


Migration

Migration is one of the most challenging tasks of any SEO and can be done in several ways, depending on the momentary situation or the resourcefulness of each SEO. It is a process that requires a lot of planning, knowledge of the techniques and thorough analysis to minimize possible losses in organic search results, visits and revenue from the site. When it comes to moving a site, we should not do it in a hurry, so we should follow the following steps:

  • Recognize which pages will actually be migrated, because in a site change, you take advantage and leave behind low-quality or error-ridden pages: create a list of all old but desired URLs;

  • Export the pages from Google Search Console;

  • Create reports with traffic, organic, referral, 404 error pages, keywords, indexed pages, Google Analytics;

  • Export the Sitemap of old URL’s;

  • Create the Sitemap of the new URL’s;

  • Prepare the Robots.TXT;

  • Verify that all pages are available to be crawled;

  • Launch the new Robots.TXT file: here you have to check that there are no directives preventing crawling of your new site;

  • Check all redirections;

  • Check the canonical tags;

  • Do 301 redirects from the old URL’s to the new URL’s, also check for 404 error or other URL related problems;

  • Disconnect the old Sitemap Add the new Sitemap: here you will have to transfer both Sitemaps (old and new) to the migration target site;

  • Notify the Google Search Console about the new Sitemap;

  • Open the site again for indexing by Google;

  • Monitor whether the page reports, indexing and Google Analytics, are flawless;

  • Check, before completely deleting the old account where the site was hosted (the server), if you are sure that everything goes well with the new server and its hosting.


Step 3 Of The Strategy: Getting The Tracking Right

First, we need to understand how Googleboot crawling works:

  1. The crawling engine creates a list of all URL’s that it finds for links within the pages of a given domain, as well as on pages that are in sitemaps.

  2. After doing all the reconnaissance, Google will prioritize crawling all new URL’s that have not been crawled before and those that need to be crawled again due to some change to them.

  3. This is how the system that captures all the content of the pages is built.

  4. Next these processing systems deal with canonization. Canonization is the act of telling Google which pages are right to be ranked.

  5. The renderer loads a page like a browser would with JavaScript and CSS files. This is done so that Google can see what most users will see.

  6. Finally comes Indexing. The pages that Google wants to show users are stored.

Crawl Budget

The crawl budget is the speed and amount of pages that the search engine intends to crawl on a site. More crawling does not mean it will rank better, but if a site’s pages are not crawled and indexed, they will not rank at all. The crawl budget can be a concern for newer sites, especially those with many pages, because if it is not yet very popular, the search engine may not want to crawl it much. It can also be a concern for larger sites with many pages or sites that are not updated often.


All URLs and requests count towards your crawl budget, this includes alternative URLs such as AMP pages, hreflang, CSS and JavaScript, XHR requests, Sitemaps, RSS feeds, submitting URLs for indexing in Google Search Console or using the indexing API any site owner can directly notify Google when pages are added or removed.


An SEO should speed up the crawling of pages on a site, for this he should first identify which pages are getting this problem: in Google Search Console, in the “crawl statistics reports” tab – “flagged crawl status”, see the date and time when pages were last crawled, or you can access log analysis tools (Log Files) like Splunk for more complex log checks. Another thing you can do is to speed up the server and increase resources, as Google crawls pages by downloading resources and then processing them. Off-page SEO in this case is one of the solutions, the more internal and external links (URL’s) a site has, the better. You can also fix broken URL links, do redirects, and use the indexing API.


Log File

Log File is an output of files contained on a web server that records any request received by the server. These help technical SEO professionals better understand how sites are crawled. They are also one of the only ways to see the actual behavior of Googleboot on websites, provide useful data and valuable information for optimization and data-driven decisions. Log files are important because they contain information that is not available anywhere else. Log File records are collected and maintained by the website’s web server for a certain period of time.


A Log File typically looks like this:

27.300.14.1 – – [14/Sep/2017:17:10:07 -0400] “GET https://allthedogs.com/dog1/ HTTP/1.1” 200 “https://allthedogs.com” “Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)”


  • The IP of the customer;

  • A timestamp of the request;

  • The method of accessing the site, which can be either GET or POST;

  • The URL that is requested, which contains the accessed page;

  • The HTTP Status Code of the requested page, which shows the success or failure of the request;

  • The User Agent, which contains extra information about the client making the request, including the browser and bot (for example, whether it comes from mobile or desktop).

The access method depends on the hosting solution, in most cases, to analyze the log files, you will first need to request access to the files from a developer, or through the CDN (web servers like Apache, NGINX, IIS), for example you can access Logflare, a Cloudflare application where log files are stored in a BigQuery database, or you can access Sucuri, Kinsta CDN, Netlify CDN and Amazon Cloudfront.


Other access methods can be “Splunk”, Logz.io.” and “Screaming Frog – Log File Analyser”. To learn more in detail how to access the Log File data: https://ahrefs.com/blog/log-file-analysis/.


SEO Get And Post

Get and Post are requests sent to the server. The GET request is used to request HTML files, listings of all registered products or forms, these are requests with a maximum submission data length of up to 255 characters. These forms can be crawled by Google. POST, on the other hand, is a method of transferring data to a server, which allows you to send slightly larger information, information to be processed, such as images, a client, etc. Unlike the GET method, Google cannot track POST forms, in most cases.


Site Availability

The availability or unavailability of a site is very important to be checked, if a user comes across an error page where he can’t access the site or can’t find it at all, he will not come back looking for it anymore. When this error happens and Google can’t access the site, the pages are not indexed, and if it happens regularly, Google understands that the site no longer exists and removes it from the search engines. Usually these problems are related to a site’s hosting services, so you should make sure you talk to the IT team and let them know about these problems as soon as possible.


404 Error

Another unavailability problem is due to the 404 error, page not found, the user cannot see the content of the page. Google usually penalizes these pages, so you should be aware and correct the URL of this page, 301 redirection is usually applied on this link. You can also choose to create a custom error page, which redirects the user to a “nice” layout, dynamic and with other links described, so the user will not leave the site, but will get to other pages of this, it can be advantageous this process of baclinks within the site itself. You can check these errors through Google Search Console or Dead Link Checker.



 

Conclusion

In this Part 1 article of Practical Technical SEO we cover indexing techniques and some crawling topics as well as the beginning of a website review, the audit. In the Technical SEO Part 2 article we will talk about the remaining tracking techniques such as servers/CDN’s, Browsers, Robots Directives, Redirects, HTTP Status Code and Canonization.


Today, many companies need immediate results, but the truth is that they can’t afford to implement SEO internally while leveraging with the priority of their business focus. If you still can’t handle these steps or don’t have the time to put them in place, Bringlink SEO ensures you get the brand visibility and growth you deserve.


Talk to us, send email to bringlinkseo@gmail.com.


 

References









bottom of page