Organic search on search engine involves three main factors;
i.e. crawling, indexing and ranking. The action of search engine arriving at
your website and going through its links is the process of crawling. Then
search engine records those links in its index; this process is called
indexing. After that, search engine uses different parameters to decide the
placement of your website’s URLs against specific search queries. It is called
ranking.
Most of Search Engine Optimizers usually give their full
attention to the ranking part only without considering the fact that they can’t
see their websites in search engine result pages if searched engine haven’t
crawled and indexed those websites. With that said, it is important to understand
that Indianapolis SEO can be successful only if search engine has crawled
through the website and indexed it properly.
What is the way you can determine whether your website has
been indexed?
Using Google Search Console, you will get to know about the
number of pages the XML sitemap has and the number pages that are indexed.
Nevertheless, this feature isn’t made to make thorough analysis to tell what
pages aren’t indexed by Google yet.
It means that you will have to dig a bit deeper using some
manual technique that would take a lot of time. However, there is a small tool
that can help you in this task. It’s called Python.
Let’s first discuss how you can check a URL whether it is
indexed by Google. It can be done by using “info:” search operator like this:
Info:http://domain-name.com
If the URL is indexed, the result page will show one search
result, which would be your website’s link. If the URL isn’t indexed, you will
get an error message.
Using Python to check
indexing status of multiple URLs
To check one link, the trick mentioned above is surely
enough. But what can be done if there are more than 1000 pages to be checked.
If you have 100 people working for you on their PCs, you can give 10 URLs to
each person and get the report within a few minutes. Or, you can use Python.
To start using this tool, first you need to install Python 3. Then you
will need to install a library which is called BeautifulSoup. There is a simple
command you can run in Command Prompt:
pip install beautifulsoup4
Now you are ready to download the script. The folder in
which script in stored, create a simple text file and start adding links in it.
Make sure every link is separated by rows.
Go to Polipo folder and create a text file. Enter the
following commands in that text file:
socksParentProxy = "localhost:9050"
socksProxyType = socks5
diskCacheRoot = ""
disableLocalInterface=true
socksProxyType = socks5
diskCacheRoot = ""
disableLocalInterface=true
Then open the command prompt and go to the Polipo directory.
After that, run following command:
polipo.exe -c config.txt
After this command, you can now run the Python script:
python indexchecker.py
After
you run the script, you will be asked to enter the number of seconds you want
to keep as a delay between the checking of two URLs. The end result will be a
CSV file in which every link will be listing along with its status. Status
‘TRUE’ against a URL means the URL is indexed and status ‘FALSE’ means the URL
isn’t indexed.