发新话题
打印

[转载]Abusing the internet with popular search engine technologies

[转载]Abusing the internet with popular search engine technologies

文章作者:c0ntex

When you think of a search engine, you probably conjure up an image of the handy, html based site that can
help you find those RFC's, wiring diagrams, soft-warez, serial keys or questionable images, depending on your
taste. You've probably used generic search engines like Google, Yahoo, MSN, Lycos & Hotbot a trillion times
without considering even the possibility that it could be used to support an attack attempt.

Due to the sheer volume of HTTP based attacks, one has to consider than anything utilising HTTP is a serious
theoretical attack vector. Too many people feel happy in the knowledge that only HTTP holes are punched in
the firewall, if the server daemon is patched and user input locations are verified as safe then there is no
need to worry. This is a serious misconception that one should avoid.

HTTP fingerprinting, SQL Injection, malformed header injections, XXS, cookie fun, every conceivable form of
web based application or service attack can be managed by your security infrastructure. Yet what is the use
in having any of this preventative security to protect your data from prying eyes, when you have allowed
search bots to come in happily and crawl all web servers.


Taken from the Berkely teaching guide, stating that Spiders are:
"Computer robot programs, referred to sometimes as "crawlers" or"knowledge-bots" or "knowbots" that are used
by search engines to roam the World Wide Web via the Internet, visit sites and databases, and keep the search
engine database of web pages up to date. They obtain new pages, update known pages, and delete obsolete ones.
Their findings are then integrated into the "home" database."


When any powerful technology is utilised by a curious mind, possessing advanced knowledge of the internal
workings, it becomes easy to transform the technology into a potentially malicious weapon. In this case, the
engine soon becomes an advanced data-gathering tool for attackers. Everyone knows about RFP's web mining tool
called Whisker, yet not everyone is aware that a search engine can do the same...and more.

Google is the most powerful search engine online and it probably has been for the last four years or more,
crawling literally millions of websites and public archives daily. The amount of data that it contains would
crush any reasonable human mind due to the sheer scale. When data is being stored on such a vast scale and
the manner in which it's gathered by engines, it is of no wonder that sometimes-sensitive information will be
included.

The spiders that Google use to harvest all these sites have no mind, they are merely an axon connection to a
central soma, acting like a neural network. The data is filtered down the axon if it passes a simple logic
test, the soma then sends it into the central brain. The only energy required to sustain crawling is a diet
similar to that of the pecking pigeons which perform the ranking inside the database clusters.

http://www.google.com/technology/pigeonrank.html

When testing a company site or domain, it is always useful to use Google as it's crawling software is superb
and will find some pretty interesting stuff you might otherwise overlook. It also has a very nice cache
function that will allow you to see "old" pages that were once available from that site. Attackers can use
this feature to compare pages, documents, structures and the likes.


Some fictional public BBS sits in your DMZ, using flat text and CGI. The BBS contains a plain text or md5 encrypted
password file that sits in the web directory and Mr Security forgets about it. One week later, a search engine
spider creeps along, crawls your site and just happens to find the password text. 1 month later we come along and
search your domain for password.txt and guess what pops up.

Interestingly, it seems a user from the internal network has subscribed to the BBS, using the same password as
his domain password. Now things become more interesting, especially when you find out after a very authentic
sounding phone call to his office receptionist, that the user is a member of IT staff.
                                                                                         
      IT: domains, routers, switches, servers, databases.
      Human: laziness, imperfection, same password everywhere, domain admin account?.


It's possible to block spiders by using the robots file, providing the spider can read. An example robots.txt
file looks like:


  User-agent: *  # This shoo's all spiders off with a big
  Disallow: /    # stick, no flies around this server:


  User-agent: googlebot # Fend off googlebot:
  Disallow: /cgi-bin/
  Disallow: /images/


  # Tease vulnerable spiders:
  User-agent: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA x 350
  Disallow: /GoogleBot PAYLOAD :-)/


If you really suffer from arachnophobia, you could block the crawler bot parasite at your router / firewall.

NOTE: This text is not trying to encourage the use of Google as an attack tool, acting on information found could be
illegal. Yet it is worth considering the ramifications of what data you allow your users to place in their private web
directories.

Example usage of the Google advanced search features:


  allinurl:            # Find all strings in the url
  allintitle:            # Find all strings in the title
  filetype:            # Find only specific filetype
  intitle:            # Find any strings in the title
  inurl:              # Find any strings in the url
  link:              # Find strings in link
  site:              # Find strings on this site

Specific:
  allinurl:session_login.cgi      # Nagios/NetSaint/Nagmin
  allintitle:/cgi-bin/status.cgi?      # Network monioring
  allintitle:/cgi-bin/extinfo.cgi      # Nagios/NetSaint

  inurl:cpqlogin.htm        # Compaq Insight Agent
  allinurl:"Index of /WEBAGENT/"    # Compaq Insight Agent
  allinurl:/proxy/ssllogin        # Compaq Insight Agent
  allintitle:WBEM Login        # Compaq Insight Agent
  WBEM site:domain.com       # Compaq Insight Agent at domain.com

  inurl:vslogin OR vsloginpage      # Visitor System or Vitalnet

  allintitle:/exchange/root.asp      # Exchange Webmail
  "Index of /exchange/"        # Exchange Webmail Directory

  netopia intitle:192.168        # Netopia Router Config


General:
  site:blah.com filename:cgi      # Check cgi source code

  allinurl:.gov passwd        # A blackhat favorite
  allinurl:.mil passwd        # Another blackhat favorite

  "Index of /admin/"
  "Index of /cgi-bin/"
  "Index of /mail/"
  "Index of /passwd/"
  "Index of /private/"
  "Index of /proxy/"
  "Index of /pls/"
  "Index of /scripts/"
  "Index of /tsc/"
  "Index of /www/"

  "Index of" config.php OR config.cgi

  intitle:"index.of /ftp/etc" passwd OR pass -freebsd -netbsd -openbsd
  intitle:"index.of" passwd -freebsd -netbsd -openbsd

  inurl:"Index of /backup"
  inurl:"Index of /tmp"
  inurl:auth filetype:mdb                 # MDB files
  inurl:users filetype:mdb
  inurl:config filetype:mdb

  inurl:clients filetype:xls      # General spreadsheets

  inurl:network filetype:vsd      # Network diagrams


EOF
曾几何时,有人对我说:装B遭雷劈。我说:去你妈的。于是,这个人又对我说:如果再说脏话,上帝会惩罚你的。我说:我操上帝。结论:彪悍的人生不需要上帝。

TOP

发新话题