Google search engines only indexes 5% of the Internet.
The websites that we can see through search engines is knows as Surface Web.
The rest of 95% of Internet is called Deep Web. And. Deep web pages can’t be
indexed by search engine and there is lot of illegal activity going on in these
Dark Web websites. To resolve these problems the US government is funding a
project for creation of search engine for Deep Web. This will not only allow US
government to control the Internet completely but also stop crimes like human
trafficking mentions Mike Stevens who is information security training professor from IICS.
The search engine is called Memex; the search engine indexes websites that normal
search engines can’t index, presenting results graphically so that any hidden
links can be identified.
The US government is focusing on Memex to resolve
problem of human trafficking as it largely relies on the Internet to attract
clients. However the government has plans to go against cyber crime is Deep
Web.
The Dark Web could soon be a lot brighter with this
new search engine that aims at criminals. Memex depends heavily on indexing
forums, chat services, job postings and other hidden services that allow trade
in Dark Web. Memex will track and map the connection between illicit
advertisements with the suspected criminals who post them.
Memex scans the Deep Web for ads that point users to
sites where child pornography or other human slavery exist. Thus it will index
those images, sources and websites so that information can used to map the
criminals. It takes phone numbers and emails information to track the
criminals. Memex has been designed for normal users without technical
background. The image search focuses on image metadata like camera serial
number and image comparison to find the exact match.
Memex has two crawlers, Ache and Nutch. Both crawlers use the data they collect in unique
ways. Both crawlers require a list of URLs to crawl, which is called a seeds
list.
Nutch is developed by Apache, and has interaction with both Solr and
Elasticsearch, and this makes Memex different from Ache. Nutch runs in
uninterruptible rounds of crawling. Nutch will run indefinitely until asked to
stop.
The number of pages left
to crawl in a Nutch increases significantly after each round. With Nutch, you
can begin with a seeds list of 100 pages to crawl, and it can find over 1000
pages to crawl for the next round.
Ache is developed by NYU. Ache is different from Nutch because we have to
create a crawl model before you can run a crawl. Unlike Nutch, Ache can be
stopped at any time.
As per information security training experts, Memex
project is still under development however it is viable to general public to
use. You can download and install Memex from Gethub by searching
Memex-Explorer.
0 comments:
Post a Comment