URL classification is based on real users actively visiting URLs, as opposed to classifying bot traffic. The classification approach employs a crowd-sourced approach for obtaining a constant stream of URLs to analyze.
The continuous stream of URLs actively being visited by 500 million end users is the primary in-house source for threat corpuses and comes from a global network of customers across several markets. This combined and integrative approach allows us to continuously enhance, optimize and tune malicious detection capabilities in an ever-changing threat landscape.
WebTitan URL Classification utilizes an integrative multi-vector approach using in-house analysis that combines the following methods:
Malicious detections are continuously sampled to profile, test and validate malicious detections. The results of the continuous sampling are then used to feed/train the supervised Machine Learning systems and adjust or tune the efficiency, accuracy and overall effectiveness of malicious detections using internal key performance indicators.
One of the critical features that our URL classification provides is an ability for deep analysis due to full path detection. In a nutshell, page and path level reporting provides analytical credibility to what is being marked as malicious. The majority of malicious URLs in the databases are detailed down to the path level. In the case of non-IP based URLs, 88.35% are marked as malicious down to the path level. In the case of IP based URLs, the number is significantly higher with 99.70% of URLs being identified as having a path. This is extremely important because DNS-based systems are typically working at the domain level only.
Due to the variable life cycle of malicious URLs, it is imperative be able to inspect and detect URLs quickly and ensure they are still malicious. The Malicious Detection Service includes an automated revisit process where malicious URLs are revisited on set schedule. Each day 300,000 malicious URLs are revisited to see if they are still infected or are now clean. As our malicious detection service is able to obtain the full path, it is is able to specifically revisit that exact URL and obtain crucial results on a granular and highly accurate level.
The detection systems utilize the following nine types of Malicious Categories:
Sites that are being used to commit fraudulent online display advertising transactions using different ad impression boosting techniques including but not limited to the following, ads stacking, iframe stuffing, and hidden ads. Sites that have high non-human web traffic and with rapid, large and unexplained changes in traffic.
Bots are compromised machines running software that is used by hackers to send spam, phishing attacks, and denial of service attacks.
Internet servers used to send commands to infected machines called bots.
Compromised web pages are pages that appear to be legitimate, but house malicious code or link to malicious websites hosting malware. These sites have been compromised by someone other than the site owner. If Firefox blocks a site as malicious, use this category. Examples are defaced, hacked by etc.
When viruses and spyware report information back to a particular URL or check a URL for updates, this is considered a malware call-home address.
Web pages that host viruses, exploits, and other malware are considered Malware Distribution Points. Web Analysts may use this category if their anti-virus program triggers on a particular website.
Web pages that impersonate other web pages usually with the intent of stealing passwords, credit card numbers, or other information. Also includes web pages that are part of scams such as a ""419"" scam where a person is convinced to hand over money with the expectation of a big payback that never comes. Examples con, hoax, scam etc.
URLs that frequently occur in spam messages.
Software that reports information back to a central server such as spyware or keystroke loggers. Also includes software that may have legitimate purposes, but some people may object to having on their system.
As you can see our classification system uses significant and continually optimized intelligence. Cybercriminals are constantly finding new ways operate, hide online and exploit vulnerabilities. All of our solutions are constantly adapting to meet these new modes of operation, we continually learn from new data to impede the cybercriminal at source.