Research Hacking – Searching for Sensitive Documents on FTP; Captchas and the Google Governor

If you want to find *sensitive documents using Google search (*documents with impacting information which someone does not want revealed, more or less), I’ve found that in addition to targeting queries to search for specific domains and file types, an alternative and potent approach is to restrict your results to files residing on an ftp server. 

The rationale is that while many allow anonymous log-in and even more are indexed by Google, FTP servers are used more for uploading and downloading, storing files than viewing pages, and typically house more office-type documents (as well as software).  As limiting your searches to ftp servers also significantly restricts the overall number of results to be returned, choice keywords combined with a query that tells Google to bring back files that have “ftp://” but NOT “http://” or “https://” in the url yield a high density of relevant results. This search type is easily executed:

Screenshot - 12032013 - 08:10:35 AM

A caveat one encounters before long using this method is that eventually Google will present you with a “captcha.” Many, many websites use captchas and pretty much everyone who uses the internet has encountered one. The basic idea behind a captcha is to prevent people from using programs to send automated requests to a webserver, they are a main tool in fighting spam by thwarting bots that mine the internet for email addresses and other data, and which register for online accounts and other services en masse. The captcha presents the user with a natural language problem which they must provide an answer to.

Google is also continuously updating its code to make it difficult to exploit Google “dorks,” queries using advanced operators similar to one used above (but usually more technical and specific). Dorks are mostly geared toward penetration testers looking for web application and other vulnerabilities, but the cracker’s tools can easily be adapted for open source research.

Screenshot - 12032013 - 08:13:41 AM

Unless you are in fact a machine (sometimes you’re a machine, in which case there are solutions), this should be easily solved; however lately, instead of returning me to my search after answering the captcha, Google has been sending me back to the first search page of my query (forcing me to somewhat start the browsing process again and to encounter another captcha). I’m calling it a Google Governor, as it seems to throttle searchers’ ability to employ high-powered queries.

The good news is that the workaround is really just smart searching. One thing you’ll notice upon browsing your results is that dozens of files from the same, irrelevant site will be presented. Eliminate these by adding -inurl:”” (which tells Google NOT exactly “” in the url). Further restrict your results by omitting sites in foreign domains (especially useful with acronym-based keyword searches): -site:cz -site:nk.

When you find an ftp site which looks interesting, copy and past the url into a client like Filezilla for easier browsing.

To give you an idea of the sensitivity of documents that can be found: One folder was titled “[Name] PW and Signature,” which contained dozens of files with passwords as well as .crt, .pem, and .key files; another titled “admin10” contained the file “passwords.xls.” This was the site of a Department of Defense and Department of Homeland Security contractor – the document contains the log-in credentials for bank accounts, utilities, and government portals. This particular document is of more interest to the penetration tester; for our purposes it serves as a meter for the sensitivity of the gigabytes of files that accompanied it on the server. The recklessness of the uploader exposed internal details of dozens of corporations and their business with government agencies.

The hopefully sufficiently blurred "passwords.xls"
The hopefully sufficiently blurred “passwords.xls”

*As of this writing, the FTP mentioned above is no longer accessible

Forensic Indexing, Metadata, and the DVIC Privacy Policy

When doing research on a subject that has some measure of obscurity by design, such as the fusion center in Philadelphia, the Delaware Valley Intelligence Center (DVIC), I often find the only way to fill in the gaps is to “data-mine” for documents. I use quotes, because data-mining strictly involves aggregating and analyzing more fragmented bits of *data, I deal more in *information, and data-mining usually applies to a much more intensive level of computation applied to a much larger corpus to be processed than I will discuss here.

You can get hands on with data mining. This is Tree-Map, I use a program called XBase. They're similar, great for browsing structured data  like xml.
You can get hands on with data mining. This is Tree-Map, I use a program called BaseX. They’re similar, great for browsing structured data like xml.

A more appropriate term would be “forensic indexing,” in that I am applying basic methods of digital forensics like metadata extraction to a general knowledge management system for large collection of documents, too large realistically to open one by one. And I’ve just made it sound more organized than it usually is.

In the case of the DVIC what this meant was using an application which automates queries to metasearch engines as well as enumerating a specified domain to find relationships and other information. I used FOCA. I saved the documents that were the result of this search in separate folders according to which domain I had chosen for the search. I collected around 1800 documents.

I then run a simple command line program called pdfgrep, I used the command pdfgrep -n -i “dvic” *.pdf to bring me a list displaying every line in every pdf file in the same directory containing the phrase “dvic,” tagged with file name, page of line, and ignoring case. One such query returned:

[filename]pg#: "text"
[filename]pg#: “text”

As you might imagine if you have followed the Declaration’s coverage, I was a bit confused. I went to the corresponding folder on my desktop and opened the file in my reader:

Screenshot - 11062013 - 05:33:45 PM

This document is titled “Nebraska Information Analysis Center,” another fusion center which it just so happens is missing a document from the fusion center association website. Where metadata plays in, and why I had missed this by manually “googling” until now, is in how FOCA searches for documents – by file name which is in the metadata of the document which gives its file path on the machine that stores it, its uri– something you can sometimes do by typing inurl:[term] into Google, but then you would have to know the exact name of the file to get relevant results. The name of this file is “Delaware-Valley-Intelligence-Center-Privacy-PolicyMar-2013.” It would have been very difficult to come up with this by educated accident.

Screenshot - 11062013 - 05:11:50 PM

So while there are still serious questions about the date gap between beginning a “cell” and submitting a policy, and concerns about a lack of full time privacy officer among others, it seems that everyone that was sure that a policy was completed and was approved by the DHS was quite correct, and I’d like to thank them for adding accurate memory to their graciously-given time to discuss the subject. It seems that a March draft was labeled somewhere in its life as the Nebraska Information Analysis Center’s policy perhaps at the National Fusion Center Associate website, where the “comprehensive” list is found, by whomever didn’t link it to the analysis center website.

This is only one elucidation among many from recent developments, the fruits of fresh approaches, and as mentioned, more documents to parse. Read the Declaration

Perl Crawler Script “fb-crawl” Lets You Automate and Organize Your Facebook Stalking

While browsing for scripts that might make my often very high-volume webmining for research less time-consuming/more automated, I came upon the following on Google Code is a script that crawls/scrapes Facebook friends and adds their information to a database.
It can be used for social graph analysis and refined Facebook searching.


– Multithreaded
– Aggregates information from multiple accounts


This is very useful for social engineering and market research, and could also very easily find fans among the more unsavory Wall creepers. They don’t even have to be programming-competent, so most neck-bearded shiftless layabouts and of course Anons can do it. You only have to plug in your FB email address and  a MySQL password (you can download and click-to-install MySQL with simple prompts if you don’t have it).


Crawl your friends’ Facebook information, wall, and friends:
$ ./ -u -i -w -f

Crawl John Smith’s Facebook information, wall, and friends:
$ ./ -u -i -w -f -name ‘John Smith’

Crawl Facebook information for friends of friends:
$ ./ -u -depth 1 -i

Crawl Facebook information of John Smith’s friends of friends:
$ ./ -u -depth 1 -i -name ‘John Smith’

Extreme: Crawl friends of friends of friends of friends with 200 threads:
$ ./ -u email@address -depth 4 -t 200 -i -w -f

Users of the script can also aggregate information about relationship status by location or by school, essentially allowing stalkers to create automated queries for lists of potential victims.


Find local singles:
SELECT `user_name`, `profile` FROM `info` WHERE `current_city` = ‘My Current City, State’ AND `sex` = ‘Female’ AND `relationship` = ‘Single’

Find some Harvard singles:
SELECT `user_name`, `profile` FROM `info` WHERE `college` = ‘Harvard University’ AND `sex` = ‘Female’ AND `relationship` = ‘Single’

And if a stalker wants to make an even handier database of GPS located targets, there are plug-ins:

To load a plug-in use the -plugins option:
$ ./ -u email@address -i -plugins
This plug-in adds the user’s coordinates to the database using the Google Geocoding API.

And as no stalker want to terrorize someone age-inappropriate, they can sort by DoB as well
This plug-in convert the user’s birthday to MySQL date (YYYY-MM-DD) format.

From IACP 2013: GBI’s Vernon Keenan and others on “Using Social Media as an Investigative Tool”

For an even more complete picture of how cops are making social media a part of their every day operations, I’m also reposting video from another panel at the International Association of Chiefs of Police which I made while covering it last week entitled “Using Social Media as an Investigative Tool,” featuring Vernon Keenan of the Georgia Bureau of Investigations.

Keenan’s comments about the privacy climate in the aftermath of Edward Snowden’s revelations were reported by Reuters.

I also attended “Leveraging Concepts and Techniques of Social Media Monitoring and Analytics to Enhance Special Event Security and Executive Protection Capabilities,” about which I will be publishing further.


Top Cop: There’s a ‘Huge Social Media Component’ to Policing These Days – South Deering – Chicago

I was interested to see that Erica Demarest of DNAinfo Chicago was able to obtain comment from Police Superintendent McCarthy regarding my the report from IACP:



Photo credit: DNAinfo/Erica Demarest
Photo credit: DNAinfo/Erica Demarest

After a panel in Philadelphia last week, reports circulated that a “senior representative” from the Chicago Police Department claimed the city’s cops were working with Facebook to permanently block users who post what’s deemed criminal content.

During the panel — which was hosted by the International Association of Chiefs of Police — a panelist claimed Facebook could identify and permanently block a person’s phone or computer from using the site.

McCarthy wouldn’t address the claims, but did say Chicago cops use social media to aid in their investigations.

“Obviously, there’s a huge social media component to law enforcement these days,” the superintendent said Monday in the South Chicago Police District station, 2255 E. 103rd St.

But “I don’t want to speak about investigative prowess … because it can compromise some of the advantages that we’re finding.”

The top cop said the police department plans to expand its use of social media in coming years.

via Top Cop: There’s a ‘Huge Social Media Component’ to Policing These Days – South Deering – Chicago.

Chicago PD on Stopping Incidents Organized Through Social Media Before They Start

In the same panel where a Chicago police official shared his department’s collaboration with Facebook to block criminal posting from the social media site, that Officer claimed that the Chicago police has in fact already had success “getting in front” of activity that its surveillance of the internet predicted would be a public safety threat.

Here the Officer recounts occasions where the Chicago PD has had success “in various areas of the city getting in front of”  events, he says, “everything from the  cyber banging all the way to the flash-mob type incidents.” The officer does not reveal the specific method or application used in these operations.

He also cites other occasions where the social media surveillance “enhanced prosecution,” and that in these cases a warrant was obtained.

Anonymous in Context: The Politics and Power behind the Mask – Gabriella Coleman in the Centre for International Governance Innovation

In Internet Governance Paper No. 3:


Screenshot from 2013-10-11 17:21:49

Since 2010, digital direct action, including leaks, hacking and mass protest, has become a regular feature of political life on the Internet. The source, strengths and weakness of this activity are considered in this paper through an in-depth analysis of Anonymous, the protest ensemble that has been adept at magnifying issues, boosting existing — usually oppositional — movements and converting amorphous discontent into a tangible form.  This paper, the third in the Internet Governance Paper Series, examines the intersecting elements that contribute to Anonymous’ contemporary geopolitical power: its ability to land media attention, its bold and recognizable aesthetics, its participatory openness, the misinformation that surrounds it and, in particular, its unpredictability.

via Anonymous in Context: The Politics and Power behind the Mask.