Tag Archive for 'google'

Content is NOT important

Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. Content is NOT important. (…)

I could make an entire post like that with nothing but rubbish content and it wouldn’t matter. Why? Because content doesn’t matter, backlinks do.

Ok, that might sound a bit exaggerate… And I must admit, whenever I work on a site I spend more time working with the content than anything else (Even in my blackhat ventures like [YACG], it’s pretty evident that I make a big deal out of content).

In fact, you know what? Scratch everything I just said. Content IS important, but it all depends on what you are trying to achieve.  I make a big deal out of it because of the usability of the website. Because Good content = Returning visitors, and we all know that.

But in the particular case of rankings, backlinks are much more important than content. Let’s take as an example the keyword “click here”:

Google results for "click here"

Bing results for "click here"

Interesting. Now let’s take a look at “worst band in the world”:

Google results for worst band in the world*I firebug’ed out the first results for the sake of the length of this post

It’s evident that there’s not a single mention of “click here” or “worst band in the world” in those sites, but still they are ranking #1 for those keywords. Why? Simple, there’s a bunch of backlinks with that anchor text pointing to those sites.

(Bullshit alert!)

I’ve done some extensive testing over the past couple of years using neural networks, statistical analysis, midgets and large quantities of vodka… And I think I figured out how Google’s algorithm work. I’m planning on releasing a paper later on, but here’s the basic formula:

Google's algorithm

Where K is Google’s magic constant (100), 0.1 is the position you want (0.1 for #1, 0.2 for #2, 1.1 for #11 etc…), x1 is the number of words in the page you want to rank (without counting the keyword you want to rank for), x2 is the number of times the keyword appears on the page and f(x1,x2) is an approximation to the number of backlinks with the keyword as anchor that you need.

You don’t believe me? Let’s try it out with http://get.adobe.com/reader/ for “click here” on #1 position. I used for this experiment this word count tool and Yahoo! Site Explorer.

Google's Algorithm (2)Google's Algorith (3)According to Yahoo! Site Explorer, http://get.adobe.com/reader/ has 652,799 backlinks.

Yes, this is the moment when you go all HOLY-MOTHER-OF-GOD at your computer screen and nearby peers.

(Bullshit alert is over!)

Bottom line: Looking for rankings? Focus on backlinks.

Google Captcha Extraction

Please start by reading this post where I explain everything about this code, thanks!

This is the code for the Google Image Captcha, and the one for the audio captcha is below the jump :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
//This script pulls CAPTCHAs URL from $urlGoogle, then gets the CAPTCHA and saves them to folder $saveGoogle from the range $startImage to $endImage.
	$urlGoogle = "https://www.google.com/accounts/NewAccount?service=mail&continue=http%3A%2F%2Fmail.google.com%2Fmail%2Fe-11-10ba05aeaa8e9b701e5151437f9a44d3-64aeae753cc34f1c864f7edc97a046ccdc96987b&type=2";
	$saveGoogle = "google/";
	$startImage = 0;
	$endImage = 999;
 
	//These two lines force the output to be constantly flushed and updated for the user. (ideally)
	ob_implicit_flush(true);
	ob_end_flush();
	echo "Script Started.\n";
 
	//Pull in the CAPTCHA image as a string with cURL, and save to a file. The curl extension must first be enabled in php.ini.
	for ($i=$startImage;$i<=$endImage;$i++) {
		//First extract a unique URL for each CAPTCHA from the $urlGoogle.
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, $urlGoogle."&amp;rand=".$i);
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$result = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Parse out the URL, and retrieve the CAPTCHA for it.
		$result = substr($result,strpos($result,"gaia captchahtml desc"));
		$resultArray = explode('"',$result);
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, rawurldecode($resultArray[2]));
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$image = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Save CAPTCHA to a file with the same name as $i.
		if(!is_dir($saveGoogle)) mkdir($saveGoogle);
		$fh = fopen($saveGoogle.$i.".jpg","w");
		fwrite($fh,$image);
		fclose($fh);
 
		//Don't allow it to timeout.
		set_time_limit(40);
		//Output occasional progress.
		if ($i%10 == 0) {
			echo $i." CAPTCHA captured.\n";
			flush();
		}
	}
 
	echo "Script Complete.";
	//-maluc

About this captcha:

length: 5-8
range: a-z
case-sensitive: no
background: always white
overlay: none
text color: solid blue,green,or red. single color.
size: 2000-3900 bytes
width: always 200px
height: always 70px
other: tilting seemingly random, 5chars is rare, red is rare, shade of solid colors may change between captchas

Here is the code for the Google Audio Captcha:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
//This script pulls CAPTCHAs URL from $urlGoogleAudio, then gets the CAPTCHA and saves them to folder $saveGoogleAudio from the range $startSound to $endSound.
	$urlGoogleAudio = "https://www.google.com/accounts/NewAccount?service=mail&amp;continue=http%3A%2F%2Fmail.google.com%2Fmail%2Fe-11-10ba05aeaa8e9b701e5151437f9a44d3-64aeae753cc34f1c864f7edc97a046ccdc96987b&amp;type=2";
	$saveGoogleAudio = "googleaudio/";
	$startSound = 0;
	$endSound = 999;
 
	//These two lines force the output to be constantly flushed and updated for the user. (ideally)
	ob_implicit_flush(true);
	ob_end_flush();
	echo "Script Started.\n";
 
	//Pull in the CAPTCHA image as a string with cURL, and save to a file. The curl extension must first be enabled in php.ini.
	for ($i=$startSound;$i<=$endSound;$i++) {
		//First extract a unique URL for each CAPTCHA from the $urlGoogleAudio.
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, $urlGoogleAudio."&amp;rand=".$i);
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
 
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$result = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Parse out the URL, and retrieve the CAPTCHA for it.
		$result = substr($result,strpos($result,"wavURL"));
		$resultArray = explode('"',$result);
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, str_replace('\75',"=",$resultArray[1]));
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$sound = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Save CAPTCHA to a file with the same name as $i.
		if(!is_dir($saveGoogleAudio)) mkdir($saveGoogleAudio);
		if(strlen($sound) &gt; 146) {
			$fh = fopen($saveGoogleAudio.$i.".wav","w");
			fwrite($fh,$sound);
			fclose($fh);
		}
		else $i--;
 
		//Don't allow it to timeout.
		set_time_limit(40);
		//Output occasional progress.
		if ($i%10 == 0) {
			echo $i." CAPTCHA captured.\n";
			flush();
		}
	}
 
	echo "Script Complete.";
	//-maluc

And info about the audio captchas as well:

length: not certain (5-10?)
range: 0-9
case-sensitive: N/A
background: equally loud gibberish and noise, really gets in the way.
size: 200044-440044 bytes
other: way too hard for a human – don’t know how blind people do it. pace varies but pitch seems to remain fairly similar.

Google Human Reviewer Guidelines (Leaked)

When trying to decide if a page is Spam, it is helpful to ask yourself this question: If I remove the scraped (copied) content, the ads, and the links to other pages, is there anything of value left? If the answer is no, the page is probably Spam.

Download it here: quality-rater-guidelines-2007.pdf!

And before you start bitching… This file is everywhere now. At [YACG]’s forum, BlackHatWorld, Syndk8, IranJava…

Google Bowling + DDoS = Pwnage!

Guess you all know about the ongoing controversy about someone abusing Eli’s tool QUIT and a DDoS against BlueHatSeo.com. For what I’ve read I think somebody is pissed with Eli because his blog is disclosing too much information (?). Who knows what is the real motive… But this made me realize that no matter how good it seems things are going for your sites and rankings… You can NEVER get to comfy.

I know a DDoS may not seem that bad; Your sites goes offline, you burn all your bandwdith and after a couple of days your back online… right?

Well, maybe.. But what if the perpetrator of the DDoS decides to take it one step further… Like a DDoS + Google Bowling attack:

Fatality

That’s right, it would be a fatality. A long-lasting DDoS attack and some Google Bowling that results on de-indexing could be the worst nightmare of your site… And if it’s executed correctly, there is no going back for your site… Ever.
Now if you want to make it suck even more, add to it some hacking and some imagination.