Google Captcha Extraction

Please start by reading this post where I explain everything about this code, thanks!

This is the code for the Image , and the one for the audio is below the jump :)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
//This script pulls CAPTCHAs URL from $urlGoogle, then gets the CAPTCHA and saves them to folder $saveGoogle from the range $startImage to $endImage.
	$urlGoogle = "https://www.google.com/accounts/NewAccount?service=mail&continue=http%3A%2F%2Fmail.google.com%2Fmail%2Fe-11-10ba05aeaa8e9b701e5151437f9a44d3-64aeae753cc34f1c864f7edc97a046ccdc96987b&type=2";
	$saveGoogle = "google/";
	$startImage = 0;
	$endImage = 999;
 
	//These two lines force the output to be constantly flushed and updated for the user. (ideally)
	ob_implicit_flush(true);
	ob_end_flush();
	echo "Script Started.\n";
 
	//Pull in the CAPTCHA image as a string with cURL, and save to a file. The curl extension must first be enabled in php.ini.
	for ($i=$startImage;$i<=$endImage;$i++) {
		//First extract a unique URL for each CAPTCHA from the $urlGoogle.
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, $urlGoogle."&amp;rand=".$i);
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$result = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Parse out the URL, and retrieve the CAPTCHA for it.
		$result = substr($result,strpos($result,"gaia captchahtml desc"));
		$resultArray = explode('"',$result);
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, rawurldecode($resultArray[2]));
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$image = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Save CAPTCHA to a file with the same name as $i.
		if(!is_dir($saveGoogle)) mkdir($saveGoogle);
		$fh = fopen($saveGoogle.$i.".jpg","w");
		fwrite($fh,$image);
		fclose($fh);
 
		//Don't allow it to timeout.
		set_time_limit(40);
		//Output occasional progress.
		if ($i%10 == 0) {
			echo $i." CAPTCHA captured.\n";
			flush();
		}
	}
 
	echo "Script Complete.";
	//-maluc

About this captcha:

length: 5-8
range: a-z
case-sensitive: no
background: always white
overlay: none
text color: solid blue,green,or red. single color.
size: 2000-3900 bytes
width: always 200px
height: always 70px
other: tilting seemingly random, 5chars is rare, red is rare, shade of solid colors may change between captchas

Here is the code for the Google Audio Captcha:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
//This script pulls CAPTCHAs URL from $urlGoogleAudio, then gets the CAPTCHA and saves them to folder $saveGoogleAudio from the range $startSound to $endSound.
	$urlGoogleAudio = "https://www.google.com/accounts/NewAccount?service=mail&amp;continue=http%3A%2F%2Fmail.google.com%2Fmail%2Fe-11-10ba05aeaa8e9b701e5151437f9a44d3-64aeae753cc34f1c864f7edc97a046ccdc96987b&amp;type=2";
	$saveGoogleAudio = "googleaudio/";
	$startSound = 0;
	$endSound = 999;
 
	//These two lines force the output to be constantly flushed and updated for the user. (ideally)
	ob_implicit_flush(true);
	ob_end_flush();
	echo "Script Started.\n";
 
	//Pull in the CAPTCHA image as a string with cURL, and save to a file. The curl extension must first be enabled in php.ini.
	for ($i=$startSound;$i<=$endSound;$i++) {
		//First extract a unique URL for each CAPTCHA from the $urlGoogleAudio.
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, $urlGoogleAudio."&amp;rand=".$i);
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
 
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$result = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Parse out the URL, and retrieve the CAPTCHA for it.
		$result = substr($result,strpos($result,"wavURL"));
		$resultArray = explode('"',$result);
		$ch = curl_init();
		curl_setopt($ch, CURLOPT_URL, str_replace('\75',"=",$resultArray[1]));
		curl_setopt($ch, CURLOPT_HEADER, 0);
		curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
		curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
		//If you're having difficulties with SSL, this may need to be enabled.
		//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
		$sound = curl_exec($ch);
		//Enable this if you're having difficulties.
		//echo "Error is: ".curl_error($ch);
		curl_close($ch);
 
		//Save CAPTCHA to a file with the same name as $i.
		if(!is_dir($saveGoogleAudio)) mkdir($saveGoogleAudio);
		if(strlen($sound) &gt; 146) {
			$fh = fopen($saveGoogleAudio.$i.".wav","w");
			fwrite($fh,$sound);
			fclose($fh);
		}
		else $i--;
 
		//Don't allow it to timeout.
		set_time_limit(40);
		//Output occasional progress.
		if ($i%10 == 0) {
			echo $i." CAPTCHA captured.\n";
			flush();
		}
	}
 
	echo "Script Complete.";
	//-maluc

And info about the audio captchas as well:

length: not certain (5-10?)
range: 0-9
case-sensitive: N/A
background: equally loud gibberish and noise, really gets in the way.
size: 200044-440044 bytes
other: way too hard for a human – don’t know how blind people do it. pace varies but pitch seems to remain fairly similar.

Similar Posts:

2 Responses to “Google Captcha Extraction”


  • There is an easiest way guys :

    [code]
    for ($i=0; $i < 1000; $i++) file_put_contents("googleCaptcha-$i",file_get_contents("http://www.google.fr/sorry/image?id=$i"));
    [/code]

    So keep your curl blabla for things which really need it :)

  • :lol: well done busin3ss3 :) and nice comment mister FB ;)

Leave a Reply

You must login to post a comment.