Tag Archive for 'yahoo'

Yahoo! Captcha Extraction

For those who are interested in security you should definitely check out sla.ckers.org. I’ve read some real gems over there when related to webapp security and it has inspired me before to write some posts. This time, I found something I just had to share with you guys. Don’t worry, I contacted maluc (the original author of the post) to get permission to post his stuff over here.

When it comes to test a and it’s weakness, you always need to have a large sample to work with. If you’re planning to train or write an OCR engine, it’s always useful and sometimes needed to have several samples to play with.

I’m going to start by posting this code to extract a large sample of ! captchas:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
<?php
//This script pulls CAPTCHAs URL from $urlYahoo, then gets the CAPTCHA and saves them to folder $saveYahoo from the range $startImage to $endImage.
$urlYahoo = "https://edit.yahoo.com/reg_json?PartnerName=yahoo_default&amp;RequestVersion=1&amp;ApiName=GetCaptcha&amp;3841320";
$saveYahoo = "yahoo/";
$startImage = 0;
$endImage = 999;
 
//These two lines force the output to be constantly flushed and updated for the user. (ideally)
ob_implicit_flush(true);
ob_end_flush();
echo "Script Started.\n";
 
//Pull in the CAPTCHA image as a string with cURL, and save to a file. The curl extension must first be enabled in php.ini.
for ($i=$startImage;$<=$endImage;$i++) {
//First extract a unique URL for each CAPTCHA from the $urlYahoo.
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $urlYahoo."&amp;rand=".$i);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
//If you're having difficulties with SSL, this may need to be enabled.
//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
$result = curl_exec($ch);
//Enable this if you're having difficulties.
//echo "Error is: ".curl_error($ch);
curl_close($ch);
 
//Parse out the URL, and retrieve the CAPTCHA for it.
$resultArray = explode('"',$result);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, stripslashes($resultArray[7]));
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, 0);
//If you're having difficulties with SSL, this may need to be enabled.
//curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 0);
$image = curl_exec($ch);
//Enable this if you're having difficulties.
//echo "Error is: ".curl_error($ch);
curl_close($ch);
 
//Save CAPTCHA to a file with the same name as $i.
if(!is_dir($saveYahoo)) mkdir($saveYahoo);
$fh = fopen($saveYahoo.$i.".jpg","w");
fwrite($fh,$image);
fclose($fh);
 
//Don't allow it to timeout.
set_time_limit(40);
//Output occasional progress.
if ($i%10 == 0) {
echo $i." CAPTCHA captured.\n";
flush();
}
}
 
echo "Script Complete.";
//-maluc
?>

This script will download and save to the /yahoo/ subfolder a sample of 1000 captchas. If you want to get more or less captchas, just edit the $end variable.

He even took some inital anotations for those interested in this particular captcha:

length: 4-6
range: a-z,A-Z,2-8
case-sensitive: no
background: always white
text color: always black
overlay: 1-3 random line paths, always black
size: between 1800 and 3200 bytes
width: always 290px
height: always 80px
other: tilting and bending randomly, 4chars is rare, each letter either 2d sans-serif or 3d serif, some letters not used or in only one case

You can read the original post at sla.ckers.org here. maluc also did the same for the Google and Hotmail captcha’s so be sure to check them out aswell.

Props to maluc one more time ;)