“Have I Been Pwned” breach site partners with… the FBI!

Pwned Breach Site

 

In case you’ve never heard of it, Have I Been Pwned, or HIBP as it is widely known, is an online service run out of Queensland in Australia by a data breach researcher called Troy Hunt.

The idea behind HIBP is straightforward: to give you a quick way of checking your own online accounts against data breaches that are already known to be public.

Of course, you’d hope that a company that suffered a data breach would let you know itself, so you wouldn’t need a third party website like HIBP to find out.

But there are numerous problems with relying on the combined goodwill and ability of a company that’s just suffered a breach, not least that the scale of the breach might not be obvious at first, if the company even realises at all.

And even if the company does do its best to identify the victims of the breach, it may not have up-to-date contact data for you; its warning emails might get lost in transit; or it might not be sure which users were affected.

In case you’re unusure, the word pwned is pronounced to rhyme with owned, and it’s what you might call doubleslang – a new jargon word created by deliberately misspelling the existing jargon word “owned”, used to describe a database or a computer system that has been breached by an attacker.

Ironically, perhaps, the fact that it’s hard for a company to be certain how many records were stolen during an attack can have two different outcomes:

  • The company might fail to inform everyone who was actually affected, due to underestimating the extent of the attack.
  • The company might decide to tell all its customers that they might have been affected, even those who weren’t, due to being unable to estimate the extent of the attack at all.

Indeed, Hunt’s HIBP database started back in 2013, when Adobe suffered a massive data breach that proved just how hard it can be even for a large and well-established company to figure out what happened after a cyberattack.

The art-and-design software giant admitted in October 2013 that its network had been breached, with its Chief Security Officer claiming that “certain information relating to 2.9 million Adobe customers” had been stolen.

That estimate was soon increased to 38 million, but the breach utltimately turned out to have exposed the encrypted-but-highly-crackable passwords of about 150 million accounts, making the breach 50 times bigger that first thought.

Check for yourself

Hunt therefore set out to collect and collate personal information from data breaches that had already become public and make it securely searchable via his HIBP service.

After all, this was stolen data that was as good as available to anyone with enough patience to hunt it down for themselves for evil purposes, so why not try to use it for good instead?

The first 10 breach data dumps that he processed were as follows [link gives JSON data]:

HIBP breach name  Date added     Took place    Notes
----------------  ----------     ----------    -------------------------------------------------
Vodafone          2013-11-30     2013-11-30    IDs, credit cards and SMS messages.
Adobe             2013-12-04     2013-10-04    153 million Adobe accounts.
Stratfor          2013-12-04     2011-12-24    860,000 accounts, 10,000s of credit cards, 100s of GBs of email.
Yahoo             2013-12-04     2012-07-11    500,000 usernames and passwords.
Sony              2013-12-04     2011-06-02    Numerous breaches, from PSN to Sony Pictures. 
Gawker            2013-12-04     2010-12-11    Information about 1.3M users.
PixelFederation   2013-12-06     2013-12-04    38,000 gamers' account details.
Snapchat          2014-01-02     2014-01-01    4.6 million usernames and phone numbers. 
BattlefieldHeroes 2014-01-23     2011-06-26    500,000 gamers' usernames and passwords.
WPT               2014-02-01     2014-01-04    175,000 World Poker Tour usernames and passwords.

Astonishingly, his service now includes billions of records from 538 breaches over the past eight years.

But did they get your password?

Fortunately, not every breached data record directly exposes the victim’s password, even if password data was amongst the information stolen.

Organisations that care about cybersecurity avoid storing actual passwords at all, typically saving a one-way hashed representation of your password instead.

This hashed version of the password can be quickly computed from the real password, which only ever needs to be stored temporarily in memory, but a cryptographic hash can’t be wrangled backwards to extract the original password, or indeed to learn anything about it.

Hashing stored passwords doesn’t absolve you from keeping the hashes secure, of course, because stolen hashes can be “cracked” one-at-a-time by trying passwords one after the other, based on a list of likely choices known as a dictionary.

The hashing process is a second layer of defence: the more unusual your choice of password, and the longer it is, the less likely it is that a crook will be able to find a hash to match it in a stolen database, and therefore the less useful a database of stolen hashes will be.

Note that properly-stored authentication databases don’t just store a hash of your password, they also store a unique random string of characters colloquially known as a salt that is combined with your password before it’s hashed. This ensures that if two users choose the same password, their hashes are nevertheless completely different, so every possible password needs to be tried separately for every possible user. If salts are used, there’s no way to compute a general-purpose lookup table that converts hashes directly back to passwords, because you’d need a new lookup table for each user.

What if the passwords weren’t hashed?

But what about passwords that were acquired by crooks in their raw, unhashed form?

That’s not supposed to happen, but:

  • Sloppy internet services sometimes store plaintext passwords on purpose, even though they know they shouldn’t, although that’s fortunately less and less common these days.
  • Keylogging malware on your laptop can capture passwords as you type them in and upload the raw data directly to crooks who use the passwords themselves, sell them on to other crooks, or both.
  • Memory-scraping malware on servers can sniff out passwords while they are being checked, even if they are purged from memory immediately after use and never get written to disk.
  • Poor coding by a service provider could result in passwords being saved in plaintext form by mistake, for example to a logfile, where they might go unnoticed by the Good Guys for months or even years.

Google notoriously admitted in 2019 that it had inadvertently, albeit only occasionally, been logging unencrypted passwords for 14 years.

Facebook admitted, at about the same time, to a similar blunder affecting millions of Facebook and Instagram accounts.

In that sort of situation, you probably wouldn’t expect your password to show up in a public dump that might end up on HIBP, given that your password probably wasn’t exposed due to any specific hacking incident at any particular company.

Worse still, if your password gets sniffed out and collected in its raw form, then the crooks can simply start using it right away without doing any hash cracking first, and neither the randomness nor the length of your password would help to protect it better.

Sure, you’re much more likely to guess the password iloveyou2 than the password P6GZ54EN5OTV, but if you acquire the password in its original form then you don’t need to guess at all, so that even C5eblGt­r35fDn3­TW$/"eeX is no safer than 123456.

Hunt therefore also offers a public service called Pwned Passwords, where you can look up your own password in a database of just over 600 million already-recovered passwords, whether those passwords were stolen due to a large-scale corporate data breach, a carefully planned ransomware attack, a long-running malware infestation, or any other cause.

Assuming that you use a password manager, or choose long and complex passwords of your own that don’t follow any obvious pattern, it’s reasonable to assume that each of your passwords is globally unique…

…so that if you find your password on Hunt’s Pwned Passwords list (which is a whopping 10GB download) then it’s equally reasonable to assume that it’s not there by chance.

It’s there because it’s no longer a secret: someone else already stole it, stored it for later, and then either leaked it themsevles, got hacked, sold it on, or dumped it publicly for nuisance value.

In short, you’d jolly well better change it right away!

Avoiding a 10GB download

If you don’t have the time or energy to download 10GB or more of of Pwned Passwords data, you can look up your password without giving it away directly.

Hunt stores the 600 million passwords as SHA-1 hashes, so they all come out as 20-byte numbers, each represented as 40 hex digits. (Two hex digits of 4 bits each make up one byte of 8 bits.)

You simply hash your own password and look up the hash in two stages, as shown below, so you never directly reveal what password you were interested in.

Let’s assume your password is ucanttouchthis. (Don’t choose this one – as you will see below, numerous others have thought of it already!)

Take the first five hex bytes of your SHA-1 password hash and visit a special URL that ends with those bytes, denoting a 20-bit number from 0 to just over a million. (220 = 1,048,576).

That brings up a page of approximately 600 password hashes for each 5-byte prefix, and you search through that much more manageable list for the final 35 hex bytes of your hash, like this:

$ echo -n 'ucanttouchthis' | sha1sum
2b355435e608aad0476ce74001d44aada409c1ab  -

# First 5 digits are 2B355 in hex
# You're looking for the remaining digits 435E...C1AB

$ curl https://api.pwnedpasswords.com/range/2B355
0060C6035CFE881ED8490EE2CBAC18247B5:2
02475EE4CCEA7E427D129134D879B56C67C:5
02FBDEF169D2AC92C53D132CBC5D9DDAB4F:1
039864D5A4F176ACF5F43D86B348DDB95F3:1
041F4A10B74CD813905BD39D78DEA151A84:1
. . . .
42C90D0D51A2FE0F8FC026C971B9D00975E:4
435E608AAD0476CE74001D44AADA409C1AB:29   <-- FOUND! 29 people chose this one
437D7DF02F1A8E026DDDB4562408349F514:2
. . . .
FDC80988BBAD077D55ECF2845A53BEA423A:1
FE2D8DFE4473E34DD26F3EBDFD69B49564F:2
FE89BBEC3DA79E0D8AECDF831876040B18F:6
FE970AFD7CB1B928119427AAFA4283EAF20:1
FFCBEE88564A963B41549D864A5D12F9B9C:2
$

You can even add a header to the web request to say “pad out the number of replies”, so that between 800 and 1000 hashes are included every time (some of them bogus), so that the length of the reply doesn’t identify which prefix you searched for.

$ curl -H 'Add-Padding: true' https://api.pwnedpasswords.com/range/2B355
[. . . Your hash will definitely come back if it is present in the  . . .]
[. . . database, but there will be no predictable reply length from . . .]
[. . . which an observer could infer which prefix you searched for. . . .]

If you aren’t comfortable using command line tools such as curl or wget, you can just paste the link with the 5-digit prefix into your browser and then search with Ctrl-F in the single page that comes back.

If you download the raw Pwned Password data and divide it into the same 220 sections as Hunt himself, you will know exactly how many hashes end up in each of the one million sections, a number that will vary randomly from section to section. You will therefore be able to predict how long the reply for each section will be, even if it’s encrypted, and therefore to infer which prefix was used simply from the length of the reply. Adding fake data so the the replies have randomised, variable lengths makes this sort of prediction impossible.

What next?

And that brings us to the headline, right here at the end.

HIBP is going to start receiving password hashes for its database from none other than the US Federal Bureau of Investigation (FBI)!

As Hunt himself explains:

[FBI investigators] play integral roles in combatting everything from ransomware to child abuse to terrorism and in the course of their investigations, they regularly come across compromised passwords. Often, these passwords are being used by criminal enterprises to exploit the online assets of the people who created them. Wouldn’t it be great if we could do something meaningful to combat that?

And so, the FBI reached out and we began a discussion about what it might look like to provide them with an avenue to feed compromised passwords into HIBP and surface them via the Pwned Passwords feature.

In other words, if your password ends up in the hands of a crook in a way that neither you nor any of your service providers are likely to have noticed, you are unlikely ever to receive a breach notification warning about any sort of “compromise”…

…but there’s now a place that you can check securely for breached passwords anyway, even if you can never be sure exactly how the crooks acquired those passwords in the first place.


SOME FUN TO FINISH

As shown above, the Pwned Passwords database includes a count of the number of times each password hash appears in the database. Loosely speaking, any hashes that appear more than once stand for poorly chosen passwords. After all, if you choose randomly enough then the chance of anyone else picking the same password as you can be considered vanishingly small.

So we took the 20 most prevalent password hashes in the database, and set out to see how quickly we could guess them right off the top of our head. It took us less than two minutes to guess 17 of them:

Rank  Password    SHA-1 Hash                                Appearances
----  ----------  ----------------------------------------  -----------
  1:  123456      7C4A8D09CA3762AF61E59520943DC26494F8941B   24,230,577
  2:  123456789   F7C3BC1D808E04732ADF679965CCC34CA7AE3441    8,012,567
  3:  qwerty      B1B3773A05C0ED0176787A4F1574FF0075F7521E    3,993,346
  4:  password    5BAA61E4C9B93F3F0682250B6CF8331B7EE68FD8    3,861,493
  5:  111111      3D4F2BF07DC1BE38B20CD6E46949A1071F9D0E3D    3,184,337
  6:  12345678    7C222FB2927D828AF22F592134E8932480637C0D    3,026,692
  7:  abc123      6367C48DD193D56EA7B0BAAD25B19455E529F5EE    2,897,638
  8:  1234567     20EABE5D64B0E216796E834F52D61FD0B70332FC    2,562,301
  9:  12345       8CB2237D0679CA88DB6464EAC60DA96345513964    2,493,390
 10:  password1   E38AD214943DAAD1D64C102FAEC29DE4AFE9DA3D    2,427,158
 11:  1234567890  01B307ACBA4F54F55AAFC33BB06BBBF6CA803E9A    2,293,209
 12:  123123      601F1889667EFAEBB33B8C12572835DA3F027F78    2,279,322
 13:  000000      C984AED014AEC7623A54F0591DA07A85FD4B762D    1,992,207
 14:  iloveyou    EE8D8728F435FD550F83852AABAB5234CE1DA528    1,655,692
 15:  1234        7110EDA4D09E062AA5E4A390B0A572AC0D2C0220    1,371,079
 16:  - - - - -   B80A9AED8AF17118E51D4D0C2D7872AE26E2109E    1,205,102
 17:  qwertyuiop  B0399D2029F64D445BD131FFAA399A42D2F8E7DC    1,117,379
 18:  123         40BD001563085FC35165329EA1FF5C5ECBDBBEEF    1,078,184
 19:  - - - - -   AB87D24BDC7452E55738DEB5F868E1F16DEA5ACE    1,000,081
 20:  - - - - -   AF8978B1797B72ACFFF9595A5A2A373EC3D9106D      994,142

We managed to figure out the last three (#16, #19 and #20) in a couple of minutes more by looking back at old Naked Security articles about “the worst passwords ever” and using those as inspiration.