Building Your Own GPU Enabled Private Cloud
Last Updated: 2013-09-05 17:01:06 UTC
by Rob VandenBrink (Version: 1)
With one "extracurricular" project winding up, I figured it was time to start the next one, and playing with the new crop of GPUs for hash and password cracking seems like a fun way to go.
At first glance, using specialized hardware like a GPU would mean that you'd be working in a physical machine, that using a VM is not in the cards. Not so, it's actually pretty easy to make it fly in a VM, with a bit of planning. For me, it also means that I don't need to find a spot for a new server.
First of all, you'll need a short list of "must haves":
- a Hypervisor that supports Vt-d - I'm using VMware ESXi (this is NOT something you want to try in Workstation)
- A motherboard and CPU that supports Vt-d. I'm using a Tyan board and a XEON E3 processor.
- Be sure that your system board will support PCIe x16 cards. You don't need x16 throughput, even an x2 slot will do nicely, but it needs to be able to accept an x16 card (my board has an x8 slot)
- If you plan to use more than one GPU card, be sure the system board has enough slots, and that they are far enough apart (GPUs generally take 2 slots). Also, with more cards, a tower configuration will tend to overheat the top card(s) - be sure you have lots of fans in the case, and try to end up with the cards mounted vertically after all is said and done.
- Be sure you've got a power supply with lots of connectors and power - the card I ended up buying needed both an 8 pin and a 6 pin PCIe power connector. I've got a 650Watt modular power supply to play with in this machine, so all is well.
- Finally, the right GPU.
For folks like me that are on a budget, there are two main choices in GPUs - NVIDIA and AMD.
While both of these cards perform great for graphics, the AMD has and edge in crypto work - it seems to have better integer computing support, so tools like Hashcat or John the Ripper tend to run quicker.
In a virtual environment, the AMD cards seem to work better with Vt-d (called Device Passthrough in the ESXi interface). If you want to use NVIDIA GPU's, you'll actually install drivers in ESXi, and you'll be confined to the most expensive NVIDIA cards (Quadro 6000, 5000, 4000, or the Tesla or Grid cards). This is actually pretty cool, as you can spread the GPU's across multiple VM's for Virtual Desktop applications like CAD and the like. But splitting the power of a GPU card across multiple VM's defeats the whole point of building a VM for cracking.
For my lab, I chose an AMD RADEON 7970 - it's got great processing power and it was on sale that week. The 7900's seem to be right at the knee of the curve, right where more processing power starts to cost you disproportionally more money.
So, once all the prerequisites are in place, we're ready to go.
1/ First, install your card.
2/ Next, over to ESXi, we'll need to enable Device Passthrough (Vt-d) for our new device. You'll find this in Server Settings / Advanced / Edit. Select the new card (which also selects the PCIe slot that it's in), and save. You'll need to reboot the server after this done.
3/ Next, over to our VM. We'll go to the "Edit Settings / Add Hardware" screen, and add this new PCI device. Once this is done, vMotion and HA will no longer be possible for this VM, since it's tied to a specific PCIe slot in the server. Even a cold migrate (migration with the VM powered off) will involve some jumping through hoops - removing the card, migrating then re-adding the card after the migrate (you'll of course need identical hardware on the destination server once the migration is complete)
4/ After installing the correct AMD driver in the VM, and we see our card! I left the card at stock values for everything, nothing was overclocked or outside of the default settings.
5/ Next we'll need to install then the OpenCL SDK in the VM (Downloaded from AMD).
At this point, you'll be able to use the processing power of the GPU in any app written for it - I'm using Hashcat and John the Ripper, they both work great!
Running the hashcat benchmark (oclHashcat-lite64.exe -b) sees the card as a "Tahiti" (the codename for the 7900 series) gives us some really impressive numbers - for instance 8765.0M/s for MD5 (yes, that's in MILLION Hashes per second). While real throughput on the "no-lite" version will be slightly slower, these numbers are all pretty close to truth.
Just for fun, I installed the identical setup on a similar but PHYSICAL machine (3.5 GHz i7 quard core, as poosed to the 3.3 Ghz XEON quad in my ESXi server). You can see from the table below that the throughput on hash calculations are very close, with the i7 setup a bit slower. It's in situations like this where you'll see the features in "server class" processors make a difference - things like larger CPU cache for instance. My ESXi server was running my kid's Minecraft server (with him and all his friends on it), plus we were streaming video off of another VM running DLNA services for our TV, and hashcat in the VM is still is consistently faster than the physical host running a workstation CPU of similar specs.
The numbers for both the physical and virtual and physical servers are shown below. From this, we can draw a few critical conclusions:
- Hashing and encryption algorithms have worked well in the past, as CPU power has increased, we've been able to stay ahead of the curve with better encryption (DES followed by 3DES then AES for instance). While you could always brute-force short strings like passwords, the additional computation involved in each successive algorithm meant that at any point in time, cracking the current algorithm on current hardware would take too long to be practical (unless you had nation-state budgets that is) - essentially this is Moore's Law in action. The power these new GPU cards bring to the table gives the hardware side of the equation a "leapfrog effect" - we're increasing the decryption capability by several orders of magnitude - by lots of zeros!. And I'm not seeing a fundamental shift on the other side, no new "1,000 or 1,000,000 times harder" algorithm that makes it "difficult enough" to make brute forcing passwords impractical. Our best defense today is longer passwords - this is an area where size does matter, and bigger is better. But what's really needed is an alternative to passwords, or a whole other method of storing them.
- MD5 and SHA1 should no longer be used to store passwords, EVER - with this kind of throughput available to attackers with even minimal budgets, it's just too easy to crack these still commonly used algorithms. You should be able to draw your own conclusions as to what's a better way to go (look towards the bottom of the list, or look at what's not on the list yet).
- PBKDF2 (RFC 2898) is not currently on HASHCAT's list of supported algorithms. This new algorithm isn't widely deployed yet, but it goal is to "eat" a much higher number of compute cycles, making it ideal for password storage (especially if SHA256 is used instead of the default SHA1). This may be our best bet for password storage, short term (I don't have benchmarks for it yet). We are however, seeing GPU support for this algorithm in John the Ripper.
|Hash Type||Benchmark Values|
|On VM||On Physical|
|NetNTLMv1-VANILLA / NetNTLMv1+ESS||7624.8M/s||7034.0M/s|
|vBulletin < v3.8.5||2492.4M/s||2427.9M/s|
|SSHA-1(Base64), nsldaps, Netscape LDAP SSHA||2361.5M/s||2314.3M/s|
|SHA-1(Base64), nsldap, Netscape LDAP SHA||2276.0M/s||2295.8M/s|
|vBulletin > v3.8.5||1697.7M/s||1628.0M/s|
|descrypt, DES(Unix), Traditional DES||47052.0k/s||44934.1k/s|
Unfortunately there's no way to use this algorithm where we need it most, afaik: Windows AD or openldap.
Sep 7th 2013
9 years ago
1. I&A needs to become more modular, again *nix in general sets a good example with the pam structure. Windows AD could vastly benefit here. Openldap theoretically could adapt faster.
2. If GPUs are being leveraged by cracking practitioners to leap ahead of secured authentication storage, then why are GPUs not used to enhance secure authentication storage? Time to start some R&D. How large of a key/salt can I use without impacting operations? Storage is cheap, GPU cycles are cheap, why not?
Sep 7th 2013
9 years ago