Idling, Inc: storage

Showing posts with label storage. Show all posts

Friday, September 18, 2009

RAID vs SSD

The fundamental problem here isn't the RAID concept, is that the throughput and access times of spinning rust haven't changed much in 30 years. Fundamentally, today's hard drive is no more than 100 times as fast (both in throughput and latency) than a 1980s one, while it holds well over 1 million times more.
ZFS (and other advanced filesystems) will now do partial reconstruction of a failed drive (that is, they don't have to bit copy the entire drive, only the parts which are used), which helps. But there are still problems. ZFS's pathological case results in rebuild times of 2-3 WEEKS for a 1TB drive in a RAID-Z (similar to RAID-5). It's all due to the horribly small throughput, maximum IOPs, and latency of the hard drive.
SSDs, on the other hand, are no where near the problem. They've got considerably more throughput than a hard drive, and, more importantly, THOUSANDS of times better IOPS. Frankly, more than any other reason, I expect the significant IOPS of the SSD to signal the death knell of HDs in the next decade. By 2020, expect HDs to be gone from everything, even in places where HDs still have better GB/$. The rebuild rates and maintenance of HDs simply can't compete with flash.
Note: IOPS = I/O Per Second, or the number of read/write operations (irregardless of size) which a disk can service. HDs top out around 350, consumer SSDs do under 10,000, and high-end SSDs can do up to 100,000.

Wednesday, July 02, 2008

Welcome to the Aperi Blog

The Aperi project announces the first release of the Storage Network Simulator. The simulator is a tool that enables you to simulate a storage area network (SAN) through software. You can create a SAN configuration, add devices to the SAN, create arbitrary connections between devices, and remove connections between devices. Using this tool to create a simulated SAN environment can help when you:

Have limited or no access to hardware and software when developing and testing SRM applications
Have "off-line"' access to SAN devices without impacting the performance of the real network (such as the SNIA lab or any SAN in the world).
Need to perform "what-if" analysis before you plan to extend or reconfigure your SAN
The SAN Simulator provides an increase in productivity and efficiency for Aperi development and testing by removing the dependence on device availability.

Welcome to the Aperi Blog

Blogged with the Flock Browser

Tuesday, July 01, 2008

Storage Management - Green IT not green

Note that the term “under storage” is substituted for “under management.” Truth be told, data management in distributed computing environments is extraordinarily lax. The best analogy for distributed storage is a huge and growing junk drawer. This point is underscored by data collated by Sun Microsystems after performing nearly 10,000 storage assessments at client facilities. Per Sun’s statistics, for every hard disk deployed by a company, roughly 30 percent of its capacity contains useful data accessed regularly as part of day to day operations. Another 40 percent must be retained for reasons of historical value, regulatory or legal compliance, or because it is intellectual property. Rarely referenced, this data belongs in an archive, preferably tape or optical because they consume far less kilowatt hours than do disk-based systems.

The balance of the space on each hard disk, some 30 percent of total disk capacity, comprises orphan data (whose owner of record no longer exists at the company), contraband data (collections of MP3 files, videos or pictures downloaded from the Internet), and good old fashioned wasted space that has been reserved by an application or file system, but never used. This capacity could be freed up through a combination of data hygiene and good storage resource monitoring and management. If that were done, and archiving was implemented, companies could return up to 70 percent of the capacity of every spindle they own today, deferring the need to invest in more storage and exacerbating the storage acquisition trend that IDC projects will total 300 percent by 2011.

This point is never brought up in the articles you read in the trades. Instead, vendors posit a number of hardware and software value-add solutions as silver bullets for Green IT. Virtualization, de-duplication, compression, re-driving arrays with larger disk drives, leveraging MAID (massive arrays of independent disk, a portion of which spin down when not in use), and thin provisioning are just a few of the green panaceas that are being discussed. Most involve plugging additional hardware into the wall, which is hardly an intelligent way to reduce power consumption.

All of these techniques deliver tactical value at best: unmanaged data will continue to grow over time and eliminate whatever short term power reductions that the new technologies deliver. They are simply re-arranging deck chairs on the Titanic. Getting to green in IT ultimately and strategically comes down to managing data better. It costs a company virtually nothing to sort out their data junk drawer, to apply processes for classifying data so that it can be migrated over time into an archive, and to deploy storage resource management tools to spot wasted space, ownerless files and junk data in their repositories.

Thursday, December 06, 2007

Storage - SAN, HBA, iSCSI, TOE, VM

Notes on iSCSI, SAN, HBAs, TOEs, DASD, VMWare, etc

Centralized Storage and the Impact on VMware TCO

11 Reasons to Choose Qlogic iSCSI HBA’s over Software

Configuring iSCSI in a VMware ESX Server 3 Environment

SAN vs DASD - Cheap SAN gear

Tuesday, November 06, 2007

Best Server Storage Setup?

Firstly, decide on a decent motherboard and disk controller combo. CPU speed is basically irrelevant, however, you should pack each node with a good 2G+ of RAM. Make sure your motherboards have at least two 64bit/100Mhz PCI-X buses. I recommend (and use) Intel's single-CPU P4 "server" motherboards and 3ware disk controllers. I believe the Areca controllers are also quite good. You will have trouble on the AMD64 side finding decent "low end" motherboards to use (ie: single CPU boards with lots of I/O bandwidth). Do not skimp on the motherboards and controllers, as they are the single most important building blocks of your arrays.
Secondly, pick some disks. Price out the various available drives and compare their $/GB rates. There will be a sweet spot were you get the best ratio, probably around the 400G or 500G size these days.

Even though the 750GB Seagates appear to provide less bang-for-buck than smaller solutions (400GB, 300GB), the higher data storage density pays off in a big way. Cramming more data into a single box means amortizing the power/heat cost of the non-disk components better, and also allows you better utilization of your floorspace (which is going to become very important, if you really are looking to scale this into the multi-petabyte range).

Tuesday, July 24, 2007

Redhat GFS

The difference is how it tries to solve the problem. NFS works over IP and access files at the inode level. This requires the server system or device to be running RPC and the NFS protocol. Most network filesystems work in a similar way. You have servers and clients accessing the servers via some protocol.

Now imagine a filesystem designed for servers that allows them to access the filesystem at a block level directly via the shared bus. Let's say a parallel SCSI buss (or any bus that allows more than one host, e.g. iSCSI, Fibre Channel, Firewire). Imagine how fast it would be to access a shared disk over Fibre Channel! The problem is that if two servers mount the filesystem at the same time it would normally currupt the filesystem. People with SAN's (Storage Area Networks) solve this problem by making mini virtual hard drives and setting ACL's on them so only one host can access that virtual hard drive at a time. This could lead to a waste of space.

GFS solves the SAN problem by using a Distributed Lock Manager (DLM). No one host is the server of the filesystem, but writes/locks are coordinated via the DLM. Now multiple hosts *can* share a virtual hard drive or real block device and not corrupt the filesystem. If a host dies, no problem, there is no server for the filesystem!

Let's give an example. Say you have a firewire enclosure. Now plug that firewire hard drive into two computers. This, by the way, may still require a patch to sbp so that Linux will tell the enclosure to allow both hosts to talk to it at the same time. Now that the hard drive is talking to both computers you could run GFS on it and access the data at the block level by both systems. Now start serving email via IMAP (load balanced), *both hot*, no standby. Now kill a box. IMAP still works. No remounting, no resycronization.

Pretty amazing if you ask me! This technology is pretty rare. IBM has GPFS. SGI has Clustered XFS. Both are pretty expensive. GFS? RedHat just re-GPL'd it! Microsoft? Ummm. I think they are just now getting logical volume management.

GFS also has nice features like journaling (kinda required for this sorta thing), ACL's, quotas, and online resizing.

Friday, June 15, 2007

storagemojo.com

I got into it today on ZDnet with one of the other bloggers, George Ou, who published Why dumb-downed no-RAID storage is bad for consumers . As I believe that RAID is an idea whose time is coming to a close, I responded with Why home RAID won’t fly .

Monday, May 14, 2007

Hitachi's Universal Storage Platform V is virtually huge

The USPV offers a performance boost over previous Tagmastore systems. In addition, the new hardware ships thin provisioning software - technology yet to be implemented by Hitachi's high-end rivals.

The new hardware can handle 3.5 million input-output operations per second — a 40 per cent boost from its predecessor, launched in 2004. The USPV also offers a 4GB/sec Fibre Channel Switch backplane for connections to disk drives and hosts. The array now supports 16 controller pairs for a total of 224 font-end Fibre Channel ports and 112 FICON or ESCON host ports. The device hold up to 1152 drives.

While internal storage has stayed the same at 332TB, virtualized external storage gets a major boost from its previous incarnation's 32PB to up to 247PB.

"This is a big box for big users," principal IT advisor of Illuminata, John Webster said. "It's clearly not for the faint of heart. You've really got to know what you're doing with a device like this."

Hitachi promises a major improvement in disk utilization with the array's use of thin provisioning. While the technology isn't new, the system is the first high-end device of its kind to use it.

Thin provisioning is a technology debuted by 3PAR where physical disk capacity is used only as needed for virtual volumes. It replaces the traditional method where large portions of storage capacity are allocated to applications but often remain unused.

Wednesday, April 18, 2007

Smugmug - Amazon S3

But wait! It gets even better! Because of the stupid way the tax law operates in this country, I would actually have to pay taxes on the $423K I spent buying drives (yes, exactly like the money I spent was actually profit. Dumb.). So I’d have to pay an additional ~$135K in taxes. Technically, I’d get that back over the next 5 years, so I didn’t want to include it as “savings” but as you can imagine, the cash flow implications are huge. In a very real sense, the actual cash I conserved so far is about $474,000.

But wait! It gets even better! Amazon has been so reliable over the last 7 months (considerably more reliable than our own internal storage, which I consider to be quite reliable), that just last week we made S3 an even more fundamental part of our storage architecture. I’ll save the details for a future post, but the bottom line is that we’re actually going to start selling up to 90% of our hard drives on eBay or something. So costs I had previously assumed were sunk are actually about to be recouped. We should get many hundreds of thousands of dollars back in cash.

I expect our savings from Amazon S3 to be well over $1M in 2007, maybe as high as $2M.

Friday, December 08, 2006

Netezza

As the global data warehouse appliance leader, Netezza is transforming the way customers use business intelligence. For Orange UK, the Netezza Performance Server® data warehouse appliance reduced BI data latency from eight weeks to two days, enabled more effective fraud prevention and cross-carrier billing and reduced the number of cabinets in the data center from 26 to nine.

Storewiz

Storewiz, Inc. is the provider of a unique real time storage compression solution that dramatically boost the available storage space in all customer environments.

Storewiz enables organizations to maximize their current and future investment in their storage infrastructure and at the same time reduce their operations and management costs.

The solution, based on a high performance appliance that requires no network changes to an existing environment, provides transparent, real time compression.

Idling, Inc

Friday, September 18, 2009

RAID vs SSD

Wednesday, July 02, 2008

Welcome to the Aperi Blog

Tuesday, July 01, 2008

Storage Management - Green IT not green

Thursday, December 06, 2007

Storage - SAN, HBA, iSCSI, TOE, VM

Tuesday, November 06, 2007

Best Server Storage Setup?

Tuesday, July 24, 2007

Redhat GFS

Friday, June 15, 2007

storagemojo.com

Monday, May 14, 2007

Hitachi's Universal Storage Platform V is virtually huge

Wednesday, April 18, 2007

Smugmug - Amazon S3

Friday, December 08, 2006

Netezza

Storewiz

My Blog List

Blog Archive

Labels

Contributors