Nonrelational Database Systems, NotOnly SQL or NoSQL?

Hadoop

Nonrelational Database Systems, NotOnly SQL or NoSQL?

딜레이라마 2017. 2. 9. 20:04

Sharding

The term sharding describes the logical separation of records into horizontal partitions. The idea is to spread data across multiple storage files—or servers—as opposed to having each stored con‐ tiguously. The separation of values into those partitions is performed on fixed boundaries: you have to set fixed rules ahead of time to route values to their appropriate store. With it comes the inherent difficulty of having to reshard the data when one of the horizontal partitions exceeds its capacity. Resharding is a very costly operation, since the storage layout has to be rewritten. This entails defining new boundaries and then The Problem with Relational Database Systems 9 www.finebook.ir 12. See “NoSQL” on Wikipedia. horizontally splitting the rows across them. Massive copy opera‐ tions can take a huge toll on I/O performance as well as temporar‐ ily elevated storage requirements. And you may still take on up‐ dates from the client applications and need to negotiate updates during the resharding process. This can be mitigated by using virtual shards, which define a much larger key partitioning range, with each server assigned an equal number of these shards. When you add more servers, you can reassign shards to the new server. This still requires that the data be moved over to the added server. Sharding is often a simple afterthought or is completely left to the operator. Without proper support from the database system, this can wreak havoc on production systems.

Let us stop here, though, and, to be fair, mention that a lot of compa‐ nies are using RDBMSes successfully as part of their technology stack. For example, Facebook—and also Google—has a very large MySQL setup, and for their purposes it works sufficiently. These data‐ base farms suits the given business goals and may not be replaced anytime soon. The question here is if you were to start working on im‐ plementing a new product and knew that it needed to scale very fast, wouldn’t you want to have all the options available instead of using something you know has certain constraints?

Nonrelational Database Systems, NotOnly SQL or NoSQL?

Over the past four or five years, the pace of innovation to fill that ex‐ act problem space has gone from slow to insanely fast. It seems that every week another framework or project is announced to fit a related need. We saw the advent of the so-called NoSQL solutions, a term coined by Eric Evans in response to a question from Johan Oskarsson, who was trying to find a name for an event in that very emerging, new data storage system space.12 The term quickly rose to fame as there was simply no other name for this new class of products. It was (and is) discussed heavily, as it was also deemed the nemesis of “SQL"or was meant to bring the plague to anyone still considering using traditional RDBMSes… just kidding.

The tagword is actually a good fit: it is true that most new storage sys‐ tems do not provide SQL as a means to query data, but rather a differ‐ ent, often simpler, API-like interface to the data. On the other hand, tools are available that provide SQL dialects to NoSQL data stores, and they can be used to form the same complex queries you know from relational databases. So, limitations in query‐ ing no longer differentiate RDBMSes from their nonrelational kin. The difference is actually on a lower level, especially when it comes to schemas or ACID-like transactional features, but also regarding the actual storage architecture. A lot of these new kinds of systems do one thing first: throw out the limiting factors in truly scalable systems (a topic that is discussed in “Dimensions” (page 13)). For example, they often have no support for transactions or secondary indexes. More im‐ portantly, they often have no fixed schemas so that the storage can evolve with the application using it.

There are many overlapping features within the group of nonrelation‐ al databases, but some of these features also overlap with traditional storage solutions. So the new systems are not really revolutionary, but rather, from an engineering perspective, are more evolutionary. 12 Chapter 1: Introduction www.finebook.ir 14. See Brewer: “Lessons from giant-scale services.”, Internet Computing, IEEE (2001) vol. 5 (4) pp. 46–55. Even projects like Memcached are lumped into the NoSQL category, as if anything that is not an RDBMS is automatically NoSQL. This cre‐ ates a kind of false dichotomy that obscures the exciting technical possibilities these systems have to offer. And there are many; within the NoSQL category, there are numerous dimensions you could use to classify where the strong points of a particular system lie.