A Managers Guide to NoSQL

Author
  • Erik Weibust

Introduction
Software design and development has undergone tremendous change over the last 30 years. Once a particular change captures the interest and imagination of the community, innovation accelerates and becomes self-propelled and change turns exponential. One such development in the last 5 years has been the development of NoSQL Database technology.

Software applications have become highly interactive with various delivery platforms and infrastructure. A modern application has to support millions of concurrent users and the data requirements have shifted from just application data to usage and analytics data. Application behavior has changed from static data capture and display, to dynamic, context-driven applications. With the above changes, relational database technology has lagged behind in innovation. Database providers have relied on 30 year old technological concepts and have applied multiple band-aids to the existing platforms to meet modern requirements.


Glossary of a few terms you need to know as you read on:

Database Schema is a well defined, strict representation of a real-world domain (such as the elements of a shopping application) within a database. All items to be stored in a database schema are expected to conform to the rules and constraints set by the schema design and no single-item can vary from the definition.

Database Replication is the process of sharing data between the primary and one or more redundant databases to improve reliability, fault-tolerance, or accessibility. Typically, data is immediately copied over to the backup location, upon write, so as to be available for recovery and/or read-only resources.

Sharding (or Horizontal partitioning) is a database design principle whereby the contents of a database table are split across physical locations, by rows instead of by columns (using referential integrity). Each partition forms part of a shard. Multiple shards together provide a complete data set, but the partitioned shards are split logically to ensure faster reads and writes.


What is NoSQL?
NoSQL is the name given to the engineering movement that birthed these next-generation databases. NoSQL stands for Not only SQL. The common misunderstanding is that it stands for No SQL, which is not true. NoSQL databases were created to solve real-world needs that existing relational databases were unable to solve. They are non-relational, distributed, schema-less and horizontally scalable with commodity hardware.

No SQL Databases are:

  1. Schema-less: Data can be inserted without being in a particular form. The format of the data can change at any time without affecting existing data. The unique identifier is the only required value for a data element.
  2. Auto-Sharding is by design an out of the box feature. All NoSQL database are built to be distributed and sharded without any further effort to the application design. They are built to support data replication, high availability and fail-over.
  3. Distributed Query support is available due to sharding.
  4. Maintaining a NoSQL cluster does not require complex software, or several layers of IT personnel and security measures. Of course, that does not mean reduced security of your data.
  5. Caching is built-in and low-latency is the expectation. Caching is transparent to application developers and the infrastructure teams.

In relation to Gartner’s Hype Cycle diagram, NoSQL is perhaps at the Slope of Enlightenment stage, with tremendous strides being made in the last 2 years towards Maturing with some of the NoSQL offerings.

Gartner's Hype Cycle

What are my Options?
There are many options to consider when choosing a NoSQL solution. They are mostly open source and schema-less. The key distinguishing factor between NoSQL databases is their design decision on how they handle data storage.

  • Key-value Storage: Membase, Redis, Riak
  • Graph Storage: Neo4j, InfoGrid, Bigdata
  • Wide-column Storage: Cassandra, Hadoop
  • Document Storage: MongoDB, CouchDB
  • Eventually Consistent Key-Value Storage: Amazon Dynamo, Voldemort
  • NewSQL: Almost relational, much simpler and easily scalable than RDBMS. Examples are voltDB, scaledb

How do I get buy-in from the team (above and below me)?
As with most organizations, new (or what is considered latest/greatest) technology is met with apprehension at best and suspicion at worst. The best and proven way to introduce something into the organization is to build prototypes of real-world scenarios, highlighting the advantages specific to your organization.

The most common place to introduce a NoSQL engine in your organization is most likely through building an application-logging prototype. With technology such as a NoSQL database, which is more of an infrastructure element, it is important to demonstrate business continuity with the new technology compared to existing technologies; thus demonstrating minimal risk to business stakeholders. It is likely that your developers may have already heard of this technology and are highly interested and motivated to use NoSQL databases. It is up to you to educate yourself on the new technology, and then educate your organization on the benefits of NoSQL based on the results of your prototype. Lastly, you can make the point that NoSQL is not an invention waiting to be implemented. Rather, it grew out of necessity for companies like Google and Amazon who built it, used it, and then open-sourced for the community at-large.

Next Steps
For more details on each NoSQL option visit www.nosql-database.org. We will also publish follow-up blogs posts on selected NoSQL databases in the coming weeks here at Credera.com. The follow-ups will be an in-depth review of the selected NoSQL databases with sample data and use cases for each.

  • dachi

    Awesome article thank you very much for taking time and sharing your knowlege.

  • Kumar Sundaram

    Nice one. more informative. thanks