ODBMS Industry Watch

Feb 8 10

One size does not fit all: “document stores”, “nosql databases” , ODBMSs.

by Roberto V. Zicari

I was asked to compare & contrast odbms systems to the new nosql datastores out there. So I have asked several people in the last few weeks….

Here are some resources I recently published on this:
– On NoSQL technologies – Part II. Roberto V. Zicari (editor) – February 10, 2010.
– The Object Database Language “Galileo” – 25 years later. Renzo Orsini – February 10 , 2010.
– Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison.Rick Cattell – January 11, 2010.
– On NoSQL technologies – Part I.R. Zicari (Editor) et al. – December 14, 2009.

Below, you`ll find a few more selected replies – I do not pretend this list to be complete, but hopefully it will help triggering a discussion….By the way, this list keeps growing so check the blog on a regular base for new updates.

RVZ

Leon Guzenda (Objectivity):
“I’ve always been amused by the many relational database enthusiasts who continue to insist that the technology is adequate for all purposes. It is a great and flexible technology and it has many applications. However, you don’t have to look far before you come across instances where developers have chosen not to use it. Why is it that Microsoft Word doesn’t break its documents down into chapter, section, paragraph, sentence and letter tables or columns and store them in SQL Server? It clearly could, but manipulating the data structures would be tiresome. Storing them in an object database, on the other hand, is efficient and flexible. Reading a document from disk into memory would probably only involve a few method invocations and I/Os, rather than the dozens of index, join and read operations that SQL would need. You could apply the same argument to Microsoft Excel.

Wikipedia lists many dozens of Windows file types. I’m sure that one could easily find hundreds of them. I’m also sure that the developers of some of those file formats might have found it convenient to store their formatted data in a relational database, but they clearly didn’t. I’d also hazard a guess that most of them could be more easily stored in an ODBMS, particularly as that’s why ODBMSs were developed in the first place – to overcome the limitations of the relational model.

If you look at the various types of data storage and management that are being clumped under the NoSQL banner you soon find that some of them are using subsets of relational technology and others are simply distributed file systems. I don’t think that there’s anything wrong with that. If you don’t need to do concurrent updates or ad hoc queries on content then why pay for the overheads of locking, transaction journals and indices? Almost all of the files that users store in online repositories such as Picasa, Flickr, Youtube and Facebook are written once, read many times and seldom, if ever, updated. There may be advantages to indexing the files, but you certainly wouldn’t want to break them down into rows and columns and there aren’t any significant advantages in storing them as BLOBs in an RDBMS. Files work just fine. However, you do need a highly scalable and efficient way to put them somewhere and find them again, which is where sharding can be useful. It’s not a new technique, though some people seem to think that it is. Objectivity/DB, being a distributed, federated ODBMS, has used sharding since 1988 to split its federations into convenient logical and physical chunks, but still provide a single logical view of connected object graphs.

The people who don’t like the NoSQL paradigm have some good points though. Throwing away a lot of the lessons learned from the building and refinement of relational database technology would be a bad thing to do. It’s certainly not worth the time and effort of rediscovering the problems and reinventing solutions to them.

In 1976, International Computers Limited (ICL) released a hardware and software technology called the Content Addressed File System. It used special disks and microprocessors to drop records into any convenient location on the disks, with some minimal local indexing. It found them again by examining their contents. The microprocessors handled the predicate operations and they could often handle many concurrent queries in a single rotation of each disk. It wasn’t a great marketing success, as ICL targeted the mainframe datastores rather than the workstation or emerging PC market. However, it did solve some pretty tricky problems at the time and the idea is still valid.

The XAM file system protocol makes it possible to store file metadata with each file and to have the storage infrastructure conduct searches for files that match predicates and values supplied by the application that needs them. Like CAFs, it would provide an ideal repository for the kinds of data that Youtube and the other sites I mentioned previously need to store. You could use SQL, SPARQL, LINQ or any other query language to find things, but you wouldn’t be using relational storage structures for the actual data. Somebody at an XLDB Workshop mentioned that the bulk of their data processing consists of scanning emails and attached files for viruses and other malware. They also spend most of their RDBMS cycles on scanning tables for marketing reasons, rather than executing queries. You could argue that content addressable hardware would be a perfect solution for them.

So, I regard the NoSQL category as a mixed bag and I don’t see it as being a threat to relational or object DBMSs. As we all know, each has its place in our technology toolkit. Much of the need for NoSQL variants may go away if content addressable hardware steps up to the challenge. However, I do wish that the people who are spending time developing these capabilities would look around a bit more before they start reinventing the wheel.”

Hamid Pirahesh, (IBM Fellow):
“There is heavy activity in the Bay area in nonsql. There is VC $ going into those companies. They do extensive blogging.
There is also xml DBs, which goes beyond relational. Hybridization with relational turned out to be very useful. For example, DB2 has a huge investment in XML, and it is extensively published, and it is also made commercial. Monet DB did substantial work in that area early on as well. ”

Borislav Iordanov ( HyperGraphDB):
“I think we’ve just realized that different representations are suitable for different domains and that it is possible to persist those representations natively without having to ultimately translate everything into a rigid schema. A very important driver of those NoSQL dbs. is their dynamic nature. I think many people like them for the same reason they like dynamic languages
(no static typing, no long compile times etc.): change is easier, prototyping is faster, and getting it into production is not that
risky. The problem with them of course is lack of integrity & consistency guarantees, something which ODBMs still try to provide at some level while remaining more flexible and offering richer structures than RDMBs. Another problem with RDBMS is SQL the language itself, which is very tightly coupled to the storage meta-model. Here again ODBMs do better through standartization, but more openness/flexibility etc. and come perhaps as a middle ground between anarchistic NoSQL and totalitarian SQL 🙂 ”

Michael Stonebraker (MIT):
“Greene`s reply is perfectly reasonable. I think the “one size does not fit all” mantra — which I have been espousing for some time — is a good way to think about things. After all, the no SQL folks are themselves pretty diverse.”

Mårten Gustaf Mickos (previously CEO of MySQL AB):
” I think Kaj had a great response. (Link to Kaj response). Generally, in the early days of a new term, it doesn’t in my mind make sense to try to define it exactly or narrowly. It’s only a term, and it will take years before we know how significant it is and whether it is justified as a category of its own. For instance, it took a long time for Web2.0 to become an acknowledged term with a useful and reasonably well defined meaning.”

Dirk Bartels (Versant):
The “NoSQL” movement is a reflection of different and entirely new application requirements that are not orthogonal to SQL and relational databases. What makes this discussion really interesting in my opinion is that it has its roots with application developers.

The least an application developer wants to do is spending (or should I say wasting) time to implement a database system. Application developers want to concentrate on their domain, getting their apps out and compete in their markets. Today, pretty much all data persistence requirements are being implemented with SQL, in particular with several open source choices such as MySQL and PostgreSQL readily and at no cost available. SQL is what developers are learning in college, and therefore, it is today often the only database technology considered.

Now having these application developers stepping out of their own comfort zone, tossing the SQL databases and spending their precious resources inventing their own data management layer should tell us something.

From time to time, advances in compute infrastructure are disruptive. The PC, Client / Server, Internet, and lately Cloud Computing have been or will be catalysts for significant changes. Is it possible that “No SQL” means that certain new types of applications are simply not a good fit for the relational data model provided via SQL?

When taking a closer look, these applications still require typical database functionality also found in a SQL database, for example some query support, long term and reliable data storage, and scalability just to name a few. Nevertheless, there must be issues with SQL to make the pain just too big to stick with SQL. I haven`t done a lot of research on this subject, but suspect that the issues revolve around the data model (too rigid), the transaction processing model (too linear), the scalability model (horizontal scale out solutions too expensive), the programming model (too cumbersome) and probably more.

I remember when IT did`t get the PC, the Graphical User Interface, the Internet etc. I am not surprised that many traditional IT people are not getting it today. I expect the No SQL movement to gain momentum and to continue to evolve rapidly and likely in several directions. These days, applications and their data management requirements are significantly more complex. In my opinion it is just a matter of time that developers realize that the traditional relational model, invented for high volume transaction processing, may not be the best choice for application domains a SQL database is simply not designed for.

Is this a call for object databases, a long overlooked database technology that has matured over the past 20 somewhat years? I think it`s possible the future will tell. No SQL application developers should at least give object databases a good look, it may save them a lot of time and headaches down the road. ”

Dwight Merriman, (CEO of 10gen):
A comparison of document-oriented and object-oriented databases is fascinating as they are philosophically more different than one might at first expect. In both we have a somewhat standardized document/object representation — typically JSON in currently popular document-oriented stores, perhaps ODL in ODBMS. The nice thing with JSON is that at least for web developers, JSON is already a technology they use and are familiar with. We are not adding something new for the web developer to learn. In a document store, we really are thinking of “documents”, not objects. Objects have methods, predefined schema, inheritance hierarchies. These are not present in a document database; code is not part of the database.
While some relationships between documents may exist, pointers between documents are deemphasized. The document store does not persist “graphs” of objects — it is not a graph database. (Graph databases/stores are another new NoSQL category – what is the different between a graph database and an ODBMS? An interesting question.) Schema design is important in document databases. One doesn’t think in terms of “persist what I work with in ram from my code”. We still define a schema. This schema may vary from the internal “code schema” of the application. For example in the document-oriented database MongoDB, we have collections (analogous to a table) of JSON documents, and explicit declaration of indexes on specific fields for the collection.

We think this approach has some merits — a decoupling of data and code. Code tends to change fast. Embedding is an important concept in document stores. It is much more common to nest data within documents than have references between documents. Why the deemphasis of relationships? A couple reasons:

First, with arbitrary graphs of objects, it is difficult to process the graph from a client without many client/server turnarounds. Thus ,one might run code server-side. A goal with document databases is to maintain the client/server paradigm and keep code biased to the client (albeit with some exceptions such as map/reduce).

Second, a key goal in the “NoSQL” space is horizontal scalability. Arbitrary graphs of objects would be difficult to partition among servers in a guaranteed performant manner.

Eric Falsken (db4o):
“NoSQL database (like Google’s BigTable data behind their Gears API) is an awkward sort of “almost-sql” or “sql-like”.

But it ends up being a columnar-database. What you call a “document store” is a row-based database. Where records are stored together in a single clump, called the row. By eliminating strongly-typed columns, they can speed up i/o by many factors (data written to one place rather than many places) just in the insert/select operation. By intelligent use of indexes, they should be able to achieve some astounding benchmarks. The complexity of object relationships is their shared drawback. Being unable to handle things like inheritance and polymorphism is what stops them from becoming object databases. You can think of db4o as a “document-oriented database” which has been extended to support object-oriented principles. (each level of inheritance is a “document” that all gets related together.)”

Peter Neubauer (Neo Technology):
“We have not modeled an ODBMS on Neo4j yet, but if you look at e.g. the Ruby bindings , it fits very naturally into dynamic language paradigms, mapping the domain models almost directly onto nodes and relationships. We have written a couple of blogs on the topic, latest Emils classification of the NOSQL space,and there are approaches to turn Neo4j into something that resembles a OODBMS:

1. JRuby bindings , hiding Neo4j as the backend largely form the code, but still exposing the Traverser APIs baside the normal Ruby collection etc for deep graph traversals.

2. Jo4neo by Taylor Cowan , which is persisting objects via Java annotations.

3. Neo4j traversers, and Gremlin , for deep and fast graph traversals (really not OODBMS like, but VERY useful in todays data sets) .

It would be very interesting to have more conversation on these topics!”

Jan Lehnardt (CouchDB):
“For me, NoSQL is about choice. OODBs give users a choice. By that definition though, Excel is a NoSQL storage solution and I wouldn’t support that idea 🙂 I think, as usual, common sense needs to be applied. Prescriptive categorisation rarely helps but those who are in the business of categorising.”

Miguel-Angel Sicilia (University of Alcalá):
“The NoSQL movement, according to Wikipedia today promotes “non-relational data stores that do not need a fixed schema”. I do not believe ODBMS really fit with that view on data management. Also, other aspects of the NoSQL “philosophy” make ODBMS be far from them. However, NoSQL focuses on several problems of traditional relational databases for which ODBMS can be a good option, so that they can be considered to be “cousins” in some sense. I do not see NoSQL and ODBMS as overlapping, but as complementary solutions for non-traditional data management problems.”

Manfred Jeusfeld (Tilburg University):
” I am a bit frightened by this development. Basically, people step back to the time prior to database systems. I heard similar ideas at a Dagstuhl seminar from IT experts of STATOIL (the Norwegian oil giant). They experienced that non-database data stores are much faster, and they appear to be willing to dump ACID for increased performance of some of their systems.
This is typical for a programmer’s attitude. They want the data storage & access to be optimized for their application and do not care too much about interoperability and reliability. If every data items can be referenced by a 64bit memory address, why would we need a query language to access the data. Following memory pointers is certainly much faster. OODBs can serve as a bridge between the pure programming view and the database view.”

Peter Norvig (Director of Research at Google Inc.):
“You should probably use what works for you and not worry about what people call it.”

Floyd Marinescu (Chief Editor InfoQ ):
“I think that web development has become so old and mature now that people are discovering that certain types of applications are better off with different solutions than your standard doctrine 3 tier system with SQL database. Plus the whole web services phenomenon has opened people`s minds to the notion of infrastructure as a service and that is driving this as well. This is my anthropological view of things. 😉 ”

Matthew Barker (Versant Corporation):
“As Robert (Greene) mentioned, when everyone thought “one size fits all”, SQL evolved from a “simple query language” to a “do-it-all” tool including creating, modifying, and reorganizing data, not just querying. Many object databases such as Versant have an “SQL-like” query language but the capabilities are limited to actually querying the database, not updates, creates, deletes, etc. When you use SQL to modify data, you break the OO paradigm of safety and encapsulation; in a large application, it very easily becomes a monster that is difficult if not impossible to control. If we reign it and use SQL for it’s original purpose, querying data, then SQL can fit in nicely with object databases – but the monster it has become does not fit into object database technologies.”

Martin F. Kraft (Software Architect):
” NoSQL is a very interesting topic, though a standardized API like the W3C proposal would be challenging to adopt, and even more to outperform native legacy OODBMS (non-sql) queries.
I see the need to improve SQL performance in object oriented use as with J2EE and understand that some of the SQL performance implementations for OO like GemFire are doing great, but don’t solve the underlying root-cause: the SQL overhead in non-sql data queries like object traversal and key/value lookup.

So far I would rather use RDBMS’ and OODMBS’ where they perform best, as Mr. Greene said “one size does not fit all”….Some object databases provide (slow) SQL interfaces, and NoSQL should not mean no-QL”

Kenneth Rugg: (Progress):
” I was at the “Boston Big Data Summit” a few weeks ago and the NoSQL movement came up as a topic. The moderator, Curt Monash whose is on dbms2.com, had a pretty funny quote on the topic. He said “The NoSQL movement is a lot like the Ron Paul campaign – it consists of people who are dissatisfied with the status quo, whose dissatisfaction has a lot to do with insufficient liberty and/or excessive expenditure, and who otherwise don’t have a whole lot in common with each other.”

Andy Riebs (Software Engineer at HP):
“Interesting blog! (Stonebraker sounds a bit too much like he’s defending his baby ) Greene’s comments are sensible. Following through on the “one size doesn’t fit all” theme, just how many simple databases are best implemented as flat text files? The “NoSQL” discussions are reminiscent of the old “RISC vs. CISC” arguments. While people usually understood the notion of a simpler instruction set, no one noticed that pipelining and huge register sets were introduced in the same package. Now you can find all those elements in most architectures.
In the same sense, one presumes that many of the good ideas that survive the “NoSQL” debates will, in fact, end up in relational databases. Will that make the resulting products any more or less relational? Some problems will best be resolved with non-relational, non-SQL tools. Best bet for a NoSQL technology that will survive? Harvesting meaningful data from the log files of data centers with 20.000 servers! With a proper MapReduce implementation, it will be a thousand times more effective to distribute the processing to the source of the data, and return only pre-processed results to the consuming application, rather than hundreds of gigabytes of raw data. Of course, the real winner will be the one who can implement SQL on top of a MapReduce engine! ”

Tobias Downer (MckoiDDB):
” Interesting read. I don’t think you can really nail down exactly what a ‘NoSQL’ technology is. It’s a rejection of the prevailing popular opinion that’s been around for the last decade, which is that a data management system that doesn’t support SQL is no way to manage ‘real’ data, and a forum for advocates of certain scalable data technologies to promote themselves.
It’s been successful at getting people interested and excited about data systems outside the world of SQL which I think is really what the aim was. I wouldn’t count on the word being in our lexicon for very long though or any products seriously branding themselves using this label, because these systems are likely to eventually come back around and support SQL like query languages in the future.”

Warren Davidson (Objectivity):
” This is of interest to me, and should be to anyone interested in market change. Change is always difficult, but in the database world it seems especially so when you see how often an RDBMS is used even though it might be a technically poor decision. Since change is risky, in order for Objectivity, or Versant or Progress to get people to adopt its technology, you need momentum and corroborating support. The NoSQL movement lends credence to the first notion of change; one size does not fit all. So to have multiple technologies saying the same thing establishes credibility for change to begin, and from there people can start making the right technical choice, which may or may not be ODBMS. But this is ok, the industry needs the market to embrace a very simple concept of ‘the right database for the right application’. They don’t do that today, but the cloud movement is going this direction. And the NoSQL movement may help everyone.

As they say, a rising tide lifts all boats. 🙂 ”

Stefan Edlich (Beuth Hochschule):
” There are some databases which are document oriented and nosql as CouchDB and SimpleDB. And there are document oriented ones which are not nosql as Lotus or Jackrabbit (a really weird system I think). I think the interesting tool and user group is the nosql group which excludes the latter group (hopefully). So the article you mentioned describes nosql with products storing documents as attribute data and not documents as pure byte / data documents (which Jackrabbit does).”

Daniel Weinreb (previously co-founder Object Design):
“They’re being used not just as caches but as primary stores of data. There’s one called Tokyo Tiger (and Tokyo Cabinet) that I’ve heard is particularly good.”

William Cook (University of Texas at Austin):
” I think it is important! Facebook is using this style of data store. I’m not sure about the performance implications, but it needs to be studied carefully.”

Raphael Jolly (Databeans):
“That a database could be designed without SQL is not a surprise to me since Databeans has no query language and is meant to be queried in Java (native queries). In addition, I’ll happily believe that “relational databases are tricky to scale”.
However, the subject of extending Databeans with distributed computing capability has been on my mind for a long time and I presently have no idea how it could be done. What is interesting about NoSQL is how they mean to perform queries, i.e. through MapReduce. I don’t know whether everything that can be expressed in SQL is amenable to MapReduce (this is probably not the case), but obviously a fair amount of what is done today on the internet is, the killer app being… search engines.
In summary, I tend to agree with this comment by alphadogg: “The main reason this [relegating to nich usage] will happen with the various key-value stores that are now in vogue is that they, like the predecessors, are built to solve a very, very specific issue on high-volume, low-complexity data, which is not the norm. The NoSQL moniker is indicative of a misplaced focus. NO structured query language? Really? We want to go back to chasing pointers and recreating wrappers for access? “.
My perception is that even when they literally have no SQL based queries, object databases are very different from NoSQL technologies as currently understood because, as is clearly explained in your references, there is more to ODBMS than just the query language. Specifically : ACID transaction constraints, which “NoSQL” seem to relax quite a bit.
These constraints are difficult to manage in a distributed setting. One has to consider advanced concurrency control techniques. But with a careful design, nothing seems to prevent a fully structured approach.
In this respect, DHTs are clearly limited compared to classical object databases. Recently, I was reading about such an attempt a distributed design, yet with “strict” transactions: (download the book “XSTM: Object Replication for Java, .NET & GWT ” as .pdf). ”

Steve Graves (McObject):
” I thought alphadogg had good comments, although he has a relational/SQL bias.”

Jonathan Gennick (Apress):
” it is an interesting discussion. I have heard the term “NoSQL”. I did find the comment about relational databases not supporting key/value stores amusing: “…and index key/value based data, another key characteristic of “NoSQL” technology. The schema seems very simple but may be challenging to implement in a relational database because the value type is arbitrary.”
In Oracle, one simply needs a table as follows:

CREATE TABLE key_value (
the_key NUMBER,
the_value BLOB);

There you go! Key/value. How much simpler can you get?”

Feb 1 10

Data Stores vs. ODBMSs

by Roberto V. Zicari

I asked Anat Gafni, VP of Engineering at db4objects, on her opinion on how ODBMSs compare with respect to new “data stores”, such as “document stores”, and “nosql databases”.

RVZ: Anat, systems such as CouchDB, MongoDB, SimpleDB, Voldemort, Scalaris, etc. provide less functionality than OODBs but a distributed “object” cache over multiple machines. How do they compare with respect to ODBMSs?

Anat: We can categorize “stores” and see if they are more similar or different than OODBs, in a couple of ways:

By each dimension of the purpose of OODBS:

1. persistent (could be accomplished by other methods like: replicating to other machines, using non-volatile caches, etc.)
2. Being queriable
3. Scalable (beyond what can be in cache, but could be distributed instead)
4. Objects vs.Relations

Arguable properties:
5. can express and query based on complex relationships among data items
6. can be shared among multiple “clients”

Many of these other database are similar to oodbs in item 1, 3 and 4. I am not sure they have capabilities in 3, 5 and 6 above.

There is a lot of interest in the USA in particular, w.r.t. to internet based applications and cloud computing. Big_table, and such look to me more like an algorithm based on traditional stores, rather than a db.

Anat Gafni has over 20 years of experience in managing software development and product strategy. At db4objects she is responsible for managing engineering and support. Anat earned her Ph.D. in Computer Science from the University of Southern California, and a MA degree from Boston University in Math and Computer Science.

Jan 17 10

Call for Contributions: 3rd International Conference on Objects and Databases 2010 (ICOODB)

by Roberto V. Zicari

I`d like to inform you that the 3rd International Conference on Objects and Databases (ICOODB), will take place September 28-30, 2010, at the Goethe University Frankfurt, in Frankfurt am Main, Germany.

ICOODB 2010 is the third in a series of international conferences aimed at promoting the exchange of information and ideas between members of the objects and databases communities.

A key feature of the conference is its goal to bring together developers, users and researchers. At the same time, the conference aims to meet the needs of the different sub-communities. The conference therefore consists of three different tracks offered as a tutorial & workshop day, an industry day and a research day.

The call for contributions is open. For 2010 we extended the scope of the conference. We recognize that the world of data management is changing. The linkage to service platforms, operation within scalable (cloud) platforms, object-relational bindings, NoSQL databases, and new approaches to concurrency control are all becoming hot topics both in academia and industry. We therefore also encourage papers in any of these areas.

Therefore the general topics of interest include, but are not restricted to:

Object Database Modeling and Design
object data models
design of object databases
semantics of object databases

Object Database Frameworks and Software Engineering
application development and application frameworks
software engineering issues
constraint models and mechanisms
event models and mechanisms
object-oriented frameworks for data management
ontologies and object stores

Object Databases
object storage systems
object query languages
transaction management for object databases
access structures and indexing in object databases
distributed object databases
architecture and engineering of object database engines
evaluation of object databases
benchmarks
novel applications of object databases
object databases in education
Integration of Object-Oriented Programming Languages with databases

Integrating objects and databases
Novel design of Programming Languages for objects and database

Object/Relational mappings
Novel design and/or Implementation of O/R Systems.
Benchmarks

Cloud Data Stores
Novel design and/or Implementation of Cloud Data Stores
Novel Application of Cloud Data Stores
Benchmarks

Document Stores
Novel design and/or Implementation of Document Stores
Novel Application of Document Stores

NoSQL Databases
Novel design and/or Implementation of NoSQL Databases
Novel Application of NoSQL Databases
Benchmarks

Here are the links to the call for contributions:

Call for Research Papers.

Call for Industry Presentations.

Call for Tutorial Proposals.

Jan 12 10

Rick Cattell on “Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison.”

by Roberto V. Zicari

What`s new at ODBMS.ORG in 2010?

I have extended the focus of ODBMS.ORG to include, besides object database technologies, new developments in data management, such as the linkage to service platforms, operation within scalable (cloud) platforms, object-relational bindings, NoSQL databases and new approaches to concurrency control.

ODBMS.ORG will offer in 2010 educational resources in all of these areas.

I have just published a new expert article by Rick Cattell on this topic:
“Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison”.

Rick Cattell, formerly at Sun Microsystems, and co-creator of JDBC, and chair of the Object Data Management Group (ODMG), explains: “Traditionally, the obvious platform for most database applications has been a relational DBMS. You might use a specialized parallel relational DBMS if you required high throughput for “data warehousing”, or an object database system if your application had unusual functionality or performance requirements, e.g. for in-memory caching or fast relationship traversal. However, an RDBMS like Oracle or MySQL has usually been the answer. This has changed somewhat recently.

There is now recognition in database research that “one size does not fit all”, for example in the widely-referenced paper by Stonebraker and colleagues.

And in the Web 2.0 industry, many companies have abandoned traditional RDBMSs for so-called “NoSQL” data stores that provide much higher scalability, or they have built a distributed caching layer on top of RDBMSs. More scalable RDBMSs are also coming to market” so Cattell.

Rick`s article “”Relational Databases, Object Databases, Key-Value Stores, Document Stores, and Extensible Record Stores: A Comparison” is available for free download as (PDF)..
Worth reading it…

In fact, we already started last year to look at new developments in data management such as NoSQL databases and Cloud Stores.
An example is the article “On NoSQL technologies. Part I” (PDF) which presents interviews on new data stores with Patrick Linskey, Robert Charles Greene, Kaj Arno and Giuseppe Maxia.

RVZ

Jan 4 10

“NoSQL technologies” interview with John Clapperton

by Roberto V. Zicari

Happy New Year!

We start in the new year with a topic we covered already in 2009: “NoSQL technologies”.

I asked this time John Clapperton for his opinion. John Clapperton BSc CEng MBCS CITP is proprietor and author of the ‘VOSS’ virtual object storage system, which extends Smalltalk with integrated database management, providing transparent access and transaction processing of persistent, versioned, Smalltalk objects. Previously, John has worked on database applications and research at Standard Telephones & Cables, Unilever Research, Acorn Computers and Deductive Systems.

RVZ: John, are object databases “NoSQL” technologies?:

John Clapperton:: The absence of SQL in “NoSQL” databases is less an a priori choice than a consequence of their simplified schema capability, imposed in the interests of higher performance, being unable to support the full set of SQL language constructs. It does not follow, therefore, that object databases from which SQL has been excluded for the opposite reason, as a language unable to address their more general representational capabilities, should be automatically included in the NoSQL classification.

A person thinking of adopting a NoSQL database will have certain capabilities in mind, so the question is really “Might an object database have ‘NoSQL’ capabilities?”

These include:

1) Data partition (which is application dependent).
2) Optimistic locking (which helps only if most accesses are read-only).
3) Relaxation of ACID transaction rules by:

a) Data replication with eventual consistency, and/or

b) Suppression of transaction logging and/or flushing, and/or

c) Data storage in fast but volatile memory, sacrificing durability.
4) Fast navigational access to arbitrary data structures.

and in principle, an object database is capable of any or all of these.

The characteristics of an object database are its ability to manage arbitrarily complex object structures and to represent relationships by explicit named references. These have the potential for better performance by, respectively, reducing the required number of file writes for (de-normalised) data structures, and fast navigation of direct references instead of relational joins. However, against that must be set the cost of serialising arbitrary object structures for durable storage and instantiating the same on retrieval, compared with the simpler handling of pre-defined rows in a relational database.

Normalisation of behaviour, encapsulated in class definitions in language persistence odbms such as Logic Arts’ VOSS for Smalltalk, reducing implicit replication in application programs and queries, may have an advantage in NoSQL applications, but it’s not clear to me how significant that might be, given that its benefit is in managing complexity whereas NoSQL applications tend to be simpler.

Dec 17 09

“Nonschematic” databases.

by Roberto V. Zicari

Carl Olofson, Research Vice President, Database Management and Data Integration Software Research, at IDC, sent me this note, where he argues about the term “NoSQL” in relation with object databases.

RVZ

Carl Olofson:
I would shy away from this term. A number of analysts (including myself) consider it a somewhat sloppy term intended to convey a certain spirit of rebellion. It actually derives from the core idea that the so-called “No-SQL” databases do not require schemas, and since most DBMSs are relational, it is simpler to say “NoSQL” than the more obscure “NoDDL”.
In fact, OODBMS does require a schema, and the data structure, which is tied to the application object model, is key to how it operates, and especially to its transparent operational nature. The so-called “NoSQL” database, which I call a “nonschematic” database, is one that requires no schema to be defined before data is loaded. One usually does define a schema afterward, through a process of data discovery and definition. If you know of a OODBMS that can accept undefined data, and allow schema definition after the fact, that could qualify. Otherwise, I would shy away from the term altogether.

Dec 2 09

Are object databases “NoSQL” technologies? Part II

by Roberto V. Zicari

I asked the opinion of another ODBMS vendor on the topic of “NoSQL databases”: Luis Ramos who is Principal Systems Engineer at Progress Software.

RVZ: Luis, how do you position yourself with respect to the so called “NoSQL” databases?:

Luis Ramos: We view many of the characteristics of the growing “NoSQL” movement as a market reaction to the realities of present day cloud-based data requirements, where ACID properties are not as important as performance, the bulk of the data’s schema is not as complex, and the corresponding queries are relatively simple. Gone could be the days of complex relational schemas and the DBAs that are needed to maintain and administer them. Similar phenomena have been seen in other areas. For example in programming languages, the reaction against the very complex and error prone C++ led to the popularity of Java.

In many respects, object databases can be classified as “NoSQL” technology. It satisfies many of the pivotal characteristics of today’s “NoSQL” data stores. Object databases have been around since the late 1980s in response to the needs and requirements initially of the CAD market. At that time, the CAD practitioners needed an approach to data management that was fundamentally different than that provided by the relational databases. Consequently, a whole new breed of non relational (object-oriented) databases emerged. Customers from other markets, whose requirements could not be met by SQL databases, followed. Call it the original “NoSQL” movement? We certainly agree with Robert Greene’s stipulation that “one size does not fit all.”
An alternative way to put it is “You can put lipstick on a ‘relational table’ but its still a ‘relational table'”.

The schema-free characteristic that one finds in many “NoSQL” technologies is not entirely new. This is a requirement of many eCommerce applications developed in the 90s. There are object databases that support this nicely, enabling applications to store, manage, and index key/value based data, another key characteristic of “NoSQL” technology. The schema seems very simple but may be challenging to implement in a relational database because the value type is arbitrary.

The horizontal scaling characteristic is another key requirement that object databases more easily supports. Multiple terabytes databases have been successfully deployed. These object database systems have a client-centric (rather than a server-centric) architecture. Data is distributed to the client and queries are performed on the client instead of on monolithic servers. Consequently, the data can be partitioned, replicated, and scaled much more easily without being tied down to the hardware limitations of a single server computer.

So indeed, object database systems could be considered “NoSQL” technologies. They can be utilized either as a persistent store for data as well as a cache.

Nov 25 09

On the evolution of “non-relational databases”.

by Roberto V. Zicari

What is the opinion of the relational database community on the so called NoSQL?
I asked Giuseppe Maxia, MySQL Community Team Lead. Giuseppe is a system analyst with 20 years of IT experience, he has worked as a database consultant and designer for several years.

RVZ

RVZ: Why NoSQL databases?

Giuseppe Maxia:
The evolution of non-relational databases (NRDB: I prefer this name to no-SQL) is rightfully puzzling. Their usefulness and efficiency are difficult to quantify in general terms and a comparison to relational database system is far to be objective.
There are cases where you can easily demonstrate that NRDB scale better than their relational counterpart. But only with a lot of ifs and buts.
Basically, the highest traffic web sites such as Facebook or Digg can’t live with a database alone. There are two factors that limit their simple adoption of a relational schema:

1) the high traffic requires that the same values are fetched several times from the database. This requirement becomes a bottleneck for data. To overcome this limitation, there are auxiliary servers, such as memcached, which keep the most requested items in a fast network of in-memory storage systems. For all practical purposes, this technology converts the majority of the data into a series of key-value records.

2) when a site reaches a high number of registered users (or a high numbers of items to trade), a single server can’t contain the database anymore. There is no way of fitting 300 million Facebook users into a single server. Thus, they do “sharding”, i.e. a logical split of the data into tables, databases, and remote servers. With such organization, the relational model is conceptually broken, and the data looks more and more like a collection of key-value sets.

In both the above cases, you see that there is a trend to converting the relational data into key-values. The administrators of such sites start asking themselves why they keep bearing the burden of a relational database overhead since they can’t have its main advantage, namely the precise and mathematically proven organization of data. In this scenario, the key-value databases are becoming popular among those users who are forced to break relational integrity.

Add to it the large number of developers who never managed to understand the relational model, and you can explain why the non-relational database systems are gaining momentum. The drawback is that NRDB can’t retain meaningful metadata information, or, if they do, they achieve it through internal extension to the key-value model that is not easily exportable.
The immediate effect of the above points is that more and more systems that are based on non relational storage are now entirely depending on the application that uses them, a situation that brings us back to the COBOL times. This kind of storage is convenient only for either simple applications or for organizations that can afford to employ a large number of developers to cope with the increased complexity of the applications. For the rest of us, relational databases are still the best way of storing data.

Cheers

Giuseppe

Nov 24 09

Are object databases “NoSQL” technologies?

by Roberto V. Zicari

This time, I wanted to ask the opinion of an ODBMS vendor on the topic of “NoSQL databases”. I therefore asked Robert Charles Greene, V.P. Open Source Operations, at Versant Corporation.

RVZ: Robert, you represent an ODBMS vendor, what is your opinion of the so called “NoSQL databases”? Are object databases “NoSQL” technologies?

Robert Charles Greene:
I find that lots of folks are getting all worked up over the dubbed “No SQL” movement. I guess it`s because one can easily make assumptions and draw a would be obvious analogy to a “No Relational” movement and that would certainly be something to get worked up over.

As the object database guy, I see the core message being conveyed as, “one size does not fit all” when it comes to data management. That`s a far cry from abandoning the SQL approach to data management and in my mind leaves little to defend, though some seem to feel threatened enough by the catch phrase to sound the alarm.

In some sense, this notion that “one size does not fit all” is an important change in attitude, because for many years one size fits all was prevalent. Only as the internet gave way to the masses and large scale concurrency and data generation ushered in a new era has the relational way of doing data management truly begun to break down, opening the door to alternatives.

The “right tool for the job” has once again become a mantra of the software development community and equally important, the mantra of the decision makers in Enterprise I.T. As evidence, one has to look no further than the proliferation of data warehousing solutions outside the realm of relational database technology, ironically, to support the adhoc query and analytics, the founding pillars of the past which brought the relational database to such high esteem. Indeed, necessity may well be the mother of invention, for if not, it would most certainly be the father of adoption. So, if the RDB is no longer the king of query, then really, what is there to get all worked up about if necessity drives adoption in yet even more directions.

So, what is this NoSQL movement all about and does it warrant the public espousal of opinions. Well, as stated above, this is an important change in attitude which will bring valuable choices to our industry making us better equipped to deal with today`s infrastructure challenges, so yes, indeed it is worth discussion.

Michael Stonebreaker decided it was important to comment on this “movement” and gave an interesting NoSQL perspective here (courtesy of ACM).

I largely agree with the technical elements of his perspective, though I would suggest as in the above, the slightly different perspective that the core message is, “one size does not fit all”. I encourage the reader to then keep this in mind as they engage in a broader understanding of what these exciting new technologies provide.

Also, it is worth pointing out, while many of the technologies involved in the NoSQL movement do sacrifice ACID as a means to achieve their end in both performance and scalability, most object databases are ACID compliant and one might argue are the original NoSQL movement.

But lets not digress, as even Michal asserts, the NoSQL movement is not about SQL. So, while object databases are by and large “NoSQL” technologies, they are not a kind of Query-less technology. Indeed, while today`s modern object databases embrace the requirement for distributed parallel query processing, they also hold true to the core tenants of large scale distribution, object clustering and parallel processing all in the context of an ACID compliant transaction. These features surround a robust environment for dealing with arbitrarily complex object models, an area in which many of the NoSQL movement participants fall short.

In summary, the “one size does not fit all” change in attitude is healthy and beneficial for all.
To that end, the object database, a continuing NoSQL movement participant, is one more tool in the developers tool chest, enabling successful implementation of complex software systems of scale.

Cheers,
-Robert

Nov 23 09

Kaj Arnö and Michael Stonebraker on “NoSQL databases”

by Roberto V. Zicari

This time, I asked Kaj Arnö (MySQL), what does he think of “NoSQL databases”. Read his reply below.

RVZ

RVZ: What is your opinion of the so called “NoSQL databases”?

Kaj Arnö :
NoSQL is a catchy name, which in char(5) captures a lot of thinking.
To be technical, it’s not merely about removing SQL, but about removing most relational database overhead (where SQL, although dominant, is just an implementation of a query language). And some of that overhead is clearly not necessary all the time. It’s a lot of protocol to implement all aspects of ACID compliance, and it isn’t always needed. Especially in the early days of MySQL, we were accused of cutting corners — for instance through MyISAM not being fully ACID. Still, MyISAM was used a lot, and it still is. Coming back to the NoSQL debate, I would say that the MySQL idea of cutting overhead is gaining traction in other tools, which may choose to cut larger chunks or different corners. That’s a healthy development, since the
shortcuts to be taken depend on the class of application.

Kaj

Kaj joined MySQL in 2001, after 14 years as an entrepreneur. Serving as VP Services, VP Engineering and other exec roles at MySQL, he has been the VP in charge of MySQL Community Relations since 2005, continuing that position in Sun Microsystems. A native of Finland, Kaj lives in Munich since 2006. He devotes his free time to launching Runnism, the Religion of Running.

Moreover, there has been a recent post by professor Michael Stonebraker related to the topic “No SQL” databases and their performance with respect to classical relational database systems.

In his post, titled “The “NoSQL” Discussion has Nothing to Do With SQL”, Prof. Stonebraker argues that “blinding performance depends on removing overhead. Such overhead has nothing to do with SQL, but instead revolves around traditional implementations of ACID transactions, multi-threading, and disk management. To go wildly faster, one must remove all four sources of overhead, discussed above. This is possible in either a SQL context or some other context.”
The Link to Stonebraker`s Blog (courtesy of ACM).

I also published an article of David Chappell: “Introducing Windows Azure”. The paper describes Microsoft`s Windows Azure. In fact, the “Tables” abstraction in Windows Azure is similar to some “nosql databases”. You can download the paper (.PDF) here.

ODBMS Industry Watch

One size does not fit all: “document stores”, “nosql databases” , ODBMSs.

Data Stores vs. ODBMSs

Call for Contributions: 3rd International Conference on Objects and Databases 2010 (ICOODB)

“NoSQL technologies” interview with John Clapperton

“Nonschematic” databases.

Are object databases “NoSQL” technologies? Part II

On the evolution of “non-relational databases”.

Are object databases “NoSQL” technologies?

Kaj Arnö and Michael Stonebraker on “NoSQL databases”

About the author

Archives

Meta

About

Flickr

Search

About the author

Tags

Archives

Meta

About

Flickr

Search