ODBMS Industry Watch

Nov 21 12

Two cons against NoSQL. Part II.

by Roberto V. Zicari

This post is the second part of a series of feedback I received from various experts, with obviously different point of views, on:

Two cons against NoSQL data stores :

Cons1. ” It’s very hard to move data out from one NoSQL to some other system, even other NoSQL. There is a very hard lock in when it comes to NoSQL. If you ever have to move to another database, you have basically to re-implement a lot of your applications from scratch. “

Cons2. “There is no standard way to access a NoSQL data store. All tools that already exist for SQL has to be recreated to each of the NoSQL databases. This means that it will always be harder to access data in NoSQL than from SQL. For example, how many NoSQL databases can export their data to Excel? (Something every CEO wants to get sooner or later).”

You can also read Part I here.

RVZ
————

J Chris Anderson, Couchbase, cofounder and Mobile Architect:
On Cons1: JSON is the defacto format for APIs these days. I’ve found moving between NoSQL stores to be very simple, just a matter of swapping out a driver. It is also usually quite simple to write a small script to migrate the data between databases. Now, there aren’t pre packaged tools for this yet, but it’s typically one page of code to do. There is some truth that if you’ve tightly bound your application to a particular query capability, there might be some more work involved, but a least you don’t have to redo your stored data structures.

On Cons2: I’m from the “it’s a simple matter of programming” school of thought. e.g. just write the query you need, and a little script to turn it into CSV. If you want to do all of this without writing code, then of course the industry isn’t as mature as RDBMS. It’s only a few years old, not decades. But this isn’t a permanent structural issues, it’s just an artifact of the relative freshness of NoSQL.

—-
Marko Rodriguez, Aurelius, cofounder :
On Cons1: NoSQL is not a mirror image of SQL. SQL databases such as MySQL, Oracle, PostgreSQL all share the same data model (table) and query language (SQL).
In the NoSQL space, there are numerous data models (e.g. key/value, document, graph) and numerous query languages (e.g. JavaScript MapReduce, Mongo Query Language, Gremlin ).
As such, the vendor lock-in comment should be directed to a particular type of NoSQL system, not to NoSQL in general. Next, in the graph space, there are efforts to mitigate vendor lock-in. TinkerPop provides a common set of interfaces that allow the various graph computing technologies to work together and it allows developers to “plug and play” the underlying technologies as needed. In this way, for instance, TinkerPop’s Gremlin traversal language works for graph systems such as OrientDB , Neo4j, Titan, InfiniteGraph, RDF Sail stores , and DEX to name several.
To stress the point, with TinkerPop and graphs, there is no need to re-implement an application as any graph vendor’s technology is interoperable and swappable.

On Cons2: Much of the argument above holds for this comment as well. Again, one should not see NoSQL as the space, but the particular subset of NoSQL (by data model) as the space to compare against SQL (table). In support of SQL, SQL and the respective databases have been around for much longer than most of the technologies in the NoSQL space. This means they have had longer to integrate into popular data workflows (e.g. Ruby on Rails). However, it does not mean “that it will always be harder to access data in NoSQL than from SQL.” New technologies emerge, they find their footing within the new generation of technologies (e.g. Hadoop) and novel ways of processing/exporting/understanding data emerge. If SQL was the end of the story, it would have been the end of the story.

—–
David Webber, Oracle, Information Technologist:
On Cons1: Well it makes sense. Of course depends what you are using the NoSQL store for – if it is a niche application – or innovative solution – then a “one off” may not be an issue for you. Do you really see people using NoSQL as their primary data store? As with any technology – knowing when to apply it successfully is always the key. And these aspects of portability help inform when NoSQL is appropriate. There are obviously more criteria as well that people should reference to understand when NoSQL would be suitable for their particular application. The good news is that there are solid and stable choices available should they decide NoSQL is their appropriate option. BTW – in the early days of SQL too – even with the ANSI standard – its was a devil to port across SQL implementations – not just syntax, but performance and technique issues – I know – I did three such projects!

—-
Wiqar Chaudry, NuoDB, Tech Evangelist :
On Cons1: The answer to the first scenario is relatively straightforward. There are many APIs like REST or third-party ETL tools that now support popular NoSQL databases. The right way to think about this is to put yourself in the shoes of multiple different users. If you are a developer then it should be relatively simple and if you are a non-developer then it comes down to what third-party tools you have access to and those with which you are familiar. Re-educating yourself to migrate can be time consuming if you have never used these tools however. In terms of migrating applications from one NoSQL technology to another this is largely dependent on how well the data access layer has been abstracted from the physical database. Unfortunately, since there is limited or no support for ORM technologies this can indeed be a daunting task.

On Cons2: This is a fair assessment of NoSQL. It is limited when it comes to third-party tools and integration. So you will be spending time doing custom design.
However, it’s also important to note that the NoSQL movement was really born out of necessity. For example, technologies such as Cassandra were designed by private companies to solve a specific problem that a particular company was facing. Then the industry saw what NoSQL can do and everyone tried to adopt the technology as a general purpose database. With that said, what many NoSQL companies have ignored is the tremendous opportunity to take from SQL-based technologies the goodness that is applicable to 21st century database needs. .

—–
Robert Greene, Versant, Vice President, Technology:
On Cons1: Yes, I agree that this is difficult with most NoSQL solutions and that is a problem for adopters.
Versant has taken the position of trying to be first to deliver enterprise connectivity and standards into the NoSQL community. Of course, we can only take that so far, because many of the concepts that make NoSQL attractive to adopters simply do not have an equivalent in the relational database world. For example, horizontal scale-out capabilities are only loosely defined for relational technologies, but certainly not standardized. Specifically in terms of moving data in/out of other systems, Versant has developed a connector for the Talend Open Studio which has connectivity to over 400 relational and non-relational data sources, making it easy to move data in and out of Versant depending on your needs. For the case of Excell, while it is certainly not our fastest interface, having recognized the needs of data access from accepted tools, Versant has developed odbc/jdbc capabilities which can be used to get data from Versant databases into things like Excell, Toad, etc.

On Cons2: Yes, I also agree that this is a problem for most NoSQL solutions and again Versant is moving to bring better standards based programming API’s to the NoSQL community. For example, in our Java language interface, we support JPA ( Java Persistence API ), which is the interface application developers get when ever they download the Java SDK. They can create an application using JPA and execute against a Versant NoSQL database without implementing any relational mapping annotations or XML.
Versant thinks this is a great low risk way for enterprise developers to test out the benefits of NoSQL with limited risk. For example, if Versant does not perform much faster that the relational databases, run on much less hardware, scale-out effectively to multiple commodity servers, then they can simply take Hibernate or OpenJPA, EclipseLink, etc and drop it into place, do the mapping exercise and then point it at their relational database with nothing lost in productivity.
In the .NET world,b we have an internal implementation that support LINK and will be made available in the near future to interested developers. We are also supporting other standards in the area of production management, having SNMP capabilities so we can be integrated into tools like OpenView and others where IT folks can get a unified view of all their production systems.

I think we as an engineering discipline should not forget our lessons learned in the early 2000’s. Some pretty smart people helped many realize that what is important is the life cycle of your application objects, some of which are persistent, and that what is important is providing the appropriate abstraction for things like transaction demarcation, caching activation, state tracking ( new, changed, deleted ) etc. These are all features common to any application and developers can easily abstract them away to be database implementation independent, just like we did in the ORM days. Its what we do as good software engineers, find the right abstractions and refine and reuse them over time. It is important that the NoSQL vendors embrace such an approach to ease the development burden of the practitioners that will use the technology.

—-
Jason Hunter, MarkLogic , Chief Architect:
On Cons1: When choosing a database, being future-proof is definitely something to consider. You never know where requirements will take you or what future technologies you’ll want to leverage. You don’t want your data locked into a proprietary format that’s going to paint you into a corner and reduce your options. That’s why MarkLogic chose XML (and now JSON also) as its internal data format. It’s an international standard. It’s plain-text, human readable, fully internationalized, widely deployed, and supported by thousands upon thousands of products. Customers choose MarkLogic for several reasons, but a key reason is that the underlying XML data format will still be understood and supported by vendors decades in the future. Furthermore, I think the first sentence above could be restated, “It’s very hard to move the data out from one SQL to some other system, even other SQL.” Ask anyone who’s tried!

On Cons2: People aren’t choosing NoSQL databases because they’re unhappy with the SQL language. They’re picking them because NoSQL databases provide a combination of feature, performance, cost, and flexibility advantages. Customers don’t pick MarkLogic to run away from SQL, they pick MarkLogic because they want the advantages of a document store, the power of integrated text search, the easy scaling and cost savings of a shared-nothing architecture, and the enterprise reliability of a mature product. Yes, there’s a use case for exporting data to Excel. That’s why MarkLogic has a SQL interface as well as REST and Java interfaces. The SQL interface isn’t the only interface, nor is it the most powerful (it limits MarkLogic down to the subset of functionality expressable in SQL) but it provides an integration path.

Resources
NoSQL Data Stores:
Blog Posts | Free Software | Articles, Papers, Presentations| Documentations, Tutorials, Lecture Notes | PhD and Master Thesis

Nov 12 12

Big Data Analytics– Interview with Duncan Ross

by Roberto V. Zicari

“The biggest technical challenge is actually the separation of the technology from the business use! Too often people are making the assumption that big data is synonymous with Hadoop, and any time that technology leads business things become difficult.” –Duncan Ross.

I asked Duncan Ross (Director Data Science, EMEA, Teradata), what is in his opinion the current status of Big Data Analytics industry.

RVZ

Q1. What is in your opinion the current status of Big Data Analytics Industry?

Duncan Ross: The industry is still in an immature state, dominated by a single technology whilst at the same time experiencing an explosion of different
technological solutions. Many of the technologies are far from robust or enterprise ready, often requiring significant technical skills to support the software even before analysis is attempted.
At the same time there is a clear shortage of analytical experience to take advantage of the new data. Nevertheless the potential value is becoming increasingly clear.

Q2. What are the main technical challenges for Big Data analytics?

Duncan Ross: The biggest technical challenge is actually the seperation of the technology from the business use! Too often people are making the assumption that big data is synonymous with Hadoop, and any time that technology leads business things become difficult. Part of this is the difficulty of use that comes with this. It’s reminiscent of the command line technologies of the 70s – it wasn’t until the GUI became popular that computing could take off.

Q3. Is BI really evolving into Consumer Intelligence? Could you give us some examples of existing use cases?

Duncan Ross: BI and big data analytics are far more than just Consumer Intelligence. Already more than 50% of IP traffic is non human, and M2M will become increasingly important. But out of the connected vehicle we’re already seeing behaviour based insurance pricing and condition based maintenance. Individual movement patterns are being used to detect the early onset of illness.
New measures of voice of the customer are allowing companies to reach out beyond their internal data to understand the motivations and influence of their customers. We’re also seeing the growth of data philanthropy, with these approaches being used to benefit charities and not-for-profits.

Q4. How do you handle large volume of data? When dealing with petabytes of data, how do you ensure scalability and performance?

Duncan Ross: Teradata has years of experience dealing with Petabyte scale data. The core of both our traditional platform and the Teradata Aster big data platform is a shared nothing MPP system with a track history of proven linear scalability. For low information density data we provide a direct link to HDFS and work with partners such as Hortonworks.

Q5. How do you analyze structured data; semi-structured data, and unstructured data at scale?

Duncan Ross: The Teradata Aster technology combines the functionality of MapReduce within the well understood framework of ANSI SQL, allowing complex programatic analysis to sit alongside more traditional data mining techniques. Many MapReduce functions have been simplified (from the users’ perspective) and can be called easily and directly – but more advanced users are free to write their own functions. By parallelizing the analysis within the database you get extremely high scalability and performance.

Q6. How do you reduce the I/O bandwidth requirements for big data analytics workloads?

Duncan Ross: Two methods: firstly by matching analytical approach to technology – set based analysis using traditional SQL based approaches, and programmatic and iterative analysis using MapReduce style approaches.
Secondly by matching data ‘temperature’ to different storage medium: hot data on SSD, cool data on fast disk drives, and cold data on cheap large (slow) drives.
The skill is to automatically rebalance without impacting users!

Q7. What is the tradeoff between Accuracy and Speed that you usually need to make with Big Data?

Duncan Ross: In the world of data mining this isn’t reeally a problem as our approaches are based around sampling anyway. A more important distinction is between speed of analysis and business demand. We’re entering a world where data requires us to work far more agiley than we have in the past.

Q8. Brewer’s CAP theorem says that for really big distributed database systems you can have only 2 out of 3 from Consistency (“atomic”), Availability and (network) Partition Tolerance. Do you have practical evidence of this? And if yes, how?

Duncan Ross: No. Although it may be true for an arbitarily big system, in most real world cases this isn’t too much of a problem.

Q9. Hadoop is a batch processing system. How do you handle Big Data Analytics in real time (if any)?

Duncan Ross: Well we aren’t using Hadoop, and as I commented earlier, equating Hadoop with Big Data is a dangerous assumption. Many analyses do not require
anything approaching real time, but as closeness to an event becomes more important then we can look to scoring within an EDW environment or even embedding code within an operational system.
To do this requires you to understand the eventual use of your analysis when starting out of course.
A great example of this approach is Ebay’s Hadoop-Singularity-EDW configuration.

Q10. What are the typical In-database support for analytics operations?

Duncan Ross: It’s clear that moving the analysis to the data is more beneficial than moving data to the analysis. Teradata has great experience in this area.
We have examples of fast fourier transforms, predictive modelling, and parameterised modelling all happening in highly parallel ways within the database. I once built and estimated 64 000 models in parallel for a project.

Q11. Cloud computing: What role does it play with Analytics? What are the main differences between Ground vs Cloud analytics?

Duncan Ross: The cloud is a useful approach for evaluating and testing new approaches, but has some significant drawbacks in terms of data security. Of course
there is a huge difference between public and private cloud solutions.
………………
Duncan Ross, Director Data Science, EMEA, Teradata.
Duncan has been a data miner since the mid 1990s. He was Director of Advanced Analytics at Teradata until 2010, leaving to become Data Director of Experian UK. He recently rejoined Teradata to lead their European Data Science team.

At Teradata he has been responsible for developing analytical solutions across a number of industries, including warranty and root cause analysis in manufacturing, and social network analysis in telecommunications.
These solutions have been developed directly with customers and have been deployed against some of the largest consumer bases in Europe.

In his spare time Duncan has been a city Councillor, chair of a national charity, founded an award winning farmers’ market, and is one of the founding Directors of the Society of Data Miners.

– Big Data: Smart Meters — Interview with Markus Gerdes. by Roberto V. Zicari on June 18, 2012

– Big Data for Good. by Roberto V. Zicari on June 4, 2012

– Analytics at eBay. An interview with Tom Fastner. by Roberto V. Zicari on October 6, 2011

BIg Data Resources
Big Data and Analytical Data Platforms: Blog Posts | Free Software | Articles| PhD and Master Thesis|

Oct 30 12

Two Cons against NoSQL. Part I.

by Roberto V. Zicari

Two cons against NoSQL data stores read like this:

1. It’s very hard to move data out from one NoSQL to some other system, even other NoSQL. There is a very hard lock in when it comes to NoSQL. If you ever have to move to another database, you have basically to re-implement a lot of your applications from scratch.

2. There is no standard way to access a NoSQL data store.
All tools that already exist for SQL has to be recreated to each of the NoSQL databases. This means that it will always be harder to access data in NoSQL than from SQL. For example, how many NoSQL databases can export their data to Excel? (Something every CEO wants to get sooner or later).

These are valid points. I wanted to start a discussion on this.
This post is the first part of a series of feedback I received from various experts, with obviously different point of views.

I plan to publish Part II, with more feedback later on.

You are welcome to contribute to the discussion by leaving a comment if you wish!

RVZ
————

1. It’s very hard to move the data out from one NoSQL to some other system, even other NoSQL.

Dwight Merriman ( founder 10gen, maker of MongoDB): I agree it is still early and I expect some convergence in data models over time. btw I am having conversations with other nosql product groups about standards but it is super early so nothing is imminent.
50% of the nosql products are JSON-based document-oriented databases.
So that is the greatest commonality. Use that and you have some good flexibility and JSON is standards-based and widely used in general which is nice. MongoDB, couchdb, riak for example use JSON. (Mongo internally stores “BSON“.)

So moving data across these would not be hard.

1. If you ever have to move to another database, you have basically to re-implement a lot of your applications from scratch.

Dwight Merriman: Yes. Once again I wouldn’t assume that to be the case forever, but it is for the present. Also I think there is a bit of an illusion of portability with relational. There are subtle differences in the SQL, medium differences in the features, and there are giant differences in the stored procedure languages.
I remember at DoubleClick long ago we migrated from SQL Server to Oracle and it was a HUGE project. (We liked SQL server we just wanted to run on a very very large server — i.e. we wanted vertical scaling at that time.)

Also: while porting might be work, given that almost all these products are open source, the potential “risks” of lock-in I think drops an order of magnitude — with open source the vendors can’t charge too much.

Ironically people are charged a lot to use Oracle, and yet in theory it has the portability properties that folks would want.

I would anticipate SQL-like interfaces for BI tool integration in all the products in the future. However that doesn’t mean that is the way one will write apps though. I don’t really think that even when present those are ideal for application development productivity.

1. For example, how many noSQL databases can export their data to excel? (Something every CEO wants to get sooner or later).

Dwight Merriman: So with MongoDB what I would do would be to use the mongoexport utility to dump to a CSV file and then load that into excel. That is done often by folks today. And when there is nested data that isn’t tabular in structure, you can use the new Aggregation Framework to “unwind” it to a more matrix-like format for Excel before exporting.

You’ll see more and more tooling for stuff like that over time. Jaspersoft and Pentaho have mongo integration today, but the more the better.

John Hugg (VoltDB Engineering): Regarding your first point about the issue with moving data out from one NoSQL to some other system, even other NoSQL.
There are a couple of angles to this. First, data movement itself is indeed much easier between systems that share a relational model.
Most SQL relational systems, including VoltDB, will import and export CSV files, usually without much issue. Sometimes you might need to tweak something minor, but it’s straightforward both to do and to understand.

Beyond just moving data, moving your application to another system is usually more challenging. As soon as you target a platform with horizontal scalability, an application developer must start thinking about partitioning and parallelism. This is true whether you’re moving from Oracle to Oracle RAC/Exadata, or whether you’re moving from MySQL to Cassandra. Different target systems make this easier or harder, from both development and operations perspectives, but the core idea is the same. Moving from a scalable system to another scalable system is usually much easier.

Where NoSQL goes a step further than scalability, is the relaxing of consistency and transactions in the database layer. While this simplifies the NoSQL system, it pushes complexity onto the application developer. A naive application port will be less successful, and a thoughtful one will take more time.
The amount of additional complexity largely depends on the application in question. Some apps are more suited to relaxed consistency than others. Other applications are nearly impossible to run without transactions. Most lie somewhere in the middle.

To the point about there being no standard way to access a NoSQL data store. While the tooling around some of the most popular NoSQL systems is improving, there’s no escaping that these are largely walled gardens.
The experience gained from using one NoSQL system is only loosely related to another. Furthermore, as you point out, non-traditional data models are often more difficult to export to the tabular data expected by many reporting and processing tools.

By embracing the SQL/Relational model, NewSQL systems like VoltDB can leverage a developer’s experience with legacy SQL systems, or other NewSQL systems.
All share a common query language and data model. Most can be queried at a console. Most have familiar import and export functionality.
The vocabulary of transactions, isolation levels, indexes, views and more are all shared understanding. That’s especially impressive given the diversity in underlying architecture and target use cases of the many available SQL/Relational systems.

Finally, SQL/Relational doesn’t preclude NoSQL-style development models. Postgres, Clustrix and VoltDB support MongoDB/CouchDB-style JSON Documents in columns. Functionality varies, but these systems can offer features not easily replicated on their NoSQL inspiration, such as JSON sub-key joins or multi-row/key transactions on JSON data

1. It’s very hard to move the data out from one NoSQL to some other system, even other NoSQL. There is a very hard lock in when it comes to NoSQL. If you ever have to move to another database, you have basically to re-implement a lot of your applications from scratch.

Steve Vinoski (Architect at Basho): Keep in mind that relational databases are around 40 years old while NoSQL is 3 years old. In terms of the technology adoption lifecycle, relational databases are well down toward the right end of the curve, appealing to even the most risk-averse consumer. NoSQL systems, on the other hand, are still riding the left side of the curve, appealing to innovators and the early majority who are willing to take technology risks in order to gain advantage over their competitors.

Different NoSQL systems make very different trade-offs, which means they’re not simply interchangeable. So you have to ask yourself: why are you really moving to another database? Perhaps you found that your chosen database was unreliable, or too hard to operate in production, or that your original estimates for read/write rates, query needs, or availability and scale were off such that your chosen database no longer adequately serves your application.
Many of these reasons revolve around not fully understanding your application in the first place, so no matter what you do there’s going to be some inconvenience involved in having to refactor it based on how it behaves (or misbehaves) in production, including possibly moving to a new database that better suits the application model and deployment environment.

2. There is no standard way to access a NoSQL data store.
All tools that already exists for SQL has to recreated to each of the NoSQL databases. This means that it will always be harder to access data in NoSQL than from SQL. For example, how many noSQL databases can export their data to Excel? (Something every CEO wants to get sooner or later).

Steve Vinoski: Don’t make the mistake of thinking that NoSQL is attempting to displace SQL entirely. If you want data for your Excel spreadsheet, or you want to keep using your existing SQL-oriented tools, you should probably just stay with your relational database. Such databases are very well understood, they’re quite reliable, and they’ll be helping us solve data problems for a long time to come. Many NoSQL users still use relational systems for the parts of their applications where it makes sense to do so.

NoSQL systems are ultimately about choice. Rather than forcing users to try to fit every data problem into the relational model, NoSQL systems provide other models that may fit the problem better. In my own career, for example, most of my data problems have fit the key-value model, and for that relational systems were overkill, both functionally and operationally. NoSQL systems also provide different tradeoffs in terms of consistency, latency, availability, and support for distributed systems that are extremely important for high-scale applications. The key is to really understand the problem your application is trying to solve, and then understand what different NoSQL systems can provide to help you achieve the solution you’re looking for.

Cindy Saracco (IBM Senior Solutions Architect) (these comments reflect my personal views and not necessarily those of my employer, IBM) :
Since NoSQL systems are newer to market than relational DBMSs and employ a wider range of data models and interfaces, it’s understandable that migrating data and applications from one NoSQL system to another — or from NoSQL to relational — will often involve considerable effort.
However, I’ve heard more customer interest around NoSQL interoperability than migration. By that, I mean many potential NoSQL users seem more focused on how to integrate that platform into the rest of their enterprise architecture so that applications and users can have access to the data they need regardless of the underlying database used.

2. There is no standard way to access a NoSQL data store.
All tools that already exists for SQL has to recreated to each of the NoSQL databases. This means that it will always be harder to access data in NoSQL than from SQL. For example, how many noSQL databases can export their data to excel? (Something every CEO wants to get sooner or later).

Cindy Saracco: From what I’ve seen, most organizations gravitate to NoSQL systems when they’ve concluded that relational DBMSs aren’t suitable for a particular application (or set of applications). So it’s probably best for those groups to evaluate what tools they need for their NoSQL data stores and determine what’s available commercially or via open source to fulfill their needs.
There’s no doubt that a wide range of compelling tools are available for relational DBMSs and, by comparison, fewer such tools are available for any given NoSQL system. If there’s sufficient market demand, more tools for NoSQL systems will become available over time, as software vendors are always looking for ways to increase their revenues.

As an aside, people sometimes equate Hadoop-based offerings with NoSQL.
We’re already seeing some “traditional” business intelligence tools (i.e., tools originally designed to support query, reporting, and analysis of relational data) support Hadoop, as well as newer Hadoop-centric analytical tools emerge.
There’s also a good deal of interest in connecting Hadoop to existing data warehouses and relational DBMSs, so various technologies are already available to help users in that regard . . . . IBM happens to be one vendor that’s invested quite a bit in different types of tools for its Hadoop-based offering (InfoSphere BigInsights), including a spreadsheet-style analytical tool for non-programmers that can export data in CSV format (among others), Web-based facilities for administration and monitoring, Eclipse-based application development tools, text analysis facilities, and more. Connectivity to relational DBMSs and data warehouses are also part IBM’s offerings. (Anyone who wants to learn more about BigInsights can explore links to articles, videos, and other technical information available through its public wiki. )

– On Eventual Consistency– An interview with Justin Sheehy. by Roberto V. Zicari, August 15, 2012

– Hadoop and NoSQL: Interview with J. Chris Anderson by Roberto V. Zicari

Oct 23 12

On Eventual Consistency– Interview with Monty Widenius.

by Roberto V. Zicari

“For analytical things, eventual consistency is ok (as long as you can know after you have run them if they were consistent or not). For real world involving money or resources it’s not necessarily the case.” — Michael “Monty” Widenius.

In a recent interview, I asked Justin Sheehy, Chief Technology Officer at Basho Technologies, maker of Riak, the following two questions, related to the subject of eventual consistency:

Q1. “How do you handle updates if you do not support ACID transactions? For which applications this is sufficient, and when this is not?”

Q2. “You said that Riak takes more of the “BASE” (Basically Available, Soft state, Eventual consistency) approach. Did you use the definition of eventual consistency by Werner Vogels? Reproduced here: “Eventual consistency: The storage system guarantees that if no new updates are made to the object, eventually (after the inconsistency window closes) all accesses will return the last updated value.”
You would not wish to have an “eventual consistency” update to your bank account. For which class of applications is eventual consistency a good system design choice? “

On the same subject, I did a follow up interview with Michael “Monty” Widenius, the main author of the original version of the open-source MySQL database, and currently working on a branch of the MySQL code base, called MariaDB.

RVZ

Q. Justin Sheehy`s reply to Q1: “Riak takes more of the “BASE” approach, which has become accepted over the past several years as a sensible tradeoff for high-availability data systems. By allowing consistency guarantees to be a bit flexible during failure conditions, a Riak cluster is able to provide much more extreme availability guarantees than a strictly ACID system.”
When do you think a “BASE” approach to consistency is justified?

“Monty” Widenius: The big questions are:
a) How are conflict’s solved? Who will win when there are conflicting updates on two nodes and the communication between the nodes are temporarily down?
b) Can a user at any point read data that is not consistent?
c) How long can the conflicting window be?

The answers to the above questions tells us how suitable the solution is for different applications. For analytical things, eventual consistency is ok (as long as you can know after you have run them if they were consistent or not). For real world involving money or resources it’s not necessarily the case.

Q. How do you handle consistency in MariaDB and at the same time ensuring scalability and availability? Aren’t you experiencing the limitations of the CAP Theorem?

“Monty” Widenius: We are using the traditional approaches with transactions or synchronous replication when you need guaranteed consistent answers.

We also provide asynchronous updates to slaves when you can tolerate a log for the data on the slaves. However, as we are only making things visible when the total transaction is run on either master/slave you have always things consistent.

So when it comes to CAP, it’s up the user to define where he wants to have his tradeoff; Speed, reliability or easy to manage.

Q. Justin Sheehy`s reply to Q2: “That definition of Eventual Consistency certainly does apply to Riak, yes. I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense. Traditional accounting is done in an eventually-consistent way and if you send me a payment from your bank to mine then that transaction will be resolved in an eventually consistent way. That is, your bank account and mine will not have a jointly-atomic change in value, but instead yours will have a debit and mine will have a credit, each of which will be applied to our respective accounts.”

“Monty” Widenius: The question is time spent between the consistency and where things will be inconsistent. For example, at no point in time should there be more money on my account than I have the right to use.
The reason why banks in the past have been using eventual consistency is that the computer systems on the banks simply has not kept up with the rest of the word.
In many places there is still human interaction needed to get money on the account! (especially for larger amounts).
Still, if you ask any bank, they would prefer to have things always consistent if they could!

Q. Justin says that “this question contains a very commonly held misconception. The use of eventual consistency in well-designed systems does not lead to inconsistency. Instead, such systems may allow brief (but shortly resolved) discrepancies at precisely the moments when the other alternative would be to simply fail”.

“Monty” Widenius: In some cases it’s better to fail. For example it’s common that ATM will not give out money when the line to the bank account is down. Giving out money is probably always the wrong choice. The other question is if things are 100 % guaranteed to be consistent down to the millisecond during normal operations.

Q. Justin says: “to rephrase your statement, you would not wish your bank to fail to accept a deposit due to an insistence on strict global consistency.”

“Monty” Widenius: Actually you would, if you can’t verify the identity of the user. Certainly the end user would not want to have the deposit be accepted if there is only a record in a single place of the deposit.

Q. Justin says: ”It is precisely the cases where you care about very high availability of a distributed system that eventual consistencymight be a worthwhile tradeoff.”
What is your take on this? Is Eventual Consistency a valid approach also for traditional banking applications?

“Monty” Widenius: That is what banks have traditionally used. However, if they would have a choice between eventual consistency and always consistent they would always choose the later if it would be possible within their resources.

———————
Michael “Monty” Widenius is the main author of the original version of the open-source MySQL database and a founding member of the MySQL AB company. Since 2009, Monty is working on a branch of the MySQL code base, called MariaDB.

Related Posts

On Eventual Consistency– An interview with Justin Sheehy. by Roberto V. Zicari, August 15, 2012

MariaDB: the new MySQL? Interview with Michael Monty Widenius. by Roberto V. Zicari on September 29, 2011

Oct 8 12

Scaling MySQL and MariaDB to TBs: Interview with Martín Farach-Colton.

by Roberto V. Zicari

“While I believe that one size fits most, claims that RDBMS can no longer keep up with modern workloads come in from all directions. When people talk about performance of databases on large systems, the root cause of their concerns is often the performance of the underlying B-tree index”— Martín Farach-Colton.

Scaling MySQL and MariaDB to TBs and still be performant. Possible? I did interview on this topic Martín Farach-Colton, Tokutek Co-founder & Chief Technology Officer.

RVZ

Q1. What is TokuDB?

Farach-Colton: TokuDB® is a storage engine for MySQL and MariaDB that uses Fractal Tree Indexes. It enables MySQL and MariaDB to scale from GBs to TBs and provides improved query performance, excellent compression, and online schema flexibility. TokuDB requires no changes to MySQL applications or code and is fully ACID and MVCC complaint.

While I believe that one size fits most, claims that RDBMS can no longer keep up with modern workloads come in from all directions. When people talk about performance of databases on large systems, the root cause of their concerns is often the performance of the underlying B-tree index. This makes sense, because almost every system out there that indexes big data does so with a B-tree. The B-tree was invented more than 40 years ago and has been fighting hardware trends ever since.
For example, although some people really do have OLAP workloads and others really do have OLTP workloads, I believe that most users are forced to choose between analytics and live updates by the performance profile of the B-tree, and that their actual use case requires the best of both worlds. The B-tree forces compromises, which the database world has internalized.

Fractal Tree Indexes® replace B-tree indexes and are based on my academic research on so-called Cache-Oblivious Analysis, which I’ve been working on with my Tokutek co-founders, Michael Bender and Bradley Kuszmaul. Fractal Tree Indexes speed up indexing — that is, they are write optimized — without giving up query performance. The best way to get fast queries is to define the right set of indexes, and Fractal Tree Indexes let you do that. Legacy technologies have staked out claims on some part of the B-tree query/indexing tradeoff curve. Fractal Tree Indexes put you on a higher-performing tradeoff curve. Query-optimal write-optimized indexing is all about making general-purpose databases faster. For some of our customers’ workloads, it’s as much as two orders of magnitude faster.

Q2. You claim to be able to “scale MySQL from GBs to TBs while improving insert and query speed, compression, replication performance, and online schema flexibility.” How do you do that?

Farach-Colton: In a Fractal Tree Index, all changes — insertions, deletions, updates, schema changes — are messages that get injected into the tree. Even though these messages get injected into the root and might get moved several times before getting to a leaf, all queries will see all relevant messages. Thus, injecting a message is fast, and queries reflect the effects of a message, e.g., changing a field or adding a column, immediately.

In order to make indexing fast, we make sure that each I/O to disk does a lot of work. I/Os are slow. For example, if accessing the fastest part of memory is like walking across a room, getting something from disk is like walking from New York to St Louis. The analogy in Germany would be to walk from Berlin to Rome — Germany itself isn’t big enough to hold a disk I/O on this scale!

To see how Fractal Tree Indexes work, first consider a B-tree. A B-tree delivers each item to its final destination as it arrives. It’s as if Amazon were to put a single book in a truck and drive it to your home. A Fractal Tree Index manages a set of buffers — imagine regional warehouses — so that whenever you move a truck, it is full of useful information.

The result is that insertion speed is dramatically higher. Query speeds are higher because you keep more indexes. And compression is higher because we can afford to read and write data in larger chunks. Compression on bigger chunks of data is typically more effective, and that rule of thumb has panned out for our customers.

Q3 Do you have any benchmark results to support your claims? If yes, could you please give us some details?

Farach-Colton: We have a benchmarking page. Our solutions page gives details of performance in the field from out customers.

Some of the highlights include up to 80x insertion performance, up to 25x compression, schema changes that take seconds instead of hours, and dramatic savings in overall disk I/O utilization.

Q4. You developed your own benchmark iiBench, why not use TPC benchmarks?

Farach-Colton: Our benchmarking page also shows performance on TPC and Sysbench workloads. iiBench is meant to expand the range of comparison for storage engines, not to replace existing benchmarks.

Benchmarks are developed to measure performance where vendors differ. iiBench measures how a database performs on a mixed insertion/query workload. B-trees are notoriously slow at updating indexes on the fly, so no one has bothered to produce a benchmark that measures performance on such workloads. We believe (and our customers confirm for us) that this Real-time OLAP (aka Analytical OLTP) workload is critical. Furthermore, iiBench seems to be filling a need in that there has been some third-party adoption, most notably by Mark Callaghan at Facebook.

Q5. How is iiBench defined? What kind of queries did you benchmark and with which data size?

At best, all I could do is summarize it here and risk omitting details important to some of your readers. Instead, let me refer you to our website where you can read it in detail here , and the Facebook modifications can be found here.

Q6. How do you handle MySQL slave lag (delays)?

Farach-Colton: The bottleneck that causes slave lag is the single-threaded nature of master-slave replication. InnoDB has single-threaded indexing that is much slower than its multi-threaded indexing. Thus, the master can support a multi-threaded workload that is substantially higher than that achievable on the single-threaded slave. (There are plans to improve this in upcoming releases of InnoDB.)

TokuDB has much higher insertion rates, and in particular, very high single-threaded insertion rates. Thus, the slave can keep up with a much more demanding workload. (Please see our benchmark page for details.)

Q7. You have developed a technique called “Fractal Tree indexes.” What is it? How does it compare with standard Tree indexes?

Farach-Colton: Much of this question was answered in question 2. Here, I’ll add that Fractal Tree Indexes are the first write-optimized data structure used as an index in databases. The Log-Structured Merge Tree (LSM) also achieves very high indexing rates, but at the cost of so-called read amplification, which means that indexing is much faster than for B-trees, but queries end up being much slower.

Fractal Tree Indexes are as fast as LSMs for indexing, and as fast as B-trees for queries. Thus, they dominate both.

A final note on SSDs: B-trees are I/O bound, and SSDs have much higher IOPS than rotational disks. Thus, it would seem that B-trees and SSDs are a marriage made in heaven. However, consider the fact that when a single row is modified in a B-tree leaf, the entire leaf must be re-written to disk. So, in addition to the problems of write amplification caused by wear leveling and garbage collection on SSDs, B-trees induce a much greater level of write amplification, because small changes induce large writes. Write-optimized indexes largely eliminate both types of write amplification, as well as potentially extending the life of SSDs.

Q8. What is special about your data compression technique?

Farach-Colton: Compression is like a freight train, slow to get started but effective once it’s had a chance to get up to speed. Thus, one of the biggest factors for how much compression is achieved is how much you compress at a time. Give a compression algorithm some data 4KB at a time, and the compression will be poor. Give it 1MB or more at a time, and the same compression algorithm will do a much better job. Because of the structure of our indexes, we can compress larger blocks at a time.

I should also point out that one of the problems that dogs databases is aging or fragmentation, in which the leaves of the B-tree index get scattered on disk, and range queries become much slower. This is because leaf-to-leaf seeks become slower, when the leaves are scattered. When leaves are clustered, they can be read quickly by exploiting the prefetching mechanisms of disks. The standard solution is to periodically rebuild the index, which can involve substantial down time. Fractal Tree indexes do not age, under any workload, for the same reason that they compress so well: by handling much larger blocks of data than a B-tree can, we are able to keep the data much more clustered, and thus it is never necessary to rebuild an index.

Q9. How is TokuDB different with respect to SchoonerSQL, NuoDB and VoltDB?

Farach-Colton: VoltDB is an in-memory database. The point is therefore to get the highest OLTP performance for data that is small enough to fit in RAM.
By comparison, TokuDB focuses on very large data sets — too big for RAM — for clients who are interested in moving away from the OLAP/OLTP dichotomy, that is, for clients who want to keep data up to date while still querying it through rich indexes.

SchoonerSQL has a combination of technologies involving optimization for SSDs as well as scale-out. TokuDB is a scale-up solution that improves performance on a per-node basis.

NuoDB is also a scale-out solution, in this case cloud-based, though I am less familiar with the details of their innovation at a more technical level.
——–

Martín Farach-Colton is a co-founder and chief technology officer at Tokutek. He was an early employee at Google, where he worked on improving performance of the Web Crawl and where he developed the prototype for the AdSense system.
An expert in algorithmic and information retrieval, Prof. Farach-Colton is also a Professor of Computer Science at Rutgers University.

– A super-set of MySQL for Big Data. Interview with John Busch, Schooner. by Roberto V. Zicari on February 20, 2012

– Re-thinking Relational Database Technology. Interview with Barry Morris, Founder & CEO NuoDB. by Roberto V. Zicari on December 14, 2011

– MariaDB: the new MySQL? Interview with Michael Monty Widenius. by Roberto V. Zicari on September 29, 2011

Sep 19 12

Hadoop and NoSQL: Interview with J. Chris Anderson

by Roberto V. Zicari

“The missing piece of the Hadoop puzzle is accounting for real time changes. Hadoop can give powerful analysis, but it is fundamentally a batch-oriented paradigm.” — J. Chris Anderson.

How is Hadoop related to NoSQL databases? What are the main performance bottlenecks of NoSQL data stores? On these topics I did interview, J. Chris Anderson co-founder of Couchbase.

RVZ

Q1. In order to analyze Big Data, the current state of the art is a parallel database or NoSQL data store, with a Hadoop connector.
What about performance issues arising with the transfer of large amounts of data between the two systems? Can the use of connectors introduce delays, data silos, increase TCO?

Chris Anderson : The missing piece of the Hadoop puzzle is accounting for real time changes. Hadoop can give powerful analysis, but it is fundamentally a batch-oriented paradigm. Couchbase is designed for real time applications (with all the different trade-offs that implies) yet also provides query-ability, so you can see inside the data as it changes.

We are seeing interesting applications where Couchbase is used to enhance the batch-based Hadoop analysis with real time information, giving the effect of a continuous process.
So hot data lives in Couchbase, in RAM (even replicas in RAM for HA fast-failover). You wouldn’t want to keep 3 copies of your Hadoop data in RAM, that’d be crazy.
But it makes sense for your working set.

And this solves the data transfer costs issue you mention, because you essentially move the data out of Couchbase into Hadoop when it cools off.
That’s much easier than maintaining parallel stores, because you only have to copy data from Couchbase to Hadoop as it passes out of the working set.

For folks working on problems like this, we have a Sqoop connector and we’ll be talking about it with Cloudera at our CouchConf in San Francisco on September 21.

Q2. Wouldn’t a united/integrated platform (data store + Hadoop) be a better solution instead?

Chris Anderson : It would be nice to have a unified query language and developer experience (not to mention goodies like automatically pulling data back
out of Hadoop into Couchbase when it comes back into the working set). I think everyone recognizes this.

We’ll get there, but in my opinion the primary interface will be via the real time store, and the Hadoop layer will become a commodity. That is why there is so much competition for the NoSQL brass ring right now.

Q3. Could you please explain in your opinion the tradeoff between scaling out and scaling up? What does it mean in practice for an application
domain?

Chris Anderson : Scaling up is easier from a software perspective. It’s essentially the Moore’s Law approach to scaling — buy a bigger box. Well, eventually you run out of bigger boxes to buy, and then you’ve run off the edge of a cliff. You’ve got to pray Moore keeps up.

Scaling out means being able to add independent nodes to a system. This is the real business case for NoSQL. Instead of being hostage to Moore’s Law, you can grow as fast as your data. Another advantage to adding independent nodes is you have more options when it comes to matching your workload. You have more flexibility when you are running on commodity hardware — you can run on SSDs or high-compute instances, in the cloud, or inside your firewall.

Q4. James Phillips a year ago said that “it is possible we will see standards begin to emerge, both in on-the-wire protocols and perhaps in query languages, allowing interoperability between NoSQL database technologies similar to the kind of interoperability we’ve seen with SQL and relational database technology.” What is your take now?

Chris Anderson : That hasn’t changed but the industry is still young and everyone is heads-down on things like reliability and operability. Once these products become more mature there will be time to think about standardization.

Q5. There is a scarcity of benchmarks to substantiate the many claims made of scalabilty of NoSQL vendors. NoSQL data stores do not qualify for the TPC-C benchmark, since they relax ACID transaction properties. How can you then measure and compare the performance of the various NoSQL data stores instead?

Chris Anderson : I agree. Vendors are making a lot of claims about latency, throughput and scalability without much proof. There are a few benchmarks starting
to trickle out from various third parties. Cisco and SolarFlare published one on Couchbase (See here ) and are putting other vendors through the same tests. I know there will be other third party benchmarks comparing Couchbase, MongoDB, and Cassandra that will be coming out soon. I think the Yahoo YCSB benchmarks will be another source of good comparisons.
There are bigger differences between vendors than people are aware of and we think many people will be surprised by the results.

Q6. What are in your opinion the main performance bottlenecks for NoSQL data stores?

Chris Anderson : The three classes of bottleneck correspond to the major areas of hardware: network, disk, and memory. Couchbase has historically been very
fast at the network layer – it’s based on Memcached which has had a ton of optimizations for interacting with network hardware and protocols.
We’re essentially as fast as one can get in the network layer, and I believe most NoSQL databases that use persistent socket connections are also free of significant network bottlenecks. So the network is only a bottleneck for REST or other stateless connection models.

Disk is always going to be the slowest component as far as the inherent latencies of non-volatile storage, but any high-performance database will paper over this by using some form of memory caching or memory-based storage. Couchbase has been designed specifically to decouple the disk from the rest of the system. In the extreme, we’ve seen customers survive prolonged disk outages while maintaining availability, as our memory layer keeps on trucking, even when disks become unresponsive. (This was during the big Amazon EBS outage that left a lot of high-profile sites down due to database issues.)

Memory may be the most interesting bottleneck, because it is the source of non-determinism in performance. So if you are choosing a database for performance reasons you’ll want to take a look at how it’s memory layer is architected.
Is it decoupled from the disk? Is it free of large locks that can pause unrelated queries as the engine modifies in-memory data structures? Over time does it continue to perform, or does the memory layout become fragmented? These are all problems we’ve been working on for a long time at Couchbase, and we are pretty happy with where we stand.

Q7. Couchbase is the result of the merger (more then one year ago) of CouchOne(document store) and Membase (key-value store).How has your product offering changed since the merge happened?

Chris Anderson : Our product offering hasn’t changed a bit since the merger. The current GA product, Couchbase Server 1.8.1, is essentially a continuation of the Membase Server 1.7 product line. It is a key value database using the
binary-memcached interface. It’s in use all around the web, like Orbitz, LinkedIn, AOL, Zynga, and lots of other companies.

With our upcoming 2.0 release, we are expanding from a key value database to a document database. This means adding query and index support. We’re even including an Elastic Search adapter and experimental geographic indices. In addition, 2.0 adds cross-datacenter replication support so you can provide high-performance access to the data at multiple points-of-presence.

Q8. How do you implement concurrency in Couchbase?

Chris Anderson : Each node is inherently independent, and there are no special nodes, proxies, or gatekeepers. The client drivers running inside your application server connect directly to the data-node for a particular item, which gives low-latency but also greater concurrency, a given application server will be talking to multiple backend database nodes at any given time.

For the memory layer, we are based on memcached, which has a long history of concurrency optimizations. We support the full memcached feature set, so operations like CAS write, INCR and DECR are available, which is great for concurrent workloads. We’ve also added some extensions for locking, which facilitates reading and updating an object-graph that is spread across multiple keys.

At the disk layer, for Couchbase Server 2.0, we are moving away from SQLite, toward our own highly concurrent storage format. We’re append-only, so once data is written to disk, there’s no chance of corruption. The other advantage of tail-append writes is that you can do all kinds of concurrent reads of the file, even while writes are happening. For instance a backup can be as easy as `cp` or `rsync` (although we provide tools to manage backing up an entire cluster).

Q9. Couchbase does not support SQL queries, how do you query your database instead? What are the lessons learned so far from your users?

Chris Anderson : Our incremental indexing system is designed to be native to our JSON storage format. So the user writes JavaScript code to inspect the document, and pick out which data to use as the index key. It’s essentially putting the developer directly in touch with the index data structure, so while it sounds primitive, there is a ton of flexibility to be had there. We’ve got a killer optimization for aggregation operations, so if you’ve ever been burned by a slow GROUP BY query, you might want to take a look.

Despite all the power, we know users are also looking for more traditional query approaches. We’re working on a few things in this area.
The first one you will see is our Elastic Search integration, which will simplify querying your Couchbase cluster. Elastic Search provides a JSON-style query API, and we’ve already seen many of our users integrate it with Couchbase, so we are building an official adapter to better support this use case.

Q10 How do you handle both structured and unstructured data at scale?

Chris Anderson : At scale, all data is messy. I’ve seen databases in the 4th normal form accumulate messy errors, so a schema isn’t always a protection. At scale, it’s all about discovering the schema, not imposing it.

If you fill a Couchbase cluster with enough tweets, wall posts, and Instagram photos, you’ll start to see that even though these APIs all have different JSON formats, it’s not hard to pick out some things in common. With our flexible indexing system, we see users normalize data after the fact, so they can query heterogeneous data, and have it show up as a result set that is easy for the application to work with.

This fits with the overall model of a document database: rather than try to “model the world” with a relational schema, your aim becomes to “capture the user’s intent” and make sense of it later. When your goal is to scale up to tens of millions of users (in maybe only a few days), the priority becomes to capture the data and provide a compelling high-performance user experience.

Q11 Couchbase is sponsor of the Couchbase Server NoSQL database open source project. How do you ensure participation of the open source developers community and avoid incompatible versions of the system? Are you able to with an open source project to produce a highly performant system?

Chris Anderson : All our code is available under the Apache 2.0 license, and I’d wager that we’ve never heard of the majority of our users, much less asked them
for money. When someone’s business depends on Couchbase, that’s when they come to us, so I’m comfortable with the open source business model. It has some distinct advantages over the Oracle style license model.

The engineering side of me admits that we haven’t always been the best at engendering participation in the Couchbase open source project. A few months ago, if you tried to compile your own copy of Couchbase, e.g. to provide a patch or bug-fix, you’d be on a wild goose chase for days before you got to the fun part. We’ve fixed that now, but it’s worth noting that open-source friction hurts in more ways than one, as smoothing the path for new external contributors also means new engineering hires can get productive on our tool-chain faster.

So I’ve taken a personal interest in our approach to external contributions. The first step is cleaning up the on-ramps, as we already have decent docs for contributing, we just need to make them more prominent. The goal is to have a world-class open-source contributor experience, and we don’t take it lightly.

We do have a history of *very open development* which I am proud of. Not only can you see all the patches that make it into the code base, you can see all the patches that don’t make it through code review, and the comments on them as they are happening. Check out here for an example of how to do open development right.

Q12 How do you compare NewSQL databases, claming to offer both ACID-transaction properties and scalability, with Couchbase?

Chris Anderson : The CAP theorem is well-known these days, so I don’t need to go into the
reasons why NewSQL is an uphill battle. I see NewSQL technologies as primarily aimed at developers who don’t want to learn the lessons about data that the first decade of the web taught us. So there will likely be a significant revenue stream based on applications that need to be scaled up, without being re-written.

But we’ve asked our users what they like about NoSQL, and one of the biggest answers, as important to most of them as scalability and performance, was that they felt more productive with a JSON document model, than with rigid schemas. Requirements change quickly and being able to change your application without working through a DBA is a real
advantage. So I think NewSQL is really for legacy applications, and the future is with NoSQL document databases.

Q13 No single data store is best for all uses. What are the applications that are best suited for Couchbase, and which ones are not?

Chris Anderson : I got into Couch in the first place, because I see document databases as the 80% solution. It might take a few years, but I fully expect the
schema-free document model to be ascendant over the relational model, for most applications, most of the time. Of course there will still be uses for relational databases, but the majority of applications being written these days could work just fine on a document database, and as developer preferences change, we’ll see more and more serious applications running on NoSQL.

So from that angle, you can see that I think Couchbase is a broad solution.
We’ve seen great success in everything from content management systems to ad-targeting platforms, as well as simpler things like providing a high-performance data store for CRUD operations, or more enterprise-focused use cases like offloading query volume from mainframe applications.

A geekier way to answer your question, is to talk about the use cases where you pretty much don’t have a choice, it’s Couchbase or bust. So those use cases are anything where you need consistent sub-millisecond access latency, for instance maybe you have a time-based service level agreement.

If you need consistent high-performance while you are scaling your database cluster, that’s when you need Couchbase.
So for instance social-gaming has been a great vertical for us, since the hallmark of a successful game is that it may go from obscurity to a household name in just a few weeks. Too many of those games crash and burn when they are running on previous-generation technology.
It’s always been possible to build a backend that can handle internet-scale applications, what’s new with Couchbase is the ability to scale from a relatively small backend, to a backend handling hundreds of thousands of operations per second, at the drop of a hat.

Again, the use-cases where it’s Couchbase or bust are a subset of what our users are doing, but they are a great way to illustrate our priorities. We have plenty of users who don’t have high-performance requirements, but they still enjoy the flexibility of a document database.

If you need transactions across multiple data items, with atomic rollback, then you should be using a relational database. If you need foreign-key constraints, you should be using a relational database.

However, before you make that decision, you may want to ask if the performance tradeoffs are worth it for your application. Often there is a way to redesign something to make it less dependent on schemas, and the business benefits from the increased scale and performance you can get from NoSQL may make it a worthwhile tradeoff.

_________________________

Chris Anderson is a co-founder of Couchbase, focussed on the overall developer experience. In the early days of the company he served as CFO and President, but never strayed too far from writing code. His background includes open-source contributions to the Apache Software Foundation as well as many other projects. Before he wrote database engine code, he cut his teeth building applications and spidering the web. These days he gets a kick out of Node.js, playing bass guitar, and enjoying family time.

======

– Next generation Hadoop — interview with John Schroeder. by Roberto V. Zicari on September 7, 2012

–Integrating Enterprise Search with Analytics. Interview with Jonathan Ellis. by Roberto V. Zicari on April 16, 2012

Resources

ODBMS.org resources on:
– Big Data and Analytical Data Platforms: Blog Posts | Free Software | Articles | PhD and Master Thesis|

– NoSQL Data Stores: Blog Posts | Free Software | Articles, Papers, Presentations| Documentations, Tutorials, Lecture Notes | PhD and Master Thesis

Sep 7 12

Next generation Hadoop — interview with John Schroeder.

by Roberto V. Zicari

“There are only a few Facebook-sized IT organizations that can have 60 Stanford PhDs on staff to run their Hadoop infrastructure. The others need it to be easier to develop Hadoop applications, deploy them and run them in a production environment.”— John Schroeder.

How easy is to use Hadoop? What are the next generation Hadoop distributions? On these topics, I did Interview John Schroeder, Cofounder and CEO of MapR.

RVZ

Q1. What is the value that Apache Hadoop provides as a Big Data analytics platform?

John Schroeder: Apache Hadoop is a software framework that supports data-intensive distributed applications. Apache Hadoop provides a new platform to analyze and process Big Data. With data growth exploding and new unstructured sources of data expanding a new approach is required to handle the volume, variety and velocity of data. Hadoop was inspired by Google’s MapReduce and Google File System (GFS) papers.

Q2. Is scalability the only benefits of Apache Hadoop then?

John Schroeder: No, you can build applications that aren’t feasible using traditional data warehouse platforms.
The combination of scale, ability to process unstructured data along with the availability of machine learning algorithms and recommendation engines creates the opportunity to build new game changing applications.

Q3. What are the typical requirements of advanced Hadoop users as well as those new to Hadoop?

John Schroeder Advanced users of Hadoop are looking to go beyond batch uses of Hadoop to support real-time streaming of content. Advanced users also need multi-tenancy to balance production and decision support workloads on their shared Hadoop cloud.
New users need Hadoop to become easier. There are only a few Facebook-sized IT organizations that can have 60 Stanford PhDs on staff to run their Hadoop infrastructure. The others need it to be easier to develop Hadoop applications, deploy them and run them in a production environment.

Q4. Why this? Please give us some practical examples of applications.

John Schroeder: Product recommendations, ad placements, customer churn, patient outcome predictions, fraud detection and sentiment analysis are just a few examples that improve with real time information.
Organizations are also looking to expand Hadoop use cases to include business critical, secure applications that easily integrate with file-based applications and products. Requirements for data protection include snapshots to provide point-in-time recovery and mirroring for business continuity.
With mainstream adoption comes the need for tools that don’t require specialized skills and programmers. New Hadoop developments must be simple for users to operate and to get data in and out. This includes direct access with standard protocols using existing tools and applications.

Q5. What are in your opinion the core limitations that limit the adoption of Hadoop in the enterprise? How do you contribute in taking Big Data into mainstream?

John Schroeder: MapR has and continues to knock down the barriers to Hadoop adoption. Hadoop needed five 9’s availability and the ability to run in a ‘lights-out” datacenter so we transformed Hadoop into a reliable compute and dependable storage platform. Hadoop use cases were too narrow so we expanded access to Hadoop data for industry standard file-based processing.
The MapR Control Center makes it easy to administer, monitor and provision Hadoop applications.
We improved Hadoop economics by dramatically improving performance. We just released our multi-tenancy features that were key to our recently announced Amazon and Google partnership announcements.
Next on our roadmap is to continue expanding the use cases and additional progress moving Hadoop from batch to real-time.

Q6. What are the benefits of having an automated stateful failover?

John Schroeder: Automated stateful failover provides high availability and continuous operations for organizations.
Even with multiple hardware or software outages and errors applications will continue running without any administrator actions required.

Q7. What is special about MapR architecture? Please give us some technical detail.

John Schroeder: MapR rethought and rebuilt the entire internal infrastructure while maintaining complete compatibility with Hadoop APIs. MapR opened up Hadoop to programming, products and data access by providing a POSIX storage layer, NFS access, ODBC/JDBC and REST. The MapR strategy is to expand use cases appropriate for Hadoop while avoiding proprietary features that would result in vendor lock-in. We rebuilt the underlying storage services layer and eliminated the “append only” limitation of the Hadoop Distributed File System (HDFS). MapR writes directly to block devices eliminating the inefficiencies caused by layering HDFS on top of a Linux local file system. In other Hadoop implementations, these continually rob the entire system of performance efficiencies.
The storage layer rearchitecture also enabled us to implement snapshot and mirroring capabilities. The MapR Distributed NameNode eliminates the well-known file scalability limitation allowing us to optimize the Hadoop shuffle algorithm.
MapR provides transparent compression on the client making it easy to reduce data transmission over the network or on disk.
Finally MapR eliminated the impact of periodic Java garbage collection.
This kind of increase is simply impossible with current implementations because it is limited by the architecture itself.

Q8. There are several commercial Hadoop distribution companies on the market (e.g. Cloudera, Datameer, Greenplum, Hortonworks, Platform Computing to name a few). What is special about MapR?

John Schroeder: Datameer is not a Hadoop distribution provider they provide an analytic application and are a partner of MapR. Platform Computing is also not a Hadoop distribution provider. MapR is the only company to provide an enterprise-grade Hadoop distribution.

Q9. What do you mean with an enterprise-grade Hadoop distribution? Isnt` for example Cloudera (to name one) also an enterprise Hadoop distribution?

John Schroeder: There is no other alternative in the market for full HA, business continuity, real-time streaming, standard file-based access through NFS, full database access through ODBC, and support for mission-critical SLAs.

Q10. How do you support this claim? Did you do a market analysis? What other systems did you look at?

John Schroeder: We performed a complete review of available Hadoop distributions. The recent selections of MapR by Amazon as an integrated offering into their Amazon Elastic MapReduce service and by Google for their Google Compute Engine are further validations of MapR’s differentiated Hadoop offering.
With MapR users can mount a cluster using the standard NFS protocol. Applications can write directly into the cluster with no special clients or applications. Users can use every file-based application that has been developed over the last 20 years, ranging from BI applications to file browsers and command-line utilities such as grep or sort directly on data in the cluster. This dramatically simplifies development. Existing applications and workflows can be used and just the specific steps requiring parallel processing need to be converted to take advantage of the MapReduce framework.

MapR also delivers ease of data management. With clusters expanding well into the petabyte range simplifying how data is managed is critical. MapR uniquely supports volumes to make it easy to apply policies across directories and file contents without managing individual files.
These policies include data protection, retention, snapshots, and access privileges.

Additionally, MapR delivers business critical reliability. This includes full HA, business continuity and data protection features. In addition to replication, MapR includes snapshots and mirroring. Snapshots provide for point-in-time recovery. Mirroring provides backup to an alternate cluster, data center or between on-premise and private Hadoop clouds. These features provide a level of protection that is necessary for business critical uses.

Q11. What are the next generations of Hadoop distributions?

John Schroeder: The first generation of Hadoop surrounded the open source Apache project with services and management utilities. MapR is the next generation of Hadoop that combines standards-based innovations developed by MapR with open source components resulting in the industry’s only differentiated distribution that meets the needs of both the largest and emerging Hadoop installations.
MapR’s extensive engineering efforts have resulted in the first software distribution for Hadoop that provides extreme high performance, unprecedented scale, business continuity and is easy to deploy and manage.

Q12. Do you have any results to show us to support these claims “high performance, unprecedented scale”?

John Schroeder: We have many examples of high performance and scale. Google recently unveiled the Google Compute Engine with MapR on stage at the Google IO conference. We demonstrated a 1256 node cluster perform a Terasort in 1 minute and 20 seconds. One of our customers, comScore presented a session at the Hadoop Summit and showed how they process 30B internet events a day using MapR. As for scale differences we have a customer with 18 billion files in a single MapR cluster. By comparison, the largest clusters of other distributions max out around 200 million files.

Q13. What functionalities still need to be added to Hadoop to serve new business critical and real-time applications?

John Schroeder: Other Hadoop distributions present customers with several challenges including:
• Getting data in and out of Hadoop. Other Hadoop distributions are limited by the append-only nature of the Hadoop Distributed File System (HDFS) that requires programs to batch load and unload data into a cluster.
• Deploying Hadoop into mission critical business projects. The lack of reliability of current Hadoop software platforms is a major impediment for expansion.
• Protecting data against application and user errors. Hadoop has no backup and restore capabilities. Users have to contend with data loss or resort to very expensive solutions that reside outside the actual Hadoop cluster.
According to industry research firm, ESG, half of the companies they surveyed plan to leverage commercial distributions of Hadoop as opposed to the open source version. This trend indicates organizations are moving from experimental and pilot projects to mainstream applications with mission-critical requirements that include high availability, better performance, data protection, security, and ease of use.

Q14. There is work to be done training developers in learning advanced statistics and software (such as Hadoop) to ensure adoption in the Enterprise. Do you agree with this? What is your role here?

John Schroeder: Simply put the limitations of the Hadoop Distributed File System require whole scale changes to existing applications and extensive development of new ones. MapR’s next generation storage services layer provides full random/read support and provides direct access with NFS. This dramatically simplifies development. Existing applications and workflows can be used and just the specific steps requiring parallel processing need to be converted to take advantage of the MapReduce framework.

Q15. Are customers willing to share their private data?

John Schroeder: In general customers are concerned with the protection and security of their data. That said, we see growing adoption of Hadoop in the cloud. Amazon has a significant web-services business around Hadoop and recently added MapR as part of their Elastic MapReduce offering. Google has also announced the Google Compute Engine and integration with MapR.

Q16. Data quality from different sources is a problem. How do you handle this?

John Schroeder: Data quality issues can be similar to those in a traditional data warehouse. Scrubbing can be built into the Hadoop applications using algorithms similar to those used during ELT.
ETL and ELT can both accomplish data scrubbing. The storage/compute resources and ability to combine unlike datasets provide significant advantages to Hadoop-based ELT.

There are different views with respect to this issue. IT personnel that are used to traditional data warehouses are typically concerned with data quality and ETL processes. The advantage of Hadoop is that you can have disparate data from many different sources and different data types in the same cluster. Some advanced users have pointed out that “quality” issues are actually valuable information that can provide insight into issues, anomalies and opportunities. With Hadoop users have the flexibility to process and analyze. Analytics are not dependent on having a pre-defined schema.

Q17. Moving older data online. Is this a business opportunity for you?

John Schroeder: The advantage of Hadoop is performing compute on data. It makes much more sense to perform analytics directly on large data stores so you send only results over the network instead of dragging the entire data set over the network for processing. For this use case to be viable requires a highly reliable cluster with full data protection and business continuity features.

Q18. Yes, but what about big data that is not digitalized yet? This is what I meant with moving older data online.

John Schroeder: Most organizations are looking for a solution to help them cope with fast growing digital sources of machine generated content such as log files, sensor data, etc. Images, video and audio are also a fast growing data source that can provide valuation analytics.

_______________________
John Schroeder, Cofounder and CEO, MapR.

John has led companies creating innovative and disruptive business intelligence, database management, storage and virtualization technologies at early stage ventures through success as large public companies. John founded MapR to produce the next generation Hadoop distribution to expand the use cases beyond batch Hadoop to include real-time, business critical, secure applications that easily integrate with file-based applications and products.

– On Big Graph Data. on August 6, 2012

– Managing Big Data. An interview with David Gorbet on July 2, 2012

– Interview with Mike Stonebraker. on May 2, 2012

– Integrating Enterprise Search with Analytics. Interview with Jonathan Ellis. on April 16, 2012

– A super-set of MySQL for Big Data. Interview with John Busch, Schooner. on February 20, 2012

– On Big Data Analytics: Interview with Florian Waas, EMC/Greenplum. on February 1, 2012

Resources

ODBMS.org Free Downloads and Links
In this section you can download free resources on Big Data and Analytical Data Platforms (Blog Posts | Free Software| Articles| PhD and Master Thesis)

Aug 27 12

On Impedance Mismatch. Interview with Reinhold Thurner

by Roberto V. Zicari

“Many enterprises sidestep applications with “Shadow IT” to solve planning, reporting and analysis problems” — Reinhold Thurner.

I am coming back to the topic of “Impedance Mismatch”.
I have interviewed one of our experts, Dr. Reinhold Thurner founder of Metasafe GmbH in Switzerland.

RVZ

Q1. In a recent interview José A. Blakeley and Rowan Miller of Microsoft, said that “the impedance mismatch problem has been significantly reduced, but not entirely eliminated”? Do you agree?

Thurner: Yes I agree, with some reservations and only for the special case for the impedance mismatch between a conceptual model, a relational database and an oo-program. However even an advanced ORM is not really a solution for the more general case of complex data which affects any (also non oo) programmer and especially also an end user.

Q2. Could you please explain better what you mean here?

Thurner: My reservations concern the tools and the development process: Several standalone tools (model-designer, mapper, code generator, schema-loader) are connected by intermediate files. Is is difficult if not impossible to develop a transparent model transformation which relieves the developer from the necessity to “think” on both levels – the original model and the transformed model – at the same time. The conceptual models can be “painted” easily, but they cannot be “executed” and tested with test data.
They are practically disjoint from the instance data. It takes a lot of discipline to avoid that changes in the data structures are directly applied to the final database with the consequence that the conceptual model is lost.
I rephrase from a document about ADO.net: “Most significant applications involve a conceptual design phase early in the application development lifecycle. Unfortunately, however, the conceptual data model is captured inside a database design tool that has little or no connection with the code and the relational schema used to implement the application. The database design diagrams created in the early phases of the application life cycle usually stay “pinned to a wall” growing increasingly disjoint from the reality of the application implementation with time.”

Q3. You are criticizing the process and the tools – what is the alternative?

Thurner: I compare this tool-architecture with the idea of an “integrated view of conceptual modeling, databases and CASE” (actually the title of one of your books). The basic ideas did exist already in the early 90es but were not realized because the means to implement a “CASE database” were missing: modeling concepts (OMG), languages (java, c#), frameworks (Eclipse), big cheap memory, powerful cpus, big screens etc. Today we are in a much better position and it is now feasible to create a data platform (i.e. a database for CASE) for tool integration. As José A. Blakeley argues, ‘(…) modern applications and data services need to target a higher-level conceptual model based on entities and relationships (…)’. A modern data platform is a prerequisite to supports such a concept?

Q4. Could you give us some examples of typical (impedance mismatch) problems still existing in the enterprise? How are they practically handled in the enterprise?

Thurner: As a consequence of the problems with the impedance mismatch some applications don’t use database technology at all or develop a thick layer of proprietary services which are in fact a sort of private DBMS.
Many enterprises sidestep applications with “Shadow
IT” to solve planning, reporting and analysis problems– i.e. Spreadsheets instead of databases, mail for multi-user access and data exchange, security by obscurity and a lot of manual copy and paste.
Another important area is development tools: Development tools deal with a large number of highly interconnected artifacts which must be managed in configurations and versions. These artifacts are still stored in files, libraries and some in relational databases with a thick layer on top. A proper repository would provide better services for a tool developer and helps to create products which are more flexible and easier to use.
Data management and information dictionary: IT-Governance (COBIT) stipulates that a company should maintain an “information dictionary” which contains all “information assets”, their logical structure, the physical implementations and the responsibilities (RACI-Charts, data steward). The common warehouse model (OMG) describes the model of the most common types of data stores – which is a good start: but companies with several DMBSs, hundreds of databases, servers and programs accessing thousands of tables and IMS-segments need a proper database to store the instances to make the “information model real”. Users of such a dictionary (designers, programmers, testers, integration services, operations, problem management etc.) need an easy to use query-language to access these data in an explorative manner.

Q5. If ORM technology cannot solve this kind of problem? What are the alternatives?

Thurner: The essence of ORM-technology is to create a bridge between the “existing eco-system of databases based on the relational model and the conceptual model”. The “relational model” is not the one and only possible approach to persist data. Data storage technology has moved up the ladder from sequential files to index-sequential, to multi-index, to codasyl, to hierarchical (IMS) and today’s market leader RDBMS. This is certainly not the end and the future seems to become very colorful. As Michael Stonebraker explains “In summary, there may be a substantial number of domain-specific database engines with differing capabilities off into the future. See his paper “One Size fits all – an Idea whose time has come and gone“.
ADO.net has been described as “a part of the broader Microsoft Data Access vision” and covers a specific subset of applications. Is the “other part” – the “executable conceptual model” which was mentioned by Peter Chen in a discussion with José Blakely about “The future of Database Systems”?
I am convinced that an executable conceptual model will play an important role for the aforementioned problems: A DMBS with an entity-relationship model implements the conceptual model without an impedance mismatch. To succeed it needs however all the properties José mentioned like queries, transactions, access-rights and integrated tools.

Q6. You started a company which developed a system called Metasafe-Repository. What is it?

Thurner: It started long ago with a first version developed in C, which was e.g. used in a reengineering project to manage a few hundred types, one million instances and about five million bidirectional relationships. In 2006 we decided to redevelop the system from scratch in java and the necessary tools with the Eclipse framework. We started with the basic elements – multi-level architecture based on the entity-relationship-model, integration of models and instance-data, ACID transactions, versioning and user access rights. During development the system morphed from the initial idea of a repository-service to a complete DBMS. We developed a set of entirely model driven tools – modeler, browser, import-/export utilities, Excel-interface, ADO-Driver for BIRT etc.
Metasafe has a multilevel structure: an internal metamodel, the global data model, external models as subsets (views) of the global data model and the instance data – in OMG-Terms it stretches from M3 to M0. All types (M2, M1) are described by meta-attributes as defined in the metamodel. User access rights to models and instance data are tied to the external models. Entity instances (M0) may exist in several manifestations (Catalog, Version, Variant). An extension of the data model e.g. by additional data types, entity types, relationship types or submodels can be defined using the metaModeler tool (or via the API by a program). From the moment the model changes are committed, the database is ready to accept instance data for the new types without unload/reload of the database.

Q7. Is the Metasafe repository the solution to the impedance mismatch problem?

Thurner: It is certainly a substantial step forward because we made the conceptual model and the database definition one and the same. We take the conceptual model literally by its word: If an ‘Order has Orderdetails’, we tell the database to create two entity types ‘Order’ and ‘Orderdetails’ and the relation ‘has’ between them. This way Metasafe implements an executable conceptual model with all the required properties of a real database management system: an open API, versioning, “declarative, model related queries and transactions” etc. Our own set of tools and especially the integration of BIRT (the eclipse Business Intelligence and Reporting Tool) demonstrate how it can be done. Our graphical erSQL query builder is even integrated into the BIRT designer. The erSQL queries are translated on the fly and BIRT accesses our database without any intermediate files.

Q8: What is the API of the Metasafe repository?

Thurner: Metasafe provides an object-oriented Java-API for all methods required to search, create, read, update, delete the elements of the database – i.e. schemas, user groups /users, entities, relationships and their attributes – both on type- and on instance-level. All the tools of Metasafe (modeler, browser, import/export, query builder etc.) are built with this public API. This approach has led to an elaborate set of methods to support an application programmer. The erSQL query-builder and also the query-translator and processor were implemented with this API. An erSQL query can be embedded in a java-program to retrieve a result-set (including its metadata) or to export the result-set.
In early versions we had a C#-version in parallel but we discontinued this branch when we started with the development of the tools based on Eclipse RCP. The reimplementation of the core in C# would be relatively easy. I think that also the tools could be reimplemented because they are entirely model-driven.

Q9 How does Metasafeˈs query language differ from the Microsoft Entity Framework built-in query capabilities (i.e. Language Integrated Query (LINQ)?

Thurner: It is difficult to compare because Metasafe’s ersql query-language was designed with respect to the special nature of an Entity-Relationship model with heavily cross linked information. So the erSQL query language maps directly to the conceptual model. Also “end users” can create queries with the graphical query builder with point and click on the graphical representation of the conceptual model to identify the path through the model and to collect the information of interest.

The queries are translated on the fly and processed by the query processor. The validation and translation of a query into a command structure of the query processor is a matter or milliseconds. The query processor returns result sets of metadata and typed instance data. The query result can also be exported as Excel-Table or as XML-file. In “read-mode” the result of each retrieval step (instance objects and their attributes) is returned to the invoking program instead of building the complete result set. A query represents a sort of “user” model and is also documented graphically. “End users” can easily create queries and retrieve data from the database. erSQL and the graphical query builder is fully integrated in BIRT to create reports on the fly.
The present version supports only information retrieval. We plan to extend it by a ” … for update” feature which locks all selected entity instances for further operation.
E.g. an update query for {an order and all its order items and products} would lock “the order” until backout or commit.

Q10. There are concerns about the performance and the overhead generated by ORM technology. Is performance an issue for Metasafe?

Thurner: Performance is always an issue when the number of concurrent users and the size and complexity of the data grow. The system works quite well for medium size systems with a few hundred types, a few million instances and a few GBs. The performance depends on the translation of the logical requests into physical access commands and on the execution of the physical access to the persistence. Metasafe uses a very limited functionality of an RDBMS (currently SQLServer, Derby, Oracle) for persistence. Locking, transactions, multi-user management is handled by Metasafe; the locking tables are kept in memory. After a commit it writes all changes in one burst to the database. We could of course use an in-memory DBMS to gain performance. E.g. VoltDB with the direct transaction access could be integrated easily and would certainly lead to superior performance.
We have also another kind of performance in mind – the user performance. For many applications the number of milliseconds to execute a transaction are less important than the ability to quickly create or change a database and to create and launch queries in a matter of minutes. Metasafe is especially helpful for this kind of application.

Q11. What problems is Metasafe designed to solve?

Thurner: Metasafe is designed as a generic data platform for medium sized (XX GB) model-driven applications. The primary purpose is the support for applications with large, complex and volatile data structures as tools, models, catalogs or process managers etc. Metasafe could be used to replace some legacy repositories.
Metasafe is certainly the best data platform (repository) for the construction of an integrated development environment. Metasafe can also serve as DBMS for business applications.
We evaluate also the possibilities to use that Metasafe DBMS as data platform for portable devices as phones and tablet computers: This could be a real killer application for application developers.

Q12. How do you position Metasafe in the market?

Thurner: I had the vision of an entity relationship base database system as future data platform and decided to develop Metasafe to a really useful level without the pressure of the market (namely the first time users). Now we have our product on the necessary level of quality and we are planning the next steps. It could be the “open source approach” for a limited version or the integration into a larger organization.
We have a number of applications and POCs but we have no substantial customer base yet, which would require an adequate support and sales organization. But we have not the intension to convert a successful development setup into a mediocre service and sales organization. We are not under time pressure and are looking at a number of possibilities.

Q13. How can the developers community test your system?

Thurner: We provide an evaluation version upon request.

—————————-
Related Posts

– Do we still have an impedance mismatch problem? – Interview with José A. Blakeley and Rowan Miller. by Roberto V. Zicari on May 21, 2012

Resources

– “Implementing the Executable Conceptual Model (ECM)” (download as .pdf),
by Dr. Reinhold Thurner, Metasafe.

ODBMS.org Free Resources on:
– Entity Framework (EF) Resources
– ORM Technology
– Object-Relational Impedance Mismatch

Aug 15 12

On Eventual Consistency– An interview with Justin Sheehy.

by Roberto V. Zicari

“I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense” –Justin Sheehy.

On the subject of new data models and eventual consistency I did interview Justin Sheehy Chief Technology Officer, Basho Technologies.

RVZ

Q1. What are in your opinion the main differences and similarities of a key-value store (ala Dynamo), a document store (ala MongoDB), and an “extensible record” store (ala Big Table) when using them in practice?

Justin Sheehy: Describing the kv-store, doc store, and column family data models in general is not the same as describing specific systems like Dynamo, MongoDB, and BigTable. I’ll do the former here as I am guessing that is the intention of the question. Since the following couple of questions ask for differences, I’ll emphasize the similarity here.

All three of these data models have two major things in common: values stored in them are not rigidly structured, and are organized mainly by primary key. The details beyond those similarities, and how given systems expose those details, certainly vary. But the flexibility of semi-structured data and the efficiency of primary-key access generally apply to most such systems.

Q2. When is a key-value store particularly well suited and when is a a document store instead preferable? For which kind of applications and for what kind of data management requirements?

Justin Sheehy: The interesting issue with this question is that “document store” is not well-established as having a specific meaning. Certainly it seems to apply to both MongoDB and CouchDB, but those two systems have very different data access semantics. The closest definition I can come up with quickly that covers the prominent systems known as doc stores might be something like “a key-value store which also has queryability of some kind based on the content of the values.”

If we accept that definition then you can happily use a document store anywhere that a key-value store would work, but would find it most worthwhile when your querying needs are richer than simply primary key direct access.

Q3. What is Riak? A key-value store or a document store? What are the main features of the current version of Riak?

Justin Sheehy: Riak began as being called a key-value store before the current popularity of the term “document store” term, but it is certainly a document store by any reasonable definition that I know — such as the one I gave above. In addition to access by primary key, values in Riak can be queried by secondary key, range query, link walking, full text search, or map/reduce.

Riak has many features, but the core reasons that people come to Riak over other systems are Availability, Scalability, and Predictability. For people whose business demands extremely high availability, easy and linear scalability, or predictable performance over time, Riak is worth a look.

Q4. How do you achieve horizontal scalability? Do you use a “shared nothing” horizontal scaling – replicating and partitioning data over many servers?
What performance metrics do you have for that?

Justin Sheehy: We use a number of techniques to achieve horizontal scalability. Among them is consistent hashing, an approach invented at Akamai and successfully used by many distributed systems since then. This allows for constant time routing to replicas of data based on the hash of the data’s primary key.
Data is partitioned to servers in the cluster based on consistent hashing, and replicated to a configurable number of of those servers. By partitioning the data to many “virtual nodes” per host, growth is relatively easy as new hosts simply (and automatically) take over some of the virtual nodes that has previously owned by existing cluster hosts.
Yes, in terms of data location Riak is a “shared nothing” system.
One (of many) demonstrations of this scalability was performed by Joyent here.
That benchmark is approximately 2 years old, so various specific numbers are quite outdated, but the important lesson in it remains and is summed up by this graph late in this post.
It shows that as servers were added, the throughput (as well as the capacity) of the overall system increased linearly.

Q5. How do you handle updates if you do not support ACID transactions? For which applications this is sufficient, and when this is not?

Justin Sheehy: Riak takes more of the “BASE” approach, which has become accepted over the past several years as a sensible tradeoff for high-availability data systems. By allowing consistency guarantees to be a bit flexible during failure conditions, a Riak cluster is able to provide much more extreme availability guarantees than a strictly ACID system.

Q6. You said that Riak takes more of the “BASE” approach. Did you use the definition of eventual consistency by Werner Vogels?
Reproduced here: “Eventual consistency: The storage system guarantees that if no new updates are made to the object, eventually (after the inconsistency window closes) all accesses will return the last updated value”. You would not wish to have an “eventual consistency” update to your bank account. For which class of applications is eventual consistency a good system design choice?

Justin Sheehy: That definition of Eventual Consistency certainly does apply to Riak, yes.

I would most certainly include updates to my bank account as applications for which eventual consistency is a good design choice. In fact, bankers have understood and used eventual consistency for far longer than there have been computers in the modern sense. Traditional accounting is done in an eventually-consistent way and if you send me a payment from your bank to mine then that transaction will be resolved in an eventually consistent way. That is, your bank account and mine will not have a jointly-atomic change in value, but instead yours will have a debit and mine will have a credit, each of which will be applied to our respective accounts.

This question contains a very commonly held misconception. The use of eventual consistency in well-designed systems does not lead to inconsistency. Instead, such systems may allow brief (but shortly resolved) discrepancies at precisely the moments when the other alternative would be to simply fail.

To rephrase your statement, you would not wish your bank to fail to accept a deposit due to an insistence on strict global consistency.

It is precisely the cases where you care about very high availability of a distributed system that eventual consistency might be a worthwhile tradeoff.

Q7. Why is Riak written in Erlang? What are the implications for the application developers of this choice?

Justin Sheehy: Erlang’s original design goals included making it easy to build systems with soft real-time guarantees and very robust fault-tolerance properties. That is perfectly aligned with our central goals with Riak, and so Erlang was a natural fit for us. Over the past few years, that choice has proven many times to have been a great choice with a huge payoff for Riak’s developers. Application developers using Riak are not required to care about this choice any more than they need to care what language PostgreSQL is written in. The implications for those developers are simply that the database they are using has very predictable performance and excellent resilience.

Q8. Riak is open source. How do you engage the open source community and how do you make sure that no inconsistent versions are generated?

Justin Sheehy: We engage the open source community everywhere that it exists. We do our development in the open on github, and have lively conversations with a wider community via email lists, IRC, Twitter, many in-person venues, and more.
Mark Phillips and others at Basho are dedicated full-time to ensuring that we continue to engage honestly and openly with developer communities, but all of us consider it an essential part of what we do. We do not try to prevent forks. Instead, we are part of the community in such a way that people generally want to contribute their changes back to the central repository. The only barrier we have to merging such code is about maintaining a standard of quality.

Q9. How do you optimize access to non-key attributes?

Justin Sheehy: Riak stores index content in addition to values, encoded by type and in sorted order on disk. A query by index certainly is more expensive than simply accessing a single value directly by key, as the indices are distributed around the cluster — but this also means that the size of the index is not constrained by a single host.

Q10. How do you optimize access to non-key attributes if you do not support indexes in Riak?

Justin Sheehy: We do support indexes in Riak.

Q11 How does Riak compare with a new generation of scalable relational systems (NewSQL)?

Justin Sheehy: The “NewSQL” term is, much like “NoSQL”, a marketing term that doesn’t usefully define a technical category. The primary argument made by NewSQL proponents is that some NoSQL systems have made unnecessary tradeoffs. I personally consider these NewSQL systems to be a part of the greater movement generally dubbed NoSQL despite the seemingly contradictory names, as the core of that movement has nothing to do with SQL — it is about escaping the architectural monoculture that has gripped the commercial database market for the past few decades. In terms of technical comparison, some systems placing themselves under the NewSQL banner are excellent at scalability and performance, but I know of none whose availability and predictability can rival Riak.

Q12 Pls give some examples of use cases where Riak is currently in use. Is Riak in use for analyzing Big Data as well?

Justin Sheehy: A few examples of companies relying on Riak in their business can be found here.
While Riak is primarily about highly-available systems with predictable low-latency performance, it does have analytical capabilities as well and many users make use of map/reduce and other such programming models in Riak. By most definitions of “Big Data”, many of Riak’s users certainly fall into that category.

Q Anything you wish to add?

Justin Sheehy: Thank you for your interest. We’re not done making Riak great!

_____________

Justin Sheehy
Chief Technology Officer, Basho Technologies

As Chief Technology Officer, Justin Sheehy directs Basho’s technical strategy, roadmap, and new research into storage and distributed systems.
Justin came to Basho from the MITRE Corporation, where as a principal scientist he managed large research projects for the U.S. Intelligence Community including such efforts as high assurance platforms, automated defensive cyber response, and cryptographic protocol analysis.
He was central to MITRE’s development of research for mission assurance against sophisticated threats, the flagship program of which successfully proposed and created methods for building resilient networks of web services.
Before working for MITRE, Justin worked at a series of technology companies including five years at Akamai Technologies, where he was a senior architect for systems infrastructure giving Justin a broad as well as deep background in distributed systems.
Justin was a key contributor to the technology that enabled fast growth of Akamai’s networks and services while allowing support costs to stay low. Justin performed both undergraduate and postgraduate studies in Computer Science at Northeastern University.

______________________
Related Posts

– On Data Management: Interview with Kristof Kloeckner, GM IBM Rational Software.

– On Big Data: Interview with Dr. Werner Vogels, CTO and VP of Amazon.com

Resources
ODBMS.org — Free Downloads and Links
In this section you can download resources covering the following topics:
Big Data and Analytical Data Platforms
Cloud Data Stores
Object Databases
NewSQL,
NoSQL Data Stores
Graphs and Data Stores
Object-Oriented Programming
Entity Framework (EF) Resources
ORM Technology
Object-Relational Impedance Mismatch
XML, RDF Data Stores,
RDBMS

Aug 6 12

On Big Graph Data.

by Roberto V. Zicari

” The ultimate goal is to ensure that the graph community is not hindered by vendor lock-in” –Marko A. Rodriguez.
“There are three components to scaling OLTP graph databases: effective edge compression, efficient vertex centric query support, and intelligent graph partitioning” — Matthias Broecheler.

Titan is a new distributed graph database available in alpha release. It is an open source Apache project maintained and funded by Aurelius. To learn more about it, I have interviewed Dr. Marko A. Rodriguez and Dr. Matthias Broecheler cofounders of Aurelius.

RVZ

Q1. What is Titan?

MATTHIAS: Titan is a highly scalable OLTP graph database system optimized for thousands of users concurrently accessing and updating one huge graph.

Q2. Who needs to handle graph-data and why?

MARKO: Much of today’s data is composed of a heterogeneous set of “things” (vertices) connected by a heterogeneous set of relationships (edges) — people, events, items, etc. related by knowing, attending, purchasing, etc. The property graph model leveraged by Titan espouses this world view. This world view is not new as the object-oriented community has a similar perspective on data.
However, graph-centric data aligns well with the numerous algorithms and statistical techniques developed in both the network science and graph theory communities.

Q3. What are the main technical challenges when storing and processing graphs?

MATTHIAS: At the interface level, Titan strives to strike a balance between simplicity, so that developers can think in terms of graphs and traversals without having to worry about the persistence and efficiency details. This is achieved by both using the Blueprint’s API and by extending it with methods that allow developers to give Titan “hints” about the graph data. Titan can then exploit these “hints” to ensure performance at scale.

Q4. Graphs are hard to scale. What are the key ideas that make it so that Titan scales? Do you have any performance metrics available?

MATTHIAS: There are three components to scaling OLTP graph databases: effective edge compression, efficient vertex centric query support, and intelligent graph partitioning.
Edge compression in Titan comprises various techniques for keeping the memory footprint of each edge as small as possible and storing all edge information in one consecutive block of memory for fast retrieval.
Vertex centric queries allow users to query for a specific set of edges by leveraging vertex centric indices and a query optimizer.
Graph data partitioning refers to distributing the graph across multiple machines such that frequently co accessed data is co-located. Graph partitioning is a (NP-) hard problem and this is an aspect of Titan where we will see most improvement in future releases.
The current alpha release focuses on balanced partitioning and multi-threaded parallel traversals for scale.

MARKO: To your question about performance metrics, Matthias and his colleague Dan LaRocque are currently working on a benchmark that will demonstrate Titan’s performance when tens of thousands of transactions are concurrently interacting with Titan. We plan to release this benchmark via the Aurelius blog.
[Edit: The benchmark is now available here. ]

Q5. What is the relationships of Titan with other open source projects you were previously involved with, such as TinkerPop? Is Titan open source?

MARKO: Titan is a free, open source Apache2 project maintained and funded by Aurelius . Aurelius (our graph consulting firm) developed Titan in order to meet the scalability requirements of a number of our clients.
In fact, Pearson is a primary supporter and early adopter of Titan. TinkerPop, on the other hand, is not directly funded by any company and as such, is an open source group developing graph-based tools that any graph database vendor can leverage.
With that said, Titan natively implements the Blueprint 2 API and is able to leverage the TinkerPop suite of technologies: Pipes, Gremlin, Frames, and Rexster.
We believe this demonstrates the power of the TinkerPop stack — if you are developing a graph persistence store, implement Blueprints and your store automatically gets a traversal language, an OGM (object-to-graph mapper) framework, and a RESTful server.

Q6. How is Titan addressing the problem of analyzing Big Data at scale?

MATTHIAS: Titan is an OLTP database that is optimized for many concurrent users running short transactions, e.g. graph updates or short traversals against one huge graph. Titan significantly simplifies the development of scalable graph applications such as Facebook, Twitter, and the like.
Interestingly enough, most of these large companies have built their own internal graph databases.
We hope Titan will allow organizations to not reinvent the wheel. In this way, companies can focus on the value their data adds, not on the “plumbing” needed to process that data.

MARKO: In order to support the type of global OLAP operations typified by the Big Data community, Aurelius will be providing a suite of technologies that will allow developers to make use of global graph algorithms. Faunus is a Hadoop-connector that implements a multi-relational path algebra developed by myself and Joshua Shinavier. This algebra allows users to derive smaller, “semantically rich” graphs that can then be effectively computed on within the memory confines of a single machine. Fulgora will be the in-memory processing engine. Currently, as Matthias has shown in prototype, Fulgora can store ~90 billion edges on a 64-Gig RAM machine for graphs with a natural, real-world topology. Titan, Faunus, and Fulgora form Aurelius’ OLAP story

Q7. How do you handle updates?

MATTHIAS: Updates are bundled in transactions which are executed against the underlying storage backend. Titan can be operated on multiple storage backends and currently supports Apache Cassandra, Apache HBase and Oracle BerkeleyDB.
The degree of transactional support and isolation depends on the chosen storage backend. For non-transactional storage backends Titan provides its own locking system and fine grained locking support to achieve consistency while maintaining scalability.

Q8. Do you offer support for declarative queries?

MARKO: Titan implements the Blueprints 2 API and as such, supports Gremlin as its query/traversal language. Gremlin is a data flow language for graphs whereby traversals are prescriptively described using path expressions.

MATTHIAS: With respects to a declarative query language, the TinkerPop teams is currently in the design process of a graph-centric language called “Troll.” We invite anybody interested in graph algorithms and graph processing to help in this effort.
We want to identify the key graph use cases and then build a language that addresses those most effectively. Note that this is happening in TinkerPop and any Blueprints-enabled graph database will ultimately be able to add “Troll” to their supported languages.

Q9. How does Titan compare with other commercial graph databases and RDF triple stores?

MARKO: As Matthias has articulated previously, Titan is optimized for thousands of concurrent users reading and writing to a single massive graph. Most popular graph databases on the market today are single machine databases and simply can’t handle the scale of data and number of concurrent users that Titan can support. However, because Titan is a Blueprints-enabled graph database, it provides that same perspective on graph data as other graph databases.
In terms of RDF quad/triple stores, the biggest obvious difference is the data model. RDF stores make use of a collection of triples composed of a subject, predicate, and object. There is no notion of key/value pairs associated with vertices and edges like Blueprints-based databases. When one wants to model edge weights, timestamps, etc., RDF becomes cumbersome. However, the RDF community has a rich collection of tools and standards that make working with RDF data easy and compatible across all RDF vendors.
For example, I have a deep appreciation for OpenRDF.
Similar to OpenRDF, TinkerPop hopes to make it easy for developers to migrate between various graph solutions whether they be graph databases, in-memory graph frameworks, Hadoop-based graph processing solutions, etc.
The ultimate goal is to ensure that the graph community is not hindered by vendor lock-in.

Q10. How does Titan compare with respect to NoSQL data stores and NewSQL databases?

MATTHIAS: Titan builds on top of the innovation at the persistence layer that we have seen in recent years in the NoSQL movement. At the lowest level, a graph database needs to store bits and bytes and therefore has to address the same issues around persistence, fault tolerance, replication, synchronization, etc. that NoSQL solutions are tackling.
Rather than reinventing the wheel, Titan is standing on the shoulders of giants by being able to utilize different NoSQL solutions for storage through an abstract storage interface. This allows Titan to cover all three sides of the CAP theorem triangle — please see here.

Q11. Prof. Stonebraker argues that “blinding performance depends on removing overhead. Such overhead has nothing to do with SQL, but instead revolves around traditional implementations of ACID transactions, multi-threading, and disk management. To go wildly faster, one must remove all four sources of overhead, discussed above. This is possible in either a SQL context or some other context.” What is your opinion on this?

MATTHIAS: We absolutely agree with Mike on this. The relational model is a way of looking at your data through tables and SQL is the language you use when you adopt this tabular view. There is nothing intrinsically inefficient about tables or relational algebra. But its important to note that the relational model is simply one way of looking at your data. We promote the graph data model which is the natural data representation for many applications where entities are highly connected with one another. Using a graph database for such applications will make developers significantly more productive and change the way one can derive value from their data.
——–

Dr. Marko A. Rodriguez is the founder of the graph consulting firm Aurelius. He has focused his academic and commercial career on the theoretical and applied aspects of graphs. Marko is a cofounder of TinkerPop and the primary developer of the Gremlin graph traversal language.

Dr. Matthias Broecheler has been researching and developing large-scale graph database systems for many years in both academia and in his role as a cofounder of the Aurelius graph consulting firm. He is the primary developer of the distributed graph database Titan.
Matthias focuses most of his time and effort on novel OLTP and OLAP graph processing solutions.

————
Related Posts

– “Applying Graph Analysis and Manipulation to Data Stores.” (June 22, 2011)

– “Marrying objects with graphs”: Interview with Darren Wood. (March 5, 2011)

Resources on Graphs and Data Stores
Blog Posts | Free Software | Articles, Papers, Presentations| Tutorials, Lecture Notes

ODBMS Industry Watch

Two cons against NoSQL. Part II.

Two Cons against NoSQL. Part I.

On Eventual Consistency– Interview with Monty Widenius.

Scaling MySQL and MariaDB to TBs: Interview with Martín Farach-Colton.

On Impedance Mismatch. Interview with Reinhold Thurner

On Eventual Consistency– An interview with Justin Sheehy.

On Big Graph Data.

About the author

Archives

Meta

About

Flickr

Search

About the author

Tags

Archives

Meta

About

Flickr

Search