Skip to content

"Trends and Information on AI, Big Data, Data Science, New Data Management Technologies, and Innovation."

This is the Industry Watch blog. To see the complete ODBMS.org
website with useful articles, downloads and industry information, please click here.

Oct 11 10

Presentations of ICOODB Frankfurt 2010.

by Roberto V. Zicari

I have published in ODBMS.ORG most of the industry presentations being given at the ICOODB Frankfurt 2010 conference.

Here are the relevant links:

TUTORIALS:
1. “Object Databases” (PDF 75 pages) by Michael Grossniklaus, Politecnico di Milano. |
2. “Patterns of Data Modeling”(PDF 49 pages) | , by Michael R. Blaha, Modelsoft Consulting Corp.
–>Download Link.

NoSQL Workshop:
1. “Approaches to Data Modeling in Non-Relational Systems Using Apache Cassandra”, by Gary Dusbabek, Rackspace
2. “Dinner in the sky with MongoDB.”, by Marc Boeker, ONchestra.
3. “Scale Out vs. Scale In- a face-off between Cassandra and Redis. ” by Tim Lossen, wooga.
4. “The Graph DB Landscape and SonesDB. “ by Achim Friedland, Sones.
5. “Neo4j for deep spatial and social intelligence. “ by Peter Neubauer, Neo Technology.
6. “Mastering Massive Data Volumes with Hypertable. “ by Doug Judd, Hypertable Inc..
—> Link to Download all presentations (.PDF).

ICOODB KEYNOTES and Industry Track Presentations:
1. “Efficient Development of Event-Driven Systems with Versant Object Database.” by Guenter Ressell-Herbert, Versant
2. “Accelerating Application Development with Objects. “ by Eric Falsken, German Viscuso, Roman Stoffel, db4objects.
3. “The Synergy Between the Object Database, Graph Database, Cloud Computing and NoSQL Paradigms. ” by Leon Guzenda, Objectivity.
4. “Unifying Remote Data, Remote Procedures and Web Services.” KEYNOTE by William Cook, University of Texas at Austin.
5. “Searching the Web of Objects” KEYNOTE by Ricardo Baeza-Yates, VP, Yahoo! Research, Europe and Latin America.
—> Download presentations Link.

5. “State of MariaDB” and “Dynamic Columns in MariaDB“, by Michael (Monty) Widenious, MariaDB.
—> Download link

A lot to read…

Around 200 Researchers from around the world attended the conference
You can see some photos here.

RVZ

Oct 1 10

Best Object Databases Lecture Notes for ETH Zurich!

by Roberto V. Zicari

The winners of the ODBMS.ORG “Best Object Databases Lecture Notes” Award 2010 are Dr. Michael Grossniklaus and Prof. Moira Norrie, ETH Zürich, Switzerland, for their Lecture Notes “Object-Oriented Databases”.

Second place for:
“Object Database Tutorial”
by Dr. Rick Cattell, Independent Consultant, USA.

Third place for:
“Modern Database Techniques”
by Prof. Martin Hulin, Hochschule Ravensburg-Weingarten, Germany.

The Award Ceremony was held on September 29, 2010, at the 3rd International Conference on Objects and Databases (ICOODB 2010) in Frankfurt.

The Awards recognize the most complete and up to date lecture notes on Object Databases, that have been, or have strong potential to be, instrumental to the teaching of theory and practice in the field of objects and databases. Any Lecture Notes published in ODBMS.ORG during the years 2004-2010 were eligible for the 2010 award.

“This is a very nice recognition to the award winners, and it also encourages others to contribute educational materials that others can use. Very good.” Prof. Alfonso Cardenas, Computer Science Department, UCLA.

Sep 27 10

Object Database Technologies and Data Management in the Cloud.

by Roberto V. Zicari

One of our expert, Dr. Michael Grossniklaus, has recently been awarded a grant by the Swiss National Science Foundation (SNF) for a fellowship as an advanced researcher in David Maier’s group at Portland State University. There, he will be investigating the use of object database technology for cloud data management.
I asked Michael to elaborate on his research plan and share it with our ODBMS.ORG community.

Q1. People from different fields have slightly different definitions of the term Cloud Computing. What is the common denominator of most of these definitions?

MG: Many of the differences stem from the fact that people use the term Cloud Computing both to denote a vision at the conceptual level and technologies at the implementation level. A nice collection of no less than twenty-one definitions can be found here.
In terms of vision, the common denominator of most definitions is to look at processing power, storage and software as commodities that are readily available from large infrastructures. As a consequence, cloud computing unifies elements of distributed, grid, utility and autonomic computing. The term elastic computing is also often used in this context to describe the ability of cloud computing to cope with bursts or spikes in the demand of resources on an on-demand basis. As for technologies, there is a consensus that cloud computing corresponds to a service-oriented stack that provides computing resources at different levels. Again, there are many variants of cloud computing stacks, but the trend seems to go towards three layers. At the lowest level, Infrastructure-as-a-Service (IaaS) offers resources such as processing power or storage as a service. One level above, Platform-as-a-Service (PaaS) provides development tools to build applications based on the service provider’s API. Finally, on the top-most level, Software-as-a-Service (SaaS) describes the model of deploying applications to clients on demand.

Q2. With the emergence of cloud computing, new data management systems have surfaced. Why?

MG: I see new data management systems such as NoSQL databases and MapReduce systems mainly as a reaction to the way in which cloud computing provides scalability. In cloud computing, more processing power typically translates to more (cheap, shared-nothing) computing nodes, rather than migrating or upgrading to better hardware. Therefore, cloud computing applications need to be parallelizable in order to scale. Both NoSQL and MapReduce advocate simplicity in terms of data models and data processing, in order to provide light-weight and fault-tolerant frameworks that support automatic parallelization and distribution.
In comparison to existing parallel and distributed (relational) databases however, many established data management concepts, such as data independence, declarative query languages, algebraic optimization and transactional data processing, are often omitted. As a consequence, more weight is put on the shoulders of application developers that now face new challenges and responsibilities. Acknowledging the fact that the initial vision was maybe too simple, there is already a trend of extending MapReduce systems with established data management concepts. Yahoo’s PigLatin and Microsoft’s Dryad have introduced a (near-) relational algebra and Facebook’s HIVE supports SQL, to name only a few examples. In this sense, cloud computing has triggered a “reboot” of data management systems by starting from a very simple paradigm and adding classical features back in, whenever they are required.

Q3. What is in your opinion the direction into which cloud computing data management is evolving? What are the main challenges of cloud computing data management?

MG: Data management in cloud computing will take place on a massively parallel and widely distributed scale. Based on these characteristics, several people have argued that cloud data management is more suitable for analytic rather than transactional data processing. Applications that need mostly read-only access to data and perform updates in batch mode are, therefore, expected to profit the most from cloud computing. At the same time, analytical data processing is gaining importance both in industry in terms of market shares and in academia through novel fields of application, such as computational science and e-science. Furthermore, from the classical data management concepts mentioned above, ACID transactions is the notable exception since, so far, nobody has proposed to extend MapReduce systems with transactional data processing. This might be another indication that cloud data management is evolving into the direction of analytical data processing.
At the time of answering these questions, I see three main challenges for data management in cloud computing: massively parallel and widely distributed data storage and processing, integration of novel data processing paradigms as well as the provision of service-based interfaces. The first challenge has been identified many times and is a direct consequence of the very nature of cloud computing. The second challenge is to build a comprehensive data processing platform by integrating novel paradigms with existing database technology. Often cited paradigms include data stream processing systems, service-based data processing or the above-mentioned NoSQL databases and MapReduce systems. Finally, the third challenge is to provide service-based interfaces for this new data processing platform in order to expose the platform itself as a service in the cloud, which is also referred to as “Database-as-a-Service” (DaaS) or “Cloud Data Services”.

Q4. What is the impact of cloud computing on data management research so far?

MG: Most of the challenges mentioned above are already being addressed in some way by the database research community. In particular, parallel and distributed data management is a well-established field of research, which has contributed many results that are strongly related to cloud data management. Research in this area investigates whether and how existing parallel and distributed databases can scale up to the level of parallelism and distribution that is characteristic of cloud computing. While this approach is more “top down”, there is also the “bottom up” approach of starting with an already highly parallel and widely distributed system and extending it with classical database functionality. This second approach has led to the extended MapReduce systems that were mentioned before. While these extended approaches already partially address the second challenge of cloud data management—integrating of novel data processing paradigms—there are also research results that take this integration even further such as HadoopDB and Clustera. The third challenge is being addressed as part of the research on programmability of cloud data services in terms of languages, interfaces and development models.
The impact of cloud computing on data management research is also visible in recent call for papers of both established and emerging workshops and conferences. Furthermore, there are several additional initiatives dedicated to support cloud data management research. For example, the MSR Summer Institute 2010 held at the University of Washington brought together a number of database researcher to discuss the current challenges and opportunities of cloud data services.

Q5. In your opinion, is there a relationship between cloud computing and object database technologies? If yes, please explain.

MG: Yes, there are multiple connections between cloud data management and object database technology which relate to all of the previously mentioned challenges. According to a recent article in Information Week , businesses are likely to split their data management into (transactional) in-house and (analytical) cloud data processing. This requirement corresponds to the first challenge of supporting highly parallel and widely distributed data processing. In this setting, objects and relationships could prove to be a valuable abstraction to bridge the gap between the two partitions.
Introducing the concept of objects in cloud data management systems also makes sense from the perspective of addressing the second challenge of integrating different data processing paradigms. One advantage of MapReduce is that it can cast base data into different implicit models. The associated disadvantage is that the data model is constructed on the fly and, thus, type checking is only possible to a limited extent. To support typing of MapReduce queries, the same base data instances could be exposed using different object wrappers. Microsoft has recently proposed “Orleans”, a next-generation programming model for cloud computing that features a higher level of abstraction than MapReduce. In order to integrate different processing paradigms, Orleans introduces the notion of “grains” that serve as a unit of computation and data storage.
Finally, object database technologies can also contribute to addressing the third challenge, i.e. providing service-based interfaces for cloud data management. Since object data models and service-oriented interfaces are closely related, it makes a lot of sense to consider object database technology, rather than introducing additional mapping layers. The concept of orthogonal persistence, that is an essential feature of most recent object databases, is particularly relevant in this context. In their ICOODB 2009 paper, Dearle et al. have suggested that orthogonal persistence could be extended in order to simplify the development of cloud applications. Instead of only abstracting from the storage hierarchy, this extended orthogonal persistence would also abstract from replication and physical location, giving transparent access to distributed objects. Even though Orleans is built on top of the Windows Azure Platform that provides a relational database (SQL Azure), the vision of grains is to support transparent replication, consistency and persistence.

Q6. Do you know of any application domains where object database technologies are already used in the Cloud?

MG: From the major object database vendors, I am only aware of Objectivity that has a version of their product that is ready to be deployed on cloud infrastructures such as Amazon EC2 and GoGrid. However, I have not yet seen any concrete case study showing how their clients are using this product. This being said, it might be interesting to point out, that many of the applications that are currently deployed using object databases are very close to the envisioned use case of cloud data management. For example, Objectivity has been applied in Space Situational Awareness Foundational Enterprise (SSAFE) system and in several data-intensive science applications, for example at the Stanford Linear Accelerator Center (SLAC). Similarly, the European Space Agency (ESA) has chosen Versant to gather and analyze the data transmitted by the Herschel telescope. All of these applications deal with large or even huge amounts of data and require analytical data processing in the sense that was described before.

Q7. What issues would you recommend as a researcher to tackle to go beyond the current state of the art in cloud computing data management?

MG: There is ample opportunity to tackle interesting and important issues along the lines of all three challenges mentioned before. However, if we abstract even more, there are two general research areas that will need to be tackled in order to deliver the vision of cloud data management.
The first area addresses research questions “under the hood”, for example: How can existing parallel and distributed databases scale up to the level of cloud computing? What traditional database functionality is required in the context of cloud data management and how can it be supported? How can traditional databases be combined with other data processing paradigms such as MapReduce or data stream processing? What architectures will lead to fast and scalable data processing systems? The second important area is how cloud data services are provided to clients and, thus, the following research questions are situated “on the hood”: What interfaces should be offered by cloud data services? Do we still need declarative query languages or is a procedural interface the way to go? Is there even a need for entirely new programming models? Can cloud computing be made independent of or orthogonal to the development of the application business logic? How are cloud data management applications tested, deployed and debugged? Are existing database benchmarks sufficient to evaluate cloud data services or do we need new ones?
Of course, these lists of research questions are not exhaustive and merely highlight some of the challenges. Nevertheless, I believe that in answering these questions, one should always keep an eye on recent and also not-so-recent contributions from object databases. As outlined above, many developments in cloud data services have introduced some kind of object notion and, therefore, contributions from object databases can serve two purposes. On the hand, technologies such as orthogonal persistence can serve as valuable starting points and inspiration for novel developments. On the other hand, we should also learn from previous approaches in order not to reinvent the wheel and not to repeat some of the mistakes that were made before.

Acknowledgement
Michael Grossniklaus would like to thank Moira C. Norrie, David Maier, Bill Howe and Alan Dearle for interesting discussions on this topic and the valuable exchange of ideas.

Michael Grossniklaus
Michael received his doctorate in computer science from ETH Zurich in 2007. His PhD thesis examined how object data models can be extended with versioning to support context-aware data management. In addition to conducting research, Michael has been involved in several courses as a lecturer. Together with Moira C. Norrie, he developed a course on object databases for advanced students which he taught for several years. Currently, Michael is a senior researcher at the Politecnico di Milano, where he both contributes to the “Search Computing” project and works on reasoning over data streams. He has recently been awarded a grant by the Swiss National Science Foundation (SNF) for a fellowship as an advanced researcher in David Maier’s group at Portland State University, where he will be investigating the use of object database technology for cloud data management.
#

Sep 17 10

New Resources

by Roberto V. Zicari

I published some new resources.

1. A new interesting User Report: 33/10 by Tilmann Zäschke. Tilmann used to work for the European Space Agency- His task there was to implement a persistence backend for the Herschel Space Observatory. The Herschel Space Observatory is a satellite that performs observations in the far infrared spectrum, in particular observing very old objects with a high red-shift. The life time of the satellite is limited to 3-4 years, during which it is expected to produce 15TB of data. You can download the User Report: 33/10.

2. An article by German Viscuso: “Using Object Database db4o as Storage Provider in Voldemort.” Voldemort’s local persistence component allows for different storage engines to be plugged in. In his article German shows how to create a new storage engine that uses db4o as storage engine in Voldemort. You can download the paper (PDF).
The source code is available under the Apache 2.0 license.

3. Three revised TechView Product Reports:
– db4o TechView Product Report -Updated July 2010
– Objectivity/DB TechView Product Report-Updated June 2010.
– ObjectStore TechView Product Report -Updated August 2010.

Sep 7 10

Best Object Databases Lecture Notes: Three Selected Finalists Announced.

by Roberto V. Zicari

The jury has selected three finalists for the ODBMS.ORG “Best Object Databases Lecture Notes” Award 2010.

The three finalists are:

“Object Database Tutorial”
by Rick Cattell, Independent Consultant, USA.

“Object-Oriented Databases”
by Michael Grossniklaus and Moira Norrie, ETH Zürich, Switzerland.

“Modern Database Techniques”
by Martin Hulin, Hochschule Ravensburg-Weingarten, Germany.

You can download the three Lecture Notes here.

The Awards recognize the most complete and up to date lecture notes on Object Databases, that have been, or have strong potential to be, instrumental to the teaching of theory and practice in the field of objects and databases. Any Lecture Notes published in ODBMS.ORG during the years 2004-2010 were eligible for the 2010 award.

The jury panel was composed by:
Prof. Suad Alagic, University of Southern Maine, USA
Prof. Dr. Alfonso F. Cárdenas, UCLA, USA
Leon Guzenda, Objectivity, USA
John McHugh, Progress Software, USA
Prof. Renzo Orsini, University of Venice, Italy
Prof. Tore J.M. Risch, University of Uppsala, Sweden
Prof. Nicolas Spyratos, University of Paris South, France
Prof. Roberto V. Zicari, Goethe University Frankfurt, Germany.

The Award Ceremony will be on September 29, 2010, at the 3rd International Conference on Objects and Databases (ICOODB 2010) in Frankfurt.

Aug 24 10

Universal antipatterns.

by Roberto V. Zicari

I published two chapters of the new book of one of our distinguished experts: Dr. Michael Blaha.
Michael is one of the early pioneers in database modeling, together with his colleagues William J. Premerlani and James E. Rumbaugh. His classic textbook “Object-Oriented Modeling and Design” has been translated into many languages.

The new book of Dr. Blah is about Patterns of Data Modelling.

One of the chapter I published describes Universal antipatterns.
An antipattern is a characterization of a common software flaw. The idea is that as you construct models, you should be alert for antipatterns and correct them. When you find an antipattern, you should substitute the correction. Universal antipatterns are antipatterns that you should avoid for all applications. The chapter is available for free download (.pdf) in the Expert Section.

The other chapter is about Models.
Models provide the means for building quality software in a predictable manner. Models let developers think deeply about software and cope with large size and complexity. Although models are beneficial, they can be difficult to construct. That is where patterns come in. Also this chapter is available for free download (.pdf).

If you are in Frankfurt late September for ICOODB Frankfurt you can consider attending Dr. Blah`s tutorial: “”Patterns of Data Modeling”” on September 28, 2010.

RVZ

Aug 5 10

the OO7J benchmark.

by Roberto V. Zicari

I just published a very interesting resource in ODBMS.ORG, the dissertation of Pieter van Zyl, from the University of Pretoria.

The title of the dissertation is: “Performance investigation into selected object persistence stores” and presents the OO7J benchmark.

OO7J is a Java version of the original OO7 benchmark (written in C++) from Mike Carey, David DeWitt and Jeff Naughton = Univ
Wisconsin-Madison. The original benchmark tested ODBMS performance.

OO7J also includes benchmarking ORM Tools. Currently there are implementations for Hibernate on PostgreSQL and MySQL, db4o and Versant.

You can download the dissertation (187 pages PDF) : LINK

The code is available on Sourceforge: LINK

RVZ

Jul 29 10

Marten G. Mickos on Innovation.

by Roberto V. Zicari

It has been a while since my last interviews on Innovation….

I asked a few questions to Marten G Mickos, CEO of Eucalyptus, former CEO of MySQL.

1. What is “Innovation” for you?
Marten: Peter Drucker said it best: Innovation is change that creates a new dimension of performance. Whereas invention typically is something technical, an innovation can relate to anything a business does, as long as it creates a new dimension of performance. Google’s business model is an innovation, as is the usability of Apple’s products.

2. What does it take to become a successful innovator?
Marten: Innovators many times start by solving problems they have experienced themselves. In that way they know the problem well and they have a genuine desire to solve it. But sometimes it’s not enough. You also need to look around you and verify that the problem you experienced is sufficiently generic, common and real so that the solving of it truly creates a new dimension of performance.

3. Is there a price to pay to be an innovator? Which one?
Marten: There is a price to pay for everything we do in life, but there is a higher price for not doing things. It’s time consuming to be an innovator, and it takes a lot of passion and determination to keep going until the innovation succeeds (or until you know it didn’t).
Innovation is risky business. We all know about the successes, but we may not always realize how many failures there are for each success. If you want to be an innovator, you must be prepared both for the huge success you are dreaming of and the possible failure that is statistically perhaps even more likely than success.

4. What are the rewards to be an innovator?
Marten: The main reward is the feeling of accomplishment – the knowledge of having contributed to something that makes the world a better place.
Many innovations also provide ample financial reward. But not all do.
For instance, the Finnish telecom executive who came up with the idea for text messaging didn’t make money on it.

5. What are in your opinion the top 3 criteria for successful innovation?
Marten:A new dimension of performance says it all. Examples: Open source software has enabled the world to produce applications and web services in a way never before possible. Google has enabled people to find information in a way never before possible, and they have enabled small and large vendors to reach the most interested customers in an efficient way. Amazon is enabling us to consume books faster and easier than before (with Kindle) and through their cloud offering they enable entrepreneurs to start a company without buying a single computer server. Those are great innovations.

6. Given your previous experience as CEO at MySQL , do you think becoming an innovator can be taught? If yes, how?
Marten: I certainly believe that you can get trained for being an innovator.
But more than it can be taught, it can be learned. What I mean is that it isn’t necessarily in a classroom that you learn about innovation. It is by doing it in practice, and by learning from those who have done it before. Mark Twain said it so well: Don’t let education get in the way of learning.

7. What specific programs do you think foster innovation?
Marten: The ecosystem of startup companies, angel investors and VCs.
Conferences and camps (such as Maker Faire and TED) that focus on wild ideas and big bold plans.

8. What would you recommend to young people who wish to pursue innovation?
Marten: To start immediately, and keep innovating until they hit gold. It may take ten attempts or it may take a thousand attempts. I would also recommend young people to keenly observe some of the greatest innovators in the world, and innovators they have access to.

9. In your opinion how can we create a culture that supports and sustains innovation?
Marten: By allowing people to fail. Michael Jordan said it well: I’ve failed over and over and over again in my life and that is why I succeed.

10. What do you think stops/slows down innovation?
Marten: Complacency and rigidity. Fear of failure. Fear of success. Not thinking big enough. Not allowing naïve dreams.

11. What is in your opinion the influence that a “location” (country/region) plays with respect to the possibility to be a successful innovator?
Marten: We are all influenced by our daily interaction with the world around us. If the interaction is conducive to innovation, we will have more of it. Silicon Valley is such a place. It’s on average probably the most innovative place on the planet, especially in tech. But that doesn’t mean that you couldn’t create bigger innovations elsewhere (Skype comes to mind). In today’s world you can have your daily interaction electronically over long distances and over many time zones, so locations that previously were unfavorable are now OK. Innovators are inherently unconventional. Many times they succeed not thanks to the location, but in spite of it.

12. What would you recommend to make a “location” attractive for innovation?
Marten: Build on what you have. Take whatever innovators there are at the location in question, and help them help others to innovate. Keep feeding this system with whatever it needs, and watch how the level of innovation slowly but surely improves.

Jul 28 10

New and old Data stores

by Roberto V. Zicari

I would like to mention the keynote panel to be held at ICOODB Frankfurt on September 29, 2010:

“New and old Data stores”
Panelists:
Ulf Michael (Monty) Widenius, main author of the original version of the open-source MySQL database.
Michael Keith, architect at Oracle
Patrick Linskey, Apache OpenJPA project.
Robert Greene, Chief Strategist Versant
Leon Guzenda, Chief Technology Officer Objectivity.
Peter Neubauer, COO NeoTechnology

This promises to be an interesting panel…
RVZ

Jul 15 10

3rd International Conference on Objects and Databases (ICOODB 2010)

by Roberto V. Zicari

I`d like to mention an event that should be of interest to the ODBMS.ORG community: the 3rd International Conference on Objects and Databases (ICOODB 2010).
The conference will take place on September 28-30, 2010, at the Goethe University Frankfurt, in Frankfurt am Main, Germany.

The program consists of
2 tutorials: “Object Databases” and “Patterns of Data Modeling”,
1 NoSQL Workshop,
10 research papers,
10 industry presentations,
3 keynotes and 1 keynote panel

Among the keynote speakers:
Ulf Michael (Monty) Widenius, main author of the original version of the open-source MySQL database,
Ricardo Baeza-Yates, VP, Yahoo! Research, Europe and Latin America,
and several more.

You can have a look at the Program Overview.

If you are interested to attend, please note that the deadline for early registration is August 31, 2010. You can register online.

##