Thursday, February 18, 2010

HyperGraphDB @ NoSQL Live

NoSQL has picked up a lot of steam lately. HyperGraphDB being a NoSQL DB par excellence, we will be joining the upcomping conference organized by the 10gen, the maker of MongoDB:

"NoSQL Live from Boston is a full-day interactive conference that brings together the thinking in this space. Picking up where other NoSQL events have left off, NoSQL Live goes beyond understanding the basics of these databases to how they are used in production systems. It features panel discussions on use cases, lightning talks, networking session, and a NoSQL Lab where attendees can get a practical view of programming with NoSQL databases."

The conference takes place on March 11, 2010. For more information and to register (fees are really small!), goto http://www.10gen.com/events.

Borislav Iordanov will be on the graph databases panel as well as on the live hacking session later in the afternoon. The talk will focus on real world graph database applications and reveal some of the interesting architectural bits behind HyperGraphDB. More extensive chats and demos during the live session.

For more information on HyperGraphDB, visit http://www.kobrix.com/hgdb.jsp

7 comments:

  1. As its the 12th I guess the conference has taken place, links I will be following up.
    There are a few questions I have.
    1. Can you point me towards any comparison of NeoDB and Hypergraphdb, do they cover the same ground? How do they differ?
    2. The relationship between graph databases and
    2.1. OWL, how would OWL be consumed, or would it?
    2.2. more generally RDF, and then XML, after all there are XML databases that parse in the XML. How do they compare? I'm sure I have missed something(s), but what?
    3. One of the problems I have encountered is in keeping various .properties files aligned. One approach is to use something like magic lenses such as the augeas implementation. But, at the same time, I have wanted to rewrite these properties out of their ANT context into a Maven POM context. A job for hypergraphdb? Ideas?
    4. Moving on, I have noticed the fascinating post about using hypergraphdb to create a neural net.
    4.1. Would you agree that what is happening here is in line with Rickard Öberg? http://www.qi4j.org/ for background and http://www.qi4j.org/qi4j/351.html where he discusses the relationship between algorithms and OOP. BTW, he also arrives at the need for atoms and mentions the same focus, the business case, that you emphasise in your background paper, Rapid Software Evolution.
    4.2. I notice that Neo4J has an example of a spreading activation algorithm (token passing), http://wiki.github.com/tinkerpop/gremlin/pagerank - I expect this means that either db could also be used to implement Random Indexing - sparse matrices - as developed by P. Kanerva and M. Sahlgren
    Some of this may be touched on in the Disko project. Again, ideas?
    Sorry for such a long comment, but not sure how/if to email privately.

    ReplyDelete
  2. Hi semanticC,

    A good place to discuss HyperGraphDB would be the discussion forum:

    http://groups.google.com/group/hypergraphdb?hl=en

    This is a long list of topics raised indeed :) Let me try to cover them one by one, perhaps in separate responses:

    1) Such comparison should ideally be done independently and I am not aware of any. For starters, HyperGraphDB has much more general data model than Neo. In fact, the name is maybe a bit misleading from a functionality perspective because now it's being labeled as "another graph database", which it is, but it is also OO database, a relational database (albeit nonsql) etc. In HyperGraphDB, edges point to an arbitrary number of things, including nodes and other edges Neo is a classical graph of nodes and directed edges between any two nodes. In addition, HGDB has a type system while Neo doesn't. So HGDB has in effect a dynamic schema that you can introspect, reason about and change. Besides the data models, the storage models are quite different: HyperGraphDB has a general two-layered architecture where a big part of the storage layout can be customized. Neo uses linked lists to store its graph and claims that this makes faster traversals (probably true) and that this is all you need to do with a graph, you don't need indices, pattern mining etc. (here, I disagree). HGDB relies heavily on a lot of indexing for more complicated graph-related queries & algorithms. In sum, HyperGraphDB has pretty much the most versatile data model I know of, and subsumes Neo and others easily. Weather that sort of generality comes at the expense of performance remains to be seen. As you've probably realized from the neural net post, HGDB gives you more representational choices so performance has to be measured more globally, at an application level, through a design that makes intelligent use of what HGDB has to offer.

    more on the others later....perhaps at the end I'll sum up my responses in a separate blog.

    ReplyDelete
  3. Hi Boris,
    Thanks so much for your reply. It would be great if the other questions inspire a blog post.
    If anyone is interested the NoSQL conference is previewed and will be written up here http://radar.oreilly.com/2010/02/nosql-conference-coming-to-bos.html - and it is a good discussion. Boris contributes too!
    There are still many things I cannot get my head around. I can see the 'representational choices' the ability to define functions directly working on the data using the HGDB API.
    I expect this is a good thing in the way that, for example, annotations are better than XML, everything is in the place where it will be used, which facilitates concentrating on the task. But other benefits? Here I cannot see.
    Moving on again, I am reminded of the efforts of Henry Story to create a framework to import RDF, inspired by Active Record. I am very unclear about all of this. Did I read somewhere that there is a standardisation of the syntax for the import statements of RDF namespaces? Anyway, the idea would be to make the referenced ontology available in code, presumably it would already be in Sesame as the graph db backend? All of this seems relevant to HGDB. First you have mentioned the type system, so how to model the types? I had thought that OWL was a good way of both modelling and sharing those models. But if so, what of the other aspect of HGDB, its ability to deal with semi-structured data, how to fit the two together? I am thinking about Collective Entity Resolution as perhaps one sort of solution and simply in code, how they might interact, as another area.
    Moving up towards the goal of evolutionary software, I have long thought that it must be possible to describe software using OWL. I assumed that reasoning would take the place of a lot of code when there is a well constructed model. Of course that brings me back to what role reasoning in NoSQL. I know it is build in to AllegroGraph.
    As I say, many thoughts, but I don't really understand the ramifications of NoSQL at the moment. Perhaps I am missing the point altogether?

    ReplyDelete
  4. 2) OWL, RDF etc. are on the short list. There have been several unrelated efforts that would need to be eventually merged into a comprehensive XML/RDF/OWL semantic stack. And this is a pretty desirable extension, but lack of manpower is preventing it from moving forward at a faster pace (any volunteer help in that area is welcome). Currently, there is some support of OWL 1.0 in Disko. There's RDF support via Sail kindly contributed by Ian Holsman. And there's a serious (but interrupted) attempt to support XML Schema at the storage level with XSD types being synthesized with bytecode generation. We have a very immediate need for OWL 2.0 support and will probably start working on it within the next couple of months.

    The strategy for supporting those semantic technologies is by implementing custom HGDB types and indices so that they look "native" to the database. Ideally, XML Schema will be supported by generating HGDB types for each XSD type, which will then allow for a very efficient support of XMLs that are based on that schema. RDF is then simply a triple or quadruple store with appropriately precomputed joins to make SPARQL run efficiently. And finally, OWL is much more involved and it's where it should get a bit more instersting. But the same principle of creating custom type constructors applies. For example, OWLTypeConstructor will take an OWL class definition and generate and store an OWL class that's also a HGDB type within the system. This way, there's a direct representation of the OWL class in HyperGraphDB, which at the same time does the intuitive thing of an OO class of managing its instances (acting like factory for its instances). The point here is the architecture and data model of HyperGraphDB solve problems in a rather natural way: (1) n-ary relationships are directly represented rather then having intermediary nodes as would be required by a classical graph model (2) since edges can point to edges, everything is automatically reified, higher-order relationships that arise frequently in RDF/OWL are also directly represented (3) the ability to plug into the type system and storage allows for optimizing how data is actually stored on disk.

    ReplyDelete
  5. 3) That would be nice application for HyperGraphDB. It's probably an example where the ability to handle large volumes of data is less important than the power of the data model used. There would be a lot of work to create mapping between all sorts of config file formats to HyperGraphDB, using something like lenses perhaps. But the interesting part is in coming up with a representation and appropriate metadata that allows you to deal with dependencies across config files. This would be a challenging problem I think. Focusing on build scripts and conversion b/w Ant and Maven is a good example. But it would be much nicer to have a model that resolves co-referring (configuration) expressions automatically. Let's say you load a bunch of config files (property, build scripts, what not) and the system already has a lot of rules about how they are interdependent. For example, it would detect if you are trying to use the same port number in two different application. Or it would translate the config settings from one program to another (Ant->Maven). Also, because everything would be loaded in a single DB, it would be able to detect easily all the places where a value is being used and help resolving what happens when that value changes. Say, some server URL is being referred to in many places. A tool like this will be able to tell you all the configuration settings that would be affected by a change in the URL. Hmm, yeah...it would be nice :) Perhaps something like this could be developed piecemeal using Seco (http://www.kobrix.com/seco.jsp), but Seco itself needs more work unfortunately.

    ReplyDelete
  6. Thanks again.
    3. I have got to a certain point developing this in Python. I am going to put a note about this on my blog.

    ReplyDelete
  7. I'll be brief on point (4) I wanted to take a look at Qi4J before responding, especially since you mentioned that it was in line with the RSE paper. Also, I've yet to write about this, but I've been obsessed with the problem of context for roughly the past 10 years and I've found all treatments unsatisfactory, including my own scribbles on various notebooks. Qi4J also seems to recognize this problem, as many other have, but in a very superficial way. I think it's a much deeper problem that can't be solved by a sexy framework. My first impression is that it is actually not in line with the evolutionary programming ideas in RSE. But I need to look deeper at it, which I will since it's an innovative approach as far as Java frameworks go. However, I must admit I was horrified by the little I saw. That attitude might be reversed though when I understand better the consequences of this design.

    As for sparse memory, I have Kanerva's book somewhere and I will need to refresh my memory, but it would be an interesting application. I just haven't seen that model applied much so I didn't even attempt to see how it would map to a hypergraph. But could be a worthwhile direction to persue.

    Best,
    Boris

    ReplyDelete