Monday, December 13, 2010

HyperGraphDB 1.1 Released

Kobrix Software is pleased to announce the first official release of HyperGraphDB version 1.1.

HyperGraphDB is a general purpose, free open-source data storage mechanism. Geared toward modern applications with complex and evolving domain models, it is suitable for semantic web, artificial intelligence, social networking or regular object-oriented business applications.

HyperGraphDB is a Java based product built on top of the Berkeley DB storage library.

Key Features of HyperGraphDB include:
  • Powerful data modeling and knowledge representation.
  • Graph-oriented storage.
  • N-ary, higher order relationships (edges) between graph nodes.
  • Graph traversals and relational-style queries.
  • Customizable indexing.
  • Customizable storage management.
  • Extensible, dynamic DB schema through custom typing.
  • Out of the box Java OO database.
  • Fully transactional and multi-threaded, MVCC/STM.
  • P2P framework for data distribution.
In addition, the project includes several practical domain specific components for semantic web, reasoning and natural language processing. For more information, documentation and downloads, please visit the HyperGraphDB Home Page.

Wednesday, August 4, 2010

HyperGraphDB at Strange Loop 2010

I will be giving a brief talk on HyperGraphDB on the Strange Loop conference in St-Louis on October 14. The talk will focus on the HyperGraphDB data model, architecture and why it's well suitable for complex software systems, as opposed to other models, SQL, NOSQL, or conventional graph databases.

This conference is highly recommended! Judging by the program and the list of speakers, it is truly as the organizers promote it: from developers for developers. It is about cutting edge technology, it is about everything hot going on this days in the world of software development, it is technical and it looks fun. So, please come buy!


You are encouraged to register with the website, and interact with the speakers online before the conference.


Friday, July 9, 2010

HyperGraphDB at IWGD 2010

The architecture of HyperGraphDB will be presented at The First International Workshop on Graph Database during WAIM 2010 (the Web Age Information Management conference) taking place on July 15-17 in Jiuzhai Valley, China. For more information, please see:

The presentation will be done by Borislav Iordanov and it will focus on the unique HyperGraphDB data model, type system and discuss some of the architectural choices and their impact on performance. The accompanying paper can be found here:

Thursday, June 17, 2010

HyperGraphDB 1.1 Alpha Released

Kobrix Software is pleased to announce the release HyperGraphDB 1.1 Alpha. HyperGraphDB is a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. Designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.

This is an initial, alpha release of the next version of HyperGraphDB. A complete list of changes is availabe at:

To download, please vist the HyperGraphDB project home page at Google.

For more information about HyperGraphDB's features, documentation and additional pointers, please visist the HyperGraphDB home page at Kobrix.

Sunday, May 30, 2010

Poetry About Our Art

About 14 years ago, at the pick of the object-oriented programming craze I came across a publication called On the Origin of Objects, randomly placed amidst the stream of software/OO engineering books at a very reliable computer science bookstore. Surely, I thought, this would get to the bottom of things, somebody has dissected the notion of objects and outlined all the fundamentals I would need to be a successful programmer without the need to skim through endless pages describing ridiculous design processes, trivial principles, inapplicable rules of thumb and what not. It turned out the book had nothing to do with object-oriented programming.

On the Origin of Objects (amazon link) is about metaphysics with computation as the starting point. The author is one of the deepest and most original thinkers I've come across - Brian Cantwell Smith (wikipedia link). Needless to say, I couldn't read it at the time, I wasn't ready for it. That came about 7 years later and it was a memorable, mentally reinvigorating experience. Smith writes beautifully and dances around his insights with such grace and depth. He writes about the kind of computation we do day to day, the real stuff, and he puts common conceptual problems we programmers face into the center stage of philosophy in a way that gives our work those extra dimensions that scientists seem to have always enjoyed - a fundamental, very real connection to the physical world, including, at an even deeper level, a connection with us intentional beings, a set of problems that arise naturally from the practice of our profession, yet quickly reach the most difficult metaphysics in a way that no other practice does.

There are a few (not many) articles you could find on the internet from Prof. Smith, all of them worth reading. However, the purpose of this blurb is to bring to the attention to whoever comes across it his latest work. For the past years, I've been eagerly monitoring and waiting the publication of Age of Significance, which is supposed to be in 7 volumes. The book website,, hadn't changed until just two months ago where it was announced that individual chapters will be published monthly. So far, only the introduction has been posted at and I believe that an attentive read would make my seeming infatuation with this work understandable. Originally, I intended to write a summary of that introduction, highlighting the main points, most of which I'm already familiar with from previous writings (Smith's and others), but I wouldn't want to butcher it. It is philosophy at its best. And it is about the foundation of computing, that which we (should, to say the least) care about. I will just quote the conclusion for the hardcore philosophy skeptics:

Throughout the whole project, I have received countless communications from programmers and computer scientists - over coffee, in the midst of debugging sessions, in conferences, at bars and via email - hoping that it might be possible to say what they feel they know, the absence of which, in some cases, has led to almost existential frustration.

That is pretty much how I've felt more often than not as a programmer. And that is why, to me Smith's work is pure poetry, as philosophy used to be seen at the time of Plato anyway.


Tuesday, March 9, 2010

Seco 0.3 Released

Kobrix Software is pleased to announce the release of Seco 0.3 Seco, formerly known as Scriba, is a scripting development environment for JVM-based dynamic languages. Seco has been in active development and use for the past several years and it is a perfect companion to the serious Java developer.

Key features include:

- Support of many popular JVM languages with syntax highlighting and code completion for most.
- Advanced script editing interface based on structured notebooks as popularized by the Mathematica system.
- A WYSIWYG HTML editor for documentation.
- An infinite, zoomable 2D canvas for arbitrary layout of components, container nesting and more.
- Full workspace automatically persisted in an embedded HyperGraphDB database.
- Support for importing 3d party libraries in multiple evaluation contexts.
- Based on the JSR 223 standard for language interoperability - all languages share the same runtime context.
- Real-time collaboration and exchange of components and notebooks via a P2P network.

Seco is perfect not only for prototyping, testing and experimentation, but it is also the ideal tool for learning a given JVM language or a new Java library. It can be used to build complete interactive applications embedded within the environment itself similarly to a life system like Squeak!

Seco is free, open-source, LGPL licensed software.

To download and for more information, please visit the Seco home page at Kobrix.

Thursday, February 18, 2010

HyperGraphDB @ NoSQL Live

NoSQL has picked up a lot of steam lately. HyperGraphDB being a NoSQL DB par excellence, we will be joining the upcomping conference organized by the 10gen, the maker of MongoDB:

"NoSQL Live from Boston is a full-day interactive conference that brings together the thinking in this space. Picking up where other NoSQL events have left off, NoSQL Live goes beyond understanding the basics of these databases to how they are used in production systems. It features panel discussions on use cases, lightning talks, networking session, and a NoSQL Lab where attendees can get a practical view of programming with NoSQL databases."

The conference takes place on March 11, 2010. For more information and to register (fees are really small!), goto

Borislav Iordanov will be on the graph databases panel as well as on the live hacking session later in the afternoon. The talk will focus on real world graph database applications and reveal some of the interesting architectural bits behind HyperGraphDB. More extensive chats and demos during the live session.

For more information on HyperGraphDB, visit

Monday, February 8, 2010

Is HyperGraphDB an Object-Oriented Database?

Back in the 90s, the "killer" of RDBMs were presumed to be the ODBMs. Today it is NoSQL. Why are RDBMs a prey to be killed, and why should any other approach be a voracious predator rather than a gentle companion has never been clear to me. Industry fads are always a bit ridiculous in retrospective, but fortunately the technical advances that fuel them at the beginning often follow their independent paths eventually contributing their fair share to our beloved profession. Strangely, OO databases are now being categorized as NoSQL. So the old and new predators join forces in a cooperative onslaught. Why not? Whatever it takes to get the crowd's attention. Since HyperGraphDB was announced as a graph database, it fits the NoSQL bill, that's good for promotion. But we've received "criticism" in the past that it was actually more of an OO database than a graph database, so why not call it that simply?

Well, for starters objects in memory form a graph, so at a certain abstraction level we are talking about essentially the same thing. But more interestingly, does HyperGraphDB fit the accepted definition of what constitutes an objent-oriented database. According to the ODMBS.ORG:

"An object database management system (ODBMS, also referred to as object-oriented database management system or OODBMS), is a database management system (DBMS) that supports the modelling and creation of data as objects. This includes some kind of support for classes of objects and the inheritance of class properties and methods by subclasses and their objects."

The Object-Oriented Database System Manifesto from 1995 is still the main reference for the core features of an OO database. So let's examine (admittedly, a bit crudely) that paper's defining list and see how it applies to HyperGraphDB:
  1. Complex Objects built from simpler ones by applying constructors to them. HyperGraphDB has that - type constructors are fundamental to representing complex values.
  2. Object Identity an object has an existence which is independent of its value. This means one can change the value while preserving the identity. The authors note " that identity-based models are the norm in imperative programming languages: each object manipulated in a program has an identity and can be updated. This identity either comes from the name of a variable or from a physical location in memory. But the concept is quite new in pure relational systems, where relations are value-based." HyperGraphDB has identity at its very basis: the atom handle is like a memory location in a universal addressing space. Atom identity in HyperGraph is in fact more fundamental than anything else.
  3. Encapsulation, which in a database context is taken to mean "that an object encapsulates both program and data". This is supported via Java. When storing Java objects, the current implementation does not store the program part (the bytecode of a class' methods) because there's no really a need for it. Naturally, this wouldn't be hard to achieve with a different set of type constructors that do store the program. In fact, this is something that we plan to do with Seco.
  4. Types and Classes - the system should offer some form of data structuring mechanism, be it classes or types. Thus the classical notion of database schema will be replaced by that of a set of classes or a set of types. The distinction between types & classes comes into play mostly at the Java level (HyperGraphDB's "host language" at the moment). Nevertheless, HyperGraphDB's types cover both the notion of a class as a factory of objects with a well-defined extent and of a type as a semantic notion obeying certain composition rules. The core notion of substitutability is expressed with HGSubsumes links. More extensive checking and enforcement is something left for actual type & type constructor implementations.
  5. Class or Type Hierarchies with various forms of inheritance being distinguished by the authors - substitution, inclusion, constraint and specialisation. HyperGraphDB has a type hierarchy with multiple inheritance via multiple HGSubsumes links between types, but it doesn't make such fine-grained distinctions between the different kinds. Such distinctions are left open to the application. When mapping Java classes to HyperGraphDB types, the HGSubsumes link created between a class and its parent corresponds to "specialisation inheritance". A HGSubsumes link between a class and an implemented interface may correspond to any/all of the other kinds.
  6. Overriding, overloading and late-binding are notions at the programming language level that usually apply to operations rather than data and as such are supported only to the extent that HyperGraphDB is being used from an OO language (Java). At the data level, we note that an object property is always fully stored, regardless of its declared type. For instance, if a bean has a property of declared type A, but the actual value is of a subclass B, B will be used as the stored type instead of A. So overriding is supported. In addition, HyperGraphDB supports properties with the same name but different types within a single record: one could have a property "x" of type int and a property "x" of type String within the same complex type. So, overloading is supported as well!
  7. Computation Completeness is required, but the authors "are not advocating here that designers of object-oriented database systems design new programming languages: computational completeness can be introduced through a reasonable connection to existing programming languages"... which HyperGraphDB does, again via the JVM.
  8. Extensibility is required in the following sense: there is a means to define new types and there is no distinction in usage between system defined and user defined types. HyperGraphDB's type system is open and extensible from the very high-level type-constructor-constructors...down to the primitive types which could be replaced as well. So this requirement is met with applause.
  9. Persistence should be orthogonal, i.e., each object, independent of its type, is allowed to become persistent as such (i.e., without explicit translation). It should also be implicit: the user should not have to explicitly move or copy data to make it persistent. Yep, check-mark, we've got it.
  10. Secondary storage management with "clear independence between the logical and the physical level of the system". Check-mark here too.
  11. Concurrency - yes.
  12. Recovery - yes, thanks to the very reliable BerkeleyDB.
  13. Ad Hoc Query Facility which lets you express non-trivial queries concisely, is efficient and it's application independent. HyperGraphDB meets that requirement, but not with flying colors at this point. More mature DBs have better querying capabilities and we hope to get there soon.
In conclusion, HyperGraphDB is a full-fledged OO database according to the most official definition.


PS: Perhaps the most prominent OO database in the Java world these days is db4o. I haven't used it, but skimming through tutorials and docs, I don't see what it can do that HyperGraphDB can't. Their querying options might be better (the native queries are quite an advanced concept), and the optimizer might be more advanced, but besides that I challenge readers to tell us what HGDB is missing as a competitor in the object-oriented database space?

Monday, January 25, 2010

A Feedforward Neural Net with HyperGraphDB

One obvious application of a graph databases such as HyperGraphDB is the implementation of artificial neural networks (NNs). In this post, I will show one such implementation based on the practical and informative book "Programming Collective Intelligence" by Toby Segaran. In chapter 4, the author shows how one can incorporate user feedback for improving search result ranking. The idea is to train a neural network to rank possible URLs given a keyword combination. It is assumed that users will perform a search, then examine the list of results (with their titles, summaries, type of document etc.) and finally click on the one that's most likely to contain the information they are looking for. That click action provides the feedback needed to train the neural network. With training, the network will increasingly rank better the most relevant documents for a keyword combination. Not only that, but it will make rather good guesses for searches it has never seen before.

If you are not familiar with neural networks and don't own the aforementioned book, the main Wikipedia article on the subject is always a good place to start. Also, I can recommend this introductory book where a couple of chapters are freely available . For those too busy/lazy for thorough introduction to NNs, here is a brief summary of the particular neural net we'll be implementing: a 3-layered feedforward NN. It consists of 3 layers of neurons connected with synapses. The neurons and synapses are abstract models of their brain counterparts. Each synapse connects two neurons, its input and output, and has an associated real number called its strength. There's an input layer, a hidden layer and an output layer as shown in this picture. The network is feed-forward because neurons from the input layer "feed" the hidden layer neurons which in turn "feed" the output layer neurons, but there is no looping back. The NN is executed in steps: first all input neurons are assigned a real number that comes somewhere from the "outside world": that's their activation level. Second, the activation level of all hidden neurons is calculated based on the input neurons connected to them and the strength of the connections (synapses). Finally, the same calculation is applied for the output neurons, yielding a set of real numbers, which is the NN's output. To give the correct outputs, the network is trained by an algorithm known as backpropagation : run the network to produce output, compare with the expected output and adjust the synaptic strengths to reduce the delta between the two.

In "Programming Collective Intelligence", the input neurons correspond to individual search keywords and their activation level is set to 1 while the output neurons correspond to URLs and their final activation level provides their ranking. The hidden neurons correspond to combination of keywords (i.e. actual searches performed). In our implementation, the inputs and outputs are going to be arbitrary HyperGraphDB atoms that are part of whatever representation of whatever domain one is working with.

As in most software, neural network implementations usually have two representations - a runtime RAM representation and a permanent storage representation (e.g. in SQL database). But one of the core principals behind HyperGraphDB is to eliminate this distinction and embed the database in-process as direct extension to RAM memory. This way one doesn't have to worry about caching frequently used data, what's in RAM and what's not, at least not as much. For example, while usually the set of connections between two consecutive layers is represented as a weighted matrix, we will see now how a direct HyperGraphDB representation can be used without much penalty!

When you think about representing a neural network as a graph, the most intuitive way is by creating a node for each neuron and a link for each synaptic connection:
public class Neuron { ... }
public class Synapse extends HGPlainLink
private double strength;
public Synapse(HGHandle...args) { super(args); }
public HGHandle getInput() { return getTargetAt(0); }
public HGHandle getOutput() { return getTargetAt(1); }
public getStrength() { return strength; }
public setStrength(double strength) { this.strength = strength; }
To get all inputs of a neuron:
List inputs = hg.getAll(graph, hg.and(hg.type(Synapse.class),hg.orderedLink(hg.anyHandle(), neuronHandle)));
This will work just fine. When time comes to activate a neuron, we loop through the above list and apply whatever formula we have chosen for our model. But there's a better way: each neuron can simply be a link binding together its input neurons and it can contain the strengths of all synapses in a double[]:
public class Neuron extends HGPlainLink
private double [] weights;

public Neuron(HGHandle...inputNeurons)

public double fire(Map input,...) { ... }

// etc...
In this way, the inputs are readily available and the activation of a neuron can be easily calculated. Notice how the generality of the hypergraph model allows us to represent the same graph structure, but in a much more compact way! Had we stored the synapses as explicit links, we would have ended up with an explosion of extra entities in the database and with a less inefficient way to execute the network where a separate runtime representation would have probably been warranted. Here we make use of the ability of an edge to hold an arbitrary number of atoms, including other edges: the artificial neurons are interlinked recursively in a sense.

With this schema, a very large network can be represented, only portions of which are loaded in memory as the need arises. A search application for a corporate site with, say, several hundred thousand URLs and about a few thousand recurring searches will yield a NN with about a small factor of that many HyperGraphDB atoms.

Now, the input layer in our representation doesn't actually consist of Neuron instances, but of arbitrary atoms. For a search application those will simply be search terms. The output layer however is made of Neurons and we need a mechanism that associates an output Neuron with an output atom. One option is to create a Neuron subclass that holds a reference to the associated atom. Another option is to create a new link for the association:
public class NeuronOutput extends HGPlainLink
public NeuronOutput(HGHandle...args)
public HGHandle getNeuron()
return getTargetAt(0);
public HGHandle getOutputAtom()
return getTargetAt(1);
With both approaches, some book keeping will be required to remove the output Neuron if the output atom itself is removed. The NeuronOutput link representation leads to a slight performance hit because of the extra hop to get from output neuron to output atom, but that should be negligible for our purposes.

With that in place, we create a NeuralNet class to hold runtime information about an active portion of the whole network stored in the database. The class is intended to be lightweight, instantiated on demand and used within a single thread (e.g. for processing a search request). It implements the feedforward algorithm which should be fast, and the training algorithm which could be slower because it will be used only for training during idle time. For example, a search engine could process the accumulated feedback at night by training the neural net based on the daily activity. To represent the activation of neurons at a given layer, we have an auxiliary ActivationMap class that is a HashMap that returns a default value (usually 0) for missing keys. Activating a neuron involves a calculation where all inputs multiplied by the synaptic strength are summed and then an activation function is applied to the sum. The activation function could be a threshold function or a sigmoid function, like tanh which we use here. An abbreviated version of the NeuralNet class looks like this:
public class NeuralNet
private HyperGraph graph;
private ActivationFunction activationFunction = new TanhFun();
Map inputLayer = null;
Map hiddenLayer = null;
Map outputLayer = null;

private Map activateNextLayer(Map previousLayer, Map nextLayer)
for (Map.Entry in : previousLayer.entrySet())
Collection downstream = hg.getAll(graph,
for (Neuron n : downstream)
if (!nextLayer.containsKey(n))
nextLayer.put(graph.getHandle(n),, activationFunction));
return nextLayer;

public NeuralNet(HyperGraph graph)
this.graph = graph;

public Map getOutputLayer()
return outputLayer;

public void feedforward(ActivationMap inputs)
inputLayer = inputs;
hiddenLayer = activateNextLayer(inputLayer, new ActivationMap(0.0));
outputLayer = activateNextLayer(hiddenLayer, new ActivationMap(0.0));

public void train(Collection inputs, Collection outputs,
HGHandle selectedOutput)
Collection outputNeurons = updateNeuralStructure(inputs, outputs);
ActivationMap inputMap = new ActivationMap(0.0);
for (HGHandle in : inputs)
inputMap.put(in, 1.0);
selectedOutput = hg.findOne(graph,
hg.apply(hg.targetAt(graph, 0),
Map outputMap = new HashMap();
for (HGHandle h : outputNeurons)
outputMap.put(h, 0.0);
outputMap.put(selectedOutput, 1.0);
The feedforward method takes an input map and executes the network, loading only relevant neurons on demand. The result of the execution will be stored in the outputLayer member variable.

The train method takes a collection in input atoms, a collection of output atoms and a single output atom that was selected amongst all outputs. It starts by making sure that all output atoms have a corresponding neuron that is connected to an appropriate hidden neuron for this particular input combination. All this is done in the call to updateNeuralStructure. Then the method creates an ActivationMap representing the expected activation of the output layer: 0 for all outputs except the selected which has an activation of 1. Then the network is executed on the input to produce its own result for the output layer (stored in the outputLayer member variable). Finally, backpropagation adjusts synaptic strengths based on the different between expected and calculated output. A version of of train that uses a full map of expected output value can be trivially added here.

Explaining how backpropagation works and the math behind it is beyond the scope of this post. So we'll leave it at that. The full code is available from HyperGraphDB's SVN repository:

This post is intended mainly as a sample usage of HyperGraphDB. However, the code should be usable in a real world application. Get it from SVN, try it and write back if you have questions, bug finds or improvement suggestions.


Monday, January 4, 2010

HyperGraphDB 1.0 Released

Kobrix Software is pleased to announce the first official release of the HyperGraphDB database system. HyperGraphDB 1.0 is a general purpose, extensible, portable, distributed, embeddable, open-source data storage mechanism. Designed specifically for artificial intelligence and semantic web projects, it can also be used as an embedded object-oriented database for projects of all sizes.

HyperGraphDB is a Java based product built on top of the Berkeley DB storage library. It can be used as a single in-process database bound to a location on the local disk or within a "cloud" of networked database instances communicating and sharing data in a P2P (peer-to-peer) fashion.

Key Features of HyperGraphDB include:
  • Storage of generalized hypergraphs.
  • Open, extensible type system.
  • Basic query system and graph traversal algorithms.
  • Out-of-box support for Java object storage.
  • Thread-safe transactions.
  • P2P framework for data distribution.
For more information, documentation and downloads, please visit the HyperGraphDB Home Page.

Friday, January 1, 2010

Parsing the Google Wiki Format

Since we moved most of our projects on the Google Code servers and started using Google's wiki for documentation, I've been wanting to use that wiki as a publishing tool for anything related to a specific project as this saves coding and administrative efforts to manage everything on our server. This would avoid having to install and administer, or code up and administer something that's pretty much done a gazillion times and that is covered by blogs and wikis. Google offers an API to access the blogs so we can embed the blog on the website easily and there are enough examples on the internet on how to do that. The easiest thing to do is just copy and modify the examples from Google Code Playground. However, I couldn't find an API, nor 3d party code around the internet for the wikis. Google's wikis are a bit lousy, admittedly, but they are good enough for most purposes, so I decided to write a parser that translate a wiki page to HTML and now we can display docs on our website. Here is that code in the hope that it could benefit someone else.

First, Google stores the wiki pages of a project in that project's SVN repository. The URL to the latest version of a wiki page looks like this:

"http://" + project + "" + page + ".wiki";

Where project is the name of your project at Google and page the name of your page. So a plain wiki file can be downloaded with a standard HTTP client.

So the parser, a class called GoogleWikiViewer, just downloads a .wiki from SVN, converts it to HTML and sends it back to the browser in a simple Java servlet that looks like this:

protected void doPost(HttpServletRequest request,
HttpServletResponse response)
throws ServletException,
// ... your HTML preamble here, e.g. response.getWriter().println("");
String project = request.getParameter("project");
String wikipage = request.getParameter("page");
if (!(project == null || project.length() == 0
|| wikipage == null || wikipage.length() == 0))
GoogleWikiViewer viewer = new GoogleWikiViewer(project, wikipage, "wikishow");
// ... finish HTML document output

In the above the 'wikishow' parameter is the name of the very servlet using that code snippet. GoogleWikiViewer uses that to construct a URL of the form "wikishow?project=theproject&page=thepage". I know the trend these days is to write this stuff in JavaScript, but JavaScript takes longer to debug, and in any case if there's interest this could easily be ported over to JavaScript and made into an AJAX library. For now, it's just a simple, standalone Java class:


Please comment with bug reports/fixes. Bear in mind that this was written in a couple of hours and it doesn't cover everything (e.g. comments and gadgets are not supported) and it probably has a bug here and there. But the HyperGraphDB wiki pages get displayed correctly which means it covers all commonly used features.