Saturday, February 1, 2014

Application Components with AngularJS

This post is about self-contained business components with AngularJS that can be instantiated at arbitrary places in one's application. 

With AnuglarJS, one writes the HTML plainly and then implements behavior in JavaScript via controllers and directives. The controllers are about the model of which the HTML is the view, while directives are about extra tags and attributes that you can extend the HTML with. You are supposed to implement business logic in controllers and UI logic in directives. Good. But there are situations where the distinction is not so clear cut, in particular when you are building a UI by reusing business functionality in multiple places.

In a large applications, it often happens that the same piece of functionality has to be available in different contexts, in different flows of the user interaction. So it helps to be able to easily package that functionality and plug it whenever it's needed. For example, a certain domain object should be editable in place, or we need the ability to select among a list of dynamically generated domain objects. Those types of components are application components because they encapsulate reusable business logic and they are even tied to a specific backend, as opposed to, say, UI components which are to be (re)used in different applications and have much wider applicability. Because application components are not instantiated that often (as opposed to UI components which are much more universal), it is frequently tempting to copy&paste code rather than create a reusable software entity. Unless the componentization is really easy and requires almost no extra work.

If you are using AngularJS, here is a way to easily achieve this sort of encapsulation: 
  1. Put the HTML in a file as a self-contained template "partial" (i.e. without the top-level document HTML tags). 
  2. Have its controller JavaScript be somehow included in the main HTML page.
  3. Plug it in any other part of the application, like other HTML templates for example. 
This last part cannot be done with AngularJS's API. We have to write to gluing code. Since we will be plugging by referring to our component in an HTML template, we have to write a custom directive. Instead of writing a separate directive for each component as AngularJS documentation recommends, we will write one directive that will handle all our components. To be sure, there is generic directive to include HTML partials in AngularJS, the ng-view directive, but it's limited to swapping the main content of a page, too coarse grained that is. By contrast, our directive can be used anywhere, nested recursively etc. Here is an example of its usage:

<be-plug name="shippingAddressList">
  <be-model-link from-child-scope="currentSelection" 
       from-parent-scope="shippingAddress">
</be-model-link></be-plug>

This little snippet assumes we have an HTML template file called shippingAddressList.ht that lets the user select one among several addresses to ship the shopping cart contents to. We have a top-level tag called be-plug and nested tag called be-model-link. The be-model-link tag associates attributes of the component's model to attributes of the model (i.e. scope in AngularJS's terms) of the enclosing HTML. More on that below. Here is the implementation:

app.directive('bePlug', function($compile, $http) {
  return {
    restrict:'E',
    scope : {},
    link:function(scope, element, attrs, ctrl) {
      var template = attrs.name + ".ht";
      $http.get(template).then(function(x) {
        element.html(x.data);
        $compile(element.contents())(scope); 
        $.each(scope.modelLinks, function(atParent, atChild) {
          // Find a parent scope that has 'atParent' property
          var parentScope = scope;
          while (parentScope != null && 
                 !parentScope.hasOwnProperty(atParent))
            parentScope = parentScope.$parent;
          if (parentScope == null) 
            throw "No scope with property " + atParent + 
                  ", be-plug can't link models";
          scope.$$childHead.$watch(atChild, function(newValue) {
            parentScope[atParent] = newValue;
          });
          parentScope.$watch(atParent, function(newValue) {
            scope.$$childHead[atChild] = newValue;
          });            
        });
      });
    }
  };
});

Let's deconstruct the above code. First, make sure you are familiar with how to write directives in AngularJS and you understand what AngularJS scopes are. Next, note that we are creating a scope for our directive by declaring a scope:{} object. The purpose is twofold: (1) don't pollute the parent scope and (2) make sure we have a single child in our scope so we have a handle on the scope of the component we are including.

Good. Now, let's look at the gist of the directive, its link method. (I'm sure there is some valid reason that method is named "link". Perhaps because we are "linking" an HTML template to its containing element. Or to a model via the scope? Something like that.) In any case, that's were DOM manipulation is done. So here's what's happening in our implementation:
  • We fetch the HTML template from the server. By naming convention, we expect the file to have extension .ht. The rest of the relative path of the template file is given in the name attribute.
  • Once the template is loaded, we set it as the HTML content of the directive's element. So the resulting DOM will have a be-plug DOM node which the browser will happily ignore and inside that node there will be our component's HTML template.
  • Then we "compile" the HTML content using AngularJS's $compile service. This method call is essentially the whole point of the exercise. This is what allows AngularJS to bind model to view, to process any nested directives recursively etc. In short, this is what makes our textual content inclusion into a "runtime component instance". Well, this and also the following:
  • ...the binding of scope attributes between our enclosing element and the component we are including. This binding is achieved in the following for loop by observing variable changes in the scopes of interest.
That last point needs a bit more explaining. The HTML code that includes our component presumably has some associated model scope with attributes pertaining to business logic. On the other hand, the included component acquires its own scope with its own set of attributes as defined by its own controller. The two scopes end up in a parent-child relationship with the directive's scope (a third one) in between. From an application point of view, we probably have one or several chained parent scopes holding relevant model attributes and we'd want to somehow connect the data in our component model to the data in the enclosing scope. In the example above, we are connecting the shippingAddress attribute of our main application scope to the currentSelection attribute of the address selection component. In the context of the enclosing logic, we are dealing with a "shipping address", but in the context of the address selection component which simply displays a choice of addresses to pick from we are dealing with a "current selection". So we are binding the two otherwise independent concepts.

To implement this sort of binding of a given pair of model attributes, we need to know: the parent scope, the child scope, the name of the attribute in the parent scope and the name of the attribute in the child scope. To collect the pairs of attributes, we rely on a nested tag called be-model-link implemented as follows:

app.directive('beModelLink', function() {
  return {
    restrict:'E',    
    link:function(scope, element, attrs, ctrl) {
      if (!scope.modelLinks)
        scope.modelLinks = {};
      scope.modelLinks[attrs.fromParentScope] = attrs.fromChildScope;
    }
  };
});

Because we have not declared a private scope for the be-model-link directive, the scope we get is the one of the parent directive. This gives us the chance to put some data in it. And the data we put is the mapping from parent to child model attributes in the form of a modelLinks object. Note that we refer to this modelLink object in the setup of variable watching in the be-plug directive where we loop over all its properties and use AngularJS' $watch mechanism to monitor changes on either side and affect the same change to the linked attributes. To find the correct parent scope, we walk up the chain and get the first one which has the stated from-parent-scope attribute, throwing an error if we can't find it. The child scope is easy because there is only one child scope to our directive.

That's about it. We are essentially doing server-side includes like in the good (err..bad) old days, except because of the interactive nature of the "thing" with AJAX and all, and the whole runtime environment created by AngularJS, we have a fairly dynamic component. Hope you find this useful. 

Friday, September 6, 2013

Better-Setter-Getter

I must confess that it's very difficult to bring myself to write code with a long life expectancy in an alternative JVM language. Scala is slowly taking over and that's great. But if I'm working on an API, an open general purpose one, or some "important" layer within a system, the thought of not using Java doesn't even cross my mind. Not that those other languages are not good. But anyway, in this post I just wanted to complain that I'm tired of seeing, let alone writing getters and setters for bean properties. Since first-class properties don't seem to be on the table for Java, can we at least come up with a better naming convention? What would it take to have a tool supported, alternative Java Beans spec? The addition of closures in Java 8 is perhaps an opportunity and an excuse to revise this (e.g. event listeners can be specified more concisely). Perhaps it's already being discussed in the depths of the Oracle.

But let's start with the 'getFoo'/'setFoo' pair. For a "real world" example:
public class Customer {
  private String email;
  public void setEmail(String email) { this.email = email; }
  public String getEmail() { return email; }
}

That convention is ugly and verbose. For one thing why the need for the 'set' and 'get' prefixes? It is clear that if you are passing an argument to a function then you'd be setting. If there's no argument you are not setting, so you are probably getting as evidenced by the return value. Moreover, the fact that a setter returns no value is pure waste since you could always return this. Why is that important? Method chaining! Yes, method chaining is not to be underestimated and it's not just a trick of the conciseness-conscious coder. Since we in the Western world read left to right, programming languages are written and evaluated left to right. As a consequence, symbols to the left of the current position (of reading, writing or evaluating) tend to establish a context that could just be relied upon rather than repeated. When you put a semi-colon in Java, that's like a reset, context goes away, next statement. But if you are setting up a bunch of properties, as is frequently the case, your next statement will start by re-establishing the context you already had. That's more verbiage there on the screen starring at you and definitely kind of ugly. But it's also more work for the thing doing the evaluation. Values are returned on the call stack, so when your setter returns this, the next set operation doesn't need to push the object on the stack since it's already there. So there's a performance improvement there. Probably quite negligible, but I'd like to point it out because it ties to the observation that, in the kind of programming that consists of moving data in one form from one place to another form somewhere else, which is the bulk of so called "business" applications, less code does mean better performance.

The difference between:
Customer customer = new Customer();
customer.setFirstName("Bill");
customer.setLastName("Watterson");
customer.setOccupation("cartoonist");

and this:
Customer customer = new Customer().firstName("Bill").lastName("Watterson").occupation("cartoonist");

is not just a matter of taste. Well, unless you really have bad taste that is.

All it would take is for a couple of IDEs to support the new convention and a small lib that could quickly become ubiquitous. Or perhaps a big lib, something called betterjava.jar that would fix bad API designs from the early, and why not later, days of Java. I'm sure people aren't short of ideas as the Java standard APIs aren't short of problems.

Monday, November 19, 2012

eValhalla User Management

[Previous in this series: eValhalla Setup]

In this installment of the eValhalla development chronicle, we will be tackling what's probably the one common feature of most web-based application - user management. We will implement:
  1. User login
  2. A "Remember me" checkbox for auto-login of users on the same computer
  3. Registration with email validation
The end result can be easily taken and plugged in your next web app! Possible future improvements would be adding captchas during registration and/or login and the ability to edit more extensive profile information.

I have put comments wherever I felt appropriate in the code, which should be fairly straightforward anyway. So I won't be walking you line by line. Rather, I will explain what it does and comment on design decisions or on less obvious parts.   First, let me say a few words about how user sessions are managed.

User Sessions

Since we are relying on a RESTful architecture, we can't have a server hold user sessions. We need to store user session information at the client and transfer it with each request. Well, the whole concept of a user session is kind of irrelevant here since the application logic is mostly residing at the client and the server is mainly consulted as a database. Still, the data needs protection and we need the notion of a user with certain access rights. So we have to solve the problem of authentication. Each request needs to carry enough information for the server to decide whether the request should be honored or not. We cannot just rely on the user id because this way anybody can send a request with anybody else's id. To authenticate the client, the server will first supply it with a special secret key, an authentication token,  that the client must send on each request along with the user id. To obtain that authentication token, the client must however identify themselves by using a password. And that's the purpose of the login operation: obtaining an authentication token for use in subsequent requests. The client will then keep that token together with the user id and submit them as HTTP headers on every request. The natural way to do that with JavaScript is storing the user id and authentication tokens as cookies.

This authentication mechanism is commonly used when working within a RESTful architecture. For more on the subtleties of that approach, just google "user authentication and REST applications". One question is why not just send the password with every request instead of a separate token. That's possible, but more risky - a token is generated randomly and it expires, hence it is harder to guess. The big issue however is XSS (cross-site scripting) attacks. In brief, with XSS an attacker insert HTML code into a field that gets displayed supposedly as just static text to other users (e.g. a blog title) and the code simply does an HTTP request to a malicious server submitting all the users' private cookies with it. To avoid them, we will have to pay special attention on HTML sanitation. That is, we have to properly escape every HTML tag displayed as static text. We can also put that authentication token in an HTTPOnly cookie for extra security.

Implementation - a User Module

Since user management is so common, I made a small effort to build a relatively self-contained module for it. There are no standard packaging mechanisms for the particular technology stack we're using, so you'd just have to copy&paste a few files:

  • /html/ahguser.ht - contains the HTML code for login and registration dialogs as well as top-level Login and Register links that show up right aligned. This depends on the whole Angular+AngularUI+Bootstrap+jQuery environment. 
  • /javascript/ahguser.js - contains the Angular module 'ahgUser' that you can use in your Angular application. This depends on the backend:
  • /scala/evalhalla/user/package.scala - the evalhalla mJson-HGDB backend REST user service. 

The backend can be easily packaged in a jar, mavenized and all, and this is something that I might do near the end of the project. 

The registration process validates the user's email by emailing them a randomly generated UUID (a HyperGraphDB handle) and requiring that they provide it back for validation before they can legitimately log into the site. Since the backend code is simple enough, repetitive even, let's look at one method only, the register REST endpoint:

@POST
@Path("/register")
def register(data:Json):Json = {
  return transact(Unit => {
    normalize(data);
    // Check if we already have that user registered
    var profile = db.retrieve(
        jobj("entity", "user", 
             "email", data.at("email")))
    if (profile != null)
      return ko().set("error", "duplicate")
    // Email validation token
    var token = graph.getHandleFactory()
          .makeHandle().toString()
    db.addTopLevel(data.set("entity", "user")
                       .set("validationToken", token)
                       .delAt("passwordrepeat"))
    evalhalla.email(data.at("email").asString(), 
      "Welcome to eValhalla - Registration Confirmation",
      "Please validate registration with " + token)
    return ok
  })
}

So we see our friends db, transact, jobj etc. from last time. The whole thing is a Scala transaction closure with the much nicer than java.lang.Callable Scala syntax. As a reminder, note that most of the functions and objects referred here are globally declared in the evalhalla package. For example, db is an instance of the HyperNodeJson class. The call to normalize just ensures the email is lower-case, because email in general are case-insensitive. While the logic is fairly straightforward, let me make a few observations about the APIs.

User profile lookup is done with a Json pattern matching.  Recall that the database stores arbitrary Json structures (primitives, arrays, objects and nulls) as HyperGraphDB atoms. It doesn't have the notion of "document collections" like MongoDB for example. This is because each Json structure is just a portion of the whole graph. So to distinguish between different types of objects we define a special JSON property called entity that we reserve as a type name attached to all of our top-level Json entities. Here we are dealing with entities of type "user". Now, each atom in HyperGraphDB and consequently each user profile has a unique identifier - the HyperGraphDB handle (a UUID). There is no notion of a primary key enforced at the database level. We know that an email should uniquely identify a user profile so we perform the lookup with the Json object {entity:"user", email:<the email>} as a pattern. But this uniqueness is enforced at the application level because new profiles are added only precisely via this register method. I've explained the db.addTopLevel method on the  HGDB-mJson wiki page, but here is a summary for the impatient: while db.add will create a duplicate version of the whole Json structure recursively and while db.assertAtom will only create something if it doesn't exist yet, db.addTopLevel will create a new database atom only for the top-level JSON, but perform an assert operation for each of its components.

Finally, note that before adding the profile to the database, we delete the passwordrepeat property. This is because we are storing in the database whatever JSON object the client gives us. And the client is giving us all fields coming from the HTML registration form, pretty much as the form is defined. So we get rid of that unneeded field. Granted, it would be actually better design to remove that property at the client-side since the repeat password validation is done there. But I wanted to illustrate the fact that the JSON data is really flowing as is from the HTML form directly to the database, with no mappings or translations of any sort needed. 

So far, so good. Let's move on the UI side.

Client-Side Application

In addition to AngularJS, I've incorporated the Bootstrap front-end library by Twitter. It looks good and it has some handy components. I've also included the Angular-UI library which has some extras like form validation and it plays well with Bootstrap.

The whole application resides on one top-level HTML page /html/index.html. The main JavaScript program that drives it resides in the /javascript/app.js file. The idea is to dynamically include fragments of HTML code inside a reserved main area while some portions of the page such as the header with user profile functionality remain stable. By convention, I use the extension '.ht' for such HTML fragments instead of '.html' to indicate that the content is not a complete HTML page. Here's the main page in its essence:


<div ng-controller="MainController">
<div ng-include="ng-include" src="'/ahguser.ht'"></div>
<!-- some extra text here... -->
<hr/>
<a href="#m">main</a> |
<a href="#t">test</a>
<hr/>
<ng-view/>  <!-- this is where HTML fragments get included -->

The MainController function is in app.js and it will eventually establish some top-level scoped variables and functions, but it's currently empty. The user module described above is included as a UI component by the <div ng-include...> tag. The login and register links that you see at the top right of the screen with all their functionality come from that component. Finally, just for the fun of it, I made a small menu with two links to switch the main view from one page fragment to another: 'main' and 'test'. This is just to setup a blueprint for future content.

A Note on AngularJS

Let me explain a bit the Angular portion of that code since this framework, as pretentious as it is, lacks in documentation and it's far from being an intuitive API. In fact, I wouldn't recommend yet. My use of it is experimental and the first impressions are a bit ambivalent. Yes, eventually you get stuff to work. However, it's bloated and it fails the first test of any over-reaching framework: simple things are not easy. It seems built by a team of young people that use the word awesome a lot, so we'll see if it meets the second test, that complicated things should be possible. Over the years, I've managed to stay away from other horrific, bloated frameworks, aggressively marketed by a big companies (e.g. EJBs), but here I may be just a bit too pessimistic. Ok, enough.

Angular reads and interprets your HTML before it gives it to the browser. That opens many doors. In particular, it allows you to define custom attributes and tags to implement some dynamic behaviors. It has the notion of an application and the notion of modules. An application is associated with a top-level HTML tag, usually the 'html' tag itself by setting the custom ng-app='appname' attribute. Then appname is declared as a module in JavaScript:

var app = angular.module('appname', [array of module dependencies])

It's not clear what's special about an application vs. mere modules, presumably nothing. Then functionality is attached to markup (the "view") via Angular controllers. Those are JavaScript functions that you write and Angular calls to setup the model bound to the view. Controller functions take any number of parameters and Angular uses a nifty trick here. When you call the toString method of a JavaScript function, it returns its full text as it was originally parsed. That includes the formal arguments, exactly with the names you have listed in the function declaration (unless you've used some sort of minimization/obfuscation tool). So Angular parses that argument list and uses the names of the parameters to determine what you need in your function. For example, when you declare a controller like this:

function MainController($scope, $http) {
}
A call to MainController.toString() returns the string "function MainController($scope, $http) { }". Angular parses that string and determines that you want "$scope" and "$http". It recognizes those names and passes in the appropriate arguments for them. The name "$scope" is a predefined AngularJS name that refers to an object to be populated with the application model. Properties of that object can be bound to form elements, or displayed in a table or whatever. The name "$http" refers to an Angular service that allows you to make AJAX calls. As far as I understand it, any global name registered as a service with Angular can be used in a controller parameter list. There's a dependency injection mechanism that takes care of, among other things, hooking up services in controllers by matching parameter names with globally registered functions. I still haven't figured out what the practical benefit of that is, as opposed to having global JavaScript objects yourself....perhaps in really large applications where different clusters of the overall module dependency graph use the same names for different things.

Beginnings of a Data Service

One of the design goals in this project is to minimize the amount of code handling CRUD data operations. After all, CRUD is pretty standard and we are working with a schema-less flexible database. So we should be able to do CRUD on any kind of structured object we want without having to predefine its structure. In all honesty, I'm not sure this will be as easy as it sounds. As mentioned before, the main difficulty are security access rules. It's certainly doable, but we shall see what kind of complexities it leads to in the subsequent iterations. For now, I've created a small class called DataService that allows you to perform a simple JSON pattern lookup as well as all CRUD operations on any entity identified by its HyperGraphDB handle.

One can experiment with this interface by making $.ajax calls in the browser's REPL. The interface is certainly going to evolve and change, but here are a couple of calls you can try out. I've made the '$http' service available as a global variable:

$http.post("/rest/data/entity", {entity:'story', content:'My story starts with....'})
$http.get("/rest/data/list?pattern=" + JSON.stringify({entity:'story'})).success(function(A) { console.log(A); })
The above creates a new entity in the DB and then retrieves it via a query for all entities of that type (i.e. entity:'story'). You can also play around with $http.put and $http.delete.

Conclusion

All right, this concludes the 2nd iteration of eValhalla. We've implemented a big part of what we'd need for user management. We've explored AngularJS as a viable UI framework and we've laid the groundwork of a data-centered REST service. To get it, follow the same steps as before, but checkout the phase2 GIT tag instead of phase1:


  1. git clone https://github.com/publicvaluegroup/evalhalla.git
  2. cd evalhalla
  3. git checkout phase2
  4. sbt
  5. run


Coming Up...

On the next iteration, we will do some data modeling and define the main entities of our application domain. We will also implement a portion of the UI dealing with submission and listing of stories. 

Sunday, November 4, 2012

HyperGraphDB 1.2 Final Released

Kobrix Software is pleased to announce the release of HyperGraphDB 1.2 Final.

Several bugs were found and corrected during the beta testing period, most notably having to do with indexing.

Go directly to the download page.

HyperGraphDB is a general purpose, free open-source data storage mechanism. Geared toward modern applications with complex and evolving domain models, it is suitable for semantic web, artificial intelligence, social networking or regular object-oriented business applications.
This release contains numerous bug fixes and improvements over the previous 1.1 release. A fairly complete list of changes can be found at the Changes for HyperGraphDB, Release 1.2 wiki page.
  1. Introduction of a new HyperNode interface together with several implementations, including subgraphs and access to remote database peers. The ideas behind are documented in the blog post HyperNodes Are Contexts.
  2. Introduction of a new interface HGTypeSchema and generalized mappings between arbitrary URIs and HyperGraphDB types.
  3. Implementation of storage based on the BerkeleyDB Java Edition (many thanks to Alain Picard and Sebastian Graf!). This version of BerkeleyDB doesn't require native libraries, which makes it easier to deploy and, in addition, performs better for smaller datasets (under 2-3 million atoms).
  4. Implementation of parametarized pre-compiled queries for improved query performance. This is documented in the Variables in HyperGraphDB Queries blog post.
HyperGraphDB is a Java based product built on top of the Berkeley DB storage library.

Key Features of HyperGraphDB include:
  • Powerful data modeling and knowledge representation.
  • Graph-oriented storage.
  • N-ary, higher order relationships (edges) between graph nodes.
  • Graph traversals and relational-style queries.
  • Customizable indexing.
  • Customizable storage management.
  • Extensible, dynamic DB schema through custom typing.
  • Out of the box Java OO database.
  • Fully transactional and multi-threaded, MVCC/STM.
  • P2P framework for data distribution.
In addition, the project includes several practical domain specific components for semantic web, reasoning and natural language processing. For more information, documentation and downloads, please visit the HyperGraphDB Home Page.

Many thanks to all who supported the project and actively participated in testing and development!

Wednesday, October 24, 2012

RESTful Services in Java - a Busy Developer's Guide

The Internet doesn't lack expositions on REST architecture, RESTful services, and their implementation in Java. But, here is another one. Why? Because I couldn't find something concise enough to point readers of the eValhalla blog series.

What is REST?
The acronym stands for Representational State Transfer. It refers to an architectural style (or pattern) thought up by one of the main authors of the HTTP protocol. Don't try to infer what the phrase "representational state transfer" could possibly mean. It sounds like there's some transfer of state that's going on between systems, but that's a bit of a stretch. Mostly, there's transfer of resources between clients and servers. The clients initiate requests and get back responses. The responses are resources in some standard media type such as XML  or JSON or HTML. But, and that's a crucial aspect of the paradigm, the interaction itself is stateless. That's a major architectural departure from the classic client-server model of the 90s. Unlike classic client-server, there's no notion of a client session here. 

REST is offered not as a procotol, but as an architectural paradigm. However, in reality we are pretty much talking about HTTP of which REST is an abstraction. The core aspects of the architecture are (1) resource identifiers (i.e. URIs);  (2) different possible representations of resources, or internet media types (e.g. application/json); (3) CRUD operations support for resources like the HTTP methods  GET, PUT, POST and DEL. 

Resources are in principle decoupled from their identifiers. That means the environment can deliver a cached version or it can load balance somehow to fulfill the request. In practice, we all know URIs are actually addresses that resolve to particular domains so there's at least that level of coupling.  In addition, resources are decoupled from their representation. A server may be asked to return HTML or XML or something else. There's content negotiation going on where the server may offer the desired representation or not. The CRUD operations have constraints on their semantics that may or may not appear obvious to you. The GET, PUT and DEL operations require that a resource be identified while POST is supposed to create  a new resource. The GET operation must not have side-effects. So all other things being equal, one should be able to invoke GET many times and get back the same result.  PUT updates a resource, DEL removes it and therefore they both have side-effects just like POST. On the other hand, just like GET, PUT may be repeated multiple times always to the same effect. In practice, those semantics are roughly followed. The main exception is the POST method which is frequently used to send data to the server for some processing, but without necessarily expecting it to create a new resource. 

Implementing RESTful services revolves around implementing those CRUD operations for various resources. This can be done in Java with the help of a Java standard API called JAX-RS.

REST in Java = JAX-RS = JSR 311
In the Java world, when it comes to REST, we have the wonderful JAX-RS. And I'm not being sarcastic! This is one of those technologies that the Java Community Process actually got right, unlike so many other screw ups. The API is defined as JSR 311 and it is at version 1.1, with work on version 2.0 under way. 

The beauty of JAX-RS is that it is almost entirely driven by annotations. This means you can turn almost any class into a RESTful service. You can simply turn a POJO into a REST endpoint by annotating it with JSR 311 annotations. Such an annotated POJOs is called a resource class in JAX-RS terms.

Some of the JAX-RS annotations are at the class level, some at the method level and others at the method parameter level. Some are available both at class and method levels. Ultimately the annotations combine to make a given Java method into a RESTful endpoint accessible at an HTTP-based URL. The annotations must specify the following elements:

  • The relative path of the Java method - this is accomplished with @Path annotation.
  • What the HTTP verb is, i.e. what CRUD operation is being performed - this is done by specifying one of @GET, @PUT, @POST or @DELETE annotations. 
  • The media type accepted (i.e. the representation format) - @Consumes annotation.
  • The media type returned - @Produces annotation.

The two last ones are optional. If omitted, then all media types are assumed possible. Let's look at a simple example and take it apart:

import javax.ws.rs.*;
@Path("/mail")
@Produces("application/json")
public class EmailService
{
    @POST
    @Path("/new")
    public String sendEmail(@FormParam("subject") String subject,
                            @FormParam("to") String to,
                            @FormParam("body") String body)  {
        return "new email sent";
    }
    
    @GET
    @Path("/new")
    public String getUnread()  {
        return "[]";
    }
   
    @DELETE
    @Path("/{id}")
    public String deleteEmail(@PathParam("id") int emailid)  {
        return "delete " + id;
    }
    
    @GET
    @Path("/export")
    @Produces("text/html")
    public String exportHtml(@QueryParam("searchString") 
                             @DefaultValue("") String search) {
        return "<table><tr>...</tr></table>";
    }
}
The class define a RESTful interface for a hypothetical HTTP-based email service. The top-level path mail is relative to the root application path. The root application path is associated with the JAX-RS javax.ws.rs.core.Application  that you extend to plugin into the runtime environment. Then we've declared with the @Produces annotation that all methods in that service produce JSON. This is just a class-default that one can override for individual methods like we've done in the exportHtml method. The sendMail method defines a typical HTTP post where the content is sent as an HTML form. The intent here would be to post to http://myserver.com/mail/new a form for a new email that should be sent out. As you can see, the API allows you to bind each separate form field to a method parameter. Note also that you have a different method for the exact same path. If you do an HTTP get at /mail/new, the Java method annotated with @GET will be called instead. Presumably the semantics of get /mail/new would be to obtain the list of unread emails. Next, note how the path of the deleteEmail method is parametarized by an integer id of the email to delete. The curly braces indicate that "id" is actually a parameter. The value of that parameter is bound to the whatever is annotated with  @PathParam("id"). Thus if we do an HTTP delete at http://myserver.com/mail/453 we would be calling the deleteEmail method with argument emailid=453. Finally, the exportHtml method demonstrates how we can get a handle on query parameters. When you annotate a parameter with @QueryParam("x") the value is taken from the HTTP query parameter named x. The @DefaultValue annotation provides a default in case that query parameter is missing. So, calling http://myserver.org/mail/export?searchString=RESTful will call the exportHtml method with a parameter search="RESTful".

To expose this service, first we need to write an implementation of javax.ws.rs.core.Application. That's just a few lines:

public class MyRestApp extends javax.ws.rs.core.Application {
   public Set>Class> getClasses() {
      HashSet S = new HashSet();
      S.add(EmailService.class);
      return S;
   }
}

How this gets plugged into your server depends on your JAX-RS implementation. Before we leave the API, I should mentioned that there's more to it. You do have access to a Request and Response objects. You have annotations to access other contextual information and metadata like HTTP headers, cookies etc. And you can provide custom serialization and deserialization between media types and Java objects.

RESTful vs Web Services
Web services (SOAP, WSDL) were heavily promoted in the past decade, but they didn't become as ubiquitous as their fans had hoped. Blame XML. Blame the rigidity of the XML Schema strong typing. Blame the tremendous overhead, the complexity of deploying and managing a web service. Or, blame the frequent compatibility nightmares between implementations. Reasons are not hard to find and the end result is that RESTful services are much easier to develop and use. But there is a flip side!

The simplicity of RESTful services means that one has less guidance in how to map application logic to a REST API. One of the issues is that instead of the programmatic types we have in programming languages, we have the Java primitives and media types. Fortunately, JAX-RS allows to implement whatever conversions we want between actual Java method arguments and what gets sent on the wire.  The other issue is the limited set of operations that a REST service can offer. While with web services, you define the operation and its semantics just as in a general purpose programming language, with RESTful you're stuck with get, put, post and delete. So, free from the type mismatch nightmare, but tied into only 4 possible operations. This is not as bad as it seems if you view those operations as abstract, meta operations.

The key point when designing RESTful services, whether you are exposing existing application logic or creating a new one, is to think in terms of data resources. That's not so hard since most of what common business applications do is manipulate data. First, because every single thing is identified as a resource, one must come up with an appropriate naming schema. Because URIs are hierarchical, it is easy to devise a nested structure like /productcategory/productname/version/partno. Second, one must decide what kinds of representations are to be supported, both in output and input. For a modern AJAX webpp, we'd mostly use JSON. I would recommend JSON over XML even in a B2B setting where servers talk to each other.

Finally, one must categorize business operation as one of GET, PUT, POST and DELETE. This is probably a bit less intuitive, but it's just a matter of getting used to. For example, instead of thinking about a "Checkout Shopping Cart" operation, think about POSTing a new order. Instead of thinking about a "Login User" operation think about GETing an authentication token. In general, every business operation manipulates some data in some way. Therefore, every business operation can fit into this crude CRUD model. Clearly, most read-only operations should be a GET. However, sometimes you have to send a large chunk of data to the server in which case you should use POST. For example you could post some very time consuming query that require a lot of text to specify. Then the resource you are creating is for example the query result. Another way to decide if you should POST or no is if you have a unique resource identifier. If not, then use POST. Obviously, operations that cause some data to be removed should be a DELETE. The operations that "store" data are PUT and again POST. Deciding between those two is easy: use PUT whenever you are modifying an existing resource for which you have an identifier. Otherwise, use POST. 

Implementations & Resources
There are several implementations to choose from. Since, I haven't tried them all, I can't offer specific comments. Most of them used to require a servlet containers. The Restlet framework by Jerome Louvel never did, and that's why I liked it. Its documentation leaves to be desired and if you look at its code, it's over-architected to a comical degree, but then what ambitious Java framework isn't. Another newcomer that is strictly about REST and seems lightweight is Wink, an Apache incubated project. I haven't tried it, but it looks promising. And of course, one should not forget the reference implementation Jersey. Jersey has the advantage of being the most up-to-date with the spec at any given time. Originally it was dependent on Tomcat. Nowadays, it seems it can run standalone so it's on par with Restlet which I mentioned first because that's what I have mostly used. 

Here are some further reading resources, may their representational state be transferred to your brain and properly encoded from HTML/PDF to a compact and efficient neural net:
  1. The Wikipedia article on REST is not in very good shape, but still a starting point if you want to dig deeper into the conceptual framework. 
  2. Refcard from Dzone.com: http://refcardz.dzone.com/refcardz/rest-foundations-restful#refcard-download-social-buttons-display 
  3. Wink's User Guide seems well written. Since it's an implementation of JAX-RS, it's a good documentation of that technology.
  4. http://java.dzone.com/articles/putting-java-rest: A fairly good show-and-tell introduction to the JAX-RS API, with a link in there to a more in-depth description of REST concepts by the same author. Worth the read. 
  5.  http://jcp.org/en/jsr/detail?id=311: The official JSR 311 page. Download the specification and API Javadocs from there.
  6. http://jsr311.java.net/nonav/javadoc/index.html: Online access of JSR 311 Javadocs.
If you know of something better, something nice, please post it in a comment and I'll include in this list.

PS: I'm curious if people start new projects with Servlets, JSP/JSF these days? I would be curious as to what the rationale would be to pick those over AJAX + RESTful services communication via JSON. As I said above, this entry is intended to help readers of the eValhalla blogs series which chronicles the development of the eValhalla project following precisely the AJAX+REST model 

Saturday, October 20, 2012

eValhalla Setup

[Previous in this series: eValhalla Kick Off, Next: eValhalla User Management]

The first step in eValhalla after the official kick off is to setup a development environment with all the selected technologies. That's the goal for this iteration. I'll quickly go through the process of gathering the needed libraries and implement a simple login form that ties everything together.

Technologies

Here are the technologies for this project:
  1. Scala programming language - I had a dilemma. Java has a much larger user base and therefore should have been the language of choice for a tutorial/promotional material on HGDB and JSON storage with it. However, this is actually a serious project to go live eventually and I needed an excuse to code up something more serious with Scala, and Scala has enough accumulated merits, so Scala it is. However, I will show some Java code as well,  just in the form of examples, equivalent to the main code.
  2. HyperGraphDB with mJson storage - that's a big part of my motivation to document this development. I think HGDB-mJson are a really neat pair and more people should use them to develop webapps. 
  3. Restlet framework REST - this is one of very few implementations of JSR 311, that is sort of lightweight and has some other extras when you need them. 
  4. jQuery - That's a no brainer.
  5. AngularJS - Another risky choice, since I haven't used this before. I've used KnockoutJS and Require.js, both great frameworks and well-thought out. I've done some ad hoc customization of HTML tags, tried various template engines, AngularJS promises to give me all of those in a single library. So let's give it a shot.
Getting and Running the Code

Before we go any further, I urge you to get, build and run the code. Besides Java and Scala, I encourage you to get a Git client (Git is now supported on Windows as well), and you need the Scala Build Tool (SBT). Then, on a command console, issue the following commands:
  1. git clone https://github.com/publicvaluegroup/evalhalla.git
  2. cd evalhalla
  3. git checkout phase1
  4. sbt
  5. run
Note the 3d step of checking out the  phase1 Git tag - every blog entry is going to be a separate development phase so you can always get the state of the project at a particular blog entry.  If you don't have Git, you can download an archive from:

https://github.com/publicvaluegroup/evalhalla/zipball/phase1

All of the above commands will take a while to execute the first time, especially if you don't have SBT yet. But at the end you should see the something like this on your console:

[info] Running evalhalla.Start 
No config file provided, using defaults at root /home/borislav/evalhalla
checkpoint kbytes:0
checkpoint minutes:0
Oct 18, 2012 12:01:01 AM org.restlet.engine.connector.ClientConnectionHelper start
INFO: Starting the internal [HTTP/1.1] client
Oct 18, 2012 12:01:01 AM org.restlet.engine.connector.ServerConnectionHelper start
INFO: Starting the internal [HTTP/1.1] server on port 8182
Started with DB /home/borislav/evalhalla/db

and you should have a running local server accessible at http://localhost:8182. Hit that URL, type in a username and a password and hit login.

Architectural Overview

The architecture is made up of a minimal set of REST services that essentially offer user login and access-controlled data manipulation to a JavaScript client-side application. The key will be to come up with an access policy that deals gracefully with a schema free database.

The data itself consists of JSON objects stored as a hypergraph using HGDB-mJson. From the client side we can create new objects and store them. We can then query for them or delete them. So it's a bit like the old client-server model from the 90s. HyperGraphDB supports strongly typed data, but we won't be taking advantage of that. Instead, each top-level JSON object will have a special property called entity that will contain the type of the database entity as a string. This way, when we search for all users for example, we'll be searching for all JSON objects with property entity="user".

There are many reasons to go for REST+AJAX rather than, say, servlets. I hardly feel the need to justify it  - it's stateless, you don't have to deal with dispatching, you just design an API, more responsive, we're in 2013 soon after all. The use of JSR 311 allows us to switch server implementations easily. It's pretty well-designed: you annotate your classes and methods with the URI paths they must be invoked for. Then a method's parameters can be bound either to portions of a URI, or to HTTP query parameters or form fields etc.

I'm not sure yet what the REST services will be exactly, but the idea is to keep them very generic so the core could be just plugged for another webapp and writing future applications could be entirely done in JavaScript.

Project Layout

The root folder contains the SBT build file build.sbt, a project folder with some plugin configurations for SBT and a src folder that contains all the code following Maven naming conventions which SBT adopts. The src/main/html and src/main/javascript folders contain the web application. When you run the server with all default options, that's where it serves up the files from. This way, you can modify them and just refresh the page.  Then src/main/scala contains our program and src/main/java some code for Java programmers to look at and copy & paste. The main point of the Java code is really to help people that want to use all those cool libraries but prefer to code in Java instead.

To load the project in Eclipse,  use SBT to generate project files for you. Here's how:
  1. cd evalhalla
  2. sbt
  3. eclipse
  4. exit
Then you'll have a .project and a .classpath file in the current directory, so you can go to your Eclipse and just do "Import Project". Make sure you've run the code before though, in order to force SBT to download all dependencies.

Code Overview

Ok, let's take a look at the code now, all under src/main. First, look at html/index.html, which gets loaded as the default page. It contains just a login form and the interesting part is the function eValhallaController($scope, $http). This function is invoked by AngularJS due to the ng-controller attribute in the body tag. It provides the data model of the HTML underneath and also a login button event handler, all through the $scope parameter. The properties are associated with HTML element via ng-model and buttons to functions with ng-click. An independent tutorial on AngularJS, one of few since it's pretty new, can be found here.

The doLogin posts to /rest/user/login. That's bound to the evalhalla.user.UserService.authenticate method (see user package). The binding is done through the standard JSR 311 Java annotations, which also work in Scala. I've actually done an equivalent version of this class in Java at  java/evalhalla/UserServiceJava. A REST service is essentially a class where some of the public methods represent HTTP endpoints. An instance of such a class must be completely stateless, a JSR 311 implementation is free to create fresh instances for each request. The annotations work by deconstructing an incoming request's URI into relative paths at the class level and then at the method level. So we have the @Path("/user") annotation (or @Path("/user1") for the Java version so they don't conflict).  Note the @Consumes and @Produces annotations at the class level that basically say that all methods in that REST service accept JSON content submitted and return back JSON results. Note further how the authenticate method takes a single Json parameter and returns a Json value. Now, this is mjson.Json and JSR 311 doesn't know about it, but we can tell it to convert to/from in/output stream. This is done in the java/evalhalla/JsonEntityProvider.java class (which I haven't ported to Scala yet). This entity provider and the REST services themselves are plugged into the framework at startup, so before examining the implementation of authenticate, let's look at the startup code.

The Start.scala file contains the main program and the JSR 311 eValhalla application implementation class. The application implementation is only required to provide all REST services as a set of classes that the JSR 311 framework introspects for annotations and for the interfaces they implement. So the entity converter mentioned above, together with both the Scala and Java version of the user service are listed there. The main program itself contains some boilerplate code to initialize the Restlet framework and asks it to serve up some files from the html and javascript folders and it also attaches the JSR 311 REST application under the 'rest' relative path.

An important line in main is evalhalla.init(). This initializes the evalhalla package object defined in scala/evalhalla/package.scala. This is where we put all application global variables and utility methods. This is where we initialized the HyperGraphDB instance. Let's take a closer look. First, configuration is optionally provided as a JSON formatted file, the only possible argument to the main program. All properties of that JSON are optional and have sensible defaults. With respect to deployment configuration, there are two important locations: the database location and the web resources location. The database location, specified with dbLocation, is by default taken to be db under the working directory, from where the application is run. So for example if you've followed the above instructions to run the application from the SBT command prompt for the first time, you'd have a brand new HyperGraphDB instance created under your EVALHALLA_HOME/db. The web resources served up (html, javascript, css, images) are configured with siteLocation the default being src/main so you can modify source and refresh. So here is how the database is created. You should be able to easily follow Scala code even if you're a mainly Java programmer.

    val hgconfig = new HGConfiguration()
    hgconfig.getTypeConfiguration().addSchema(new JsonTypeSchema())
    graph = HGEnvironment.get(config.at("dbLocation").asString(), hgconfig)
    registerIndexers
    db = new HyperNodeJson(graph)

Note that we are adding a JsonTypeSchema to the configuration before opening the database. This is important for the mJson storage implementation that we are mostly going to rely on. Then we create the graph database, create indices (for now just an empty stub) and last but not least create an mJson storage view on the graph database - a HyperNodeJson instance. Please take a moment to go through the wiki page on HGDB-mJson. The graph and db variables above are global variables that we will be accessing from everywhere in our application. 

Some other points of interest here are the utility methods:

  def ok():Json = Json.`object`("ok", True);
  def ko(error:String = "Error occured") = Json.`object`("ok", False, "error", error);

Those are global as well and offer some standard result values from REST services that the client side may rely on. Whenever everything went well on the server, it returns an ok() object that has a boolean true ok property. If something went wrong, the ok boolean of the JSON returned by a REST call is false and the error property provides an error message. Any other relevant data, success or failure, is embedded with those ok or ko objects. 

Lastly, it is common to wrap pieces of code in transactions. After all, we are developing a database backed applications and we want to take full advantage of the ACID capabilities of HyperGraphDB. Scala makes this particularly easy because it supports closures. So we have yet another global utility method that takes a closure and runs it as a HGDB transaction:

  def transact[T](code:Unit => T) = {
    try{
    graph.getTransactionManager().transact(new java.util.concurrent.Callable[T]() {
      def call:T = {
        return code()
      }
    });
    }
    catch { case e:scala.runtime.NonLocalReturnControl[T] => e.value}
  }

This will always create a new transaction. Because BerkeleyDB JE, which is the storage engine by default as of HyperGraphDB 1.2, doesn't supported nested transaction, one must make sure the transact is not called within another transaction. So when we are in a situation where we want to have a transaction and we'd happily be embedded in some top-level one, we can call another global utility function: ensureTx, which behaves like transact except it won't create a new transaction if one is already in effect.

Ok, armed with all these preliminaries, we are now able to examine the authenticate method:

    @POST
    @Path("/login")
    def authenticate(data:Json):Json = {
        return evalhalla.transact(Unit => {
            var result:Json = ok();
            var profile = db.retrieve(jobj("entity", "user", 
                        "email", data.at("email").asString().toLowerCase()))
            if (profile == null) { // automatically register user for now         
              val newuser = data.`with`(jobj("entity", "user"));
              db.addTopLevel(newuser);
            }
            else if (!profile.is("password", data.at("password")))
                result = ko("Invalid user or password mismatch.");
            result;   
        });
    }

The @POST annotation means that this endpoint will be matched only for an HTTP post method. First we do a lookup for the user profile. We do this by pattern matching. We create a Json object that first identifies that we are looking for an object of type user by setting the entity property to "user". Then we provide another property, the user's email which we know is supposed to be unique so we can treat it as a primary key. However, note that neither HyperGraphDB nor its Json storage implementation provides some notion of a primary other than HyperGraphDB atom handles. The HyperNodeJson.retrieve method returns only the first object matching the pattern. If you want an array of all objects matching the pattern use HyperNodeJson.getAll. Note the 'jobj' method call in there: this is a rename in the import section of the Json.object factory method. It is necessary because in Scala object is a keyword. Another way to use a keyword as a method name in Scala beside import rename, on can wrap it in backquotes ` as is done with data.`with` above which is basically an append operation, merging the properties of one Json object into another. The db.addTopLevel is explained in the HGDB-mJson wiki. Also, you may want to refer to the mJson API Javadoc. One last point though about the structure of the authenticate method: there are no local returns. The result variable contains the result and it is written as the last expression of the function and therefore returned as the result. I like local returns actually (i.e. return statement in the middle of the method following if conditions or within loops or whatever), but the way Scala implements them is by throwing a RuntimeException. However, this exception gets caught inside the HyperGraphDB transaction which has a catch all clause and treats exceptions as a true error conditions rather then some control flow mechanism.  This can be fixed inside HyperGraphDB, but avoiding local returns is not such a bad coding practice anyway.

Final Comments

Scala is new to me so take my Scala coding style with a grain of salt. Same goes with AngularJS. I use Eclipse and the Scala IDE plugin from update site  http://download.scala-ide.org/releases-29/stable/site (http://scala-ide.org/download/nightly.html#scala_ide_helium_nightly_for_eclipse_42_juno for Eclipse 4.2). Some of the initial Scala code is translated from equivalent Java code from other projects. If you haven't worked with Scala, I would recommend giving it a try. Especially if, like me, you came to Java from a richer industrial language like C++ and had to give up a lot of expressiveness.

I'll resist the temptation to make this into an in-depth tutorial of everything used to create the project. I'll say more about whatever felt not that obvious and give pointers, but mostly I'm assuming that the reader is browsing the relevant docs alongside reading the code presented here. This blog is mainly a guide.

In the next phase, we'll do the proverbial user registration and put in place some security mechanisms.

Tuesday, October 9, 2012

eValhalla Kick Off

[Next in this series eValhalla Setup]

As promised in this previous blog post, I will now write a tutorial on using the mJson-HGDB backend for implementing REST APIs and web apps. This will be more extensive than originally hinted at. I got involved in a side project to implement a web site where people can post information about failed government projects. This should be pretty straightforward so a perfect candidate for a tutorial. I decided to use that opportunity to document the whole development in the form of blog series, so this will include potentially useful information for people not familiar with some of the web 2.0 libraries such as jQuery and AngularJS which I plan to use. All the code will be available in GitHub. I am actually a Mercurial user, something turned me off from Git when I looked at it before (perhaps just its obnoxious author), but I decided to use this little project as an opportunity to pick up a few new technologies. The others will be Scala (instead of Java) and AngularJS (instead of Knockoutjs).


About eValhalla

Valhalla - the hall of Odin into which the souls of heroes slain in battle and others who have died bravely are received.

The aim is to provide a forum for people involved in (mainly software) projects within government agencies to report anonymously on those projects' failures. Anybody can create a login, without providing much personal information and be guaranteed that whatever information is provided remains confidential if they so choose. Then they can describe projects they have insider information about and that can be of interest to the general public. Those projects could be anything from small scale, internal-use only, local government, to larger-scale publicly visible nation-level government projects.

I won't go into a "mission statement" type of description here. You can see it as a "wiki leaks" type transparency effort, except we'd be dealing with information that is in the public domain, but that so far hasn't had an appropriate outlet. Or you can see it as a fun place to let people vent their frustrations about mis-management, abuses, bad decisions, incompetence etc. Or you can see it as a means to learn from experience in one particular type of software organization: government IT departments. And those are a unique breed. What's unique? Well, the hope is that such an online outlet will make that apparent.

Requirements Laundry List

Here is the list of requirements, verbatim as sent to me by the project initiator:
  • enter project title
  • enter project description
  • enter project narrative
  • enter location
  • tag with failure types
  • tag with subject area/industry sector
  • tag with technologies
  • enter contact info
  • enter project size
  • enter project time frame (year)
  • enter project budget
  • enter outcome (predefined)
  • add lessons learned
  • add pic to project
  • ability to comment on project
  • my projects dashboard (ability to add, edit, delete)
  • projects can be saved as draft and made public later
  • option to be anonymous when adding specific projects
  • ability to create profile (username, userpic, email, organization, location)
  • ability to edit and delete profile
  • administrator ability to feature projects on main page
  • search for projects based on above criteria and tags
  • administrator ability to review projects prior to them being published
Nevermind that the initiator in question is currently pursuing a Ph.D. in requirements engineering. Those are a good enough start. We'll have to implement classic user management and then we have our core domain entity: a failed project, essentially a story decorated with various properties and tags, commented on. As usualy, we'd expect requirements to change as development progresses, new ideas will popup, old ideas will die and force refactorings etc. Nevermind, I will maintain a log of those development activities and if you are following, do not expect anything described to be set in stone.

Architecture

Naturally, the back-end database will be HyperGraphDB with its support of plain JSON-as-a-graph storage as described here. We will also make use of two popular JavaScript libraries: jQuery and AngularJS as well as whatever related plugins come in handy.

Most of the application logic will reside on the client-side. In fact, we will be implementing as much business logic in JavaScript as security concerns allow us. The server will consist entirely of REST services based on the JSR 311 standard so they can be easily run on any of the server software supporting that standard. To make this a more or less self-contained tutorial, I will be describing most of those technologies and standards along the way, at least to the extent that they are being used in our implementation. That is, I will describe as much as needed to help you understand the code. However, you need familiarity with Java and JavaScript  in order to follow.

The data model will be schema-less, yet our JSON will be structured and we will document whenever certain properties are expected to be present and we will follow conventions helps us navigate the data model more easily.

Final Words

So stay tuned. The plan is to start by setting up a development environment, then implement the user management part, then submission of new projects (my project dashboard), online interactions, search, administrative features to manage users, the home page etc.