Monday, November 19, 2012

eValhalla User Management

[Previous in this series: eValhalla Setup]

In this installment of the eValhalla development chronicle, we will be tackling what's probably the one common feature of most web-based application - user management. We will implement:
  1. User login
  2. A "Remember me" checkbox for auto-login of users on the same computer
  3. Registration with email validation
The end result can be easily taken and plugged in your next web app! Possible future improvements would be adding captchas during registration and/or login and the ability to edit more extensive profile information.

I have put comments wherever I felt appropriate in the code, which should be fairly straightforward anyway. So I won't be walking you line by line. Rather, I will explain what it does and comment on design decisions or on less obvious parts.   First, let me say a few words about how user sessions are managed.

User Sessions

Since we are relying on a RESTful architecture, we can't have a server hold user sessions. We need to store user session information at the client and transfer it with each request. Well, the whole concept of a user session is kind of irrelevant here since the application logic is mostly residing at the client and the server is mainly consulted as a database. Still, the data needs protection and we need the notion of a user with certain access rights. So we have to solve the problem of authentication. Each request needs to carry enough information for the server to decide whether the request should be honored or not. We cannot just rely on the user id because this way anybody can send a request with anybody else's id. To authenticate the client, the server will first supply it with a special secret key, an authentication token,  that the client must send on each request along with the user id. To obtain that authentication token, the client must however identify themselves by using a password. And that's the purpose of the login operation: obtaining an authentication token for use in subsequent requests. The client will then keep that token together with the user id and submit them as HTTP headers on every request. The natural way to do that with JavaScript is storing the user id and authentication tokens as cookies.

This authentication mechanism is commonly used when working within a RESTful architecture. For more on the subtleties of that approach, just google "user authentication and REST applications". One question is why not just send the password with every request instead of a separate token. That's possible, but more risky - a token is generated randomly and it expires, hence it is harder to guess. The big issue however is XSS (cross-site scripting) attacks. In brief, with XSS an attacker insert HTML code into a field that gets displayed supposedly as just static text to other users (e.g. a blog title) and the code simply does an HTTP request to a malicious server submitting all the users' private cookies with it. To avoid them, we will have to pay special attention on HTML sanitation. That is, we have to properly escape every HTML tag displayed as static text. We can also put that authentication token in an HTTPOnly cookie for extra security.

Implementation - a User Module

Since user management is so common, I made a small effort to build a relatively self-contained module for it. There are no standard packaging mechanisms for the particular technology stack we're using, so you'd just have to copy&paste a few files:

  • /html/ahguser.ht - contains the HTML code for login and registration dialogs as well as top-level Login and Register links that show up right aligned. This depends on the whole Angular+AngularUI+Bootstrap+jQuery environment. 
  • /javascript/ahguser.js - contains the Angular module 'ahgUser' that you can use in your Angular application. This depends on the backend:
  • /scala/evalhalla/user/package.scala - the evalhalla mJson-HGDB backend REST user service. 

The backend can be easily packaged in a jar, mavenized and all, and this is something that I might do near the end of the project. 

The registration process validates the user's email by emailing them a randomly generated UUID (a HyperGraphDB handle) and requiring that they provide it back for validation before they can legitimately log into the site. Since the backend code is simple enough, repetitive even, let's look at one method only, the register REST endpoint:

@POST
@Path("/register")
def register(data:Json):Json = {
  return transact(Unit => {
    normalize(data);
    // Check if we already have that user registered
    var profile = db.retrieve(
        jobj("entity", "user", 
             "email", data.at("email")))
    if (profile != null)
      return ko().set("error", "duplicate")
    // Email validation token
    var token = graph.getHandleFactory()
          .makeHandle().toString()
    db.addTopLevel(data.set("entity", "user")
                       .set("validationToken", token)
                       .delAt("passwordrepeat"))
    evalhalla.email(data.at("email").asString(), 
      "Welcome to eValhalla - Registration Confirmation",
      "Please validate registration with " + token)
    return ok
  })
}

So we see our friends db, transact, jobj etc. from last time. The whole thing is a Scala transaction closure with the much nicer than java.lang.Callable Scala syntax. As a reminder, note that most of the functions and objects referred here are globally declared in the evalhalla package. For example, db is an instance of the HyperNodeJson class. The call to normalize just ensures the email is lower-case, because email in general are case-insensitive. While the logic is fairly straightforward, let me make a few observations about the APIs.

User profile lookup is done with a Json pattern matching.  Recall that the database stores arbitrary Json structures (primitives, arrays, objects and nulls) as HyperGraphDB atoms. It doesn't have the notion of "document collections" like MongoDB for example. This is because each Json structure is just a portion of the whole graph. So to distinguish between different types of objects we define a special JSON property called entity that we reserve as a type name attached to all of our top-level Json entities. Here we are dealing with entities of type "user". Now, each atom in HyperGraphDB and consequently each user profile has a unique identifier - the HyperGraphDB handle (a UUID). There is no notion of a primary key enforced at the database level. We know that an email should uniquely identify a user profile so we perform the lookup with the Json object {entity:"user", email:<the email>} as a pattern. But this uniqueness is enforced at the application level because new profiles are added only precisely via this register method. I've explained the db.addTopLevel method on the  HGDB-mJson wiki page, but here is a summary for the impatient: while db.add will create a duplicate version of the whole Json structure recursively and while db.assertAtom will only create something if it doesn't exist yet, db.addTopLevel will create a new database atom only for the top-level JSON, but perform an assert operation for each of its components.

Finally, note that before adding the profile to the database, we delete the passwordrepeat property. This is because we are storing in the database whatever JSON object the client gives us. And the client is giving us all fields coming from the HTML registration form, pretty much as the form is defined. So we get rid of that unneeded field. Granted, it would be actually better design to remove that property at the client-side since the repeat password validation is done there. But I wanted to illustrate the fact that the JSON data is really flowing as is from the HTML form directly to the database, with no mappings or translations of any sort needed. 

So far, so good. Let's move on the UI side.

Client-Side Application

In addition to AngularJS, I've incorporated the Bootstrap front-end library by Twitter. It looks good and it has some handy components. I've also included the Angular-UI library which has some extras like form validation and it plays well with Bootstrap.

The whole application resides on one top-level HTML page /html/index.html. The main JavaScript program that drives it resides in the /javascript/app.js file. The idea is to dynamically include fragments of HTML code inside a reserved main area while some portions of the page such as the header with user profile functionality remain stable. By convention, I use the extension '.ht' for such HTML fragments instead of '.html' to indicate that the content is not a complete HTML page. Here's the main page in its essence:


<div ng-controller="MainController">
<div ng-include="ng-include" src="'/ahguser.ht'"></div>
<!-- some extra text here... -->
<hr/>
<a href="#m">main</a> |
<a href="#t">test</a>
<hr/>
<ng-view/>  <!-- this is where HTML fragments get included -->

The MainController function is in app.js and it will eventually establish some top-level scoped variables and functions, but it's currently empty. The user module described above is included as a UI component by the <div ng-include...> tag. The login and register links that you see at the top right of the screen with all their functionality come from that component. Finally, just for the fun of it, I made a small menu with two links to switch the main view from one page fragment to another: 'main' and 'test'. This is just to setup a blueprint for future content.

A Note on AngularJS

Let me explain a bit the Angular portion of that code since this framework, as pretentious as it is, lacks in documentation and it's far from being an intuitive API. In fact, I wouldn't recommend yet. My use of it is experimental and the first impressions are a bit ambivalent. Yes, eventually you get stuff to work. However, it's bloated and it fails the first test of any over-reaching framework: simple things are not easy. It seems built by a team of young people that use the word awesome a lot, so we'll see if it meets the second test, that complicated things should be possible. Over the years, I've managed to stay away from other horrific, bloated frameworks, aggressively marketed by a big companies (e.g. EJBs), but here I may be just a bit too pessimistic. Ok, enough.

Angular reads and interprets your HTML before it gives it to the browser. That opens many doors. In particular, it allows you to define custom attributes and tags to implement some dynamic behaviors. It has the notion of an application and the notion of modules. An application is associated with a top-level HTML tag, usually the 'html' tag itself by setting the custom ng-app='appname' attribute. Then appname is declared as a module in JavaScript:

var app = angular.module('appname', [array of module dependencies])

It's not clear what's special about an application vs. mere modules, presumably nothing. Then functionality is attached to markup (the "view") via Angular controllers. Those are JavaScript functions that you write and Angular calls to setup the model bound to the view. Controller functions take any number of parameters and Angular uses a nifty trick here. When you call the toString method of a JavaScript function, it returns its full text as it was originally parsed. That includes the formal arguments, exactly with the names you have listed in the function declaration (unless you've used some sort of minimization/obfuscation tool). So Angular parses that argument list and uses the names of the parameters to determine what you need in your function. For example, when you declare a controller like this:

function MainController($scope, $http) {
}
A call to MainController.toString() returns the string "function MainController($scope, $http) { }". Angular parses that string and determines that you want "$scope" and "$http". It recognizes those names and passes in the appropriate arguments for them. The name "$scope" is a predefined AngularJS name that refers to an object to be populated with the application model. Properties of that object can be bound to form elements, or displayed in a table or whatever. The name "$http" refers to an Angular service that allows you to make AJAX calls. As far as I understand it, any global name registered as a service with Angular can be used in a controller parameter list. There's a dependency injection mechanism that takes care of, among other things, hooking up services in controllers by matching parameter names with globally registered functions. I still haven't figured out what the practical benefit of that is, as opposed to having global JavaScript objects yourself....perhaps in really large applications where different clusters of the overall module dependency graph use the same names for different things.

Beginnings of a Data Service

One of the design goals in this project is to minimize the amount of code handling CRUD data operations. After all, CRUD is pretty standard and we are working with a schema-less flexible database. So we should be able to do CRUD on any kind of structured object we want without having to predefine its structure. In all honesty, I'm not sure this will be as easy as it sounds. As mentioned before, the main difficulty are security access rules. It's certainly doable, but we shall see what kind of complexities it leads to in the subsequent iterations. For now, I've created a small class called DataService that allows you to perform a simple JSON pattern lookup as well as all CRUD operations on any entity identified by its HyperGraphDB handle.

One can experiment with this interface by making $.ajax calls in the browser's REPL. The interface is certainly going to evolve and change, but here are a couple of calls you can try out. I've made the '$http' service available as a global variable:

$http.post("/rest/data/entity", {entity:'story', content:'My story starts with....'})
$http.get("/rest/data/list?pattern=" + JSON.stringify({entity:'story'})).success(function(A) { console.log(A); })
The above creates a new entity in the DB and then retrieves it via a query for all entities of that type (i.e. entity:'story'). You can also play around with $http.put and $http.delete.

Conclusion

All right, this concludes the 2nd iteration of eValhalla. We've implemented a big part of what we'd need for user management. We've explored AngularJS as a viable UI framework and we've laid the groundwork of a data-centered REST service. To get it, follow the same steps as before, but checkout the phase2 GIT tag instead of phase1:


  1. git clone https://github.com/publicvaluegroup/evalhalla.git
  2. cd evalhalla
  3. git checkout phase2
  4. sbt
  5. run


Coming Up...

On the next iteration, we will do some data modeling and define the main entities of our application domain. We will also implement a portion of the UI dealing with submission and listing of stories. 

Sunday, November 4, 2012

HyperGraphDB 1.2 Final Released

Kobrix Software is pleased to announce the release of HyperGraphDB 1.2 Final.

Several bugs were found and corrected during the beta testing period, most notably having to do with indexing.

Go directly to the download page.

HyperGraphDB is a general purpose, free open-source data storage mechanism. Geared toward modern applications with complex and evolving domain models, it is suitable for semantic web, artificial intelligence, social networking or regular object-oriented business applications.
This release contains numerous bug fixes and improvements over the previous 1.1 release. A fairly complete list of changes can be found at the Changes for HyperGraphDB, Release 1.2 wiki page.
  1. Introduction of a new HyperNode interface together with several implementations, including subgraphs and access to remote database peers. The ideas behind are documented in the blog post HyperNodes Are Contexts.
  2. Introduction of a new interface HGTypeSchema and generalized mappings between arbitrary URIs and HyperGraphDB types.
  3. Implementation of storage based on the BerkeleyDB Java Edition (many thanks to Alain Picard and Sebastian Graf!). This version of BerkeleyDB doesn't require native libraries, which makes it easier to deploy and, in addition, performs better for smaller datasets (under 2-3 million atoms).
  4. Implementation of parametarized pre-compiled queries for improved query performance. This is documented in the Variables in HyperGraphDB Queries blog post.
HyperGraphDB is a Java based product built on top of the Berkeley DB storage library.

Key Features of HyperGraphDB include:
  • Powerful data modeling and knowledge representation.
  • Graph-oriented storage.
  • N-ary, higher order relationships (edges) between graph nodes.
  • Graph traversals and relational-style queries.
  • Customizable indexing.
  • Customizable storage management.
  • Extensible, dynamic DB schema through custom typing.
  • Out of the box Java OO database.
  • Fully transactional and multi-threaded, MVCC/STM.
  • P2P framework for data distribution.
In addition, the project includes several practical domain specific components for semantic web, reasoning and natural language processing. For more information, documentation and downloads, please visit the HyperGraphDB Home Page.

Many thanks to all who supported the project and actively participated in testing and development!