Friday, January 1, 2010

Parsing the Google Wiki Format

Since we moved most of our projects on the Google Code servers and started using Google's wiki for documentation, I've been wanting to use that wiki as a publishing tool for anything related to a specific project as this saves coding and administrative efforts to manage everything on our server. This would avoid having to install and administer, or code up and administer something that's pretty much done a gazillion times and that is covered by blogs and wikis. Google offers an API to access the blogs so we can embed the blog on the website easily and there are enough examples on the internet on how to do that. The easiest thing to do is just copy and modify the examples from Google Code Playground. However, I couldn't find an API, nor 3d party code around the internet for the wikis. Google's wikis are a bit lousy, admittedly, but they are good enough for most purposes, so I decided to write a parser that translate a wiki page to HTML and now we can display docs on our website. Here is that code in the hope that it could benefit someone else.

First, Google stores the wiki pages of a project in that project's SVN repository. The URL to the latest version of a wiki page looks like this:

"http://" + project + ".googlecode.com/svn/wiki/" + page + ".wiki";

Where project is the name of your project at Google and page the name of your page. So a plain wiki file can be downloaded with a standard HTTP client.

So the parser, a class called GoogleWikiViewer, just downloads a .wiki from SVN, converts it to HTML and sends it back to the browser in a simple Java servlet that looks like this:

protected void doPost(HttpServletRequest request,
HttpServletResponse response)
throws ServletException, java.io.IOException
{
// ... your HTML preamble here, e.g. response.getWriter().println("");
String project = request.getParameter("project");
String wikipage = request.getParameter("page");
if (!(project == null || project.length() == 0
|| wikipage == null || wikipage.length() == 0))
{
GoogleWikiViewer viewer = new GoogleWikiViewer(project, wikipage, "wikishow");
viewer.toHtml(response.getWriter());
}
// ... finish HTML document output


In the above the 'wikishow' parameter is the name of the very servlet using that code snippet. GoogleWikiViewer uses that to construct a URL of the form "wikishow?project=theproject&page=thepage". I know the trend these days is to write this stuff in JavaScript, but JavaScript takes longer to debug, and in any case if there's interest this could easily be ported over to JavaScript and made into an AJAX library. For now, it's just a simple, standalone Java class:

Download GoogleWikiViewer.java.

Please comment with bug reports/fixes. Bear in mind that this was written in a couple of hours and it doesn't cover everything (e.g. comments and gadgets are not supported) and it probably has a bug here and there. But the HyperGraphDB wiki pages get displayed correctly which means it covers all commonly used features.

2 comments:

  1. Thanks, could be very handy!

    What licence have you got on this? I'd like to adapt it to convert from Google .wiki to one of GitHub's wiki formats, maybe markdown.

    ReplyDelete
  2. Pick your license...:)

    Bear in mind that it is really a quick & dirty thing, probably with bugs and certainly incomplete. Not sure if it will be easily adaptable. If you do find/fix bugs, it would be great to send back your new version!

    ReplyDelete