Landed me a modelling gig…

Continuing on from my Previous Posts, I’ve started a personal project to have a play with some things and ideas that I find interesting at the moment. These posts outline the decisions and challenges I stumble across along the way. It’s mostly just me thinking out loud but hopefully somebody will find my rambling interesting or useful…

After much hemming and hawing, tooing-and-froing over which DBMS to use and what overall approach to take to database interaction I have settled on using Oracle. In order to keep a clean separation between the database layer and the application, all communication will be through stored procedures – no direct table access. Hopefully this should minimise the dependencies on the data model itself, making refactoring much simpler should the need arise.

There is a fairly immediate downside to this approach, however. Since I won’t be accessing the tables directly I can’t really utilise Hibernate (or any ORM for that matter). Hibernate makes data retrieval and persistence ridiculously easy and mostly transparent to the developer. This can be a massive timesaver during the development phase but I tend to lean towards static code generation. Code generation gives similar benefits in terms of a reduction in dev effort – especially given the monotonous nature of wiring up your DAOs to your data model. The reasons I prefer code generation are number one, it’s faster at runtime – there’s no intermediate layers, no extra marshaling. Secondly, it’s easier to read/debug and finally, it’s easier to tweak by hand if needs be. If your model changes you can always regenerate anyway so I’m not too pushed on the need for a live mapping at runtime.

The first thing I did, after my eureka database selection moment was to hunt down a good tool to start putting together my data model and generate the DDL. Instantly I started to have flashbacks of Oracle’s delightful Designer and the hardship and misery that usually followed taking the fateful  decision to click that seemingly innocent little icon. I may be placing a tad too much emotion and just a little superstition on what, after all, is just an inanimate piece of software. Maybe, but sweet merciful Jesus, the misery and heartache that bloody application caused me.

It’s been a good 2.5 years since I’ve had to even think about designing a database in Oracle so I’m glad to say that the situation is much improved. Oracle now offer a product called “Oracle SQL Developer Data Modeler” which so far, seems pretty solid. I’ve also used MySQL Workbench over the last few years and feel it deserves a mention as it too is a great little bit of kit.

Anyway, (late) last night, with my tools at the ready, I dived into creating my data model. It started off with easy questions, do I use the singular or plural form for my table names? Do I prefix my objects? Do I prefix my column names? After that the questions started to get a bit more interesting. Since I’m aiming for this to be a hosted solution, it’s going to have to deal with multiple users’ datasets. What’s the best way to segregate everyone’s data? I have a table in my model which lists the different “instances” of the application and some metadata around that instance, such as URL, status and what not. It’s fairly easy then to associate other records with an instance by having a foreign key on all the main entities – users, projects, customers to tie them back to a particular instance. There is a risk with this approach. If I make a mess of one of my queries and forget to filter to the current instance’s data only I could expose one customer’s data to another and I don’t think they’d be best pleased with “sorry it was 2am when I wrote that code” as an excuse. Another more laborious approach would be to physically separate each instance’s data and have a DB schema per instance. This would reduce the chance of a coding error leaking one person’s data to another. The maintenance of this design would be a great deal more complicated. Another problem with this approach is the permissions needed to create each schema. Ideally the schema setup would be automated, just a task that gets carried out in the database when saving a new instance’s details to the database. Suddenly you need your database user to have much greater privileges which creates more risk of divilment in the event the application/server/database/random staff member is compromised.

My main reason for starting this project was to play with new technologies and new approaches to application design but what I’ve ended up with is a boring old, seen it a million times before, not sexy, bog standard relational database in an old school, been around the block RDBMS – but it’s the right tool for the job.

My next step is to look into the architecture of the client. I’ve decided the client will be a HTML5 + JavaScript app that consumes JSON data from a REST API. There are a number of JavaScript frameworks out there aimed at developing rich apps in this fashion such as extJS, JavaScriptMVC and even frameworks like Cappuccino (a JavaScript implementation of Apple’s Cocoa frameworks). Now I just need to pick one…

Posted in Personal Project | Leave a comment

Decisions, decisions…

Continuing on from my Previous Post, I’ve started a personal project to have a play with some things and ideas that I find interesting at the moment. These posts outline the decisions and challenges I stumble across along the way. It’s mostly just me thinking out loud but hopefully somebody will find my rambling interesting or useful…

It seems like for every decision I make, I end up with 2 new decisions as a result.

Here’s a run down of the stuff I’ve decided on so far..

First and foremost, what the hell am I actually building? I’ve decided to build a hosted time tracking, invoicing and accounting system targeted at services companies such as solicitors and accountants. Anyone who knows me will have already burst out in fits of uncontrollable laughter at this point. It’s a long story…

After some very nerdy discussion I’ve settled on the following setup:

  • A Data Access Layer written in spring to communicate with the database (more on that later)
  • A spring service layer. This is where the bulk of the application logic will lie
  • A spring-ws layer to expose the service layer operations as a SOAP webservice
  • A Groovy + Grails project to expose a RESTful JSON API with the same functionality as the SOAP API (It might be easier to combine the 2 endpoints in one project but that’s for another day).
  • A HTML5 + JavaScript based client to consume the JSON API. I had the opportunity to work on a large project that had a similar setup with a rich JS client communicating with JSON endpoints exposed from a PHP application. I was skeptical at first, mostly due to having all the presentation HTML in templates within the JavaScript but it I was impressed with how well it worked. One of the big pluses to this architecture is that with a “dumb” client (which is the way it should be – the heavy lifting is in the service layer) is how easy it is to integrate other clients. This will make it easy to integrate with other systems as well as offering up different flavour clients such as mobile or an Outlook plugin.

 

The setup outlined will suit all the CRUD stuff quite nicely, creating users and projects, entering timesheets, and viewing basic entities. It starts to get a bit flaky around some of the more involved features such a invoicing and revenue calculations. These features will depend on pulling data from a number of different places and changes to any of the underlying data will of course change the end figures. Calculating this information on the fly becomes unsustainable  very quickly, particularly as the number of timesheet entries or users grows.

To get around this I’m pulling all the heavy lifting out of the page request lifecycle and making the operations asynchronous. The triggering event (adding a timesheet entry, changing a project’s details) will remain synchronous but the rest of the operation won’t. My plan is to setup a separate java service for each discrete area, one for revenue, one for invoicing etc. These services will subscribe to events on a message queue, notifying them of upstream changes that they’re interested in. The invoicing service for example will listen for “Timesheet Entry Created” events and update the list of projects that will require invoicing at the end of the month. The last step of creating a timehseet entry will be to add the event notification to the message queue.

This approach has a number of benefits. First of all the user operation is nice and quick – the synchronous call just adds a simple record to the database. The user doesn’t have to wait around for calculations to complete that were triggered by their action. The same is also true at the other end. As the list of work to invoice is kept up to date with each change we don’t need a big query to figure out what needs invoicing when a user happens across that screen at the end of the month. From an architecture point of view it’s also quite neat as any number of services can subscribe to the various events without the producer having any knowledge of their existence.

This approach allows us to create a number of expert systems, each only concerned with their own tasks, oblivious to where their input data comes from or how it’s generated. Of course we could achieve similar results with scheduled jobs but processing the changes as they occur rather than in bulk at set intervals will result in a more responsive system with closer to realtime figures available to the UI. Anyway, an asynchronous architecture is just more interesting than some cronjobs will ever be!

 

Now that we’ve figured out how the system hangs together, we need somewhere to put all that data. Given the nature of the data generated from a timesheet/accounting system, I think a traditional RDBMS will suit the project better than a key-value data store. Getting the data into the system is one thing but the strength of an information system of this kind to any  company is the data they can extract from it – timesheet data for different departments and projects over different periods of time, revenue figures for the same. This is where I leave myself open to correction/criticism/flaming but I feel this kind of data extraction is best suited by SQL and an RDBMS. In theory the various services could compute summary data for various time periods and projects and put them into something like MongoDB but it just doesn’t seem like a good fit.

 

So SQL it is, but which RDBMS. Realistically I’ve only considered three options here, MySQL, Oracle and PostgreSQL. I don’t really know all that much about PostgreSQL to be honest so I haven’t really given it as much thought as I probably should. MySQL has open source on its side – it’s free. It’s also fairly lightweight and reliable. Oracle on the other hand is sell-your-first-born-expensive. They do offer Oracle XE which is free but limited to a single core of a CPU and 11gb of data storage. Oracle offers very strong analytical constructs within their SQL engine and a very mature stored procedures platform in PL/SQL. MySQL’s support for sprocs is more recent and not as fully featured as Oracle’s.

 

The choice of database platform is made more complicated by another decision I’ve yet to make – use ORM or stored procedures to access the database. I’m quite fond of using stored procedures – I like the neat separation of concerns. The database logic stays in the database with the application having no concern for the underlying data model. If I use sprocs exclusively the user the application users to connect to the database doesn’t need SELECT or UPDATE privileges on any tables – it just needs EXECUTE on the sprocs. This adds an extra layer of protection against SQL injection and other security concerns. I’m leaning towards exposing all query results as refcursors from sproc calls but this approach results in more coding than straight SQL queries from the application. If I chose not to use sprocs then I could leverage the benefits of Hibernate to map my objects to the database model, significantly reducing the code I have to write. This would work brilliantly for simple CRUD but I’m not sure how well it will work for the calculations required, such as revenue etc. I think for these I’ll end up falling back to making sproc calls for the calculations and I don’t really like the idea of mixing and matching different approaches. The use of Hibernate is mainly a development time concern which isn’t really at the top of my list. I could probably write some code generators easily enough that would create my DAOs, mapping my sproc results to my domain objects. I like this approach as it’s run once rather than a runtime configuration. Hibernate does support mapping to sproc calls, just as long as the sproc only has a single OUT parameter, and that the OUT parameter is a refcursor (not a problem) and that it’s also the first parameter. If I’m going with Hibernate then it makes more sense to use MySQL. if I’m not using Hibernate and go down the sproc route then it makes more sense to use Oracle as I can utilise the more advanced analytical features and PL/SQL. I’ve a systems architect friend who has been singing Hibernate’s praises but I’m stubborn so…

What say ye, Hibernate + MySQL or sprocs + Oracle?

I think I’ll give it some more thought and decide later…

 

Posted in Personal Project | Leave a comment

A New Project to Play With

So I’ve decided to start a new personal project to allow me to play around with a few things I find interesting. I also figured I’d catch up with 1999 and blog about it as I go – mainly to see if I can glean any extra suggestions from others or just to be told why my architecture is a steaming pile-o-crap.

I haven’t quite decided what I’m building yet – I have a few ideas floating around but indecision is rife at the moment. It’ll most likely be some sort of SaaS business-y application. Probably doesn’t make for the most interesting subject matter in the world but meh. I’d like to build an event driven system anyway, I think I’m settled on that much. That’ll give me a chance to have a play around with some nice message queue and asynchronous stuff so that should be interesting. I’ve always been a traditional SQL database kinda guy so maybe a look at some other storage solutions would be interesting too. That doesn’t necessarily mean NoSQL but I had a serious look at MongoDB for a project last year and was really impressed. PHP has been the main staple of my diet for way too long now so I think I’ll avoid it for this project. I quite like the idea of using Groovy + Grails for driving the front-end and building any heavy lifting stuff as loosely coupled services written in Spring – maybe tied together with RabbitMQ or something similar. Another part of me wants to make the whole thing service based – including the front-end and just expose the data to a HTML5 + JS application using Node JS. This would also fit quite nicely with the event driven stuff I want to do on the backend.

Posted in Personal Project | Leave a comment