A first look on CouchDB

By on

Recently I was starting with Apache CouchDB. It was a "pain driven" decision and not rational in any way.

I had to proceed quickly. The project is an investment, and I already had one failed attempt for this project.

Here are the thoughts that led me to CouchDB.

CouchDB - A first look

I love to have my schema in GIT

I was trying the original version of the project with PostgreSQL. Postgres is amazing, and I love it. I believe relational databases are a great way to structure your data.

Even with NoSQL, you have to structure your data. In the case of CouchDB, one uses JSON. It means, your structure does not come from your database engine, your structure comes your application layer.

At least, this is the case for me. I can imagine only a few Document types that are truly "unstructured". Maybe it something like Word transformed to JSON. I don't know. In all the use cases I was working so far I could easily see the relational data structure.

With CouchDB you can move your data definitions from SQLs CREATE statements to plain application classes.

However, my "schema" would be defined in PHP classes. Great, because it means, I would have my schema in GIT.

Note: there are tools like FlywayDB which help you to organize and version your schemas and their changes too. I think it's not that nice than just plain classes, but it's still helpful. FlywayDB is my preferred tool, as it's lightweight and easy to use.

The powers of relational databases

PostgreSQL is a powerful beast. With recent versions you can even have JSON datatypes that let you store and index "unstructured" data. Still, designing a relational database requires some thought. It's another server, and you need to know more about compared to in example CouchDB.

With the JSON type, you have to understand both worlds. And your co-workers too.

People need to understand SQL. It's not a big issue for me, but in such a small project like mine it is nice to to scratch such a requirement when you consider hiring new blood.

Developing the first version of my software cost me too much time using PostgreSQL. I was hoping to reduce the development time using CouchDB, and so far I was not disappointed.

Once I had my PHP models, I had my database model already. No need for ORM.

Map/Reduce

Map/Reduce was always big magic for me. When I read about it with CouchDB, I was a bit afraid. But it is not that difficult, once you understood the basics.

Mapping lets you change your "view" on the data. You simply create new "keys". It feels like constantly creating new combined keys in relational databases.

Reducing aggregates it. It's like SQLs count(*) or group by functions.

Working with my data is natural. No "joins" are necessary, I work with dot notation and JSON.

There is one "drawback" I noticed: creating a new view and mapping the data in a new way can cost you some deployment time. I tested with a large set of Data and I was literally blocked for minutes to deploy a simple change.

The reason is, CouchDB would run your changes at your dataset immediately. On each record. The result is stored as an enhanced B-tree that on the other hand is highly performant on even large datasets.

For my small project with only a little data (just 10.000 records) this was not an issue. But it was good to know what I have to expect when my data grows.

In the end, Map/Reduce felt easy and understandable. At least, in the easy way I used it.

Cloudant + CouchDB replications

With IBM's Cloudant you can have a hosted version of CouchDB for a reasonable cost. I will not utilize it too much, so I will most likely stay under $50 per month. And everything under that limit will not be charged, which means I can have that service provided for free.

For developing, I found it more useful to have my data on my local box.

It's super easy to have that setup. You just add a "replication" to your Futon (kind of administration tool provided by CouchDB) and by magic your data is downloaded from the Internet to your local box.

Furthermore, you can deploy that way. You could develop your Map/Reduce scripts, and then replicate it to your remote box.

This approach is much nicer and easy than what I ever saw with relational databases. Yes, I hate SQL dumps. They always remind me how old I am.

Apache <3, and the incredible docs

Of course, I love Apache CouchDB because it comes from the family, the Apache Software Foundation. I followed the project quite for a while, and I know what kind of people are working on it. In general, I would prefer solutions from the ASF to any other Open Source solution, if the quality of the software is comparable.

CouchDB is a great community, and there are a lot of helpful people around. The docs are also fantastic. I mean, it's maybe the best technical docs I ever saw.

The irrational demon

I failed to complete the project with PostgreSQL because I was running out of time once (not only, but this is the official version). It was tiring to do the exactly same thing again, just a bit better.

I wanted to rewrite it, but I still didn't have time.

I simply want the CouchDB people to be right on the time-saving thing.

And of course it is exciting to build something new with a brand-new shiny tool.

And this curiousness is most likely the actual driver for my decision to try CouchDB. If you don't want to hear that, take one of the other exciting reasons I mentioned in this post a bit earlier.

That said, CouchDB was at least for my little project the right choice so far. And the fun factor was immense. I am looking forward to other things I will learn.

Image Credits

Tags: Apache, Apache CouchDB, NoSQL, Open Source