Scalable Corner - Software Development Blog: 2007

Monday, 17 December 2007

(Book) Content Networking: Architecture, Protocols and Practice

I decided to start a new practice. I will post short reviews of books I like. I know that everyone doing that and internet is full of different opinions addressing any subject one can imagine. As it turned out it is rather hard to write a good material but extremely easy to give feedback :). Nevertheless I'll insert my 10 cents as well. I have own reasoning for such behavior. I like reading - especially computer science books. I remember those dark times of USSR when it was extremely difficult to find a good source of knowledge. I had to scan book stores and libraries just to find a small piece of valuable up-to-date information. Someone could say I was unlucky or was taking wrong approach to information retrieval in those days. But I believe it was just because of the rotten soviet system :). But the situation changed dramatically nowadays. I have more books on my hard drive then I'm able to read in my lifetime. As the result I became very selective and try to pay attention only to those books which relate to my interests as close as possible and are best of their kinds (of course taking into account my current level of knowledge in the area). The process of books evaluation requires both time and some experience in the field. So I just hope that my choice can be useful for someone else.

The book I've recently finished is called "Content Networking: Architecture, Protocols and Practice". As you might suspect (surprise! surprise!) it is entirely about content delivery :). It provides excellent and quite thorough overview for newbies and helps more experienced readers with white spots (even if you don't aware of their existance now) and systematization. After reading I have a strange feeling that I've left the extremely overpopulated but noble league of wheel inventors in the area of content networking.

Ok, let's try to find out what it is about in details. Here is the list of all chapters with my brief explanations per chapter. It is not the official contents (you can find one on Amazon using look inside feature).

Chapter 1. Introduction. Brief history of content networking evolution.
Chapter 2. Describes how the End-to-End principle guided the original architecture of the Internet. HTTP from protocol design and architecture point of view. Mutlicast concepts.
Chapter 3. Describes why closer is better and how caching helps to solve problems. You will find out how caches are located within clients, the network, and at the server farms.
Chapter 4. Media streaming protocols. Even though streams are continuous, making them different from web pages, specialized techniques brings the benefits of caching to multimedia streaming content.
Chapter 5. Content retrieval requires navigation and this chapter describes how the DNS routes requests to Web servers, Web caches and Web switches. It also descibes how Web switches improve operations and how global request routing connects clients to the best service node.
Chapter 6. Describes why and how peer-to-peer networks challenge traditional client-server networks. You will learn not only concrete algorithms of the most successful P2P networks but also find out which architectual benefits in network realiability and scalability are provided by P2P.
Chapter 7. Describes the abstract model for a presence and instant messaging system. It describes a variety of standards-based and proprientary approaches to building such systems. This interactive content delivery presents another challenge for networks that focus traffic or introduce delays.
Chapter 8. Introduces the arena of the content service provider. People expect instantaneous delivery of information, custom-tailored to their specific needs and preferences. People expect an integrated solution that converges their various communication needs adding value beyond basic content transport. The underpinning technology and architectures are given in this chapter.
Chapter 9. Describes building of the content networks designed to serve enterprise users, network operators, content distribution companies and content service providers. You will see how network elements are combined and work together to fulfil their needs.
Chapter 10. Shows the role of standards. Unfolds history and working processes of IETF and W3C. Explains how the creative tension between proprietary solutions and standard solutions continuously energizes Working Groups to identify, describe and solve more challenging problems.
Chapter 11. Summary and Outlook.

With best regards of enjoyable reading, Me.

BTW I removed link list with recommended books in the right column and began using handy and pretty Amazon widget. As it turned out shortly the famous Firefox addon "Adblock Plus" cuts out all books from the widget leaving only its header :). If you use Adblock and can't make it work correctly please let me know - I'll try to find another solution for recommended books.

Saturday, 16 June 2007

Implementing Tag Cloud - The Nasty Way (Part 2)

In the previous part of this article I've started the discussion of the tag cloud construction problem. Let's make a little recollection.

We found out that a relational persistence model is not adequate enough for this purpose due to the huge computational overhead it creates. Then we came to the conclusion that tag cloud should not be calculated each time it is requested. Instead it could be kept between requests as a hash table persisted in the binary stream. Unfortunately this simple solution led us to another problem - we need to adjust the hash table each time someone changes tags of any entity in the system. The problem was escalated further after it had become obvious that we need to keep counters for all tags in the system (not just a few hundred tags from the tag cloud). If we had focused on the cloud tags only we would lose the possibility of adding new tags which became popular to the cloud as well as removing ones which lost their popularity. We would have no idea which of two tags is more popular without usage information about at least one of them.

In this part I will present an algorithm effective enough to cope with the issues mentioned above.

The picture below describes the physical architecture I will use to demonstrate the concept (click on the picture to enlarge).

The architecture presented consists of two clusters: web cluster and data storage cluster. The web cluster is built in the following way:

the shared-nothing approach is used: each server of the cluster is completely independent of its neighbors, can operate even if it is the only server in the cluster and it has the standard predefined configuration and the applications installed;
servers in the web cluster are capable of communicating with each other by means of group communication software;
all logic are performed by web servers and mostly driven by incoming web requests – there are no separate application servers in the system;
there is a load balancer distributing web requests among web servers based on some simple rule (for example randomly with uniform distribution or in a round-robin fashion).

The organization of data storage cluster is not important until it is reliable and provides consistent data to the web cluster. But we have to define the informational schema to store all kind of data we got so far:

all tags together with their usage counters,
the binary representation of the tag cloud hash table,
some additional information which will be required by the algorithm itself.

We will suppose that our data storage is a relational database (or a cluster of relational databases) as it is the most common approach. Please take a look at the picture below.

As you can see there are three tables in the schema:

1. "tag" – contains the list of all tags stored in the system. The table can have millions of rows especially if you have a multilingual site. The table has three columns:

id – an integer representing the identifier of the tag in the system,
name – the tag itself,
count – the total number of tag usages.

2. "tag_list_version" – contains all versions of the tag list which is represented by the “tag” table. Each time the “tag” table is changed a new row should be added to the “tag_list_version” table indicating that a new version of tag list was created. The table has four columns:

version – an integer representing an increasing counter for tag list changes,
tag_cloud – a binary field which contains a binary representation of the tag cloud hash table which corresponds to the version of the tag list,
created – date and time of the version creation.

3. "tag_increment" – contains all increments ever applied to the tag list. Each new version of the tag list is formed from the previous one by applying several tag list increments. A tag list increment (hereinafter increment) is an algorithm specific data structure which is similar to the tag cloud. It is also a hash table of <tag, usage counter> pairs with the only difference – counters can be negative. The negative value means that during the period when the increment was collected a concrete tag was removed more times than added. I will tell more about increments below. Here is the columns:

id – an integer representing the identifier of the increment,
increment – a binary field which contains a binary representation of increment hast table,
created – date and time of the increment persistence,
version – the version of the tag list which was formed by applying a set of increments (including this one) to the preceding version in a single step.

Okay, we covered the minimum required to start the algorithm description Let’s move on and pick up some logical reasoning.

Reason 1. Why don’t you give me a KISS?!

According to the point 3 of the web cluster description given above – web servers are the only type of servers which perform any actions with tags except permanent storage. These actions are usually a direct response to appropriate user actions such as adding or removing tags of an entity. It looks like we can make each web server to update the tag cloud each time a tags operation is requested. Isn’t it a good and simple option? For example the user updates a set of tags on a photo - the server performs the direct user request and … immediately updates the tag cloud as well. Everything is excellent – tag cloud is always up-to-date plus no full recalculation is ever required. Aren’t we done? Unfortunately no. There is one significant drawback – performance. Do you remember that we have millions of users registered in our system? The tags operations are frequent. They are performed not only when users work with tags directly modifying them but also when they add and delete entities. It doesn’t sound reasonable to add a relatively slow procedure to frequent operations. The tag cloud update involves a few time-consuming activities:

Network transportation. The tag cloud isn’t small – it has entire tags plus usage counters plus the table of hash values and indexes.
Database updates. The tag list must be updated. The tag cloud is updated if necessary.
Synchronization. Multiple servers compete for the possibility to update the same data structures – it makes the overall system throughput slower.

Conclusion: the solution is possible but it is far from being optimal under the serious load.

Reason 2. Is laziness infectious?

Let’s try to improve the previous case. What if we wouldn’t update the tag cloud immediately in the response to any user action but collect the updates for some time and apply all of them at once later. It will definitely lighten the load on the system but fail the constant up-to-date state of the tag cloud. Do we really need the tag cloud to be up-to-date all the time?

As I wrote in the previous part of this article the tag cloud changes slowly. It will hardly ever be changed with each request. And even in case if the tag cloud should have been changed but it didn’t the user is not able to detect this fact. The user neither has an idea about what the other users do nor has the possibility to influence the data strongly enough himself. The only way to betray the fact of delayed tag cloud updates is to provide both tags and tags counters to the user. So we just discovered one of the reasons why popular web applications seldom does that :). Tags of the tag cloud are usually presented arranged relatively to each other by popularity – there are no absolute values.

Okay, taking into account all said above - delaying the tag cloud update sounds like a good idea. How can we implement it? We need to accumulate changes and store them somehow till the tag cloud update is performed. As you might suspect it is the primary responsibility of that new structure called tag list increment. The increment is a simple hash table holding all tags either added or removed between two tag cloud updates. A counter which corresponds to each tag in the table is initially set to zero. It is incremented each time the tag is added and decremented each time the tag is removed. Each web server has its own increment and keeps it in the memory. As you can see we still have to count tags but do it with almost no overhead using efficient in-memory structure.

The last opened question is when to start the procedure of tag cloud update. It is obvious that the more different tags we have in the increment the longer the update procedure will be. The procedure consists of merging the increment into the “tag” table and subsequent adjusting of the tag cloud based on tags usage information obtained during merging. In other words if we collected 5,000 different tags in the increment we have to perform 5,000 INSERT/UPDATE statements. This operation seems to be long and (this is the most important!) it is expensive from the database point of view.

But the situation is not bad, right? Can we just monitor increments and force the tag cloud update to start as soon as the increment has some reasonable amount of tags (for example 100)? Unfortunately we can’t. We cannot predict the simultaneous behavior of users at any point of time in the future. The tags increment on a concrete web server can grow to the size of 100 tags in 10 minutes or in 10 milliseconds depending on the current load and its nature. The process is entirely driven by users.

We have another option – we could spoonfeed the data storage sending statements in turn and making pauses after each one. If we don’t use a single transaction for the whole process we can significantly decrease the load on the database increasing the overall update time. Unfortunately this trick doesn’t help us either because we have a web cluster and there might be a lot of web servers (for example www.myspaces.com installation has more than 1000 web servers). If each server of the cluster spoonfeeds the database with its own increment for a relatively long period of time they will interfere with each other and bring the database to its knees.
Anyway, it looks like we got two options. Which one do you prefer?

A lot of servers sending a bunch of 5000 SQL statements each from time to time.
A lot of servers sending 100 SQL statements each per unit of time for a longer period of time with much higher probability of interfering.

To telling the truth both of them don’t look pretty. Let’s try something else.

Conclusion: we got rid of additional overhead per each tags operation. But we still have a problem – increments are big and merging is expensive. The things become worse if we increase the amount of servers in the web cluster.

Reason 3. Divide et impera!

Don’t worry – it is the final one :). Let’s take a look at the conclusion of the previous reason. We decided that increments can become big quickly and there is not much we can do about that. But a big increment is not a problem itself. The problem is in its merging. Every server has to merge its own tag increment as soon as it overgrew a certain threshold. It would be much better if only one server will be responsible for merging. It would solve the problem of competition and we could spoonfeed the database as long as we need. So what’s the problem? Let’s just select a single web server and make others send their increments to this one instead of merging. The chosen server should collect a few increments and commit all of them into the database at the same time. Of course the chosen server has its own tags increment as well and it will send it to itself as others do. What does the improvement we made give us?

Benefits:

There is no competition for the database resources.
The total amount of merges became lower as now both tags and increments are aggregated.
Merging is more efficient as increments usually have a lot of tags in common. If we merge all increments into a single one before committing to the database we will save a lot of database operations.

Drawbacks:

The tag cloud update is delayed more. But it is still not a problem (as a matter of fact it is just a question of the configuration – we can manipulate all thresholds to get the optimal variant).
The update procedure became longer as we have more tags to merge (tags which are different in collected tag increments). And this is also okay. The merging can take longer now and we can spoonfeed (see the first point of the benefits).
We created a single point of failure. If the chosen server goes down – we are in a serious trouble. The tag collection is stopped at all and we lose increments until the server is back online.

So here it is! All that’s left is to fix the “single point of failure” drawback. I’d like to admit that we don’t need the precise values of tags counters to build the tag cloud if we have a lot of users. That’s why we could sometimes afford to lose an entire increment. And that’s why we don’t speak about single points of failure discussing reason 2 – any web server can and will go down from time to time losing the tags from its current increment which is not persisted yet.

But a failure of the centralized collection is much more serious. To prevent that we have to add some redundancy. It would be good to have two or three servers selected for increments collection (let’s name those servers "collectors"). Using group communication system we can create a separate communication group especially for collectors. Each collector must register itself in this group at startup. All web servers (also called as "emitters") must send increments to the group instead of direct communication with servers. It means that all collectors in the group will get all increments. Now it is much harder to lose an increment in case of failure as any collector has a copy of it.

Let's decide how the group of collectors should actually work. It is obvious that:

Only one of the collectors can merge at any point of time in order not to start the resources competition again.
Any increment must be committed only once. Other copies must be discarded as soon as the increment is committed to prevent duplication.
All collectors must share (or have access to) the full information about merge operations done so far. Due to the possibility of failures and subsequent restarts collectors won't be in sync from the point of view of increments they had, have or will have.

Let's look at the example. Consider an increment sent by an emitter and delivered to two out of three collectors. Because of network malfunction the increment is delayed and has not reached the third server yet. Meantime one of the two servers has met the threshold and started merging. The network delay is big enough and the merge was finished before the increment has been finally delivered to the last server. As a result the increment is obsolete on arrival and must be discarded at once.

All these points are addressed in the final algorithm below. Please pay special attention to them. The correct implementation is crucial for the stability of the system.


THE ALGORITHM

state STARTED:

event - Spontaneously_started:

communication_groups.enter("collectors");
tag_list_version = database.tag_list.get_last_version();
increment_identifiers =
 database.tag_list.get_increment_identifiers(tag_list_version);
processed_increment_identifiers.clear();
processed_increment_identifiers.add(increment_identifiers);
become(PROCESSING);

state PROCESSING:

event - Receiving_tag_increment:

increment = network.receive();
if(!processed_increment_identifiers.contains(increment.id))
then
 pending_increments.add(increment);
if(pending_increments.count() > threshold)
then
 increments_list = pending_increments.remove_first_n(threshold);
 call asynchronously Merge_increments(increments_list);
become(PROCESSING);

procedure - Merge_increments(increments_list):

lock = lock_service.acquire_application_wide_lock("collectors");
last_version = database.tag_list.get_last_version();
if(tag_list_version == last_version)
then
 tag_list_version = tag_list_version + 1;
 transaction = database.begin_transaction();
 database.tag_list.insert_new_version(tag_list_version);
 database.tag_list.insert_increment_identifiers(
  tag_list_version, increments_list
 );
 single_increment = merge_all_in_one(increments_list);
 single_increment = 
  database.tag_list.merge_with(single_increment);
 tag_cloud = database.tag_list.get_cloud(tag_list_version - 1);
 tag_cloud.merge_with(single_increment);
 database.tag_list.insert_tag_cloud(
  tag_list_version, tag_cloud
 );
 transaction.commit();
 communication_groups.
  send_notification_to("emitters", "tag list is updated");
else
 versions = database.tag_list.
  get_all_versions_greater(tag_list_version);
 foreach(version in versions)
 do
  identifiers = 
   database.tag_list.get_increment_identifiers(version);
  pending_increments.remove_by_identifiers_if_any(identifiers);
  processed_increment_identifiers.add(identifiers);
  tag_list_version = database.tag_list.get_last_version();
lock.release();

I suppose it could be named as pseudo code :). Tell me if I am wrong and the code is not self-explanatory or even unreadable. I hope after the long discussion we had you are more than capable of understanding the main ideas behind this algorithm. I will not scan it row by row (at least for now) and let you think it over.

You are welcome to ask questions and any comments are highly appreciated as usual.

Talk to you soon. Happy thinking.

Friday, 15 June 2007

Implementing Tag Cloud - The Nasty Way (Part 1)

I am going to write about Tag Cloud. I am pretty sure all of you know what it is. I can't say that the majority of Web2.0 applications really need it. But if you think this feature will make your application better - it is definitely necessary to know a little bit about its implementation.
Yes, I know that I am cheating here. The way of implementation described in this post is probably not the best one. Firstly it is for big guys only. Secondly it can be significantly improved.
Don't try this method unless you have a lot of time and millions of users :). You've been warned!

Let's start from a few simple questions. Why to discuss Tag Cloud at all? It seems like a common feature and it shouldn't be tricky to implement. But have you ever think about how the most noble applications (such as www.flickr.com - yep, I'm a worshiper of this product) construct Tag Cloud and keep it up-to-date taking into account the huge amount of data moving forth and back?

Here are just two problems you could unfold trying to find the answer to this question (in fact there is much more to discuss but it could take a lot of time - more then I have):

1. Relational database.

Yes, relational database is a problem in case of a normalized data model usage (anyone wants a piece of me now? :)). Why?

Let assume you have 10,000,000 users. They form the first table - "user".
Let further assume each user upload at least 1,000 photos (not a big deal in the age of digital photography). So we have 10,000,000,000 photos - cool isn't it? And it is the second table - "photo".
And finally let's add some tags - I usually have from 3 to 6 tags per item. If anyone had been like me we would have about 45,000,000,000 tags in average. These are the third and the forth tables - "tag" and "photo_tags" relationship respectively.

It seems we could end up with quite long tables (especially the one for a relationship). And now we need the final step of this story. Here it is:

In order to extract the information we would need to make ... JOINS! Could anyone say how many records the Cartesian product of all four tables will have? ;)

2. The system restart.

Okay. There are a lot of relational algebra gurus to hire and you might have bought a database engine capable of performing JOINs of tables with infinite number of rows in a finite amount of time. So far so good.

Let's assume your system was up and running until you decided to upgrade the database schema (for example to reach the 13th normal form :)). You planned a downtime, announced the date to your users and stopped the system. Upgrade was done quickly. You started the application again and ... your application now need to calculate the tag cloud! I can even make things "easier" for you - your system has to calculate a tag cloud for everyone from those 10,000,000 people to build personal tag clouds! According to the data model we saw above the calculation like this requires an intensive rows counting in the relationship table(s). And taking into account the number of rows it seems like your database requires additional super ability now - it should be able to count rows as quickly as it JOINs tables ;).

For those of you who are suspicious of my words I suggest to make a little experiment - take your favorite database engine, generate a table with at least 100,000,000 records, perform the following statement: "SELECT COUNT(*) FROM Table;" and measure the time of execution. I promise you will be surprised. Also I suggest to play with your data a little. Try to add some conditions to the statement above, add indexes and check the same statements again, measure the size of indexes, time of their construction and defragmentation etc. This should give you good feeling of the problem.

(By the way do you know my favorite joke from Chuck Norris Facts? - "Chuck Norris counted to infinity - twice.")

Let's move on now and try to think about alternative ways of tags cloud manipulation and persistence.

Firstly there is no need in relational data storage. It doesn't mean you should throw away that thing you payed so much money for - just leave it for more adequate data. A standard hash table with tags and their respective counters of usage persisted as a binary stream will do the trick. It is pretty much enough to represent the entire tag cloud (as well as a personal tag cloud). Of course you shouldn't keep all tags which exist in the system in a single hash table. The tag cloud consists of the most popular tags only (approximately 200 tags in our system) and it is adjusted all the time to be up-to-date.

You can still use the database by storing hash tables as BLOBs (which is probably the easiest way as you don't need an additional type of data storage).

Secondly there is no need to recalculate the tag cloud (as well as personal tag clouds). It can be created once and adjusted all the time according to changes in tags applied to all entities in the system. Moreover these adjustments can be infrequent. Tag cloud is usually rather static. Just look at any tag cloud of the popular photo sharing applications and you can notice that words like "art", "cool", "wedding" etc. are always somewhere near the top. Unfortunately you still have to have all tags and counters of their usage somewhere in the database (perhaps in the separate table) and update those counters all the time to give "new" tags a chance to enter the tag cloud. But it is nothing comparing to the process of full tag cloud recalculation.

The personal tag clouds can be treated similarly. The same representation of a tag cloud as a persisted hash table works here as well. Although the situation is much simpler as you deal with lower number of tags (a vocabulary of an ordinary person is about 5000 words - the real number of words used as tags will be considerably lower). As a result you can have all tags of a person in a single hash table - not just the most popular ones as in the case of the global tag cloud . Put this persisted hash table in the "user" table in an additional column and you will always have a personal tag cloud at your disposal. Don't forget to update it each time the user changes its tags.

So what did we get so far? We managed to replace the extremely slow process of tag cloud calculation (that must be performed only when tag cloud is requested) to the process of continuous tag cloud adjustment (that should be done each time tags change is requested). In other words the infrequent requests (such as getting the tag cloud) are made very quick but frequent requests (such as adding/deleting/changing tags) are made slower. Strange result, isn't it? Don't worry. In the second part I'll show a concrete distributed algorithm you can use to make tag cloud adjustment quick to the point it wouldn't slow the ordinary tags operations. And we will have the only major drawback - the complexity of the implementation.

That's all for now :). I hope you find these ideas useful.
Don't hesitate to ask questions and express your opinion.

Good luck!

Thursday, 24 May 2007

Log Me Gently, Baby!

Hi Guys,

I haven't been here for almost a month. That's kind of a bad thing because I have plans to keep my blog updated each week. Let's fix this a little bit and talk about something strange... for example logging!

You may say there is nothing to discuss. It is not a big deal. Almost every platform and/or framework has a logging infrastructure - just use that. It seems like all you need to do is to insert a call to a particular logging method/function with an appropriate informative level and a message all over the code you might find reasonable and ... that it! Enjoy the results! And you are absolutely right. Get yourself busy with the business part of the application and don't waste your time inventing the wheel. The end of story.

For those who don't believe in the happy end let's try to add some concrete environments restrictions and see what we really have:

The web application I work on is a heavily loaded one with a lot of simultaneous requests (roughly speaking hundreds per second per machine in the web cluster) - this is the first piece.
One of the primary architecture goals is to serve each request as quickly as possible to get the maximum throughput. According to the last measurements on the testing platform the average response time for the majority of functions is about 50ms (yes, I don't believe in this also ... and this makes me read the performance testing repost again and again every day - but it is always quite the same :) ). To telling the truth we have a strict limit at 150ms per request. And of course I don't want to spoil the achieved results - the second piece seems to be in place.
Big applications are complex - complex and expensive to develop, to install, to maintain. That's why we are always struggling hard to escape unnecessary infrastructure elements, especially if they require deep understanding from each team member in order to be used :). And this is the third piece.

That's what I have... Now let's take a look at what I need.

When you create a highly available solution you need a lot of things you probably don't care much in other application types. These things include but not limited to:

live hardware and software reconfiguration,
live data migration,
non-stoppable deployment of new application versions,
carefully planned and quick database updates,
live system patches,
etc.

There are a lot of excellent technical challenges to face as well as a growing overall system complexity. And not surprisingly it is necessary to monitor all this complexity - the most critical parts even can require real-time monitoring. What choice do we have? Can you think at least about a few options here? Probably the conclusion is too obvious. It is just a simple axiom - logs are closest friends in the situations like this. The more complex system you have - the more time and money logs will save you if applied right.
Here is a brief list of questions which should be addressed in a highly available application with a serious load to create a good logging subsystem. And even if you answered all of them - you might still not be sure that your logging is 100% okay...

Apply each question to the particular situation you have and try to think carefully about any consequences which may follow if you skip the corresponding logging functionality:

How many steps do I need to setup basic logging? Do I need to bring in the complex configuration? How dump the system administrators could be?
Is my logging synchronous or asynchronous? Can I make the time consuming logging operations (for example log to a database or a file) asynchronous?
Is my logging friendly for multiple concurrent threads (or even processes)? Does it use intensive locking of any data structure which may lead to threads blocking? Is it possible to use none-locking versions of the same structure instead?
Can I use batch logging? Is the logging subsystem flexible enough to reach the balance between the speed of batch logging and loses of log information which is stored in memory between batch writes in case of unpredictable failure?
Can I change the logging configuration without affecting the application (at least without restarting it)?
Can I add some pre-processing logic before the messages are logged, i.e. write filters that keep logs clean from duplicate error notifications?
Is logging subsystem flexible enough to deliver logs over the networks to a specially selected machine (or even a cluster of machines) dedicated for logs storage and processing?
Are there good tools for viewing logs of the format I choose? Do they support real-time log acquisition and visualization?
How graceful may the logging subsystem fail? Can it switch to different log targets in case of failure? Is it self-repairing as soon as the main problem is resolved (for example if the database temporarily goes down can it write to a backup location then redirect messages to the database once its back online)?
Which "standard" configuration possibilities of logging subsystem do I have (changeable log message format; thresholds for messages coming from concrete places of the application; pluggable log targets such as files, system eventlog, e-mail, database, network connection, group communication software connections; log-levels that can be easily adjusted depending on the situation)?

As you can see there are a lot of stuff for such a simple thing as logging, isn't it?
I hope this post helps you to think about your old acquaintance in a new way and let him reveal its true power.

Talk to you later!

Tuesday, 1 May 2007

Cloning and YouTube.com

A couple of months ago we started a new project. I took a noble position of Software Architect and a small team of developers to create a big and complex solution - although it's not quite true but let's name it a clone of YouTube and FlickR at the same time :). Yes, I know - the internet is full of them and constructing just another one should be a boring thing... But for me it was an excellent possibility! Why? Let me explain...

After the tremendous business success of first web2.0 applications a lot of customers of our company became ordering their clones. It was a slow process with a lot of hesitation and afterthought. After Google acquired www.youtube.com every internet business owner start having nightmares. All of a sudden everyone understood how incredibly profitable their business could be even without any visible business model if they had been more active. As a result everybody rushed forward but unfortunately with the same level of stupidity - the vast majority decided to catch the same train. Yes, they start dreaming about YouTube clones! It is not hard to construct something similar to YouTube (and even easier to order its construction to outsourcing company like one I work for - you don't even need to get your hands dirty managing developers or coding directly). The task could be easily fulfilled by a team of four web developers in two-three months period (plus a tester to make the application stable). So if it is so reachable and cheap why not to try? And they tried and tried a lot... The only problem here is that this estimation remains true as long as the load of a site is small - in other words when you have hundreds (not thousands) visitors per day and they don't upload much stuff. But this was sufficient. In spite of the fact of willing to succeed almost all customers we had until now wanted to bite off just a tiny piece of YouTube market. Truly speaking they merely dreamed about becoming at most middle-sized internet company in the distant future. They preferred to be born small, hope to grow big and die huge (just kidding ... sure all we want to get on top and be there forever :) ). So we were making blind copies of the same site again and again but on a very small scale. We were copying the design, functionality, trying to use the same open source software as YouTube does and so on. It was a boring stupid work usually assigned to teams formed mostly from students as they performed work like this for the first time and didn't feel sick... Of course the quality of solutions were not high but well enough to work most of the time and made customers happy.

But finally one of the customers decided to be born huge at once. He came to a wise and simple conclusion - not to create dump clones of famous systems but provide some flexible instrumentation which helps in construction of media sites. Unfortunately I can't provide the full description of the business idea but it is not necessary. The only thing that worth mentioning is that due to the specific market orientation such none-trivial requirements as scalability, high-availability (99.999% up-time) and total visibility became the core features of the architecture.

Why did I write such a long story? Each technical solution has the appropriate business context. I'm not only planning to share the most difficult technical parts and possible solutions on the pages of this blog but also want to discuss their applicability to a concrete business areas...

Stay tuned!

Guy Kawasaki - The Art of The Start

When this man speaks everybody listens! This time Guy Kawasaki is talking to entrepreneurs about The Art of The Start - both the book and the strategy of achievement. If you haven't seen this video yet I highly recommend you to do so. It is smart and funny as everything I heard from Guy so far ...

As The Matter of Testing

Okay, a few words about testing now...

As I wrote before I'm working on a huge project involving a lot of technology and hardware:

database clusters,
web clusters,
computational clusters for concrete algorithms,
all mentioned above distributed across a few data centers,
mix of Linux and Windows platforms etc.

To my mind the project is not typical for an outsourcing company. As you might suspect outsourcing companies usually don't have resources and processes established for development of big systems. Also they don't bother much about quality (no matter what they tell you and which certificates they have) what is crucial in construction of a 24x7 solution. But this is the core of the business - they do not sell products, they sell people (more on this in separate posts). Nevertheless every company has a QA department. So do we. The only problem is that the QA staff is not technical in nature. In other words we don't have software engineers in testing at all. To be honest there are few projects which really need something more advanced than just clicking through the user interface to make sure it works and meet the specification. A small numbers of exceptional projects which needs sophisticated testing tend to have some developers assigned to this activity inside the project team. Sounds not bad - at least we have testing in place. The only problem is that there are a few questions each team has to answer first:

how to setup the testing process?
what and how to test?
how to accumulate the results?
what conclusions can be made on the basis of the data collected?

As you probably understood a team needs to create the whole testing process from scratch knowing literally nothing about how to do it efficiently. In our case the additional question arose: how to do it right for a 24x7 system?

Fortunately we are not alone on the planet :). Here is the excellent video from Google which answers some of the questions above. Enjoy!

Wednesday, 18 April 2007

The first post

Hi!

My name is Yury. I'm a software engineer in one of the small post-soviet countries . I really like to create software and hope to find some new friends sharing my passion through this blog. I never run blogs before and this is my first try. So please be kind and patient. I'm planning to cover wide range of topics and hope you will find some information useful.

Best regards,
Yury

Scalable Corner - Software Development Blog