Fundamental performance and scalability concepts

Web application performance and scalability (2005)

Before exploring performance and scalability issues in web applications, its worth taking a closer look at some universal concepts that are equally used across programming languages, permanent storage technologies and operating systems in the context of performance and scalability topics. In this chapter we'll explore the following concepts in detail:

Performance and scalability
Latency and throughput
Processes and threads
Concurrency and Parallelism
Bits, bytes and encodings
Sessions
Asynchronous design
Decoupling
Caching
Fault tolerance, replication and synchronization
Distributed computing

You will find such concepts recurring themes in the book, reason why its important to familiarize yourself with them from the outset.

Performance and scalability

Performance is a web application's ability to execute at an acceptable level within a given time-span. This makes performance a relative term, since the term 'acceptable level' depends on the nature of a web application and its users.

A web application for a non-profit organization with a few dozen users is more tolerant in what is an acceptable level, than a web application that processes financial securities orders in real-time for an equal number of users. Waiting two or three-seconds for a comment proces in a web application for a non-profit, is surely not a cause for investing more resources in enhancing its performance level to one second or less. On the other hand, a web application taking two or three-seconds to process financial securities orders is unacceptable to even a few dozen users -- since financial securities prices can change in real-time -- in such cases enhancing an application's performance level is vital to fulfilling its purpose.

Web application performance is difficult to quantify because it depends on many variables, just like gas mileage performance on cars. A car's gas mileage performance can vary depending on driving conditions (city vs. highway), tire pressure, air intake, general maintenance, fuel quality, among other things. Similarly, a web application's two or three-second delay could be due to a web server, application business logic, permanent storage system, hosting provider's infrastructure or even an end user's Internet connection.

In a similar fashion, tackling a car's gas mileage performance on different fronts can have different cost-benefit ratios. Adjustment to tire pressure could have a 0.5 miles per gallon performance boost, while greater fuel quality could result in a 2.0 miles per gallon performance boost. Each change requires resources -- in either time or direct expenditure -- which could or could not be worth the cost.

Web applications are similar. You could invest considerable time re-factoring an application's business logic (i.e. code) only to get a negligible performance boost, while a minimal adjustment to a web server or permanent storage system could have a compounded effect on performance. So in a similar fashion to a car's gas mileage, changes to a web application require varying degrees of intervention which could or could not be worth the cost.

In essence, undertaking multiple changes in various tiers can boost web application performance. However, a cost-benefit analysis of each change is vital to ensuring the investment is worth the effort for a particular type of web application. Throughout the book you will learn several performance enhancing techniques for various tiers, all of which will help you gauge the cost-benefit ratios of incorporating such changes into web applications.

Scalability on the other hand is a web application's ability to adapt to growing demands without incurring in major design changes. Scalability and performance are often associated, because performance issues are the root cause of scalability issues.

If a web application is capable of attending twice as many users and performance remains the same, it's said that a web application is scalable, since no changes were necessary to attend a larger amount of users. However, once performance starts to deteriorate -- on account of increasing demand -- it's said that a web application is not scalable.

A web application's scalability is also a relative term, since it depends on the amount of changes needed to support increasing demand. The greater the changes needed to support increasing demand the less scalable a web application, by the same token, the lesser the changes required to support increasing demand the more scalable a web application.

But much like the closely related topic on performance, scalability also has cost-benefit ratios you need to take into account. Applications designed from the outset with the most scalable choices can often be costly, especially when you consider increasing demand may never materialize. The flip side is that making heedless scalability choices at the outset, can make a web application more costly to scale in the long run.

	Cloud computing services & scalability
One of the main advantages of adopting cloud computing services -- such as Amazon's EC2, Google's App Engine or Microsoft's Azure -- is that they have built-in support for addressing many scalability issues. Something which effectively lowers the barriers and costs to build scalable web applications. Instead of incorporating more scalable technologies as a web application grows -- which are difficult to streamline -- or provisioning a web application from the outset with scalable technologies that can go unused -- which are expensive -- a cloud computing service offers a 'pay as you grow' approach. Its value proposition is that once you buy into a service, web applications scale with minimum effort compared to most non-cloud computing scalability techniques. Both cloud computing and non-cloud computing scalability techniques are highlighted throughout the book.

Cloud computing services & scalability

One of the main advantages of adopting cloud computing services -- such as Amazon's EC2, Google's App Engine or Microsoft's Azure -- is that they have built-in support for addressing many scalability issues. Something which effectively lowers the barriers and costs to build scalable web applications.

Instead of incorporating more scalable technologies as a web application grows -- which are difficult to streamline -- or provisioning a web application from the outset with scalable technologies that can go unused -- which are expensive -- a cloud computing service offers a 'pay as you grow' approach.

Its value proposition is that once you buy into a service, web applications scale with minimum effort compared to most non-cloud computing scalability techniques. Both cloud computing and non-cloud computing scalability techniques are highlighted throughout the book.

Latency and throughput

Latency is the delay incurred between a request and it being fulfilled by a responding party. In the context of performance and scalability, you will see the term latency applied to various areas that conform a web application, such as the following:

Latency as the time elapsed between a user's web browser making a request and receiving a response.
Latency as the time elapsed between an application making a request query to a permanent storage system and receiving a response.
Latency as the time elapsed between a permanent storage system writing data to a hard drive and receiving a confirmation response.
Latency as the time elapsed between a request made to physical memory for reading data and receiving a response.

As you can imply from these last examples and depending on the circumstances, latency is measured in seconds, milliseconds or even nanoseconds. Much like performance, it's often the sum of latencies across a web application that make latency notable. And in a similar fashion, there are varying cost/benefit ratios for latencies across application tiers, that can make tackling certain latency bottlenecks more worthwhile than others.

Throughput expresses the amount of requests a responding party can handle per unit of time. Latency and throughput have a similar relation to that of performance and scalability.

A web application capable of maintaining low latency for increasing requests has a high throughput -- just like a web application capable of maintaining a certain level of performance for increasing requests has better scalability. As a web application's latency increases, the same unit of time serves a lower number of requests are, thus reducing throughput.

Throughput is also measured in various units -- not just requests per second -- depending on the medium. For permanent storage systems its common to see throughputs expressed in reads per second, writes per second or queries per second; for web servers throughput is generally measured in requests per minute or second; for CPUs in cycles per second; where as for hard drives in revolutions per minute.

Various parts of the book will use the terms latency and throughput to further explain performance and scalability techniques, so its important your comfortable with their definition and relation to one other.

Processes and threads

Understanding processes and threads plays an important part in optimizing a web application for high levels of performance and scalability. In addition, many of the supporting software used in web applications often make reference to terms like 'thread safe' -- such as a web framework -- or 'multi-threaded' -- such as a web server -- all of which make understanding processes and threads important.

Lets start by analyzing a process from the vantage point of an operating system (OS). Figure 1-1 illustrates the output of executing the ps -uax command on a Linux server which displays an OS's running processes. Figure 1-2 on the other hand, illustrates the task manager on a Windows server invoked with the keyboard combination Ctrl-Alt-Supr, which is also used to display an OS's running processes.

Figure 1-1 - Linux running processes displayed using ps -uax command

Figure 1-2 - Windows running processes displayed in task manager

Each of the lines in the previous figures represents a process. The processes run by an OS vary in nature, they range from administrative programs a user often isn't even aware of (i.e. doesn't use directly, but the OS requires to run properly) to programs belonging to a web application itself (e.g. web server, application run-time's like Java/Python/PHP).

Each process takes up resources in an OS in the form of CPU and memory. If you see these last figures, you will note each line representing a process has these metrics displayed inline with the column headers CPU and Memory. Run-time environments like Java/Python/PHP or web servers often have predetermined amounts of memory assigned to their processes in order to guarantee a minimum level of performance. In fact, it's even possible to modify a process's scheduling priority using a command like nice on Unix/Linux OSs or using the context menu (i.e. right-mouse click) of a process on Windows OSs, so that a process gets higher priority over CPU resources than other processes (e.g. a higher nice process number gets a larger share of CPU time than a lower nice process number).

Such resource assignments are one of the first things you need to understand when discussing processes, which also implies an OS has a finite amount of resources to execute processes at any given time. To layman users and even system administrators, processes are all that's running in an OS. However, as an application designer it's vital that you understand each of these processes is potentially composed of threads. In the simplest of terms, a thread is a process within a process. So even though an OS might represent a program as a single process, such a process might be composed of a pair or dozens of threads each performing a certain task.

Under various circumstances thread usage often appears unwarranted, however the use of threads is unequivocally linked to performing more than one task simultaneously.

One of the classic examples of thread usage has to do with applications involving graphical user interfaces (GUI). Though unrelated to web applications and in this particular case having little to do with performance and scalability issues, these types of applications being popular are sure to make understanding the fundamentals of processes and threads easier.

A GUI application such as a video game generally involves a series of simultaneous tasks: An action figure moving across the screen; moving the mouse pointer so the action figure can select item on the screen; pressing key combintaions on the keyboard so the action figure moves faster; hearing background music in accordance with the actions performed by the action figure. That's four tasks in total.

If you see the processes running on an OS, you won't see a process for the background music, another for the mouse actions, another for the keyboard actions and yet another for the action figure. You will see a single process. However, more importantly is that each of these tasks is performed as a thread, because each task executes simultaneously.

But what happens if threads aren't used in a GUI application such as this ? It can be possible that while the action figure moves across the screen, the mouse pointer starts to sputter, the keyboard becomes unresponsive or you cease to hear the background music. Why ? Since each task depends on the behavior of the action figure, each task would need to finish before another one could start. Even though it might be possible to avoid using threads -- incurring in delays in terms of milliseconds between each task -- the eye can detect such behaviors more easily due to it being a GUI, hence the reliance on threads.

Now lets take the case of an area that's closer to the realm of web applications, a web server. A web server receives incoming requests and dispatches the appropriate information out to requesting users, all in all a simple task.

However, if a web server runs as a single process, this requires that each incoming request take over the process up until sending the information to the requesting user. So what happens if a web server receives a sudden influx of 10 user requests ? Each request has to wait untill all preceding requests come to an end. Such an occurrence can immediately translate into a performance and scalability issue, with some users incurring in greater latency than others.

The first web servers to appear in the market relied on multiple processes. This was an acceptable design trade-off at first, with each process attending incoming requests as they arrived. However, it soon became clear that a single web server process could easily attend various requests by relying on threads, sparing the extra load on a host OS required to run multiple web server processes, with each process requiring more resources.

This brought about multi-threaded web servers, which can attend more user requests per process than if a web server didn't rely on threads -- the last case being one request per process. I should also mention that even though multi-threaded web servers are capable of handling a greater amount of user requests, it's still common to see multiple multi-threaded web server processes being run simultaneously to back high-traffic web applications. To this end, multi-threaded web servers have not entirely supplanted the practice of running multiple web server processes, multi-threading has just augmented the request capacity handled per web server process.

Thread safety - Shared memory and non shared memory

Another aspect associated with threads is thread safety, a concept that is especially important if you will be the designer of an application using threads. Since threads execute simultaneously, this can have adverse consequences on the data on which threads work.

If proper care is not taken, a thread can inadvertently modify or read data while another thread is in the process of modifying or reading the same data. Hence the term thread-safe, indicating that the each thread is safe from interfering with one another.

The process of achieving thread safety -- and as a consequence designing multi-threaded applications -- is highly specific to the programming language you use. However, there a two great groups in which to classify threading techniques: shared memory and non shared memory. Where shared memory basically stands for shared data, but given that threads work on data residing in memory, the term memory is common.

Thread safety is more difficult to do using shared memory due to the inherent problems of multiple threads attempting to work on the same data. These problems can include:

Race conditions.- Occur when two threads access shared data simultaneously. Both threads read the same value, then each thread carries out the operation on the data. Next, both threads race to write the new value for the data. This is problematic because the operation performed on the data by one of the threads is effectively lost. Only the value written by the last thread is preserved.
Dead locks.- Threads use locks to prevent race conditions and avoid simultaneous access. However, dead locks can also occur under such circumstances. Deadlocks occur when two threads attempt to lock common shared data. If thread 1 requires locking variable A & B and thread 2 requires locking B & A, in the first step thread 1 locks variable A and thread 2 locks variable B, at which time a deadlock occurs. Thread 1 is not able to continue because it cannot lock variable B -- since its locked by thread 2 in step one. Similarly, thread 2 is not able to continue because it cannot lock variable A -- since its locked by thread 1 in step one.
Priority failures.- Occur when a group of threads executes in a way, that the work performed on data by certain threads isn't performed in time to make it useful to other threads in the group. An opposite condition to priority failures can occur when a thread is given such a high priority that it 'starves' other threads from terminating their work. In thread parlance this is resource starvation.

Programming languages that rely on shared memory to use threads are subject to the previous problems. No matter what type of constructs or keywords a programming language offers to use threads, it's only design experience that can help ensuure thread-safety in shared memory designs.

An alternative to using threads with shared memory is to use threads with non shared memory, which means threads not sharing data between one another. In such cases, design experience shifts to avoiding threads that need to work on the same data or relying on programming languages that enforce such a design.

Creating threads without the need to work on the same data (i.e. non shared memory) is no simpler than creating thread-safe designs using threads that work on the same data (i.e. shared memory). For example, listing 1-1 illustrates the concept of shared memory design in pseudo-code.

Listing 1-1 - Pseudo-code illustrating shared memory design with global variable

variable  x=0 

// Thread-safety construct needed in method
function computationA() { 
     x = 1
}

// Thread-safety construct needed in method
function computationB() { 
    x = 2 
}

// Thread-safety construct needed in method 
function computationC() {
    x = 10 
}

Thread.call(computationA)
print x 
Thread.call(computationB)
print x
Thread.call(computationC)
print x

Listing 1-1 illustrates a particularly common pattern in most programming languages, three functions that do operations on a global variable. Global variables are a prime example of a shared memory scenario. Under most circumstances, global variables are more practical and quicker to make, which is why they are widely used. In fact, global variables have no clear drawbacks, except when using threads.

By calling the three functions in a serial way -- no threads -- there is no possibility for the global variable to end up with a conflicting value. The serial execution order would consist of calling function A, then function B followed by function C, the corresponding printed values of x would be 1, 2 and 10.

However,by using threads, there is no guarantee that threads will be executed in the order A,B, and then C. Remember, the purpose of threads is to execute them simultaneously, so its possible for function B to take longer to finish than function C. So what would be the printed value of x using threads ? Undetermined, unless you rely on a programming language's constructs to make sure the global variable is synchronized and not prone to the shared memory conflicts mentioned earlier (i.e. race conditions, dead locks,etc).

Depending on the programming language, this process can consist of special keywords or designated blocks to ensure thread safety. In essence, you are free to use threads, but the only safety net you have to enforce thread safety is through your design experience.

Such design techniques -- many of which vary depending on the programming language -- are wide in scope, but I will mention a few in pseudo-code so you can gain a better understanding of them. Listing 1-2 illustrates a modified version of the same shared memory design but without using a global variable.

Listing 1-2 - Pseudo-code illustrating modified shared memory design without global variable

function computationA(variable  x) {
     x = 1 
     return x
}

function computationB(variable  x) {
     x = 2
     return x
}

function computationC(variable  x) {
     x = 10 
     return x
}

variable  x=0 
print Thread.call(computationA(x))
print Thread.call(computationB(x))
print Thread.call(computationC(x))

This last listing relies on functional design. The same three functions and threads are still used, but notice how the functions have no side-effects, which is to say neither function modifies data outside of its scope. The only data modification that takes place is in each function, applied to the input value and returned once modified.

Even though the execution order of threads is not guaranteed -- the printing order for each thread can vary -- unlike listing 1-1 there is no danger of one thread unintentionally overwriting the value of x over another. A call to function A always returns 1, a call to function B always returns 2 and a call to function C always returns 10.

Thus functional design makes ensuring thread-safety easier. Not a guarantee by any means, since guaranteeing the execution order of threads still requires some type of synchronization construct. But as you can attest, functional design makes applications less convoluted, easier to debug in case threading problems arise and less prone to the shared memory threading problems discussed earlier.

The degree in functional design is often associated with programming languages. Some programming languages have inherently more functional behavior than others. Functional programming languages such as Haskell, make it easier to create designs like the one in listing 1-2 due to the language's rules -- you effectively gain crutches or a safety net -- where as languages like Java or Python require more design forethought to use functional design.

Programming languages also rely on other alternatives to make executing multiple threads simultaneously easier. One particular approach that builds on non shared memory principles is message passing. In programming languages using this approach -- such as Erlang -- there is no concept of 'threads' per se, but rather each task is a self-contained process. If communication is necessary between processes (i.e.'threads'), message passing between processes guarantees both data integrity and orderly execution (i.e.'thread-safety'). Listing 1-3 shows the pseudo-code for this design.

Listing 1-3 - Pseudo-code illustrating message passing between processes

Ping(N, Process_Pong)
     SEND_MESSAGE (Ping, self.Process) TO Process_Pong
     WAIT_ON_MESSAGE_TO_MATCH:
              Pong(): 
                   print "Ping() received pong"
     END_ON_MESSAGE_TO_MATCH
     IF N=0: 
        SEND_MESSAGE Finished TO Process_Pong
        print "Ping() Finished"   
     ELSE: 
        Ping(N-1, Process_Pong)

Pong()
     WAIT_ON_MESSAGE_TO_MATCH:
               Finished:
                    print "Pong() finished"
               (Ping, Process_Ping): 
                    print "Pong() received ping"
                    SEND_MESSAGE  Pong() TO Process_Ping
                    Pong()
     END_ON_MESSAGE_MATCH_MESSAGE

Process_Pong = Create(Pong())
Create( Ping(3, Process_Pong) )

When you look at this pseudo-code listing don't think of it as functions -- even though it has this structure -- but rather as processes. Notice how message passing enforces functional style by inputting not just a variable, but the entire process (i.e. 'thread') representing the task. The events taking place in this pseudo-code are the following:

1.- The creation of a Pong() process (Line 23) assigned to the variable Process_Pong.
2.- The Pong() process (Line 13 to 21) doesn't do anything until it receives a message (Line 14), so it stands-by.
3.- The creation of a Ping() process (Line 24) assigned to two input variables, a 3 and Process_Pong -- the last of which represents the Pong() process (Line 13 to 21).
4.- The Ping() process (Line 1 to 11) immediately sends the message (Ping, self.Process) (Line 2) to the Pong() process via the local Process_Pong reference. Where self.Process represents the Ping() process. The Ping() process will then wait (Line 3) until it receives a message.
5.- Since the Pong() process is waiting for a message (Line 14), the (Ping, Process_Ping) message sent in the last step will result in a match (Line 17).
6.- Upon matching (Line 17), the message 'Pong() received ping' is printed (Line 18); sending the message Pong() to the Ping() process via the local Process_Ping reference (Line 19); and the Pong() process is re-called (Line 20) so it will once again wait for a message (Line 14).
7.- Since the only message the Ping() process is waiting for is (Pong), a match occurs (Line 4).
8.- Upon matching, the message 'Ping() received pong' is printed (Line 5).
9.- Then a conditional based on the N variable occurs (Line 7). If N is different from 0, the Ping() process executes with N-1 (Line 11). This re-initiates the sequence starting from step 4.
10.- Once N equals 0, the Pong() process receives message Finished via the local Process_Pong reference (Line 8), in addition to the 'Ping() finished' message being printed (Line 9).
11.- Since among the messages the Pong() process is waiting for is Finished (Line 15), a match occurs, resulting in the 'Pong() finished' message being printed (Line 16).

This sequence of events results in messages being printed in the following order:

      Pong() received ping
      Ping() received pong
      Pong() received ping
      Ping() received pong
      Pong() received ping
      Ping() received pong
      Ping() finished
      Pong() finished

In this case, data integrity and order execution (i.e. 'thread safety') guarantees are due to the message passing design. Each process waits for certain messages before it proceeds with its execution, ensuring an orderly sequence. In addition, integrity is also assured because a process is self-contained. A process cannot randomly access data in another process -- as in threaded shared memory designs -- the only way to communicate with data in another process is by passing messages to it.

This approach is substantially different from the more mainstream languages like Java or Python. In part because it requires thinking of a problem in another set of abstractions, more functional in nature, that may not necessarily be easy depending on the problem an application is trying to solve. Still, this is just another alternative for executing multiple 'threads' simultaneously in a safe way.

Another mechanism for achieving thread-safe design is the Actor model . The actor model operates on the same principals of message passing. Languages like Java and Python have libraries to support this type of design. Under such circumstances, tasks are not 'threads' but rather Actors, with each Actor being self-contained and relying on message passing to communicate safely among one another.

The Actor model is a special case of safely executing multiple tasks simultaneously. Because the Actor model is a general purpose algorithm, you are likely to find libraries for many programming languages that allow you to use this type of design.

In addition to these approaches, programming languages also have special APIs for dealing with these same problems. For example, more recent Java versions offer an API called the fork-join framework, which provides a set of features for tackling these same issues.

Future sections of the book will explore a series of these approaches using specific programming languages. For the moment, being aware of the difference in processes and threads, as well as the approaches and difficulties of making threads -- or similar abstraction (e.g. Actors) -- execute simultaneously, should suffice. In the next section, I will discuss two related topics to threads, that illustrate the importance and need of threads in the context of performance and scalability.

Concurrency and Parallelism

Concurrency is executing simultaneous tasks. As you learned in the last section, threads offer a means to do this. Whether it's the case of a GUI application using threads to improve responsiveness or a web server attending a greater number of requests, threads allow the execution of simultaneous tasks.

However, in addition to the different approaches (i.e. shared memory or non shared memory) you can use to incorporate threads in a web application's design, there is the issue of the resources needed to execute threads. You're already aware thread design requires contemplating memory resources. But in addition to memory resources, a thread can also need other resources like a CPU or data stored on a permanent storage system to complete its logic.

Like any other resource, CPUs and a permanent storage system (i.e. its hard drives) are available in limited quantities to a web application, something that can eventually become a performance and scalability concern.

If a web application is capable of running concurrent threads and the tasks performed by each thread need a substantial amount of CPU cycles or the retrieval of data, you can have a problem. For if threads only have access to a single CPU or a single access point is available to get data, it won't matter how many threads an application uses, latency and throughput will suffer since each thread competes for the same resource.

The solution to this problem is to increase the amount of CPUs or access points for stored data (i.e. hard drives), therefore allowing parallel execution of an application's tasks. Parallelism thus consists of executing simultaneous tasks, leveraging more than a single resource point.

Generally it's an operating system or run-time environment that enforces the act of parallelism. For example, in a CPU intensive multi-threaded application, the operating system or run-time environment delegates threads to whatever CPUs are available at any given moment. In such cases, since CPUs offer the same capabilities, threads are executed in parallel on a first come first serve basis.

Parallelism performed on data access points is a little different. It presents some of the same problems as shared memory threading, discussed earlier. Since parallel data consists of reading or writing to multiple locations (i.e. hard drives), it's essential that a synchronization process be established, ensuring the data in one location is the same as other locations. An upcoming section describes this synchronization process along with the related concept of replication.

Since parallelism implies taking advantage of multiple resources -- whether CPUs or data access points -- it's often a closely related term to concurrency. The logic behind this relation is simple, since concurrency implies simultaneous execution and parallelism the usage of multiple resources, what better way to enhance performance and scalability than to exploit multiple resources simultaneously.

Throughout the book you will see the terms concurrency and parallelism used in a variety of areas, so understanding their meaning is important for applying a series of performance and scalability techniques.

Bits, bytes and encodings

Technology is constantly evolving to address limits imposed by what's considered mainstream design. In software, there is no more clear pattern of this than the use 8-bit, 16-bit, 32-bit, 64-bit and 128-bit technology. Its usage influences everything from character encodings, physical memory limits (RAM), CPUs, screen resolutions, permanent storage systems up to the operating system itself. And as you'll likely and correctly assume, an application's performance and scalability.

A bit is the basic unit used to express binary state, it can have a value of either 0 or 1. By using several bits, its possible to represent an increasing amount of states, which can then be used to represent things like integers, colors and memory addresses, to name a few areas. Table 1-1 shows a series bit ranges and the amount of states they can represent.

Table 1-1 - Bit ranges and number of states

Bit range	States exponential	States
8-bits	2^8	256
16-bits	2^16	65,536
32-bits	2^32	16,777,216
64-bits	2^64	4,294,967,296
128-bits	2^128	1,099,511,627,776

If you look closely at the amount of states in each bit range, they are likely to look familiar as limits on a variety of software related topics. An 8-bit color palette will consist of 256 colors, the limit to 16-bit integers is 65,536 values, 32-bit systems will have 16,777,216 memory addresses, and so on.


	You may see some of the states in table 1-1 expressed minus 1 (e.g. 255, 65,535, 16,777,215), this would be due to counting 0 as a state.

In the area of performance and scalability, having a 32-bit OS vs. a 64-bit OS or running a 32-bit application vs. a 64-bit application can have wide implications. These issues can range from the more widely known physical memory limits, a file's largest size, to performance related topics having to do with an application's run-time (e.g. Java). Where pertinent, each of the book's sections will elaborate on x-bit architectures and their influence on performance and scalability.

In addition to bits, many system metrics are also expressed in bytes. For the topics presented in this book, there isn't a need to detail any deeper correlation between bits and bytes, except that a byte consists of 8 bits. For expressing larger quantities of bytes though -- such as those related to physical memory, files sizes or bandwidth -- there is a certain notation used throughout the book that is best explained here.

The technology sector has long relied on the metric system to express quantities in bytes. You will often hear the terms kilobyte, megabyte and gigabyte to express memory, storage capacities and bandwidth. Such prefixes in the metric system represent the following numbers: kilo=1000, mega=1000^2 and giga=1000^3.

However, in technology related areas saying that a kilobyte equals 1000 bytes is often incorrect, it's generally 1024 bytes. The reason behind this somewhat awkward math is that technology relies on expressing binary states, which are either 1s or 0s. This creates a base-2 notation system, where a kilo expresses 2 to the power 10 (2^10), which in fact equals 1024.

Many in the technology sector have long been aware of this math, however, this is not to say it isn't cause for confusion. Storage systems often express 1000 byte files as 1 megabyte -- which is linguistically correct. But is a 1 megabyte file of the 1000 byte (decimal) kind ? Or a 1 megabyte file of the 1024 byte (binary) kind ? No clear answer of course, with larger quantities leading to potentially larger gaps. Even though in areas like physical memory it's assumed 1 megabyte is always 1024 bytes, the definition is still ambiguous -- after all, linguistically kilo is 1000.

The gist of the matter is that the technology sector has relied on metric system prefixes to express binary quantities as a convenience. This has long required a mental note on the fact that kilo in most technology contexts is not 1000 but rather 1024, the same applying for the remaining metric system prefixes.

In 1998 to address this issue, the International Electrotechnical Commission (IEC), endorsed by Institute of Electrical and Electronics Engineers (IEEE) and the International Committee for Weights and Measures (CIPM) defined a special notation by which to express binary prefixes. Table 1-2 shows a list of decimal and binary prefixes, including their abbreviations.

Table 1-2 - Decimal and binary prefixes, including abbreviations

Prefix	Quantity in units (bytes or bits)	Expression/Abbreviation for bytes	Expression/Abbreviation for bits*
Decimal
Kilo	1000	Kilobyte/KB	Kilobit/Kb
Mega	1000^2	Megabyte/MB	Megabit/Mb
Giga	1000^3	Gigabyte/GB	Gigabit/Gb
Tera	1000^4	Terabyte/TB	Terabit/Tb
Peta	1000^5	Petabyte/PB	Petabit/Pb
Exa	1000^6	Exabyte/EB	Exabit/Eb
Zetta	1000^7	Zettabyte/ZB	Zettabit/Zb
Yotta	1000^8	Yottabyte/YB	Yottabit/Yb
Binary
Kibi	1024	Kibibyte/KiB	Kibibit/Kib
Mebi	1024^2	Mebibyte/MiB	Mebibit/Mib
Gibi	1024^3	Gibibyte/GiB	Gibibit/Gib
Tebi	1024^4	Tebibyte/TiB	Tebibit/Tib
Pebi	1024^5	Pebibyte/PiB	Pebibit/Pib
Exbi	1024^6	Exbibyte/EiB	Exbibit/Eib
Zebi	1024^7	Zebibyte/ZiB	Zebibit/Zib
Yobi	1024^8	Yobibyte/YiB	Yobibit/Yib
*Note the only difference in abbreviations between bit and byte expressions is that byte expressions use uppercase (B) and bit expressions use lowercase (b).

As you can see in table 1-2, these binary prefixes effectively clear the ambiguous nature of metric prefixes. Making 1000 bytes a kilobyte, 1000^2 bytes a megabyte and 1024 bytes a kibibyte, 1024^2 bytes a gibibyte, and so on.

The problem is that even though these terms clear the ambiguous nature of metric based prefixes in technology -- like anything related to standards -- metric based prefixes are still entrenched in the technology sector, that it makes transitioning to using binary prefixes slow.

However, the book will use these relatively new -- or rather slowly adopted -- binary prefix notations for expressing bit and byte quantities where applicable.

Another fundamental topic in web applications are encodings. In fact, not only are encodings relevant to performance and scalability topics, they're also an important topic for web application usability. If different encodings are used or required throughout a web application, it can make a web application unusable or require considerable efforts to deal with encoding conversions.

There are many types of encodings, whether you're dealing with images, sound clips, video clips or character data (i.e. letters, numbers and symbols).

When an image, audio clip or video clip is used with a web application, it's encoded from its original form (Camera, Microphone, DVD). When an web application is accessed, the receiving party decodes the image, audio clip or video clip in order to view it or listen to it. This act of encoding and decoding has made the available mechanisms for this process be called codecs -- the combination of coder-decoder.

There are well over 100 codecs available for images, audio clips and video clips, with their differences ranging from compression ratios to licensing-royalty schemes. In web applications though, there's a set of popular codecs used in the majority of cases. For images common codecs include JPEG, GIF, and PNG, for audio clips common codecs include AAC, Vorbis, & WMA and for video clips common codecs include H.264, VP8 and WMV.

Some of these codecs are likely to sound familiar to you, but others may not. Image codecs are the more widely known because they're also used to classify image types. But have you ever heard Vorbis audio clips ? Or seen H.264 video clips ? You probably have, but as a Quicktime audio clip or Flash video clip, respectively.

The reason audio and video codecs are not as widely known, is due to audio clips and video clips being distributed in container formats (e.g. Quicktime, Flash). Container formats are not as important to performance issues as codecs -- container formats are used for adding things like meta-data (subtitles, menus) and frame/bit rates to both types of media clips. For this reason, further discussions in the book on audio clips and video clips is based on codecs. Part II of the book discusses detailed performance considerations for image, audio and video codecs.

Character data which consists of the letters, numbers and symbols used in web applications also uses an encoding process similar to images, audio clips and video clips. Character data is encoded and decoded in several parts of a web application.

When you write character data in a text editor or IDE for a web application, it's encoded. When a user's browser receives a web application's content, it's decoded. When data is saved to a web application's permanent storage system (e.g. RDBMS), it's encoded. When a web application's business logic code reads data from a permanent storage system, it's decoded. This same encoding and decoding process can take place at several other parts of a web application that require reading and writing character data.

But unlike images, audio clips and video clips that use the term codecs to refer to the encoding process, character data encoding operates on the basis of character encodings, also called character sets, charsets, character maps or charmaps

Similar to codecs, there are more than 50 character encodings to choose from. The differences in character encodings are due to different representations of letters, numbers and symbols between countries (e.g. Germany, Japan, France and China use certain character representations in their languages not used in languages in other countries), in addition to manufacturers designing software for a particular country market. But similar to a certain group of codecs popular in most web applications, there's also a set of popular character encodings for web applications that include: ASCII, ISO-8859-1, Windows-1252 and UTF-8.

In terms of performance, character encodings are important because they influence the space needed to represent characters. Some character encodings are known as single-byte encodings, while others are multi-byte encodings. Of the multi-byte kind, variable-width encodings are the more versatile, since they can represent characters using 1-byte, 2-bytes or more bytes depending on the character they're trying to represent.

Choosing a web application's character encoding solely on the basis of performance (i.e. the amount of space it uses to represent characters) is generally a bad idea, since it can seriously impact a web application's usability. With variable-width encodings, you can choose a character encoding that's versatile to single or multiple bytes, which is often a better tradeoff -- favoring usability -- than one of strictly following byte efficiency by using single-byte encodings. To understand this, it's necessary to describe the consequences of using single-byte character encodings in a web application.

If you write web applications targeting a particular region in the world, single-byte character encodings like ISO-8859-1, Windows-1252 and even ASCII, can represent the most efficient character encodings. However, even though you're getting performance efficiency by using a single byte to represent each character, you're limiting a web application's usability because single-byte encodings are only capable of representing up to 256 characters (1 byte = 8 bits = 2^8 positions = 256 characters).

ASCII which was one of the earliest character encodings, in fact only uses 7 bits out of the possible 8 bits in a byte to represent characters. The reasoning behind this logic in ASCII -- made around the 1960's -- was that when you counted all the possible alphanumeric characters used by computers, the letters A to Z upper and lower case, the numbers 0 to 9 and special characters like %, *, ? among others things, the sum came to less than 100 characters. Given that 7 bits or 2^7 positions equals 128, 7-bits for character representations was enough, leaving the remaining 8th bit as a parity bit to detect transmission errors -- this again was the 1960's when network transmissions were in their infancy and not very reliable.

ASCII served its purpose until computers required to represent more characters. This brought the need to either make use of the 8th bit in ASCII to increment the total number of character representations in a byte to 256 or use multi-byte character encodings to break the 256 limit on single-byte character encodings.

The first change came with single-byte/8-bit character encodings. Among these single-byte/8-bit character encodings came ISO-8859-1 and Windows-1252. With 256 positions available to represent characters -- double the amount available in ASCII -- many new characters could be supported. The British were now able to represent their pound sign £ on their applications, the French their cedilla ç on their applications and the Spanish their vocals with accents á, é, í, ó, ú on their own applications.

Everyone happy, right ? Not exactly, even though 256 characters were enough to accommodate web applications targeting British, French and Spanish citizens, web applications requiring to represent characters in Arab, Hebrew or Nordic languages (e.g. Swedish, Danish, Norwegian), got squeezed out of this 256 character space. This led to multiple ISO-8859 regional character encodings. ISO-8859-6 supporting characters from the Arab alphabet, ISO-8859-8 supporting Hebrew letters and ISO-8859-10 supporting Nordic languages. As well as the equivalent Windows character encoding variations, like Windows-1256 supporting Arabic characters and Windows-1255 supporting Hebrew characters.

This need to encode characters in a single byte leads to usability problems. For example, the 232nd position in the ISO-8859-1 character encoding represents the è character (Latin small letter e with grave accent). But since other languages need space for their own characters and can do without this particular character, the 232nd position in the ISO-8859-6 character encoding represents the و character (Arabic Letter waw), where as the 232nd position in the ISO-8859-8 character encoding represents the ט character (Hebrew letter tet) and the 232nd position in the ISO-8859-10 represents the č character (Grapheme or latin c with háček).

Care to guess what happens if a web application's permanent storage system uses the ISO-8859-1 character encoding to store data and the business logic code attempts to read it as Windows-1256 ? Or a web application's HTML content is written with the ISO-8859-6 character encoding but accessed with a browser incapable of detecting this encoding ? You'll see character data as either squiggles (e.g. ), question marks (e.g. �,�,�) or characters with different meaning (e.g. instead of an expected è character, you could see ט, و or č). Yes, you gain performance using a single-byte to represent each character, but this is the potential usability penalty you pay for doing so.

Meanwhile, while this single-byte character encoding fragmentation process took place, users in other parts of the world like China, Japan and Korea found the idea of using a single byte to encode characters laughable. Languages like Chinese, Japanese and Korean rely heavily on ideographs -- symbols representing the meaning of a word, not the sounds of a language -- so with single-byte character encodings limited to representing 256 characters they prove inadequate for these languages that can have thousands of characters. These special symbols used in Chinese, Japanese and Korean are often called CJK characters, with CJK being an acronym made up from each language's initial.To address such needs web applications targeting Chinese, Japanese and Korean speakers require to use multi-byte character encodings.

This brought about a series of multi-byte character encodings, such as IEC-2022-JP and Shift-JIS to support Japanese, IEC-2022-KR and KSX1001 to support Korean, as well as GB-2312 and GBK to support Chinese. By using multiple-bytes, an enormous amount of positions are made available for character representations (e.g. 2 bytes with 8 bits each has a potential for 2^16=65,536 character representations and 3 bytes with 8 bits each has a potential for 2^24=16,777,216 character representations), more than enough to satisfy thousands of ideographs in Chinese, Japanese and Korean.

But would you dare guess what happens if a web application's permanent storage system uses the IEC-2022-JP character encoding to store data and you use a run-time (e.g. Java, Python, Ruby) built using ISO-8859-1 to read this data and do a business process ? Or a web application's HTML content is written with the GBK character encoding but accessed with a browser in Europe incapable of detecting this encoding ? You'll be back to seeing squiggles, question marks or characters with different meaning. This is because, multi-byte character encodings also define meanings for each position in their bytes (e.g. the 232nd position in a multi-byte character encoding is used to represent a meaningful character to that particular encoding, like a kanji, hiragana or katakana character, similar to how the 232nd position in an Arabic focused character encoding is used to represent an Arab letter or the 232nd position in a Hebrew focused character encoding is used to represent Hebrew letters).

So is there a solution to all this character encoding madness ? Yes, it's called Unicode, and it's one of the leading variable-width encodings used in software. There are actually several types of Unicode, but for web applications UTF-8 is the dominant choice. UTF-16 and UTF-32 are the other types of Unicode, but UTF-16 is primarily used in run-time platforms like Java and Windows OS, where as UTF-32 is used in several Unix OS. The differences consist of UTF-8 using one to four 8-bit bytes to represent characters, UTF-16 one or two 16-bit code units to represent characters and UTF-32 using a single 32-bit code unit to represent characters.

Getting back to UTF-8 which is the preferred Unicode choice for web applications. How is it that UTF-8 solves the problem of multiple character encodings not getting in the way of each other ? The answer is simple, in UTF-8 there's only a single character for every byte position that is equally interpreted on every system in the world supporting UTF-8. This means that in UTF-8, the 232nd position in its first byte will always represent the è character (Latin small letter e with grave accent), whether it's accesed on a system in Saudi Arabia, Israel or Norway.

So what happens if a web application requires using characters like و (Arabic Letter waw), ט (Hebrew letter tet) or č (Grapheme or latin c with háček) ? UTF-8 also assigns each of these characters an exclusive byte position to unequivocally represent such characters on any system in the world. UTF-8 can do this because it's designed to use up to 4-bytes to represent characters.

UTF-8 defines these exclusive byte positions for characters used in languages with Latin letters (e.g. English, Spanish, Italian Portuguese, German,etc), as well as Arabic, Hebrew, Nordic languages, Chinese, Japanese, Korean, in addition to other characters used in Ethiopic, Cherokee, Mongolian, Thai and Tibetan, to name a few. UTF-8 also defines exclusive byte positions for special characters like currency symbols, emoticons and even music symbols, also to name a few. In essence, UTF-8 provides software representations for practically every character imaginable that's used in computers. You can take a look at the entire set of UTF-8 characters by consulting references like the Unicode charts or this Unicode character chart with named and HTML entities .

The first issue than can come to mind about supporting this amount of characters in UTF-8 is the average number of bytes needed to represent characters. If encoding a basic character like the letter A requires using 4-bytes, this can translate into exponential byte growth requiring more bandwidth, memory and storage for each character, all this overhead just to support characters in Ethiopic, Cherokee or Mongolian a web application will never use.

You can relax on this particular issue. UTF-8 is a variable-width encoding character set, which means even though it can use up to 4-bytes to represent characters, it doesn't mean it requires using all 4-bytes to do so, certain characters are represented using just 1-byte, which makes the overhead concerns of exponential byte growth unwarranted.

UTF-8 designers took a very clever approach to the way characters are assigned among this possible 4-byte representation. The 1st byte in UTF-8 supports 7 bits for character representations (2^7= 128 characters or code points). Can you guess which characters got the privilege of being represented as a single-byte ? The most common set used in computers of course, the same sub-100 character set defined in ASCII in the 1960's. What this means is that the most prevalent set of characters used in software, even encoded using UTF-8, need just 1 byte per character representation, just like single-byte encoding characters sets like ASCII, ISO-8859-1 and Windows-1252.

So why doesn't UTF-8 use all 8-bits in the 1st byte to represent characters ? Since UTF-8 is a variable-width encoding character set, it needs a way to indicate if a character is made up of a single-byte or multiple-bytes. Therefore one bit of the 1st byte in UTF-8 character representations is reserved for this purpose. Following this same rational, 2 bytes in UTF-8 support 11 bits for character representations (2^11 = 2048 characters or code points), 3 bytes in UTF-8 support 16 bits for character representations (2^16 = 65,536 characters or code points) and 4 bytes support 21 bits for character representations (2^21 = 2,097,152 characters or code points). Here again, the reason UTF-8 doesn't use the entire 8-bit spectrum across the 4 available bytes, is due to UTF-8 reserving bits in each byte to determine if it's a single-byte code point, a multi-byte code point or a continuation of a multi-byte code point.

But wait, does this mean UTF-8 requires 2-bytes to represent characters that in character encodings like ISO-8859 and Windows-1252 could be represented with 1-byte ? Yes, since one bit in the 1st byte of UTF-8 characters is reserved. But don't get hung-up on small details. How many web applications have you written made up entirely of characters in the upper boundary -- above ASCII characters -- of single-byte character sets ? Characters like £, ç, á, é, í, ó, ú ? Not many I would think. Since such characters are used sparingly, the tradeoff between using 2-bytes instead of 1-byte for such characters, is well worth it when you consider the usability benefits of UTF-8.

But wait, what about CJK characters, won't UTF-8 need 3-bytes or even 4-bytes to represent them ? Yes, but again this shouldn't be an issue considering the usability benefits of UTF-8. Web applications that require CJK characters one way or another need multi-byte character encodings, so it's not like UTF-8 adds extra overhead for representing characters that would still need multiple-bytes to be represented.

Even though the maximum character definitions permitted in UTF-8 is 2,097,152, the most recent Unicode standard version 6.0 released in late 2010 defines a little over 100,000 characters or code points -- 109,449 to be exact. Among these 100,000 characters you'll find definitions for the barrage of characters already described earlier like Cherokee and Mongolian, all the way up to Vedic Sanskrit. Considering there's still room to add characters to the tune of nearly 2,000,000 character representations, UTF-8/Unicode is likely to accommodate characters for some civilizations to come.

As you've now learned, choosing a character encoding can impact several parts of a web application, from the configuration and installation of a permanent storage system, to the creation of static content and business logic development. Throughout the book and where pertinent you'll see several notes indicating the importance of character encoding selection, even though I would recommend you consider UTF-8 character encoding for all your web application needs, given its versatiltiy.

Sessions

The subject of sessions will be a recurring theme throughout the book, one that you will realize is often the crux of most performance and scalability issues. To explain sessions, it's best to describe their need and how web applications creates them.

The first time a user visits a web application, the underlying web framework creates a session on a user's behalf. A session that will hold personalized information needed to customize a web application to the needs of a user, during the time his session is active.

Since a session's information is held on the server, all the following requests made by a user will need to have an identifier so the web application can detect which session to associate with each request. This is due to the stateless HTTP protocol on which the interaction between user (i.e. web browser) and web application (i.e. server) occur. If a user didn't provide an identifier of this kind on subsequent requests, a user would always appear anonymous.

Practically all modern web framework relies on one of two forms to provide a user with a session identifier: A cookie or a URL parameter appended to all user requests, the latter of which is common when a user has disabled the use of cookies and a technique called URL rewriting. Figure 1-3 illustrates the various scenarios for assigning a session identifier to a user.

Figure 1-3 - Session identifier assignment process

As you can see in figure 1-3, the first step represents a user making a request to a web application's main page -- index -- illustrating the structure of a request's HTTP headers. The second step consists of the server responding with the requested web application content. Notice that in this last response, the contents of the index page along with the HTTP header Set-Cookie: sessionid=999999 are sent.

Generally, web frameworks include this last header automatically, providing a unique value for each user. It's also worth mentioning, most web frameworks offer some type of function (API) for customizing and setting this header. In addition, a cookie's value can also have more data, including an expiration date or be set for specific URLs of a site, based on criteria either set by the web framework or your own requirements. However, I will describe the use of cookies in detail in part II of the book.

When the browser receives the index page, it will inspect the HTTP headers and notice the Set-Cookie header, at which point the browser attempts to save the cookie locally, associating it with the site that sent it. Next, one of three things can happen.

If a user's browser accepts cookies -- represented as case a) in figure 1-3 -- any subsequent requests made to a site's web application will contain the HTTP header Cookie:sessionid=999999. In this case, a request for the reports page would contain this last header. Upon arrival, the web framework is able to identify the incoming request for reports as a returning user with sessionid=999999 and personalize the response with data stored in a user's session (e.g. 'Welcome John Smith'). As a side note, similar to setting the HTTP header Set-Cookie, most web framework provide some type of function (API) for extracting the value of the HTTP header Cookie, at which point its possible to personalize the response data based on this value.

The second case -- illustrated as case b) in figure 1-3 -- represents a user's browser rejecting the saving of cookies. In this case, subsequent requests made to a site's web application would not include the HTTP header Cookie.

When such a request arrives for the reports page, the web framework will be incapable of extracting a session identifier, at which time personalization of the reports page will fail. This second case illustrates the most common response to users denying a session identifier in the form of a cookie, telling them to activate cookie reception to go ahead.

Some web frameworks though, support an alternative to tracking a session identifier when users (i.e. browsers) reject cookies -- illustrated as case c) in figure 1-3. In such cases, when a web framework receives a request for a page like reports that requires a session identifier to personalize, it will resort to URL rewriting.

URL rewriting if often a fall-back option when a web framework detects that the default cookie -- sent on the first response made by an application -- is not being sent on subsequent request HTTP headers.

So for example, if a user requests the reports page without a cookie, a user would be re-directed to the originating document -- in this case index -- but with its various URLs rewritten and appended with a session identifier. This would result in a user requesting a URL in the form reports?sessionid=999999, allowing the web framework to personalize the content and respond appropriately.

Though URL rewriting solves the problem of cookie blocking, its use if often discouraged for various reasons, especially when the alternative of telling a user to activate cookies is relatively easy and less resource intensive.

One of the primary drawbacks of URL rewriting is that a page's links (<a>), forms (<form>) and other elements relying on URLs is the need to rewrite them dynamically. This is both a performance and management problem. If you have 100 URLs on a page, rewriting them for each user will be a performance hit and a process that can easily be avoided by forcing a user to activate cookies. From a management perspective, you need to ensure the usage of a web framework's HTML markup syntax (e.g. <html:a>,<html:form>) for creating URL related content. This is the only way to ensure HTML markup syntax with URLs is rewritten, since URL related resources using standard HTML markup (e.g <a>,<form>) aren't detectable by a framework for rewriting.

Another drawback that is often cited for URL rewriting with a session identifier is search engine indexing and bookmarking. This results in an application page never having a 'well known' address. If you attempt to bookmark the content for a page like reports?sessionid=999999 it might cease to work after a few hours or days when the server expires such a session. Equally, if a search engine attempts to index your page, it's likely to get its own session identifier (sessionid=777777), afterward any person making a search is sent to an address like reports?sessionid=777777, which similarly may stop to work after a predetermined time.

Now that you know how web applications track sessions, I will elaborate on the data that is often tracked in sessions. Session data is kept in one of three places: a cookie, a user's session managed by the web framework or a permanent storage system.

So lets assume the user with a sessionid=999999 cookie has now identified himself for the first time as 'John Smith' and you want to present him with a page 'Welcome John Smith'.

The easiest way to do this at first is by appending 'John Smith' to the value of a cookie. But next, lets say you request John Smith's address and he introduces: 'Main Street #255, New York, New York'. You could still append this information to a user's cookie, but you would be pushing the limits and purpose of a cookie.

The limit on cookies is 4096 bytes or 4 KiB of information, mainly because each request made by a user's browser to a web application sends this information. In addition, it's considered a bad security practice to place personalized information on a cookie (e.g. 'John Smith, Main Street #255, New York, New York'), generally more simple and less revealing indicators (e.g. user=355545, c=42,r=32353,hometown=ENS) are used that are later associated with data managed by a web framework (i.e. in the server)

So with no more reliance on cookies, what do you next ? You are left with a user's session managed by the web framework. Since a web framework receives a session identifier in the form of a cookie on each request, a web framework can create a session space to save any type of data required on subsequent requests.

This means that information like 'John Smith' and 'Main Street #255, New York, New York' can be saved into a user's session and can later be retrieved from any other part of an application to personalize content. There is literally no limit about what you can hold in a user's session -- held by the web framework on the server -- to build custom content.

However, bear in mind that John Smith is only one user. What if 1,000 users had an open session with similar information ? And this included not only name and address, but dozens of items like telephone, favorite foods and favorite books ? You would need a lot of resources.

The problem is that each item kept in a user's session takes up resources in the form of memory. So even though 1,000 sessions with a user's name apparently take up little resources, when compounded with other fields like addresses or favorite books, the amount of memory needed to support 1,000 sessions can grow exponentially.

At this juncture you will need to consider a storage system for various reasons: a server crash, in which case a user's session data would be lost since it was in memory; wanting to save this information permanently, for a user's future sessions; or offloading session data to permanent storage to reduce resource usage.

Realizing it's not convenient to store a lot of data on a user's session and that it's actually rare you will need every single piece of data to be in user's session at any given time, lets analyze the extreme case of moving all session data to a storage system. You would only leave something like an id in a user's session and rely on this id to access all data from a permanent storage when it's needed.

Though minimizing the data maintained on a user's session is apparently a great idea to cut resource consumption, taken to the extreme it generates another problem with the storage system. Moving session data to a storage system consists of two operations, one is writing it to the storage system and the other is reading it from the storage system when its needed. This process places a heavy load on the Input/Output (I/O) operations performed on the hard drives on which the storage system is located, as well as occasionally the amount of CPU cycles needed by the storage system to synthesize this work (e.g. Format it and find it).

Since you've restricted a user's session data -- the one managed by the web framework -- to only an id, lets assume the same 1,000 users access the application. Now the application needs to do an initial 1,000 write operations for each field a user submits, as well as consume CPU cycles to synthesize the data in the format required by the storage system. In addition, each time a user needs personalized content, a web application also needs to read this data from the storage system -- potentially leading to a 1,000 read operations per field and CPU cycles needed by the storage system to find such data. This type of scenario can also lead to a quick drop in performance due to the amount of I/O operations and CPU cycles needed for this process.

As you will now realize, even though sessions are essential to the lifeblood of an application, a careful balance on storing session data at any given moment is essential to meeting sustainable performance and scalability results. Abuse managing data by a web framework and you will need vast amounts of memory to accommodate it, underutilize it and you will shift performance and scalability problems the your permanent storage system in the form of I/O and CPU demands.

The costs of performance and scalability go up as you add more personalization features to a web application. Since personalization requires knowing a user, this leads to managing a user's session data in either cookies, a web framework's in-memory facility or permanent storage.

Throughout the book, you will learn various approaches to managing sessions and reducing the burden they impose on an application's performance and scalability.

Asynchronous design

Communication in a system takes place in one of two forms: synchronous or asynchronous. The behavior of these two communication forms also makes them be called blocking and non-blocking, respectively. Figure 1-4 illustrates the process of synchronous and asynchronous communication.

Figure 1-4 - Synchronous and asynchronous communication

As you can see in figure 1-4, in synchronous communication a requesting party blocks itself from doing any other work until it has received a response for its request. In asynchronous communication, a requesting party can continue doing work without receiving a response for its request. As a result asynchronous communication allows a requesting party to do more work. In turn, increasing an application's throughput.

In topics related to web applications, asynchronous communication presents itself in many areas. These areas include web servers reading data asynchronously from hard drives, web browsers making data requests asynchronously (a.k.a. AJAX) to application APIs designed to do tasks asynchronously.

Not surprisingly, asynchronous communication is often associated with the topic of threads, since threads allow an application to do tasks simultaneously. With one thread continuing to work while the other waits for a response, this technically means a process is asynchronous since work isn't stopped. However, asynchronous communication is broader and doesn't necessarily need threads. AJAX designs and messaging middle-ware software (e.g. Websphere MQ, Tibco Rendezvous, Microsoft MQ ) are asynchronous in nature, yet don't need to use threads.

As you will learn in various parts of the book, asynchronous communication can take place in many places that are not directly associated with the use of threads. In addition, you will also learn how asynchronous communication can enhance an application's performance and scalability.

Decoupling

Decoupling is the act of separating or disconnecting. In software, you will often hear the terms 'tightly coupled' and 'loosely coupled', implying that its difficult or easy to do decoupling.

One of the most common scenarios involving decoupling occurs in object orientated systems, made up of classes. If two classes depend on one another directly, it's said that they are tightly coupled. This means that its difficult to separate design changes occurring in one class from affecting the other.

In order to ease the separation process, object orientated systems rely on interfaces. By using interfaces, classes shield themselves from depending on one another, making them loosely coupled. Figure 1-5 illustrates this process.

Figure 1-5 - Decoupling mechanism -- Classes and interfaces in OO systems.

This decoupling mechanism is not exclusive to the use of interface and classes in OO systems, but is widely used in many software areas to ease maintenance and upgrading tasks. In the process, making it easier to create sub-systems from a larger system.This ability to create sub-systems is an important factor to resolving performance and scalability issues. By having sub-systems, its possible to shift workloads across different application tiers more easily, dividing a tier's workload into separate operating systems.

If an application consists of loosely coupled components, you will find it easier to apply a 'divide and conquer' approach to solving an application's performance and scalability bottlenecks. Than if it consisted of a monolithic (i.e. tightly coupled) design.

As you explore other sections of the book, you will see the term decoupling applied to various areas ranging from web services to permanent storage (i.e. database) design.

Caching

Caching is the act of copying data to a place closer to where its required, as such, it's used as a mechanism to increase throughput and cut latency in web applications.

Caching is possible at many points through which a web application's data crosses. This ranges from CPU caching, permanent storage caching, business logic (i.e. web framework) caching, internet service provider caching to web browser caching. Each caching strategy is different in the way it's implemented and the way you as a web application designer can influence it.

For example, CPU's apply caching solely on the algorithms provided by a CPU manufacturer, there is no way to alter how this caching strategy works. The same case applies for Internet service provider caching, given it's a strategy used by third-party providers. Permanent storage, business logic and web browser caching are different, since they all fall within the realm of a web application's design.

The act of caching takes place in a cache, which is the place where a copy of data is held. Depending on the caching strategy, a cache can take the form physical memory (RAM), memory integrated into a CPU's dye or even a folder on a user's desktop in the case web browsers.

Irrespective of the caching strategy and cache type, the purpose of caching in all its forms is simple: to avoid investing resources getting data that hasn't changed at its origin. Occasionally and based on the caching strategy, it's determined that the data present in a cache is stale. This triggers a request to the data's origin for a newer version, which is once again cached, until it's again determined it has gone stale.

This ongoing process leads to a term widely used in caching: hit ratio. A hit ratio expresses the percentage of requests fulfilled by data present in a cache. A high hit ratio indicates effective caching, where as a low hit ratio indicates poor caching.

Caching hit ratios are influenced by three things: the caching strategy, the size of a cache and the type of data stored in a cache. I won't discuss the particularities of caching strategies since they are so varied, but generally speaking, a low hit ratio indicates a caching strategy requires reconfiguration. The size of a cache can also influence a cache's hit ratio, if a cache is too small it can lead to low hit ratios, since it will have a limited amount of space on which to store data. Another factor influencing cache hit ratios is the type of data being cached, if the data being cached is too volatile (e.g. real-time stock quotes) hit ratios can plummet since it becomes necessary to constantly get data from its origin.

Throughout the book you will find various discussions on caching. Including how to implement caching strategies on a web application's tiers, as well as the degree of effectiveness each one has given a certain set of circumstances.

Fault tolerance, replication and synchronization

Fault tolerance is a web application's ability to recover from an outage with minimal to no downtime. Since having an outage goes against the very principles of performance and scalability, contemplating fault tolerance is an important characteristic in any performance and scalability design.

Ensuring fault tolerance in a web application entails having identical working components, with one capable of absorbing the other's workload in case of failure, all within an acceptable time frame. Events requiring fault tolerance capabilities are generally caused by hardware failures or other external factors (e.g. security attacks) that can then ripple through a web application's tiers.

With few exceptions, hardware failures always disrupt applications. Exceptions could include something like a fan or a single CPU on a multi-CPU machine. However, failures in components like a motherboard or network card can immediately make a web application inaccessible. And in the case of components like hard drives or physical memory, this can be even more critical, since a web application's data always interacts directly with these type of components it can lead to data loss, in addition to a web application's outage.

To enable fault tolerance, you can use replication. Replication consists of creating identical working components that can range from a web application's web servers, user sessions or permanent storage systems, among other things, for either the purpose of providing fault tolerance or real-time parallel access.

Replication in the context of fault tolerance is similar to the act of making backups, which consist of copying a web application's data for historical or disaster recovery purposes. However, there is a subtle difference between the act of replication and backups that I will describe with an example next.

If a hard drive fails and its data is only in backups, re-establishing a system's data back to its original state can entail hours or days, in addition to the incurred downtime. This is due to the fact that backups are mostly saved to secondary storage (e.g. tape drives). In addition, most web application backups consist of raw data and not ancillary data -- such as the OS itself or a software suite like a permanent storage system. Therefore re-establishing this ancillary data can further extend a system's downtime.

If a hard drive fails, re-establishing a system to its original state with replicated data can either go unnoticed or take minutes at the most. This is due to the fact that replication involves having identical copies of the same data, including all -- if any -- ancillary data and hardware needed to restore an application to its original state. In essence, replication is a real-time -- often called 'hot' -- data backup.

Even though replication is a natural fit to ensure fault tolerance, replication also brings with it another characteristic that can serve a web application's performance and scalability: parallelism. In a previous section you learned about the concept of parallelism and how it consists of increasing an application's throughput by leveraging more than one access point to do tasks simultaneously.

Since using replication makes a web application's data available at more than a single location in real-time, it becomes an opportunity to leverage multiple access points in parallel, in turn increasing an application's data throughput.

Using replication though -- whether for fault tolerance or parallel purposes -- comes hand in hand with another important topic: synchronization. Synchronization ensures the components on which replication is undertaken are consistent with one another. Since replication is the act of having multiple points that can fulfill the same duties, it's vital that all replicated points -- to the extent possible -- are identical to the rest at any given moment.

Given this last behavior, replication and synchronization are always said to take place between a master and slave, where a slave depends on a master's information. Figure 1-6 illustrates this replication and synchronization process along with a master/slave architecture.

Figure 1-6 - Replication and synchronization - Master/Slave architecture

The master/slave architecture in figure 1-6 is a staple to replication and synchronization processes. As you can see, there is often a blurry line between what makes up a master and slave, since some master's have slaves but are also slave's in themselves since they need to synchronize with another master. This type of architecture is very common in performance and scalability designs, such as permanent storage clusters, web server farms and DNS load balancing, among other things.

The act of synchronizing replicated points is an expensive proposition if it's performed too often (e.g. a slave being updated every minute from a master), but by the same token, it can have adverse consequences if it's performed too little (e.g. a slave being updated every day from a master) since it can propagate stale information.

To mitigate replication and synchronization issues, a series of techniques are often used. One such technique is server affinity, that guarantees user requests are always handled by the same server (i.e. master or slave), thus ensuring consistency between multiple requests. Another technique often used in permanent storage systems consists of separating masters & slaves along the lines of 'readers' and 'writers'. This makes synchronizing 'readers' (which only read information) less prone to inconsistencies than 'writers' (which write information) whose information is constantly updated, simplifying the synchronization process.

As the book progresses, I will describe a series of technologies that rely heavily on the concepts of fault tolerance, replication and synchronization.

Distributed computing

Distributed computing is described as multiple systems interacting with one another to meet a common goal. The simplest example of distributed computing is best explained when you browse the Internet. Your browser -- representing one system -- interacts with a web site's server -- representing a second system -- to meet the common goal of sharing information.

Distributed computing scenarios can be much more elaborate than the one just outlined. When individual systems are incapable of accommodating either data or processing needs, they adopt a distributed nature to meet their common goal. For this reason, it's a common occurrence to solve performance and scalability problems using distributed computing. Given the nature of distributed computing, the term is closely associated with many of the earlier concepts.

For example, applying parallelism -- which is the act of executing simultaneous tasks by leveraging more than a single resource point -- fits perfectly with the concept of distributed computing, where by multiple systems interact with another to achieve a common goal.

Another case would be the relation to latency and throughput. Since distributed computing consists of multiple systems working among one another, latency and throughput issues between systems is critical to achieving their operation.

Yet another close tie is found to fault tolerance, replication and synchronization. Multiple systems working toward a common goal must communicate with one another. This can range from resolving issues like determining the data present in particular system, transferring data to multiple systems, to detecting if a certain system suddenly fails. For this reason, topics like fault-tolerance, replication and synchronization are closely intertwined with distributed computing.

But even though the purpose of distributed computing is as simple as achieving the same goal as an individual system, there are many complexities to designing applications based on distributed computing principles. In fact, there is a set of guiding principles known as The 8 fallacies of distributed computing which are the following:

The network is reliable
Latency is zero
Bandwidth is infinite
The network is secure
Topology doesn't change
There is one administrator
Transport cost is zero
The network is homogeneous

These guiding principles are fallacies because designers of distributed computing applications often assume they're true. In reality, if multiple systems are interacting with one another and you want to successfully achieve a common goal, you have to assume the antithesis of these principles: The network is not reliable, latency in never zero,...., the network won't be homogenous.

As the book progresses, you'll encounter several technologies that are based on the principles of distributed computing, many of which offer the leading edge in terms of performance and scalability for web applications.

« Table of Contents

A web application's limited resources »