Monday, March 17, 2008

Synchronization and Locks

How does synchronization work? With locks. Every object in Java has a built-in lock that only comes into play when the object has synchronized method code. When we enter a synchronized non-static method, we automatically acquire the lock associated with the current instance of the class whose code we're executing (the this instance). Acquiring a lock for an object: is also known as getting the lock, or locking the object, locking on the object, or synchronizing on the object. We may also use the term monitor to refer to the object whose lock we're acquiring. Technically the lock and the monitor are two different things, but most people talk about the two interchangeably, and we will too.

Since there is only one lock per object, if one thread has picked up the lock, no other thread can pick up the lock until the first thread releases (or returns) the lock. This means no other thread can enter the synchronized code (which means it can't enter any synchronized method of that object) until the lock has been released. Typically, releasing a lock means the thread holding the lock (in other words, the thread currently in the synchronized method) exits the synchronized method. At that point, the lock is free until some other thread enters a synchronized method on that object. Remember the following key points about locking and synchronization:

* Only methods (or blocks) can be synchronized, not variables or classes.

* Each object has just one lock.

* Not all methods in a class need to be synchronized. A class can have both synchronized and non-synchronized methods.

* If two threads are about to execute a synchronized method in a class, and both threads are using the same instance of the class to invoke the method, only one thread at a time will be able to execute the method. The other thread will need to wait until the first one finishes its method call. In other words, once a thread acquires the lock on an object, no other thread can enter any of the synchronized methods in that class (for that object).

* If a class has both synchronized and non-synchronized methods, multiple threads can still access the class's non-synchronized methods! If you have methods that don't access the data you're trying to protect, then you don't need to synchronize them. Synchronization can cause a hit in some cases (or even deadlock if used incorrectly), so you should be careful not to overuse it.

* If a thread goes to sleep, it holds any locks it has—it doesn't release them.

* A thread can acquire more than one lock. For example, a thread can enter a synchronized method, thus acquiring a lock, and then immediately invoke a synchronized method on a different object, thus acquiring that lock as well. As the stack unwinds, locks are released again. Also, if a thread acquires a lock and then attempts to call a synchronized method on that same object, no problem. The JVM knows that this thread already has the lock for this object, so the thread is free to call other synchronized methods on the same object, using the lock the thread already has.

* You can synchronize a block of code rather than a method.

Because synchronization does hurt concurrency, you don't want to synchronize any more code than is necessary to protect your data. So if the scope of a method is more than needed, you can reduce the scope of the synchronized part to something less than a full method—to just a block. We call this, strangely, a synchronized block, and it looks like this:

class SyncTest {
public void doStuff() {
System.out.println("not synchronized");
synchronized(this) {
System.out.println("synchronized");
}
}
}

When a thread is executing code from within a synchronized block, including any method code invoked from that synchronized block, the code is said to be executing in a synchronized context. The real question is, synchronized on what? Or, synchronized on which object's lock?

When you synchronize a method, the object used to invoke the method is the object whose lock must be acquired. But when you synchronize a block of code, you specify which object's lock you want to use as the lock, so you could, for example, use some third-party object as the lock for this piece of code. That gives you the ability to have more than one lock for code synchronization within a single object.

Or you can synchronize on the current instance (this) as in the code above. Since that's the same instance that synchronized methods lock on, it means that you could always replace a synchronized method with a non-synchronized method containing a synchronized block. In other words, this:

public synchronized void doStuff () {
System.out.println("synchronized");
}

is equivalent to this:

public void doStuff() {
synchronized(this) {
System.out.println("synchronized");
}
}

These methods both have the exact same effect, in practical terms. The compiled bytecodes may not be exactly the same for the two methods, but they could be—and any differences are not really important. The first form is shorter and more familiar to most people, but the second can be more flexible.

So What About Static Methods? Can They Be Synchronized?
static methods can be synchronized. There is only one copy of the static data you're trying to protect, so you only need one lock per class to synchronize static methods—a lock for the whole class. There is such a lock; every class loaded in Java has a corresponding instance of java.lang.Class representing that class. It's that java.lang.Class instance whose lock is used to protect the static methods of the class (if they're synchronized). There's nothing special you have to do to synchronize a static method:

public static synchronized int getCount() {
return count;
}


Again, this could be replaced with code that uses a synchronized block. If the method is defined in a class called MyClass, the equivalent code is as follows:

public static int getCount() {
synchronized(MyClass.class) {
return count;
}
}

Wait—what's that MyClass.class thing? That's called a class literal. It's a special feature in the Java language that tells the compiler (who tells the JVM): go and find me the instance of Class that represents the class called MyClass. You can also do this with the following code:

public static void classMethod() {
Class c1 = Class.forName("MyClass");
synchronized (cl) {
// do stuff
}
}


Here is a sample code to test "synchronize"

class InSync extends Thread {
StringBuffer letter;
public InSync(StringBuffer letter) { this.letter = letter; }
public void run() {
synchronized(letter) { // #1
for(int i = 1;i<=100;++i) System.out.print(letter); System.out.println(); char temp = letter.charAt(0); ++temp; // Increment the letter in StringBuffer: letter.setCharAt(0, temp); } // #2 } public static void main(String [] args) { StringBuffer sb = new StringBuffer("A"); new InSync(sb).start(); new InSync(sb).start(); new InSync(sb).start(); } }

Just for fun, try removing lines I and 2 then run the program again. It will be unsynchronized—watch what happens.

What Happens If a Thread can't Get the Lock?
If a thread tries to enter a synchronized method and the lock is already taken, the thread is said to be blocked on the object's lock. Essentially, the thread goes into a kind of pool for that particular object and has to sit there until the lock is released and the thread can again become runnable/running. Just because a lock is released doesn't mean any particular thread will get it. There might be three threads waiting for a single lock, for example, and there's no guarantee that the thread that has waited the longest will get the lock first.

When thinking about blocking, it's important to pay attention to which objects are being used for locking.

Threads calling non-static synchronized methods in the same class will only block each other if they're invoked using the same instance. That's because they each lock on this instance, and if they're called using two different instances, they get two locks, which do not interfere with each other.

Threads calling static synchronized methods in the same class will always block each other—they all lock on the same Class instance.

A static synchronized method and a non-static synchronized method will not block each other, ever. The static method locks on a Class instance while the non-static method locks on the this instance—these actions do not interfere with each other at all.

For synchronized blocks, you have to look at exactly what object has been used for locking. (What's inside the parentheses after the word synchronized?) Threads that synchronize on the same object will block each other. Threads that synchronize on different objects will not.

Table 9-1: Methods and Lock Status

Give Up Locks

Keep Locks

Class Defining the Method

wait ()

notify() (Although the thread will probably exit the synchronized code shortly after this call, and thus give up its locks.)

java.lang.Object


join()

java.lang.Thread


sleep()

java.lang.Thread


yield()

java.lang.Thread

So When Do I need To Synchronize?

When we use threads, we usually need to use some synchronization somewhere to make sure our methods don't interrupt each other at the wrong time and mess up our data. Generally, any time more than one thread is accessing mutable (changeable) data, you synchronize to protect that data, to make sure two threads aren't changing it at the same time (or that one isn't changing it at the same time the other is reading it, which is also confusing). You don't need to worry about local variables—each thread gets its own copy of a local variable. Two threads executing the same method at the same time will use different copies of the local variables, and they won't bother each other. However, you do need to worry about static and non-static fields (instance variables), if they contain data that can be changed (not final).

For changeable data in a non-static field, you usually use a non-static method to access it. By synchronizing that method, you will ensure that any threads trying to run that method using the same instance will be prevented from simultaneous access. But a thread working with a different instance will not be affected, because it's acquiring a lock on the other instance. That's what we want—threads working with the same data need to go one at a time, but threads working with different data can just ignore each other and run whenever they want to; it doesn't matter.

For changeable data in a static field, you usually use a static method to access it. And again, by synchronizing the method you ensure that any two threads trying to access the data will be prevented from simultaneous access, because both threads will have to acquire locks on the Class object for the class the static method's defined in. Again, that's what we want.

However—what if you have a non-static method that accesses a static field? Or a static method that accesses a non-static field (using an instance)? In these cases things start to get messy quickly, and there's a very good chance that things will not work the way you want. If you've got a static method accessing a non-static field, and you synchronize the method, you acquire a lock on the Class object. But what if there's another method that also accesses the non-static field, this time using a non-static method? It probably synchronizes on the current instance (this) instead. Remember that a static synchronized method and a non-static synchronized method will not block each other—they can run at the same time. Similarly, if you access a static field using a non-static method, two threads might invoke that method using two different this instances. Which means they won't block each other, because they use different locks. Which means two threads are simultaneously accessing the same static field—exactly the sort of thing we're trying to prevent.

It gets very confusing trying to imagine all the weird things that can happen here. To keep things simple: in order to make a class thread-safe, methods that access changeable fields need to be synchronized.

Access to static fields should be done from static synchronized methods. Access to non-static fields should be done from non-static synchronized methods. For example:

public class Thing {
private static int staticField;
private int nonstaticField;
public static synchronized int getStaticField() {
return staticField;
}
public static synchronized void setStaticField(int staticField) {
Thing.staticField = staticField;
}
public synchronized int getNonstaticField() {
return nonstaticField;
}
public synchronized void setNonstaticField(int nonstaticField) {
this.nonstaticField = nonstaticField;
}
}

What if you need to access both static and non-static fields in a method? Well, there are ways to do that, but it's beyond what you need for the exam. You will live a longer, happier life if you JUST DON'T DO IT. Really. Would we lie?

Thread-Safe Classes

When a class has been carefully synchronized to protect its data (using the rules just given, or using more complicated alternatives), we say the class is "thread-safe." Many classes in the Java APIs already use synchronization internally in order to make the class "thread-safe." For example, StringBuffer and StringBuilder are nearly identical classes, except that all the methods in StringBuffer are synchronized when necessary, while those in StringBuilder are not. Generally, this makes StringBuffer safe to use in a multithreaded environment, while StringBuilder is not. (In return, StringBuilder is a little bit faster because it doesn't bother synchronizing.) However, even when a class is "thread-safe," it is often dangerous to rely on these classes to provide the thread protection you need. (C'mon, the repeated quotes used around "thread-safe" had to be a clue, right?) You still need to think carefully about how you use these classes, As an example, consider the following class.

import java.util.*;
public class NameList {
private List names = Collections.synchronizedList(new LinkedList());
public void add(String name) {
names.add(name);
}
public String removeFirst() {
if (names.size() > 0)
return (String) names.remove(0);
else
return null;

}
}

The method collections.synchronizedList() returns a List whose methods are all synchronized and "thread-safe" according to the documentation (like a Vector—but since this is the 21st century, we're not going to use a Vector here). The question is, can the NameList class be used safely from multiple threads? It's tempting to think that yes, since the data in names is in a synchronized collection, the NameList class is "safe" too. However that's not the case—the removeFirst() may sometimes throw a NoSuchElementException. What's the problem? Doesn't it correctly check the size() of names before removing anything, to make sure there's something there? How could this code fail? Let's try to use NameList like this:

public static void main(String[] args) {
final NameList nl = new NameList();
nl.add("Ozymandias");
class NameDropper extends Thread {
public void run() {
String name = n1.removeFirst();
Systein.out.println(name);
}
}
Thread t1 = new NameDropper();
Thread t2 = new NameDropper();
t1.start();
t2.start();
}

What might happen here is that one of the threads will remove the one name and print it, then the other will try to remove a name and get null. If we think just about the calls to names.size() and names.get(0), they occur in this order:

  • Thread t1 executes names.size(), which returns 1.

  • Thread t1 executes names.remove(0), which returns Ozymandias.

  • Thread t2 executes names.size(), which returns 0.

  • Thread t2 does not call remove(0).

The output here is

Ozymandias
null

However, if we run the program again something different might happen:

  • Thread t1 executes names.size(), which returns 1.

  • Thread t2 executes names.size(), which returns 1.

  • Thread t1 executes names.remove(0), which returns Ozymandias.

  • Thread t2 executes names.remove(0), which throws an exception because the list is now empty.

The thing to realize here is that in a "thread-safe" class like the one returned by synchronizedList(), each individual method is synchronized. So names.size() is synchronized, and names.remove(0) is synchronized. But nothing prevents another thread from doing something else to the list in between those two calls. And that's where problems can happen.

There's a solution here: don't rely on Collections.synchronizedList(). Instead, synchronize the code yourself:

import java.util.*;
public class NameList {
private List names = new LinkedList();
public synchronized void add(String name) {
names.add(name);
}
public synchronized String removeFirst() {
if (names.size() > 0)
return (String) names.remove(0);
else
return null;
}
}

Now the entire removeFirst() method is synchronized, and once one thread starts it and calls names.size(), there's no way the other thread can cut in and steal the last name. The other thread will just have to wait until the first thread completes the removeFirst() method.

The moral here is that just because a class is described as "thread-safe" doesn't mean it is always thread-safe. If individual methods are synchronized, that may not be enough—you may be better off putting in synchronization at a higher level (i.e., put it in the block or method that calls the other methods). Once you do that, the original synchronization (in this case, the synchronization inside the object returned by Collections.synchronizedList()) may well become redundant.


No comments:

Post a Comment