Using weak references

Introduction

A weak reference is tightly tied to the garbage collector concept, meaning its only present in garbage collected languages such as C# or Java.

In order to clarify weak reference we have to address (in simple terms) what is the garbage collector and the basics of how its works.

Basics of garbage collector

Traditional imperative languages has always have to handle memory for reference types, that way, whenever we wanted to get a hold on a new class or an structure of memory we needed to allocate it, which in turn reserved the space in the heap.

Then the memory is marked as used up so whenever we ask for another chunk of memory we don't get the same location. Pretty simple.

Obviously, at some point, we will stop needing the first chunk of memory, so we will need to tell the system that that part of the heap is now available, that's what is known as freeing the memory. And that's a problem that has been bugging computers programs for more than three decades. If we forget to release that memory, then it never gets freed, and so we never "recover" it to be able to use it. With time (or a very fast loop) we end up consuming all available memory, which lead us to a performance hit and ultimately to a program crash.

The garbage collector provides a solution to that problem by taken care of handling the release of that memory.

Basically, it could be seen as a background process which is in charge of collecting the memory that can no longer be reached. For this, the garbage collector maintains an internal list of all objects and all reference to that objects so that, at any moment, he can know whether an object can or cannot be reached, and, if it can't, if no reference exists to that object, free the associated memory.

Strong references

Enough of the garbage collector introduction 1, to understand weak references we need to make a little introduction to strong references.

Weak references are another kind of reference versus the usual strong reference. An object having a strong reference to it is never collected by the garbage collector, neither are objects that can be reached by a chain of strong references.

This is the standard way of functioning, which keeps our objects alive until they cannot be used anymore (as they cannot be reached by the program anymore). So if we have:

MyClass classInstance = new MyClass();

we have a strong reference to the memory allocated to hold the class, the moment that reference gets out of reach, the garbage collector will be free to collect that memory.

Weak references

So what happens when we want a reference which doesn't hold to tight to the memory, what if we want to reference an object, that is, be able to access it without interfering with the ability of the garbage collector to collect it. We use a weak reference.

Weak reference allow us to reference an object but they are not considered in the chain of reference so that, if an object can only be reached through a weak reference, it can be disposed by the garbage collector.

A real example

Suppose we have a class that needs to keep track to a couple of data present in other classes.

For example, suppose we have Clients and Products mapped into memory with some sort of mechanism which caches them, keeping them in memory only to a certain number. For example we keep 5000 clients in memory but, if the system asks for the 5001 we select one of the previous 5000 and delete it.

Now, suppose that each product has a reference to the client that bought it so that, given a product you can easily access the product.

public class Client
{
  public UInt32 Id {get; set;}
  public string Name {get; set;}
  public string Surname {get; set;}
  public string Address {get; set;}
}

public class Product
{
  public UInt32 Id {get; set;}
  public string Name {get; set;}
  public double SellPrice {get; set;}
  public Client Owner {get; private set;}
}

public static class Clients
{
  private static Dictionary<string, Client> mClients;

  public static Client FindClient(string clientName)
  {
    // Search in the local list
    if (mClients.ContainsKey(clientName)
      return mClients[clientName];
    else
    {
      // If not found local, retrieve from db
      Client cli = GetClientFromDb(clientName);
     
      // Keep the size of our local data
      if (mClients.Count > MAX_CLIENTS)
        mClients.Remove(mClients.Keys[0]);

      mClients.Add(cli.Name, cli);      

      return cli;
    }
  }
}

So we have a class (Clients) which holds access to all the clients in the system and keeps them in an internal list, not letting that list grow above a certain point. Now, when we reach that limit, the first item in the list is removed and, in theory, freed.

Now, with the situation above, if any product has an owner defined, that client will never be collected, not as long as the product exists as there is a reference (a strong reference) between the product and the client, and therefore, although our list of clients will always hold a maximum value (5000 for example) the actual number of clients in memory can be much higher!

So, in order for the example to work correctly we need to transform the strong reference from the product to the client into a weak reference. Now, in .NET this would be something like:

public class Product
{
  private WeakReference mClient;

  public UInt32 Id {get; set;}
  public string Name {get; set;}
  public double SellPrice {get; set;}
  public Client Owner
  {
    get
    {
      if (mClient.IsAlive)
        return (Client)mClient.Target;
      else
        return null;
    }
    private set
    {
      mClient = new WeakReference(value);
    }
}

So now our product class holds a weakrefence to the client. In the getter we check to see if the object is still alive and just if the client is still there we return it to the caller.

This way, if the Garbage Collector claims the object the property will return null but if the Client is not claimed then we can access it. That means that there might still be more than 5000 clients in memory but in this case we are not getting in the way of the Garbage Collector.

Performance vs memory. The best of both worlds

Let's see another situation where weak reference might be useful.

Suppose we are retrieving some information from the Database in a somewhat costly operation (maybe several joins involved or a great quantity of objects). Let's say that operation brings up data that gets stored in several object taking 1 MB of memory. We use them for whatever it is and then we face a dilemma: do we keep them on memory and avoid asking for them the next time or do we save the memory and sacrifice the performance of having to query them again?

One posible solution is letting that decision to the instrument in charge of the memory, the Garbage Collector. We can declare our objects as weak reference so that, if the Garbage Collector really needs to free up some memory then we might "lose" part or all of our objects but if he doesn't we will still be able to use them. A perfect cache, it's only freed when the memory it uses up is actually needed.

  1. 1. Garbage collector is a very complex topic, which will give for a full article (maybe one day), for more detailed information, the wikipedia has a good article on the garbage collection process.
0
No votes yet
Your rating: None