Friday, February 15, 2008

ARM Blocks in Scala, Part 2

Update: Here's a better approach to Automatic Resource Management in Scala.

I didn't plan on writing a follow-up to my first post on ARM Blocks in Scala, but I think I will be able to improve upon some things. I don't plan on any future posts on this topic, but I will not hesitate to do another one if sufficient progress can being made.

Here's where we left off with Arm:


object Arm {
type CloseType = { def close() }

def manage(resources: CloseType*)(block: => Unit)
(implicit exceptionHandler: (Exception) => Unit) = {
try {
block
} finally {
resources.foreach( resource => {
try {
resource.close()
} catch {
case e: Exception =>
try {
exceptionHandler(e)
} catch {
case fatal: Throwable => fatal.printStackTrace() //last resort
}
}
})
}
}
}


Motivation

I chose to write a follow-up because of a couple of important omissions that I made in my previous post. The first and foremost is that of resource initialization (as "helium" pointed out). The problem is that resource initialization happens outside of the scope of the Arm.manage method. This violates the principle motivation for ARM Blocks by forcing the developer to correctly handle resource disposal in the case that one or more resources is initialized successfully and others are not.
For example:

val reader = new BufferedReader(new FileReader("test.txt"))
val writer = new BufferedWriter(new FileWriter("test_copy.txt"))

If reader initializes successfully and writer does not, reader needs to be closed. This is exactly the type of problem we were trying to avoid with ARM Blocks in the first place. I didn't have a solution for this problem at the time, but I forgot to mention it in my post - a mistake on my part.

Secondly, I didn't mention how an exception within the block of code itself should/would be handled. This means that one would have to wrap the entire call to Arm.manage in a try/catch block in order to handle any exceptions that may occur inside the block itself. Not ideal.

In this post, I hope to address these two concerns. The solution I present here is by no means the best solution, and I am confident it can be improved upon as well.

For starters, let's take care of the first issue, which is the easier of the two:

object Arm {
type CloseType = { def close() }

def manage(resources: CloseType*)(block: => Unit)
(implicit exceptionHandler: (Exception) => Unit) = {
try {
try {
block
} catch {
case e: Exception => exceptionHandler(e)
}
} finally {
resources.foreach( resource => {
try {
if (resource != null)
resource.close()
} catch {
case e: Exception =>
try {
exceptionHandler(e)
} catch {
case fatal: Throwable => fatal.printStackTrace() //last resort
}
}
})
}
}
}

Here we reuse the existing exception handler function. This is more convenient that declaring two different exception handlers, but has the issue of knowing if the exception being handled occurred upon closing or in the block itself. We will correct this after we address the second issue.

The issue of initialization is a more complicated one. The approach taken here is to define an initialization function, much like our exception handling function. It should be implicit as well in the case that resources have already been initialized and no initialization is necessary. Here's an attempt:

object Arm {
implicit def defaultInitializer() = {}
implicit def defaultExceptionHandler(e: Exception) = {}

type CloseType = { def close() }

def manage(resources: (() => CloseType)*)(block: => Unit)
(implicit initializer: () => Unit, exceptionHandler: (Exception) => Unit) = {
try {
try {
initializer()
block
} catch {
case e: Exception => exceptionHandler(e)
}
} finally {
resources.foreach( resource => {
try {
val value = resource()
if (value != null)
value.close()
} catch {
case e: Exception =>
try {
exceptionHandler(e)
} catch {
case fatal: Throwable => fatal.printStackTrace() //last resort
}
}
})
}
}
}

Here we define a default initializer as well as a default exception handler. Defining the exception handler is not really necessary because of the implicit identity function in Scala's Predef, but defining the default initializer may be helpful. In Arm.manage, the initialization and exception handling functions are included in the list of implicit parameters. Only one such list is allowed in the Scala language and it is required to be the last list of params. Also, if we are to use an initialization function, the resources themselves cannot be passed as arguments because it will be call-by-value (i.e. in Arm.manage, they will remain uninitialized). Instead, we use call-by-name and accept an arbitrary number of functions that return resources - remember that methods are functions in Scala. This is a necessary change, but adds a bit of clutter when it comes to usage. We will save an example until later because of a deficiency in this implementation.

In this version of Arm.manage, when an exception occurs and our handler is called, we have no idea if the exception happened in initialization, in the block of code, or when trying to close the resources.

Let's address this issue:

object Arm {
abstract class ManagementException(val cause: Exception) extends Exception(cause) {
override def getCause() = cause
}
class InitializationException(cause: Exception) extends ManagementException(cause)
class ExecutionException(cause: Exception) extends ManagementException(cause)
class ClosingException(cause: Exception) extends ManagementException(cause)

implicit def defaultInitializer() = {}
implicit def defaultExceptionHandler(e: ManagementException) = {}

type CloseType = { def close() }

def manage(resources: (() => CloseType)*)(block: => Unit)
(implicit initializer: () => Unit, exceptionHandler: (ManagementException) => Unit) = {
try {
var executeBlock = true

try {
initializer() //initialize the resources
} catch {
//forward exceptions to the handler
case e: Exception =>
executeBlock = false //do not continue if initialization fails
exceptionHandler(new InitializationException(e))
}

if (executeBlock) {
try {
block // execute the block
} catch {
//forward exceptions to the handler
case e: Exception => exceptionHandler(new ExecutionException(e))
}
}
} finally {
//close all of the resources properly
resources.foreach( resource => {
try {
val value = resource()
if (value != null)
value.close()
} catch {
case e: Exception =>
try {
exceptionHandler(new ClosingException(e))
} catch {
case fatal: Throwable => fatal.printStackTrace() //last resort
}
}
})
}
}
}

So now, our exception handler signature takes a ManagementException, which has the original exception as its cause - accessable by e.cause or e.getCause(). We know, based on the type of ManagementException, where it is coming from. Instead, we could define three different exception handling methods, but I think it's easier to use if we only have to define (at most) one exception handler.

Let's see an example:

var reader: BufferedReader = null
var writer: BufferedWriter = null

def getReader() = reader
def getWriter() = writer

def initResources() = {
reader = new BufferedReader(new FileReader("test.txt"))
writer = new BufferedWriter(new FileWriter("test_copy.txt"))
}

def handle(e: ManagementException) = {
e match {
case ie: InitializationException =>
println("Failed to initialize: " + ie.getCause())
case ee: ExecutionException =>
println("Could not copy files: " + ee.getCause())
case ce: ClosingException =>
println("Failed to close a resource: " + ce.getCause())
}
}

//copy a file, line by line
manage(getReader, getWriter) {
var line = reader.readLine
while (line != null) {
writer.write(line)
writer.newLine
line = reader.readLine
}
} (initResources, handle)

This one passes the methods for initialization and exception handling as explicit parameters.

Here's the same example with a different approach:

var reader: BufferedReader = null
var writer: BufferedWriter = null

implicit def initResources() = {
reader = new BufferedReader(new FileReader("test.txt"))
writer = new BufferedWriter(new FileWriter("test_copy.txt"))
}

implicit def handle(e: ManagementException) = {
e match {
case ie: InitializationException =>
println("Failed to initialize: " + ie.getCause())
case ee: ExecutionException =>
println("Could not copy files: " + ee.getCause())
case ce: ClosingException =>
println("Failed to close a resource: " + ce.getCause())
}
}

//copy a file, line by line
manage(() => reader, () => writer) {
var line = reader.readLine
while (line != null) {
writer.write(line)
writer.newLine
line = reader.readLine
}
}

This one uses implicit defs for our initialization and exception handling functions. For this to work, we cannot import the implicit defs of these functions from Arm because they would conflict. I should point out that if you provide one function explicitly, you have to provide the other explicitly as well. Also, this example defines getters for reader and writer "inline".

Room for Improvement

The new syntax is not nearly as easy to read as the original. This concession was necessary, though, for the sake of completeness. The problem of an exception thrown from the handler still remains from the first post, as well.

A couple of readers pointed me toward scalax, which provides a ManagedResource class which provides the same functionality using for-comprehensions. I took a look at it and it looks pretty well done, but I haven't yet had an opportunity to use it so I can't comment on how it works in practice. I would be interested to see an example of this which manages more than one resource.

9 comments:

  1. I'd say this:

    def manage(resources: (=> CloseType)*)...

    Then there's no need for "() =>", and Scala can automatically create closures for you. There's no need for the separate initialization either when using callbacks rather than direct objects. That is, you can say:

    manage(new FileInputStream(...), new FileOutputStream(...)) ...

    Scala will turn those into anonymous functions automatically, and they won't be evaluated until referenced in your manage function. So just make sure to keep track there of the ones you've referenced so you can dispose of them.

    I haven't tested this, but I'm pretty sure it would work.

    ReplyDelete
  2. I had the same thought, but the syntax you recommend does not compile. It says, "no by-name parameter type allowed here". I don't know why, but by-name params do not appear to work with varargs. Perhaps it's a bug - I will bring it up in the Scala mailing list.

    It's too bad really, because that would be a definite improvement. Thanks for mentioning it.

    ReplyDelete
  3. I'm getting a different error message (different version of Scala, maybe, and I'm just on whatever version of the Eclipse plugin), but I also can't seem to get the automatic functions with varargs to work. Please post your findings after asking the mailing list. And thanks for the review of all this topic meanwhile.

    ReplyDelete
  4. I found the discussion on the mailing list archives so far. Maybe worst case for now, you could write different versions with 1 parameter, 2, 3, and maybe 4, or so. It's a hack, but anyone putting too many resources in one block is likely abusing the system.

    Also, make sure to call the constructors in call the manage (rather than beforehand), or else it will be too late to close them as needed.

    Or maybe just one resource per block is easier to follow? Not sure. Thanks for checking into this.

    ReplyDelete
  5. For example, like this:

    &nbsp def manage[ResourceA <: {def close}, ResourceB <: {def close}](a: => ResourceA, b: => ResourceB)(action: (ResourceA, ResourceB) => Unit) {
    &nbsp&nbsp&nbsp manage(a) {a =>
    &nbsp&nbsp&nbsp&nbsp&nbsp manage(b) {b =>
    &nbsp&nbsp&nbsp&nbsp&nbsp&nbsp&nbsp action(a, b)
    &nbsp&nbsp&nbsp&nbsp&nbsp }
    &nbsp&nbsp&nbsp }
    &nbsp }

    ReplyDelete
  6. Ya, I'm leaning toward one resource per block for now. It's similar to how C# dealt with that problem if I'm not mistaken.

    I'll probably write a follow-up post that simplifies things.

    ReplyDelete
  7. Sorry to spam so much, but I have another comment. See this post on Scala ARM blocks. The key point here is the use of the 'for' keyword, which I've mostly ignored on purpose, but it allows for automatically nesting repeated blocks. Carsten Saager also discusses the initialization problem, but doesn't seem to address it. However, it works fine. I tried the "=> Type" syntax, and it worked like a charm for multiple resources. I'm tempted to put up a blog post myself sometime on it, but maybe I'll wait to see what you post first.

    ReplyDelete
  8. The technique that Carsten talks about in that post has been implemented in scalax (which he pointed out in the comments of my first post):
    ManagedResource

    I'm becoming more and more convinced that theirs is the right approach. It's more elegant than what I've ended up with.

    ReplyDelete