Let’s see if I am in that 10% or not…
As I’ve found myself using Ruby more and more lately, I thought I’d try to implement the binary search using that language.
Here’s my stab:
The script includes code to run the binary search through the test cases for the exercise created by Steve Witham. And once the file reading worked - all tests actually passed (or the code incorrectly shows them as passed… ;).
Good training this!
/M
]]>As part of the migration I wanted to extract all the old posts from Wordpress into the proper markdown format used by Jekyll.
Jekyll includes a migration script for Wordpress, but this used a direct database connection to the Wordpress MySQL server to extract a minimum amount of information for each blog post. To avoid missing information that I later might want after having shut down the Wordpress server, I wanted to base the migration on a full XML export from Wordpress. This way I could go back and extract more information from the old posts, if, and when, needed.
Seeing this as a chance to play with Ruby - a language I’ve only dabbled in previously - I set out to build a simple Wordpress XML->Jekyll converter. The converter reads the Wordpress export and converts all blog posts into separate files with a proper Yaml frontmatter. It also converts the syntax highlighting markup, code markup and headings into the corresponding markdown conventions. The rest of the HTML is left alone.
For the images I just copied the files in the old Wordpress structure into the new blog directory layout so that the URLs still match. Not the nicest way, but I was too lazy to do anything about it just yet.
Anywho - the source for the Wordpress XML converter is available in the blog branch of my fork of Octopress. The source is also given below. To use, just run with the filename of the Wordpress XML export as the first argument.
It’s my first publicly available Ruby program - so be careful. Improvements welcome!
/M
]]>Feels good with a server handling the C10K problem - and it has a fairly decent configuration language IMHO.
Also looking forward to use Nginx more in the days to come as a the web frontend for development tools server at work hosting, among some things:
Will post more about this later!
/M
]]>Finding the time and inspiration to write between the day job and private commitments turns out to be a challenge. Instead of seeing this as a failure, however, I will try to learn from the experience and attempt to find a new way of relating to the blog. Perhaps by being less long winded (!) in each post I could get a few more out there. There are still a lot of things going on I would like to capture, but the previous form is clearly not working.
So - let’s try to shake things up and see how that works!
/M
]]>This time we will finally get the Ioke source code, build it, and test the Ioke REPL!
Ioke uses Git as the source code versioning tool, with a public repository set up on GitHub (soon I’ll be linking like Jeff Atwood).
So - let’s get Ioke using Git by cloning Ola’s repository:
~/bin$ cd ~/work
~/work$ git clone git://github.com/olabini/ioke
Initialized empty Git repository in /home/melwin/work/ioke/.git/
remote: Counting objects: 9221, done.
remote: Compressing objects: 100% (2941/2941), done.
remote: Total 9221 (delta 5782), reused 8656 (delta 5351)
Receiving objects: 100% (9221/9221), 45.95 MiB | 399 KiB/s, done.
Resolving deltas: 100% (5782/5782), done.
Great - let’s quickly move on to the build step:
~/work$ cd ioke
~/work/ioke$ ant
...
[java] 2606 examples, 0 failures
jar:
[jar] Building jar: /home/melwin/work/ioke/lib/ioke.jar
BUILD SUCCESSFUL
Total time: 22 seconds
Running just ant
will execute the default target in the build.xml
file. This target will make sure Ioke is compiled and the test suite is run.
At the end is a summary of the test results - if these are not all successful you might have a problem with the environment or you might have checked out a broken build from the repository.
Now all is installed and Ioke is compiled, so let’s fire up the REPL. This is done by just running the main Ioke binary:
~/work/ioke/$ bin/ioke
iik>
When doing this we get the iik>
REPL prompt. From here we can execute most Ioke code. Let’s try to print a string:
iik> "Hello World!" println
Hello World!
+> nil
iik>
Here we create a Text (Ioke’s string type) literal and send it the message println
, which will cause the text to be printed to the console. The return type of println
is nil
(similar to null), which is printed by the REPL itself. After that we get the prompt back and can execute another line.
Let’s try a little more complex example - the factorial:
iik> fact = method(n, if(n > 1, n * fact(n pred), 1))
+> fact:method(n, if(n >(1), n *(fact(n pred)), 1))
Note how the REPL prints the method definition in a canonical form - it makes it obvious how Ioke parses the code, what are messages and what are arguments.
Let’s run our new factorial function:
iik> fact(3)
+> 6
iik> fact(10)
+> 3628800
iik> fact(50)
+> 30414093201713378043612608166064768844377641568960512000000000000
Ioke handles arbitrarily big numbers, so we don’t need to think about what fits in a certain number of bits.
The Iik REPL also comes with a simple debugger, which helps with handling conditions. For instance, let’s try to run the fact method, passing a message which doesn’t mean anything (yet):
iik> fact(f)
*** - couldn't find cell 'f' on 'Ground_0xC360E7' (Condition Error NoSuchCell)
f [<init>:1:5]
The following restarts are available:
0: storeValue (Store value for: f)
1: useValue (Use value for: f)
2: abort (restart: abort)
3: quit (restart: quit)
dbg:1> 1
dbg:1:newValue> 5
+> 5
+> 120
The above output shows that when we try to reference something that doesn’t exist we get a chance to provide that value. Here I decided to provide a value using the useValue restart and entered the value 5. This let the method continue executing and the result was printed.
To exit the REPL, just type: exit
Next up are the Emacs modes for Ioke - stay tuned!
/M
]]>It took a bit longer than anticipated to get to this point, as I got side tracked with the Ioke type checking activities. But now when Ioke S is out we can continue with the series. So! We’re getting closer… Only a few more things to go.
As the current version of Ioke is implemented to run on a Java Virtual Machine, we need to get hold of one of those to be able to continue. The JVM brand we will install here is the latest stable one produced by Sun.
What we need is Java SE Development Kit 6u11 for Linux, Multi-language, and the the filename for this is jdk-6u11-linux-i586.bin
. Note that this is not the .rpm.bin
, which is also available.
Download the file from http://java.sun.com/javase/downloads/index.jsp (I’m sure you can find it) and put it in the ~/work
directory.
To unpack the file, let’s make it executable and then run it:
~$ cd ~/work
~/work$ cd ~/work
~/work$ chmod a+x jdk-6u11-linux-i586.bin
~/work$ ./jdk-6u11-linux-i586.bin
... lots of legalese here...
Once the license agreement is accepted, the JVM will unpack itself into the directory ~/work/jdk1.6.0_11
. Let’s move it to where we want it and add java
and javac
to the ~/bin
directory as before:
~/work$ mv jdk1.6.0_11 ../opt
~/work$ cd ~/bin
~/bin$ ln -s ../opt/jdk1.6.0_11/bin/java
~/bin$ ln -s ../opt/jdk1.6.0_11/bin/javac
And test it:
~/bin$ java -version
java version "1.6.0_11"
Java(TM) SE Runtime Environment (build 1.6.0_11-b03)
Java HotSpot(TM) Client VM (build 11.0-b16, mixed mode, sharing)
That done - let’s move on to Ant.
Ioke uses Apache Ant to handle the build process, so to build Ioke we need to get Ant installed.
Let’s download the Ant distributable package and unpack it in one go - as we did with Git:
~/bin$ cd ~/opt
~/opt$ wget -O - http://www.apache.org/dist/ant/binaries/apache-ant-1.7.1-bin.tar.bz2 | tar xjv
</lang>
And add it to the ~/bin directory again so it's on the path:
<pre lang="sh">
~/opt$ cd ~/bin
~/bin$ ln -s ../opt/apache-ant-1.7.1/bin/ant
Let’s try it out:
~/bin$ ant -version
Apache Ant version 1.7.1 compiled on June 27 2008
Looking good!
Soon to come: How to get the Ioke source and build it.
/M
Update: Part 5: Ioke and the REPL
]]>The Java methods usually operate on the specific data contained within Ioke objects. This data corresponds to different Java classes, depending on what the Ioke object should hold. For instance, an Ioke List contains inside the data area a Java List. To operate on this List the Java code needs to get hold of the List reference - and it does this in a lot of cases by assuming the data object is of a certain type and casts to this type. If the data object is of a different type then a Java ClassCastException
is thrown, which makes the interpreter quit.
Since Ola mailed the call to arms a number of things have happened. A few people (including myself) have volunteered to help out with adding the validation code. Additionally, a few supporting methods and additions have been added to the Ioke code base to simplify adding type checks to the existing code. And it’s these additions that I will talk about here.
As an example what the new type validations can look like I will use the List +
method that Ola also mentioned in the example.
This is the previous code from IokeList.java:
obj.registerMethod(runtime.newJavaMethod("returns a new list that contains the receivers elements and the elements of the list sent in as the argument.", new JavaMethod("+") {
private final DefaultArgumentsDefinition ARGUMENTS = DefaultArgumentsDefinition
.builder()
.withRequiredPositional("otherList")
.getArguments();
@Override
public DefaultArgumentsDefinition getArguments() {
return ARGUMENTS;
}
@Override
public Object activate(IokeObject method, IokeObject context, IokeObject message, Object on) throws ControlFlow {
List<Object> args = new ArrayList<Object>();
getArguments().getEvaluatedArguments(context, message, on, args, new HashMap<String, Object>());
List<Object> newList = new ArrayList<Object>();
newList.addAll(((IokeList)IokeObject.data(on)).getList());
newList.addAll(((IokeList)IokeObject.data(args.get(0))).getList());
return context.runtime.newList(newList, IokeObject.as(on));
}
}));
The key to the new validation are two new classes, which extend the functionality of the normal JavaMethod
class with type checking functionality. These are:
When using these to define the Java method we can “annotate” the arguments definition with types for both the arguments and the receiver. By then implementing the appropriate activate(...)
the super class takes care of evaluating and validating (converting as appropriate) the receiver and arguments.
For instance, for the List +
example, the above code gets changed to the following:
obj.registerMethod(runtime.newJavaMethod("returns a new list that contains the receivers elements and the elements of the list sent in as the argument.", new TypeCheckingJavaMethod("+") {
private final TypeCheckingArgumentsDefinition ARGUMENTS = TypeCheckingArgumentsDefinition
.builder()
.receiverMustMimic(runtime.list)
.withRequiredPositional("otherList").whichMustMimic(runtime.list)
.getArguments();
@Override
public TypeCheckingArgumentsDefinition getArguments() {
return ARGUMENTS;
}
@Override
public Object activate(IokeObject self, Object on, List<Object> args, Map<String, Object> keywords, IokeObject context, IokeObject message) throws ControlFlow {
List<Object> newList = new ArrayList<Object>();
newList.addAll(((IokeList)IokeObject.data(on)).getList());
newList.addAll(((IokeList)IokeObject.data(args.get(0))).getList());
return context.runtime.newList(newList, IokeObject.as(on));
}
}));
Note that most of the boiler plate code for arguments handling (which most JavaMethods do) is removed, leaving a clean implementation of the necessary logic - to concatenate two lists in this case.
A few things are done:
The relevant tests for this, which should be added to the list_spec.ik
test file, could look like:
describe(List,
describe("+",
it("should validate type of receiver",
x = Origin mimic
x cell("+") = List cell("+")
fn(x + [3]) should signal(Condition Error Type IncorrectType)
)
it("should validate type of argument",
fn([1,2,3] + 3) should signal(Condition Error Type IncorrectType)
)
)
)
So for you to do:
Let’s get crackin’!
(but first - back to normal work… :)
/M
]]>The emacs-starter-kit is a set of base configuration for Emacs. It contains a number of useful elisp libraries, with a slight focus on dynamic languages.
To install it, perform the following steps (note that we move any existing Emacs configuration out of the way first to avoid stomping what you currently have):
~/work/emacs$ cd
~$ mv .emacs.d .emacs.d.old
~$ mv .emacsrc .emacsrc.old
~$ git clone git://github.com/technomancy/emacs-starter-kit.git .emacs.d
If you now start Emacs again you’ll see that the menu bar and the toolbar is gone. This is the default in emacs-starter-kit as most Emacs users don’t find them useful. For new users the menu bar can sometimes come in handy, to get it back temporarily, just press F1
.
If you want to add your own customizations to Emacs when using emacs-starter-kit, just add an Emacs LISP file called username.el
, or hostname.el
, in the ~/.emacs.d
directory. For instance, to make the menu bar always visible:
C-x C-f
and type in: ~/.emacs.d/username.el
melwin.el
). (menu-bar-mode 1)
And save the file with C-x s
.
C-x c
) and restart and you’ll see that the menu bar is shown.emacs-starter-kit includes, among many other things, the very nice Magit Git mode for Emacs, which gives you a nice interface for working with a Git repository.
Let’s use this mode to commit our recent changes to the configuration file to our local clone of the emacs-starter-kit
repository. This helps us track changes we make and also makes a backup of the file in case we screw (sorry, mess) something up.
Note 1: To move easily between Emacs windows using the keyboard, just press Shift
and the arrow key pointing in the direction you want to move.
Note 2: To only show the current Emacs window - press: C-x 1
That’s it - now the change has been committed to the local clone of the emacs-starter-kit Git repository.
To see the log of all commits, press l
(lowercase L) in the magit-status
buffer:
To look at a certain commit - just press Enter
on it and a view of the diff will be shown. This makes it quite easy to browse through commits in a repository. At the top of the log is the most recent commit, which in this case is the file we just added.
To update emacs-starter-kit with the latest changes in the GitHub repository, just press F
in the magit-status
buffer or run git pull
in the ~/.emacs.d
directory.
Now that we have Git and Emacs set up we can finally move to Ioke. In the next post we’ll go through installing the latest Java JDK and getting and compiling the Ioke source code.
Join me then!
/M
Update: Part 4: Java and Ant
]]>In this post we will install GNU Emacs from source and also get a basic configuration set up using the emacs-starter-kit.
If you already have Emacs installed and have an old configuration laying around you probably want to make a backup of this before following the below instructions. This post assumes that there is no Emacs installed and that the user doesn’t have any Emacs configuration in the home directory.
Git is an efficient Distributed Version Control System. Although the primary VCS for Emacs is still CVS - and it looks like they are moving towards Bazaar - we’ll get the Emacs sources from the Emacs Git mirror. We want to be cool, right?
~$ cd work
~/work$ git clone --depth 1 git://git.sv.gnu.org/emacs.git
Initialized empty Git repository in /home/melwin/work/emacs/.git/
remote: Counting objects: 46400, done.
remote: Compressing objects: 100% (24410/24410), done.
remote: Total 46400 (delta 41204), reused 25501 (delta 21836)
Receiving objects: 100% (46400/46400), 74.38 MiB | 213 KiB/s, done.
Resolving deltas: 100% (41204/41204), done.
Checking out files: 100% (2837/2837), done.
With the --depth 1
parameter we limit the history so that we only get the latest version of the files - in this case we’re not interested in the full history.
Note: As we’re getting the bleeding edge source code of Emacs, it could happen that the build is broken. I’ve never had this happen on me, but in case you get strange errors when building Emacs, this might be the reason. Usually such problems are fixed quickly, so try to do a git pull
a bit later to update the downloaded source.
Previously, to get fun things like multi-tty support and smooth fonts, we had to get specific feature branches of Emacs. Nowadays, however, all the things we want are merged into the Emacs master branch. Before building, the only thing we need to do is to make sure the necessary development libraries are installed. In case you wonder how I came up with this list: the good ol’ method of trial and error. I simply ran the configure
command and checked the error messages. This helped me identify the needed libraries. Now you can reap the benefits by just doing the following:
~/work$ sudo apt-get install libgtk2.0-dev libxpm-dev libjpeg-dev libgif-dev libtiff-dev
...
… and then run the classical configure
, make
and make install
:
~/work$ cd emacs
~/work/emacs$ ./configure --prefix=$HOME/opt/emacs-23.0.60
...
~/work/emacs$ make
...
~/work/emacs$ make install
...
Phew. That burned som CPU cycles…! Once done - let’s add it to our bin
directory, just as we did with Git.
~/work/emacs$ cd ~/bin
~/bin$ ln -s ../opt/emacs-23.0.60/bin/emacs
~/bin$ ln -s ../opt/emacs-23.0.60/bin/emacsclient
~/bin$ ls
emacs emacsclient git
Time to test. Just run emacs:
~/bin% emacs
This should show you an Emacs window with a pretty(?) GNU.
If you’re completely new to Emacs, this might be a good opportunity to run the Emacs tutorial. Emacs is a self-documenting editor, which means that most things that you might want to learn about Emacs can be found inside the editor itself - including information about internal functions and libraries.
To run the tutorial, just press Ctrl+h t
(that is, control
and h
, release control, then press t
), or, in Emacs lingo, C-h t
(which is the convention I’ll use from now on).
Once you’ve learned enough (you did complete the whole thing, right?), just quit using C-x c
.
Note: To update the source code and rebuild, do a git pull
in the emacs
directory, then make distclean
(in case some build files changed) and then perform the compile and installation as per the above again.
Next up is emacs-starter-kit
to get more functionality in Emacs and a decent default configuration to help us get going - stay tuned!
/M
Update: Part 3: emacs-starter-kit
]]>The components we will be installing are:
These instructions are written for a non-expert who might not have too much experience with compiling things, using Git or Emacs. If you’re an expert you will find nothing new or exciting here! The environment I use is a freshly installed Kubuntu Intrepid Ibex box, although most of it should be similar, if not identical, to other Ubuntu/Debian based distributions.
The components listed above will all be installed manually - not using the package system. This is to allow us to get the freshest versions of what we need without depending on the packages in the distribution to be updated (or hunted down in alternative repositories). The only thing we will install from the distribution are the compilers, tools and development libraries used to compile the software.
Why do we not use the package system? Well, that’s a valid question. Most of the above can be installed from packages in custom repositories. It’s simple and convenient. However here we will not use if for two reasons:
To keep control over our custom built software we’ll install it in dedicated directories under $HOME/opt
instead of in the normal locations like /usr/bin
. This allows us to get everything installed without needing root access (except to install the compilers etc from the distribution).
Always sound advice to start with - especially if you are hitchhiking. I’m going to try to cover each step in detail, but if you think something is unclear - just leave a comment and I’ll try to clarify. So - grab your towel and let’s get going!
For Emacs, emacs-starter-kit and Ioke we will use Git to retrieve the sources. Therefore the first thing we need to install is Git.
In the command line instructions below, all lines prefixed by the prompt <dir>$
are commands to be typed in. The rest are output or my comments.
Open a terminal window and create a new work directory under your home using the following instructions. In this directory we will work with downloaded source files and build the software.
~$ mkdir ~/work
~/work$ cd ~/work
~/work$ wget -O - http://kernel.org/pub/software/scm/git/git-1.6.1.tar.bz2 | tar xjv
... many lines...
The last command downloads the git source package and expands it in one go by piping it directly from wget
to tar
, which is told to expand it using bz2 with the j
parameter. This is a convenient way to get packages off the net and unpacked, especially when there is no need to keep the package itself around.
The result is that we now have a git-1.6.1
directory in the work directory. Let’s see how we compile it:
~/work$ cd git-1.6.1/
~/work/git-1.6.1$ head INSTALL
Git installation
Normally you can just do "make" followed by "make install", and that
will install the git programs in your own ~/bin/ directory. If you want
to do a global install, you can do
$ make prefix=/usr all doc info ;# as yourself
# make prefix=/usr install install-doc install-html install-info ;# as root
~/work/git-1.6.1$
So - two commands. Simple enough! However, before we compile - we need to make sure all the relevant packages are installed:
~/work/git-1.6.1$ sudo apt-get install libcurl4-openssl-dev zlib1g-dev libexpat-dev tk8.5 asciidoc docbook2x
Depending on your internet connection, this could take quite a while, as we need to download the texlive
distribution, among other things, to build all of the Git documentation. This is not strictly necessary, but here we’ll just do it for completeness’ sake.
Let’s build git and make sure it’s installed under our $HOME/opt
directory as we said in the beginning:
~/work/git-1.6.1$ make prefix=~/opt/git-1.6.1 all doc info
... lots of lines...
Compiling Git will take some time as well. Go get a coffee (or perhaps a Pan Galactic Gargle Blaster - sweet like nectar).
Once the compile is done - install it. Note that we don’t need to do this as root
as we’re installing under the user home directory:
~/work/git-1.6.1$ make prefix=~/opt/git-1.6.1 install install-doc install-html install-info
Let’s add it to the user’s path by linking it into the private bin
directory. This depends on the standard Ubuntu bash
shell profile script which adds ~/bin
to the PATH
variable. If a different shell is used you need to perform the appropriate steps yourself.
~/work/git-1.6.1$ mkdir ~/bin
~/work/git-1.6.1$ cd !$
~/bin$ ln -s ../opt/git-1.6.1/bin/git
Now close the shell/terminal, open a new one - and try the git
command:
~$ git --version
git version 1.6.1
If you get the above output - great! You’re don! Sit back and relax for a bit before moving on to the next section.
However, if you don’t get the version output, but instead see the following message, review the previous instructions and make sure it works ok before continuing.
#INCORRECT OUTPUT - SOMETHING WAS MISSED!
#GO BACK AND REVIEW
~$ git
The program 'git' is currently not installed. You can install it by typing:
sudo apt-get install git-core
-bash: git: command not found
If you still can’t get it to work - drop me a comment!
There are loads of good Git tutorials and information. You can find several on the official Git page and at GitHub (go sign up if you haven’t already, and fork me!).
Here is a quick run through of a few common Git commands you could try out with your freshly brewed cup of Git:
~$ cd
~$ mkdir gittest
~$ cd gittest/
~/gittest$ git init
Initialized empty Git repository in /home/melwin/gittest/.git/
~/gittest$ echo Test file! > test.txt
~/gittest$ git add test.txt
~/gittest$ git commit -m "Initial import."
[master (root-commit)]: created 79c091d: "Initial import."
1 files changed, 1 insertions(+), 0 deletions(-)
create mode 100644 test.txt
~/gittest$ echo Add line. >> test.txt
~/gittest$ git diff test.txt
diff --git a/test.txt b/test.txt
index 1cbaf90..3746f9e 100644
--- a/test.txt
+++ b/test.txt
@@ -1 +1,2 @@
Test file!
+Add line.
~/gittest$ git commit -a -m "Add new line."
[master]: created b20f9a4: "Add new line."
1 files changed, 1 insertions(+), 0 deletions(-)
If you can follow the above - great! First part finished. Next up is to install GNU Emacs from source and the emacs-starter-kit, which provides a decent default a set of configuration for Emacs.
Stay tuned!
/M
Update: Part 2: Emacs
]]>Ioke is a general purpose language. It is a strongly typed, extremely dynamic, prototype object oriented language. It is homoiconic and it’s closest ancestors is Io, Smalltalk, Ruby and Lisp - but it’s quite a distance from all of them.
So what does this all mean? It means it’s fun! The language syntax is very regular (although not quite as regular as lisp) - code is data, data is code and everything is a message. It’s also one of relatively few languages that provide macros similar to those in lisp. In Ioke this means that a message stream can be operated on (modified, transformed, etc) before it’s evaluated, using normal Ioke methods. Anyone who groks lisp should be familiar with the power this gives.
In Ioke, a chain of messages is separated by space, with the first message being sent to the current receiver and subsequent messages are sent to the result of the previous one. For instance, in the code
"ioke rocks" upper println
the text literal “ioke rocks” results in an internal message which creates a Text object. To this object the upper
messages is sent, which results in an upper case copy of the Text object is created. To this object the println
message is sent, which prints the upper case text to the console.
As the Ioke reader handles space separated tokens, it’s quite easy to create macros to process non Ioke code. As a learning exercise I decided to try to implement a JSON parser. Nothing too fancy - and doesn’t even have to be secure - just enough to parse simple JSON into Ioke objects.
JSON is usually parsed in one of two ways - into custom objects/classes or into collections like dictionaries and arrays. For this exercise I decided to create Ioke dictionaries and arrays and I want the result to work like this:
json({
"string" : "string1",
"int" : 1234,
"arr" : ["item1", "item2"],
"dict" : {"key1":"value1"}
}) println ;; => {dict={key1=value1}, arr=[item1, item2], int=1234, string=string1}
json(["string",
1234,
["item1", "item2"],
{"key1": "val1"}
]) println ;; => [string, 1234, [item1, item2], {key1=val1}]
Now, looking at JSON, it’s suspiciously similar to Ioke - [] is used for arrays, {} is used for “dictionaries”, comma separates entries, ” quotes strings, etc. The main problem is the colon “:” used for separating keys and values in a dictionary. As Ioke is a dynamic language we can redefine how messages are handled. Instead of using colon to define symbols, we can redefine it to instead create pairs suitable for a dict:
Text cell(":") = macro(call resendToMethod("=>"))
”:” is originally defined on DefaultBehavior Literals, but in the JSON the “:” message is always sent to Text, so it’s enough to redefine it there.
This single redefinition allows us to parse JSON directly with the Ioke reader - very nice.
But… Redefining “:” for all Text objects is not really good (especially not when Ola adds concurrency constructs!). Another alternative would be to redefine how the text objects are created and create a new “:” cell just for our JSON text objects - but I never got this to work, as the internal:createText message works with a raw Java string, which I couldn’t work with in a macro/method…
Update: With Ola’s recent commit, the previously described workaround can now be simplified to the following code.
Now the above redefinition of the “:” message can be used, but is visible only in the scope of a let
:
json = macro(let(Text cell(":"), DefaultBehavior cell("=>"),
call argAt(0)
))
;; Used as:
json({
"string" : "string1",
"int" : 1234,
"arr" : ["item1", "item2"],
"dict" : {"key1":"value1"}
}) println
json(["string",
1234,
["item1", "item2"],
{"key1": "val1"}
]) println
As can be seen, the Ioke macro functionality gives us nice control over how messages are handled, and the environment can be changed before the arguments are evaluated. I’m looking forward to the day when we have full Java interop and can use these powerful constructs to control Java code!
Source code to the above example is committed to my Ioke fork at: git://github.com/melwin/ioke.git
/M
PS: The thing that confuses me the most: how the heck are you supposed to pronounce Ioke?
Update: Sam Aaron shared the fact that Ola pronounces it eye-oh-key in the following interview: http://www.akitaonrails.com/2008/11/22/rails-podcast-brasil-qcon-special-ola-bini-jruby-ioke
]]>http://github.com/melwin/ioke/commit/54cfd7e54de0be910385c6ec805693fd3ed4e294
The keyword definitions I borrowed (read stole) from Sam Aaron’s TextMate bundle (also in the Ioke repository). As Sam just said in the #ioke on freenode: “what goes around comes around”. :)
Highlighting test:
m = #/({areaCode}\d{3})-({localNumber}\d{5})/ =~ number
describe("start",
it("should return the start index of group zero, which is the whole group",
(#/foo/ =~ "foobar") start should == 0
(#/foo/ =~ "abcfoobar") start should == 3
(#/foo/ =~ "foobar") start(0) should == 0
(#/foo/ =~ "abcfoobar") start(0) should == 3
)
it("should return the start index of another group",
(#/(..) (..) (..)/ =~ "fooab cd efbar") start(2) should == 6
)
it("should return the start index from the name of a named group",
(#/({one}..) ({two}..) ({three}..)/ =~ "fooab cd efbar") start(:two) should == 6
)
it("should return -1 for a group that wasn't matched",
(#/(..)((..))?/ =~ "ab") start(2) should == -1
(#/({no}..)(({way}..))?/ =~ "ab") start(:way) should == -1
(#/(..)((..))?/ =~ "ab") start(10) should == -1
(#/({no}..)(({way}..))?/ =~ "ab") start(:blarg) should == -1
)
it("should validate type of receiver",
Regexp Match should checkReceiverTypeOn(:start)
)
)
x = #/bla #{"foo"} bar/
/M
]]>For instance, consider the following method:
public Collection getByFilter1(String cacheName, Filter f) {
NamedCache c = CacheFactory.getCache(cacheName);
return c.entrySet(f);
}
A query is executed across all nodes containing cache data in a cluster. The filter acts on the values in the cache, not the keys. And as all values are usually not available locally on every node the query needs to execute on the separate nodes.
The query above is correct, but not very efficient. In the above case we query and retrieve the data as well from the separate nodes. A more efficient way would be to only get the keys for the matching values from the nodes and then retrieve the values from the local near cache:
public Collection getByFilter2(String cacheName, Filter f) {
NamedCache c = CacheFactory.getCache(cacheName);
Set keys = c.keySet(f);
return c.getAll(keys).values();
}
However, in this case we still perform the query every time.
Now - the clever part. Not so much on my side, but the Coherence engineers have thought things through and made the Filter implementations have good hashCode and equals implementations. This together with the fact that they are serializable makes them possible to use as keys in a cache! Sweet! Without changing our method’s interface we can add a query cache so that each query only is performed once.
public Collection getByFilter3(String cacheName, Filter f) {
NamedCache c = CacheFactory.getCache(cacheName);
NamedCache queryC = CacheFactory.getCache(cacheName + ".querycache");
Set keys = (Set)queryC.get(f);
if(keys == null) {
keys = c.keySet(f);
queryC.put(f, keys);
}
return c.getAll(keys).values();
}
Note that we only save the keys in the query cache. This to avoid having several caches with the same data. When near caching is used, getting the data for the keys can still be a local only operation. Compared to getting the data from the separate nodes it’s several orders of magnitude faster - depending on the usage patterns.
Of course if queries are different every time, the query cache will not help much. But in most high load applications the same data tends to be needed several times.
Additionally, the properties queried on should in most cases be indexed. This is important to avoid too much overhead when searching for an entry in a cache.
One thing to think about when adding query caches is: how is data updated?
As part of updating the value caches the query caches should preferably be emptied of the outdated cached queries. This could be done programmatically “manually” when the data is updated, or by hooking in code to clean the entries from the query cache using the map listener mechanism. The relevant code could do a query using the ContainsFilter.
To summarize: a little code can go a long way to improve performance without affecting the interface used by an application. Good when query heavy applications are adapted to use a distributed cache like Coherence.
/M
]]>Instead I tried the WP-Syntax plugin, but this didn’t support Symbol literals:
val sym = 'foobar
println("Symbol is: " + sym)
val other = 'barfoo
This seems to be because of single quote being interpreted as a quotation character. A fix seems to be removing the single quote from the Geshi language definition file scala.php
in the wp-syntax/geshi/geshi
directory. While there one can also specify a regex for symbols and give them their own color:
val sym = 'foobar
println("Symbol is: " + sym)
val other = 'barfoo
Also check out the scala.php
in the LAMP repository - it seems to add some additional keywords and other colors. I used this and made the changes described above.
Full new scala.php
for Geshi after the break.
<?php
/*************************************************************************************
* scala.php
* --------
* Author: Geoffrey Washburn (washburn@acm.ogr)
* Copyright: (c) 2004 Nigel McNie (http://qbnz.com/highlighter/)
* Release Version: ???
* Date Started: 2008/01/03
*
* Scala language file for GeSHi.
*
* CHANGES
* -------
* 2007/01/03
* - Created by copying the Java highlighter
*
* TODO
* -------------------------
* * Finish
*
*************************************************************************************
*
* This file is part of GeSHi.
*
* GeSHi is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* GeSHi is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with GeSHi; if not, write to the Free Software
* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
*
************************************************************************************/
$language_data = array (
'LANG_NAME' => 'Scala',
'COMMENT_SINGLE' => array(1 => '//'), /* import statements are not comments! */
'COMMENT_MULTI' => array('/*' => '*/'),
'CASE_KEYWORDS' => GESHI_CAPS_NO_CHANGE,
'QUOTEMARKS' => array('"', '"""'),
'ESCAPE_CHAR' => '\\',
'KEYWORDS' => array(
1 => array(
/* Scala keywords, part 1: control flow */
'case', 'default', 'do', 'else', 'for',
'if', 'match', 'while'
),
2 => array(
/* Scala keywords, part 2 */
'return', 'throw',
'try', 'catch', 'finally',
'abstract', 'class', 'def', 'extends',
'final', 'forSome', 'implicit', 'import',
'lazy', 'new', 'object', 'override', 'package',
'private', 'protected',
'requires', 'sealed', 'super', 'this', 'trait', 'type',
'val', 'var', 'with', 'yield'
),
3 => array(
/* Scala keywords, part 3: standard value types */
'unit', 'Unit', 'boolean', 'Boolean', 'int', 'Int', 'Any', 'AnyVal', 'Nothing',
),
4 => array(
/* other reserved words in Scala: literals */
/* should be styled to look similar to numbers and Strings */
'false', 'null', 'true'
),
5 => array(
/* Scala reference types */
'AnyRef', 'Null', 'List', 'String', 'Integer', 'Option', 'Array'
)
),
'SYMBOLS' => array(
':', '*', '&', '%', '!', ';', '<', '>', '?', '_', '=', '=>',
'<-', '<:', '<%', '>:', '#', '@'
),
'CASE_SENSITIVE' => array(
GESHI_COMMENTS => true,
/* all Scala keywords are case sensitive */
1 => true, 2 => true, 3 => true, 4 => true, 5 => true ),
'STYLES' => array(
'KEYWORDS' => array(
1 => 'color: #b1b100;',
2 => 'color: #000000; font-weight: bold;',
3 => 'color: #993333;',
4 => 'color: #b13366;',
5 => 'color: #aaaadd;'
),
'SYMBOLS' => array(
0 => 'color: #FFAA00;'
),
'COMMENTS' => array(
1 => 'color: #808080; font-style: italic;',
'MULTI' => 'color: #808080; font-style: italic;'
),
'ESCAPE_CHAR' => array(
0 => 'color: #000099; font-weight: bold;'
),
'BRACKETS' => array(
0 => 'color: #66cc66;'
),
'STRINGS' => array(
0 => 'color: #ff0000;'
),
'NUMBERS' => array(
0 => 'color: #cc66cc;'
),
'METHODS' => array(
1 => 'color: #006600;',
2 => 'color: #006600;'
),
'SCRIPT' => array(
),
'REGEXPS' => array(
0 => 'color: #008000;'
)
),
'URLS' => array(
1 => '',
2 => '',
3 => '',
4 => ''
),
'OOLANG' => true,
'OBJECT_SPLITTERS' => array(
1 => '.'
),
'REGEXPS' => array(
0 => "'[a-zA-Z_][a-zA-Z0-9_]*"
),
'STRICT_MODE_APPLIES' => GESHI_NEVER,
'SCRIPT_DELIMITERS' => array(
),
'HIGHLIGHT_STRICT_BLOCK' => array(
)
);
?>
Although I haven’t used it too much yet, Scala is definitely one of the languages I find most interesting right now. Many customers I work with are heavily Java focused, and getting a more flexible and powerful language with superb Java interoperability to use on the JVM feels very liberating. Now I just need to convince the customers that Scala is the future… :] But if Gosling likes it it must be good, right?
A few months ago Jonas BonĂ©r wrote about how Scala actors can be clustered with Terracotta. I really enjoyed the article and I think the idea of distributed, redundant actors is very appealing. The actor paradigm is a nice way of developing concurrent applications (see the intro in Jonas’ blog entry for more info) and if we can liberate the actors from the confines of a single JVM and easily distribute them over multiple hosts - all the better.
I’m not going to compare Coherence and Terracotta here. In short, Coherence provides, among other things, a distributed caching and code execution mechanism without single points of failure. Coherence can be downloaded for evaluation purposes from the Oracle web site.
The idea I wanted to try was to have Scala actors that store their state in the distributed Coherence cache and run as normal Scala actors on a single node in the cluster at a time. The node the actor runs on should be allocated by Coherence and if the node fails, the actor should be automatically started on another node with maintained state and without any lost messages.
Also, I wanted this to work as similarly to normal Scala actors as possible with compatibility between the two.
Before investigating the proof of concept solution, let’s look at the result and what it gives us.
Here’s a simple test application with a normal Scala actor. It uses the recommended way of creating actors with actor { ... }
:
package coherencetest1
import java.util.Date
import scala.actors.Actor._
object ActorTest {
def main(args : Array[String]) : Unit = {
actor {
var pings : Int = 0
println("Actor started.")
self ! ('ping, 1)
loop {
react {
case ('ping, i : Int) =>
pings = pings + 1
println(new Date + " - Got ping: " + i + " Total pings: " + pings)
Thread.sleep(1000)
self ! ('ping, i+1)
}
}
}
}
}
When this code is run, a simple actor that sends a message to itself is created and started. It sleeps for 1 second to pace the execution and to simulate a task that takes time to perform (of course, normally you shouldn’t sleep in real actors as you tie up the thread).
When the code is run, the following is displayed:
Actor started.
Sun Jun 29 15:57:16 CEST 2008 - Got ping: 1 Total pings: 1
Sun Jun 29 15:57:17 CEST 2008 - Got ping: 2 Total pings: 2
Sun Jun 29 15:57:18 CEST 2008 - Got ping: 3 Total pings: 3
Sun Jun 29 15:57:19 CEST 2008 - Got ping: 4 Total pings: 4
Sun Jun 29 15:57:20 CEST 2008 - Got ping: 5 Total pings: 5
Sun Jun 29 15:57:21 CEST 2008 - Got ping: 6 Total pings: 6
...
Nothing too fancy, but a decent test case for our actor distribution. An important aspect of this actor is that it defines a local variable pings
and prints a message in the initialization part, before the loop
and react
. The value of the local var must be maintained and the initialization code must only be run once, and not when an actor is started on a new node after a failure.
Let’s make it a distributed Actor:
package coherencetest1
import java.util.Date
import scala.actors.coherence.CoActor._
@serializable
object DactorTest {
def main(args : Array[String]) : Unit = {
dactor {
var pings : Int = 0
println("Actor started.")
self ! ('ping, 1)
loop {
react {
case ('ping, i : Int) =>
pings = pings + 1
println(new Date + " - Got ping: " + i + " Total pings: " + pings)
Thread.sleep(1000)
self ! ('ping, i+1)
}
}
}
}
}
What have we done here? Three things:
The first point is simple - we need access to the new functionality, so we import the new CoActor object instead of the standard Actor object.
For number two - this is slightly nasty. If I interpret things correctly; as the code block created as a parameter to react
needs to be serializable (so that the actor can be distributed over the network), all enclosing types needs to be serializable. I struggled with this for a while and the only option seems to be creating a proper named serializable type… But since I want to be able to create an actor in-line, we need to do it this way.
For the last point - dactor { ... }
is simply the function used to create a distributed actor instead of a normal actor.
Let’s run it:
2008-06-29 16:11:18.779 Oracle Coherence 3.3.1/389 <Info> (thread=main, member=n/a): Loaded operational configuration from resource "jar:file:/opt/coherence-3.3.1/lib/coherence.jar!/tangosol-coherence.xml"
2008-06-29 16:11:18.785 Oracle Coherence 3.3.1/389 <Info> (thread=main, member=n/a): Loaded operational overrides from resource "jar:file:/opt/coherence-3.3.1/lib/coherence.jar!/tangosol-coherence-override-dev.xml"
2008-06-29 16:11:18.786 Oracle Coherence 3.3.1/389 <D5> (thread=main, member=n/a): Optional configuration override "/tangosol-coherence-override.xml" is not specified
Oracle Coherence Version 3.3.1/389
Grid Edition: Development mode
Copyright (c) 2000-2007 Oracle. All rights reserved.
2008-06-29 16:11:19.042 Oracle Coherence GE 3.3.1/389 <Info> (thread=main, member=n/a): Loaded cache configuration from resource "file:/crypt/dev/scala/CoherenceTest1/config/scalacoherence.xml"
2008-06-29 16:11:19.331 Oracle Coherence GE 3.3.1/389 <Warning> (thread=main, member=n/a): UnicastUdpSocket failed to set receive buffer size to 1428 packets (2096304 bytes); actual size is 714 packets (1048576 bytes). Consult your OS documentation regarding increasing the maximum socket buffer size. Proceeding with the actual value may cause sub-optimal performance.
2008-06-29 16:11:19.459 Oracle Coherence GE 3.3.1/389 <D5> (thread=Cluster, member=n/a): Service Cluster joined the cluster with senior service member n/a
2008-06-29 16:11:22.662 Oracle Coherence GE 3.3.1/389 <Info> (thread=Cluster, member=n/a): Created a new cluster with Member(Id=1, Timestamp=2008-06-29 16:11:19.343, Address=192.168.54.1:8088, MachineId=24065, Location=process:31397@dellicious, Edition=Grid Edition, Mode=Development, CpuCount=2, SocketCount=1) UID=0xC0A836010000011AD4A9DCAF5E011F98
2008-06-29 16:11:22.834 Oracle Coherence GE 3.3.1/389 <D5> (thread=DistributedCache, member=1): Service DistributedCache joined the cluster with senior service member 1
Actor started.
Sun Jun 29 16:11:23 CEST 2008 - Got ping: 1 Total pings: 1
Sun Jun 29 16:11:24 CEST 2008 - Got ping: 2 Total pings: 2
Sun Jun 29 16:11:25 CEST 2008 - Got ping: 3 Total pings: 3
Sun Jun 29 16:11:26 CEST 2008 - Got ping: 4 Total pings: 4
Sun Jun 29 16:11:27 CEST 2008 - Got ping: 5 Total pings: 5
Sun Jun 29 16:11:28 CEST 2008 - Got ping: 6 Total pings: 6
...
After the Coherence initialization (which happens automatically and which I’ve disabled in the outputs below) the actor starts up as expected. However, if we start this on two nodes - there will be two actors created, and no way for a new JVM to get hold of a reference to a specific existing actor. To handle this, let’s specify a name for the actor that we create using the dactor(name : Symbol) { ... }
function:
...
object DactorTest {
def main(args : Array[String]) : Unit = {
dactor('pingActor) {
var pings : Int = 0
...
This simply means: Give me a reference to pingActor
, but if it doesn’t exist - create it with the following body. This mechanism makes it easy to have a single instance of an actor even if the same application is running on multiple nodes, without having to explicitly check if an actor has already been created or not.
Now we can run the program on two different nodes. After the actor has started and is running on one node, I’ll kill that node:
Node 1 | Node 2 |
|
|
First Node 1 started up and ran the actor until Node 2 started. At this point the actor was distributed to Node 2 (determined by the automatic cache partitioning done by Coherence) and started there. As can be seen, the local state (the total pings) was persisted and transferred over. When Node 2 was killed the actor was migrated back and started on Node 1. Note that the state of the actor is persisted for each message, so a sudden shutdown of a JVM is not a problem.
One might wonder why the message for ping number 6 and 19 can be seen in both outputs - this happens as the actor was migrated while the actor thread was sleeping - before the react body was complete. This causes the new node to rerun the message (since the processing of the message didn’t complete on the old node) and the support code in the old node makes sure all messages sent by the old actor are discarded as it’s been terminated. It’s a bit tricky coding actors to be fully idempotent as not everything is handled in a transaction, but limiting side effects to sending messages at the end of the processing makes it fairly reliable.
Here’s a slightly more complex example:
package coherencetest1
import java.util.Date
import scala.actors.coherence.CoActor._
@serializable
object DactorTest2 {
def main(args : Array[String]) : Unit = {
init
readLine
val numActors = 80
val actors = for(id <- 1 to numActors)
yield dactor {
loop {
react {
case 'ping =>
println(new Date + " - Actor " + id + " got ping.")
reply(('pong, id))
}
}
}
actors.map(_ ! 'ping).force
var pongs = 0
while(pongs < numActors) {
receive {
case ('pong, x : Int) =>
pongs = pongs + 1
}
}
println("Got " + pongs + " pongs.")
readLine
}
}
In this example 80 distributed actors are created and sent a ping message. After that the main thread receives all the pong replies. The output of this, when run on 4 nodes look like so:
Node 1 | Node 2 |
|
|
Node 3 | Node 4 |
|
|
The program was started on all 4 nodes, but only the first node passes the first readLine. The other nodes just init the distributed actor framework and wait. As can be seen, the actors were distributed over the running nodes as expected - however, with small numbers like this the distribution can be a bit uneven (compare Node 3 and Node 4).
The solution to allow the distribution of serializable actors is based on the Coherence backing map listeners. These can be used to get notifications of which specific node is the master for a certain object. As there only is one master for an object at any point, and a new master is allocated automatically if a master fails, we can use this to determine where the actor should run.
The object returned from dactor { ... }
is a Proxy object - very similar to the Proxy used in the standard Scala remote actors. In fact, this whole thing is built in a way similar to the standard Scala remote actors, with Proxy objects acting on behalf of the sender on the receiver side and receiver on the sender side.
Additionally, the Coherence grid invocation mechanism allows us to deliver messages to running actors directly to the node where it is running.
When a new dactor is created, the following happens:
When a message is sent to a distributed actor using a Proxy object the following happens:
If the distributed actor does a reply
or sends a message to the sender
, the Proxy
which represents the non-distributed actor gets called as follows:
The distributed actors are a bit limited in what they can do, as they always needs to be serializable and I didn’t want to change any of the standard Scala code. For instance - when a synchronous invocation is made the return channel is stored in the Scala actor. The return channel implementation used in Scala isn’t serializable, so I decided to not implement this feature for now.
Basically, only message sending (!), reply, sender, loop and react are allowed in the distributed actors. However, they can interoperate with normal actors as can be seen in this example:
val distActor = dactor('distActor) {
loop {
react {
case ('ping) =>
reply('pong)
}
}
}
actor {
distActor ! 'ping
loop {
react {
case x =>
println("Actor got message: " + x)
}
}
}
The Proxy objects can be used by actors as they serialize correctly.
Distributing (or GridEnabling(tm) or whatever the word du jour is) actors to easily use the processing power and resilience multiple computers give but at the same time hiding the complexity from the developer is a nice way to fairly easily scale up an actors based application. To add more processing power or redundancy - just fire up new nodes.
The proof of concept I made here just scratches the surface, but it’s interesting to see that it can be done with Coherence while maintaining the syntax and behavior expected by Scala developers.
The highly experimental and hackish source code for the proof of concept is available in the git repository at:
http://martin.elwin.com/git/scala-coherence.git
Dependencies are Scala (2.7.1 recommended) and Oracle Coherence. There are some scripts for Linux which are trivial to adapt to other operating environments.
/M
]]>One of the fairly recent chances I’ve made is to switch to a tiling window manager. For those too lazy to read the Wikipedia entry: A tiling window manager automatically arranges the open windows to fill the available space of the screen - saving you the job of arranging the windows yourself. That sounds simple, but can be quite complex in practice.
I started out using Ion3 in mid 2007. Ion3 is nice and smooth and worked great and really got me hooked on tiled window managers. However, as a big fan of open source, I was a bit concerned by the author’s approach to controlling the development of Ion (see this or this for instance) so I switched to wmii. wmii was also quite nice, but I never got into it as well as Ion, for some reason.
That’s when I found xmonad - a very nice window manager written in Haskell. Haskell is also used as the configuration language, which makes the configuration possibilities close to endless (actually, they probably are endless, as Haskell is Turing equivalent (although there are people who doubt it!) :).
Today, some 6 months later, I use xmonad 0.7 together with KDE 3.5.9 (similar to the setup described on the Haskell wiki) on Kubuntu 8.04 and am very happy with it.
However, xmonad is just a tiling window manager, but I also want a status bar to allow me to see chosen pieces of information at a glance. I use KDE3 and have the KDE3 panel kicker
running, but it’s hidden unless I put the mouse in the bottom left corner. I don’t use it normally and only keep it around to access the system tray (which I almost never need). Instead of the ugly and bulky kicker
I want a smaller, slimmer alternative…
Enter xmobar - a text based status bar, also written in Haskell. xmobar comes with a number of plugins to show different aspects of the system on the status bar - date, cpu, memory, network activity, etc. A few things it doesn’t do out of the box, which I want to see, are number of new mails, current keyboard layout (I use US mainly, but switch to Swedish for the national chars) and speaker mute state. Luckily, xmobar provides a plugin to execute shell commands, and using this mechanism we can put anything on the status bar, as long as we can figure out a command to run to produce the wanted text.
I use a Maildir mail storage (together with dovecot to provide an IMAP interface to my mail), and since I don’t automatically split mails into groups/folders it’s enough to just check the number of mails in my Maildir’s new
folder. The xmobar command for this ends up as:
Run Com "sh" ["-c", "\"ls ~/Maildir/new | wc -l\""] "mailcount" 600
This command invokes the sh
shell and passes a command string using the -c
switch. The command string contains the actual shell command we want to execute: ls ~/Maildir/new | wc -l
Executing sh
and passing a command string allows us to use the pipe to pass the output of the ls
to the wc
to get the actual file count back.
The command is mapped to the mailcount
alias, and the interval is set to 600 tenths of a second = 60 seconds.
Since I use KDE3 the keyboard layout switching is handled by the kxkb
application (shows up as a flag in the tray if multiple layouts are configured). KDE3 uses DCOP (unlike KDE4 which uses D-Bus) for IPC. Available DCOP services can easily be interogated from the command line using the… you guessed it… dcop
command.
Running dcop
without parameters shows a list of available applications. On my system it shows:
% dcop
konsole-9099
kdebluetooth
kicker
kxkb
guidance-6223
kded
adept_notifier
kmix
knotify
kio_uiserver
klauncher
khotkeys
kwalletmanager
digikam-6484
klipper
ksmserver
knetworkmanager
Woohoo! kxkb is there, so let’s dig deeper. By a bit of trial and error we can probe the internals of the dcop-exposed application:
% dcop kxkb
qt
MainApplication-Interface
kxkb
% dcop kxkb kxkb
QCStringList interfaces()
QCStringList functions()
bool setLayout(QString layoutPair)
QString getCurrentLayout()
QStringList getLayoutsList()
void forceSetXKBMap(bool set)
% dcop kxkb kxkb getCurrentLayout
us
Sweet! Now we have the dcop “path” to find the current keyboard layout. Let’s add it to the xmobar configuration:
Run Com "dcop" ["kxkb", "kxkb", "getCurrentLayout"] "kbd" 20
Using dcop
again we can find the mute state:
% dcop kmix
qt
MainApplication-Interface
Mixer0
kmix
kmix-mainwindow#1
% dcop kmix Mixer0
QCStringList interfaces()
QCStringList functions()
void setVolume(int deviceidx,int percentage)
void setMasterVolume(int percentage)
void increaseVolume(int deviceidx)
void decreaseVolume(int deviceidx)
int volume(int deviceidx)
int masterVolume()
void setAbsoluteVolume(int deviceidx,long int absoluteVolume)
long int absoluteVolume(int deviceidx)
long int absoluteVolumeMin(int deviceidx)
long int absoluteVolumeMax(int deviceidx)
void setMute(int deviceidx,bool on)
void setMasterMute(bool on)
void toggleMute(int deviceidx)
void toggleMasterMute()
bool mute(int deviceidx)
bool masterMute()
int masterDeviceIndex()
void setRecordSource(int deviceidx,bool on)
bool isRecordSource(int deviceidx)
void setBalance(int balance)
bool isAvailableDevice(int deviceidx)
QString mixerName()
int open()
int close()
% dcop kmix Mixer0 masterMute
true
Perfect - now we can create the last xmobar run command:
Run Com "dcop" ["kmix", "Mixer0", "masterMute"] "mute" 20
So, here’s the final .xmobarrc
:
Config { font = "xft:Consolas-8"
, bgColor = "black"
, fgColor = "grey"
, position = Bottom
, commands = [ Run Network "eth0" ["-L","0","-H","32","--normal","green","--high","red"] 50
, Run Network "wlan0" ["-L","0","-H","32","--normal","green","--high","red"] 51
, Run Cpu ["-L","3","-H","50","--normal","green","--high","red"] 52
, Run Memory ["-t","Mem: <usedratio>%"] 54
, Run Date "%a %b %_d %H:%M:%S" "date" 10
, Run StdinReader
, Run Com "dcop" ["kxkb", "kxkb", "getCurrentLayout"] "kbd" 20
, Run Com "sh" ["-c", "\"ls ~/Maildir/new | wc -l\""] "mailcount" 600
, Run Com "dcop" ["kmix", "Mixer0", "masterMute"] "mute" 20
]
, sepChar = "%"
, alignSep = "}{"
, template = "<fc=#ee9a00>%date%</fc> | %cpu% | %memory% | %eth0% - %wlan0% } %StdinReader% { Mail: %mailcount% | Kbd: %kbd% | Mute: %mute%"
}
And a small screenshot for your viewing pleasure:
]]>scala.xml.XML
object has helper functions to create Scala XML structures from the usual suspects: Strings, InputStream, Reader, File, etc. But DOM Document or Element is missing.
Going via a Byte array, Char array or String is simple. For instance, the following outputs the DOM to a char array, which is then used as the source of the Scala XML creation:
Using Char array
//dom is the DOM Element
val charWriter = new CharArrayWriter()
TransformerFactory.newInstance.newTransformer.transform(new DOMSource(dom), new StreamResult(charWriter))
val xml = XML.load(new CharArrayReader(charWriter.toCharArray))
This works fine, and is reasonably fast. However, it does allocate some unnecessary memory (the char array) and performs some unnecessary parsing (of the char array) - both of which we’d really like to avoid.
How do we do this? Well, here’s one option that I came up with:
Using SAX
val saxHandler = new NoBindingFactoryAdapter()
saxHandler.scopeStack.push(TopScope)
TransformerFactory.newInstance.newTransformer.transform(new DOMSource(dom), new SAXResult(saxHandler))
saxHandler.scopeStack.pop
val xml = saxHandler.rootElem
What’s going on here? Well, the Scala XML library uses SAX to parse XML and create the XML structure. One way of generating SAX events is to walk a DOM tree, which is handled by the javax.xml.transform.Transformer
with a DOMSource
as input and a SAXResult
as output. The extension of the DefaultHandler
needed for handling the SAX events is implemented by the scala.xml.parsing.FactoryAdapter
, which is extended by the NoBindingFactoryAdapter
used to construct the XML structure. Because of this, we can do violence on the API and use the NoBindingFactoryAdapter
directly as a SAX DefaultHandler
- nice! The scopeStack
calls are done to maintain the scope information, which I stole from the loadXML
method in the AdapterFactory
class.
However, let’s take a moment to reflect on this. Using the Scala XML library in this way is not really good. Even if it’s possible to do it this way, I’ve not seen it described as a supported way of using it and therefore it should be done only after considering that the next release of Scala might remove this possibility.
[Update 2008-07-02: Burak Emir kindly added a comment; “Don’t worry, the SAX factory adapter is not going to go away.” - good to know!]
That said - let’s consider this an exploration of possibilities which could potentially lead to an update of the Scala XML API to allow a DOM to be used as a source instead…!
A quick test gives the following result.
Test data is a 6897 bytes XML file containing 118 elements with some 4 different namespaces.
I ran each test in 1000 iterations, with a full garbage collection before the first iteration. For every 100 iterations I printed the delta of the free memory and then timed the time for the complete 1000 iterations.
Char array: 100 iterations use around 28 MB, full test: 1414 ms SAX: 100 iterations use around 18 MB, full test: 970 ms
So, in conclusion, not overwhelming difference, but around 1/3 faster and 1/3 less memory consumed. Can we do better? I’m not sure. :)
The next step is to do Scala XML to DOM… This could be more interesting. I see two options:
Option 1 would be more efficient - but the DOM API isn’t fun implementing. Option 2 would be much simpler, but probably would be less efficient and require more allocations. Gotta think about this one…
/M
]]>Oh, right, this isn’t Slashdot…
Anywho, here goes:
While at a customer, a few colleagues and I were joking about Use Case diagrams, which in their most basic form tend to be somewhat meaningless. However, in more complex requirements definitions, they certainly can help define actors and sections of functionality to go into certain releases, for instance.
As I at the time was pondering how to best visualize the customer’s SOA architecture and service dependencies by generating Graphviz diagrams, I thought - hey, why not use Graphviz to quickly produce some snazzy Use Case Diagrams?
Well… the snazziness is debatable, and I didn’t manage it as quickly as I had hoped. But all considered, I think they turned out quite well.
To begin with, I have to figure out how to render a stick figure for a node. When using the postscript output, you can create custom shapes, but since I want to create just a PNG image, I need to use an image as the custom shape file.
With Inkscape I managed to produce the following:
Very nice.
Saving this as stick.png
, we can create a simple diagram:
1.dot
digraph G {
"User" [shapefile="stick.png"];
"Log In" [shape=ellipse];
"User"->"Log In" [arrowhead=none]
}
This can be run through the Graphviz dot
program thus:
cat 1.dot | dot -Tpng > 1.png
Which produces:
Hmmm… Not quite what we wanted. Let’s make a few changes:
The most difficult of this is moving the label. This seems to require creating a cluster subgraph containing a label and the node with the custom shape. An example of this can be seen in this Graphviz tutorial. Another alternative is using HTML in the label - but this looks ugly. So, doing this, and refactoring a bit to create nodes explicitly, we get:
2.dot
digraph G {
rankdir=LR;
subgraph clusterUser {label="User"; labelloc="b"; peripheries=0; user};
user [shapefile="stick.png", peripheries=0, style=invis];
login [label="Log In", shape=ellipse];
user->login [arrowhead=none];
}
… which gives us:
Not too bad. Let’s add some more use cases:
3.dot
digraph G {
rankdir=LR;
labelloc="b";
peripheries=0;
/* Actor Nodes */
node [shape=plaintext, style=invis];
subgraph clusterUser {label="User"; user};
user [shapefile="stick.png"];
subgraph clusterAdmin {label="Administrator"; admin};
admin [shapefile="stick.png"];
/* Use Case Nodes */
node [shape=ellipse, style=solid];
log_in [label="Log In"];
log_in_pwd [label="Log In Password"];
log_in_cert [label="Log In Certificate"];
manage_user [label="Manage User"];
change_email [label="Change Email"];
change_pwd [label="Change Password"];
/* Edges */
edge [arrowhead="oarrow"];
admin->user;
edge [arrowhead=none];
user->log_in;
admin->manage_user;
edge [arrowtail="vee", label="<<extend>>", style=dashed];
log_in->manage_user;
log_in->log_in_pwd;
log_in->log_in_cert;
manage_user->change_email;
manage_user->change_pwd;
}
Run through dot
this produces the following image:
The obvious problem with this is that the nodes aren’t placed were we really want them to be. Automatic node placement in a directed graph is one the strong points of dot, but the default placement doesn’t really correspond to how we want our Use Case to look. So, let’s add some hints to help dot place the nodes in the way we like it:
4.dot
digraph G {
rankdir=LR;
labelloc="b";
peripheries=0;
/* Actor Nodes */
node [shape=plaintext, style=invis];
subgraph clusterUser {label="User"; user};
subgraph clusterAdmin {label="Administrator"; admin};
{
rank=min;
user [shapefile="stick.png"];
admin [shapefile="stick.png"];
}
/* Use Case Nodes */
node [shape=ellipse, style=solid];
{
rank=same;
log_in [label="Log In"];
manage_user [label="Manage User"];
}
log_in_pwd [label="Log In Password"];
log_in_cert [label="Log In Certificate"];
change_email [label="Change Email"];
change_pwd [label="Change Password"];
/* Edges */
edge [arrowhead="oarrow"];
admin->user;
edge [arrowhead=none];
user->log_in;
admin->manage_user;
edge [arrowtail="vee", label="<<extend>>", style=dashed];
log_in->manage_user;
log_in->log_in_pwd;
log_in->log_in_cert;
manage_user->change_email;
manage_user->change_pwd;
}
Which produces:
Now, I don’t see most people working with requirements modeling getting too excited by this. The ones I’ve worked with tend to prefer graphical tools. But Graphviz is an excellent option when one wants to visualize data sets by automatically generating diagrams - and the more that can be done automatically, the better!
/M
]]>This time around I decided to get rid of the Windows partition previously used for dual booting (haven’t booted into Windows in 6 months). I also wanted to switch from the extremely useful and amazing Truecrypt to using Linux native LUKS encryption for my work and private data. I originally used Truecrypt on Linux with an NTFS file system to make the encrypted drive compatible with both Windows and Linux, but since then I’ve switched to an ext3 file system and don’t need the capability to mount it both under Linux and Windows. If you do, make sure to try - nay, use! - Truecrypt. It’s very very nice.
So, the 200GB disk ended up being partitioned like so:
$sudo fdisk -l /dev/sda Disk /dev/sda: 200.0 GB, 200049647616 bytes 255 heads, 63 sectors/track, 24321 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x000cbcfd Device Boot Start End Blocks Id System /dev/sda1 * 1 3647 29294496 83 Linux /dev/sda2 3648 7294 29294527+ 83 Linux /dev/sda3 7295 23707 131837422+ 83 Linux /dev/sda4 23708 24321 4931955 82 Linux swap
That is, 30GB for root, 30GB to be encrypted, 135GB for data and the rest for swap.
Of course, with Hardy you can encrypt the root file system just by selecting to do so in the alternate installer, but I don’t feel a need to go that way right now - maybe later.
Now, slowly approaching the main topic of this post, while playing around with the new disk and the old data I was thinking about putting in place a new backup strategy. At the same time (I can do anything while simultaneously going through my subscriptions in Google Reader:) I happened upon the JWZ post about backups. JWZ, who obviously is a smart guy, is onto something. I’m already doing backups (of part of my data) using rsync
to my server at home - not full disk image, admittedly, which was JWZ’s prescription. However, I wanted to try something different and achieve a few other things:
For a while I was considering ZFS (which I in a moment of temporary megalomania started porting to a native Linux kernel module, strictly for private use! - maybe something for another post) with its nice snapshot and compression support, but this idea was quickly discarded for being a bit too unnatural for most Linux setups. I added the additional requirement that the solution should work without having too many non-standard dependencies.
CromFS has a nice feature set, but it’s a FUSE file system and it’s not available in Hardy. The comparison chart on the CromFS page however led me to SquashFS. SquashFS is available out-of-the-box in Hardy and by apt-get
ting thesquashfs-tools
package, everything needed is installed. The nice HowTo gives a good introduction to the commands required to use it. Note - SquashFS, as most compressed file systems, is read-only. This suites me perfectly as I want to make a snapshot for backup purposes, but it might not be what you want.
The idea I got was to use SquashFS as the compressed snapshot file system and wrap it in LUKS for encryption. By doing this I get a single compressed file which I can mount on a modern Linux distribution to access the data. It’s securely and (somewhat) efficiently stored.
Luckily, what I wanted to do is very similar to what the Gentoo guide for how to burn and encrypted CD image describes - namely wrapping an already existing file system image in a LUKS encrypted container. Most of the following is based on the Gentoo guide.
There are a few steps to the process:
The trickiest (which isn’t really that tricky) part is figuring out how large to make the LUKS container. As we don’t know beforehand how large the SquashFS image is, we need to create it and then calculate how large the LUKS container should be.
Let’s start with creating the SquashFS image:
$sudo mksquashfs /crypt /tmp/cryptbackup.sqsh -e /crypt/stuff
Parallel mksquashfs: Using 2 processors
Creating little endian 3.1 filesystem on /tmp/cryptbackup.sqsh, block size 131072.
[==========================================================================] 2297/2297 100%
Exportable Little endian filesystem, data block size 131072, compressed data, compressed metadata, compressed fragments, duplicates are removed
Filesystem size 13606.69 Kbytes (13.29 Mbytes)
21.57% of uncompressed filesystem size (63077.89 Kbytes)
Inode table size 38419 bytes (37.52 Kbytes)
32.62% of uncompressed inode table size (117780 bytes)
Directory table size 33629 bytes (32.84 Kbytes)
58.10% of uncompressed directory table size (57880 bytes)
Number of duplicate files found 115
Number of inodes 3176
Number of files 1877
Number of fragments 45
Number of symbolic links 945
Number of device nodes 0
Number of fifo nodes 0
Number of socket nodes 0
Number of directories 354
Number of uids 5
....
Number of gids 12
....
This creates a new compressed SquashFS image from the data in the /crypt
directory (the contents of this directory becomes the contents of the image root). The -e
flag excludes all files in the given directories - here, everything in /crypt/stuff
. Note that it might be more efficient to store this image on a separate disk, if available. Even an external USB 2.0 mounted disk might be faster to write to while reading the data from the main disk.
The SquashFS image was compressed by about 50% in my case - 10GB of data stored in a 5GB image. Of course, the compression ratio achieved depends on the type data stored.
Step 1 done, we now need to create a LUKS container to store the SquashFS image in. How big should it be? Well… The Gentoo guide linked to above calculates the size of the LUKS overhead by checking the difference between a LUKS container and the mapped block device. It turns out that this overhead is 1032 blocks (each block being 512 bytes), no matter what the block size is. Googling this seems to confirm it, so for now I’m assuming that LUKS always adds 1032 blocks of overhead.
The size in 512 byte blocks of the SquashFS image can be found by doing:
$ls -l --block-size=512 /tmp/cryptbackup.sqsh
-rwx------ 1 root root 27216 2008-05-11 20:55 /data/cryptbackup.sqsh
Which in the above case indicates that the file is 27216 512 byte blocks large (this is a test file…).
Adding 1032 blocks gives us the size needed for the LUKS container - 28248 blocks - let’s create it (while letting the shell handle the calculation for us):
$sudo dd if=/dev/zero of=/tmp/cryptbackupluks.img bs=512 count=1 seek=$((27216+1032))
Note that this creates a sparse file on most modern file systems, so it’s quite quick. We don’t need to fill it with random numbers or anything as the whole container will be updated when we write the SquashFS image to it.
Now, let’s map it. First locate an available loop device:
$sudo losetup -f
/dev/loop0
loop0
is available - no loops on this system.
Set up the container file as a loop device:
$sudo losetup /dev/loop0 /tmp/cryptbackupluks.img
Then make it a LUKS volume:
$sudo cryptsetup luksFormat /dev/loop0
WARNING!
========
This will overwrite data on /dev/loop0 irrevocably.
Are you sure? (Type uppercase yes): YES
Enter LUKS passphrase:
Verify passphrase:
Command successful.
Make sure you remember the password… ;P
And open the device:
$sudo cryptsetup luksOpen /dev/loop0 cryptbackup
Enter LUKS passphrase:
key slot 0 unlocked.
Command successful.
Now the device is available as /dev/mapper/cryptbackup
, ready to accept our SquashFS image.
Let’s validate the overhead of LUKS:
$echo $((`sudo blockdev --getsize /dev/loop0`-`sudo blockdev --getsize /dev/mapper/cryptbackup`))
1032
Sweet! So, the size of the mapped device should be the same as our SquashFS image:
$sudo blockdev --getsize /dev/mapper/cryptbackup
27217
Hmmm… Close enough… ;P Perhaps there is some rounding to a full KB or something like that going on. Anywho, at least it is big enough for our 27216 block image. Let’s transfer it:
$sudo dd if=/tmp/cryptbackup.sqsh of=/dev/mapper/cryptbackup bs=512
27216+0 records in
27216+0 records out
13934592 bytes (14 MB) copied, 0.0544451 s, 256 MB/s
Done and done. To verify that it works we can mount the file system:
$sudo mkdir /mnt/cryptbackup
$sudo mount /dev/mapper/cryptbackup /mnt/cryptbackup
If all went well, ls /mnt/cryptbackup
should now give the contents of the original directory.
To unmount, do:
$sudo umount /mnt/cryptbackup
$sudo cryptsetup luksClose cryptbackup
$sudo losetup -d /dev/loop0
Now remove the old SquashFS image /tmp/cryptbackup.sqsh
and store the LUKS container /tmp/cryptbackupluks.img
in a safe location. I use a portable external hard disk and the server at home to save the backup images. To mount later, just run a few of the, slightly modified, above commands again:
$LOOP=`sudo losetup -s -f /tmp/cryptbackupluks.img`
$sudo cryptsetup luksOpen $LOOP cryptbackup
Enter LUKS passphrase:
key slot 0 unlocked.
Command successful.
$sudo mount /dev/mapper/cryptbackup /mnt/cryptbackup
Now, all that remains is to create some helper scripts to avoid having to write all this every time I want to make a backup…
A drawback of this method is that it takes quite a while to perform the backups. It’s not incremental either, so the backups will presumably take longer and longer to make each time (as more data is accumulated). Next thing might be to try to create a base image with SquashFS and then do incremental backups with UnionFS or something… Hmmm……
/M
]]>