11.28.06

Interview concerning GPL Java

Posted in Uncategorized, Java at 7:17 am by skoobi

Dalibor Topic has been interviewed (post here) concerning the recent announcement of Sun to make Java GPL, its implication on the Kaffe project.

As far as I am concerned, I strongly believe that Open Sourcing Java will open the door to more innovation and wider adoption around/of this technology. When it comes to high-level languages, the only truly Free (Libre) option on the linux/*NIX desktop was to use Mono/C# (more and more GNOME applications are written in C#). Adding more competition by releasing Java Open source will most probably lead to better, easier to maintain applications.

Another important point about the GPL’ing of Java is that Java itself will benefit from the change. Look at GCJ, for instance, which tries to get rid of the JVM and directly compile java to machine Code. It is yet to prove whether this approach would actually make Java faster, but in any case, it would help to better integrate Java with the unix-way of doing things, instead of having Java on one side, and the whole OS on the other side. Few to no efforts have actually been done in order to mix high-level languages with the system.  GCJ is currently struggling to implement both the language and classlib aspects of Java. Once the classlib will be released as Open Source software, GCJ developpers will be able to concentrate on the compiler itself, and on the integration of Java with C/C++ code, which will be an incentive for the people who don’t like JVMs to use Java (after all, Java will just be a simpler C++ that provides a decent library by default).

11.27.06

HOWTO: Use Direct Web Remoting (DWR) with Spring Framework and Java5 Annotations

Posted in Java, Web Development at 12:20 am by skoobi

DWR is an Open Source Java library that allows to write AJAX-enabled Web Sites. Since the “AJAX” term is used to to describe pretty much anything from rich Web User Interfaces to auto-completing combo boxes to asynchronous communication between the web client and the server, here is a more technical introduction that explains what DWR allows to do :

DWR is basically a Remote Method Invocation framework that allows to “export” server-side Java Objects to the javascript-enabled web client thanks to a transparent communication layer that dynamically (at runtime) generates the client stubs. So, in other terms, You have server side Java methods that you want to run from the client. DWR takes these classes/methods and provides you with a javascript file (generated on-the-fly) that provides javascript classes/methods that once called, will handle all the method calls, marshalling/unmarshalling, etc..

DWR does not provide any UI-abstraction layer. In order to create nifty graphics, you have to complement it with another framework like Dojo toolkit, or Script.aculo.us
Since the official documentation isn’t particularly clear on how to get DWR to work with Spring-managed beans and Java5 Annotations, here is a mini-HOWTO that completes it. (Anyone is free to use this HOWTO under any OSI-approved Open Source license). Nothing is easier to understand than a simple example, so we are going to create a Client Side Logger API that logs the events on the server-side thanks to Log4j. (A real-world example should use some abstraction layer like Commons-Logging , and should result in the creation of a backend to some of the popular Javascript logging frameworks, like Log4Javascript, instead of reinventing yet another logging API).

As a sidenote, DWR is part of the typical Bleeding Edge Web-Framework Stack that I wrote about in a previous post. Surely, not everybody uses this Stack, but it gives an idea of what the current trend in the Java world is. The current java world is full of innovation. Every 2 days, a new framework arrives on the scene, and beginners are more and more afraid of the overall complexity of the platform (if we can call this a platform, because it more looks like a set of unrelated  tools that people struggle to use together, in opposition to Microsoft .Net’s stack that we can safely call a “solution” because of the tight integration between the components). Innovation is doublessly a very good thing, and nothing should prevent innovation from happening. However, I believe it is also very important to document the best practices and tools and try to gather the community around a limited set of paradigms and frameworks. It doesn’t make sense to have a different framework for each programmer in this world. And in my humble opinion, using such an RMI-like system for communicating between Java classes and Javascript is definitely one of the best and efficient (from the developers viewpoint) practices around.
Java class

The first step is to create the annotated Java5 class that will be exported to the Javascript client. All exported methods must be annotated thanks to the @RemoteMethod annotation, and the @Create annotation is used at the class level to tell DWR that the class is instantiated using the Spring Framework. The beanName parameter must reflect the name of the bean in Spring’s application context file.

@Create(creator = SpringCreator.class, creatorParams = { @Param(name = “beanName”, value = “clientSideLogger”) })
public class ClientSideLogger {

public static Logger logger = Logger.getLogger(ClientSideLogger.class);

/**
* @param arg0
* @see org.apache.log4j.Category#debug(java.lang.Object)
*/
@RemoteMethod
public void debug(String arg0) {
logger.debug(arg0);
}

/**
* @param arg0
* @see org.apache.log4j.Category#error(java.lang.Object)
*/
@RemoteMethod
public void error(String arg0) {
logger.error(arg0);
}

/**
* @param arg0
* @see org.apache.log4j.Category#fatal(java.lang.Object)
*/
@RemoteMethod
public void fatal(String arg0) {
logger.fatal(arg0);
}

/**
* @param arg0
* @see org.apache.log4j.Category#info(java.lang.Object)
*/
@RemoteMethod
public void info(String arg0) {
logger.info(arg0);
}

/**
* @param arg0
* @see org.apache.log4j.Category#warn(java.lang.Object)
*/
@RemoteMethod
public void warn(String arg0) {
logger.warn(arg0);
}

Spring Configuration

In order to instantiate the ClientSideLogger class with Spring, we can create the following dwr.xml file, that is going to create a “clientSideLogger” singleton :

<?xml version=”1.0″ encoding=”UTF-8″?>
<beans xmlns=”http://www.springframework.org/schema/beans” xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance”
xmlns:aop=”http://www.springframework.org/schema/aop” xmlns:tx=”http://www.springframework.org/schema/tx”
xsi:schemaLocation=”http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans.xsd
http://www.springframework.org/schema/aop
http://www.springframework.org/schema/aop/spring-aop.xsd
http://www.springframework.org/schema/tx
http://www.springframework.org/schema/tx/spring-tx.xsd”>

<bean id=”clientSideLogger” class=”fr.cvf.vodoo.ihm.portal.dwr.ClientSideLogger” />

</beans>

Web.xml glue

The next step is to write the usual glue (setup the DWR/Spring servlet, etc…).

In your web.xml, you can add :

<context-param>
<param-name>contextConfigLocation</param-name>
<param-value>classpath:spring/dwr.xml</param-value>
</context-param>

<listener>
<listener-class>org.springframework.web.context.ContextLoaderListener</listener-class>
</listener>

<!– DWR servlet , that should list all annotated classes–>
<servlet>
<servlet-name>dwr-invoker</servlet-name>
<display-name>DWR Servlet</display-name>
<servlet-class>org.directwebremoting.servlet.DwrServlet</servlet-class>
<init-param>
<param-name>debug</param-name>
<param-value>true</param-value>
</init-param>
<init-param>
<param-name>classes</param-name>
<param-value>
package.ClientSideLogger
</param-value>
</init-param>
</servlet>

<servlet-mapping>
<servlet-name>dwr-invoker</servlet-name>
<url-pattern>/dwr/*</url-pattern>
</servlet-mapping>

Javascript Code
The final step is to use the logging using the automagically generated javascript API.

First, make sure the following javascript files are included :

<script type=’text/javascript’ xsrc=’/dwr/interface/ClientSideLogger.js’>
</script>
<script type=’text/javascript’ xsrc=’/dwr/engine.js’></script>
<script type=’text/javascript’ xsrc=’/dwr/util.js’></script>

And you can use the API :

ClientSideLogger.error(”error message”);

That’s all you have to do !

11.26.06

HOWTO: Setup SYMPA, WWS with Apache2/FastCGI on Debian/Ubuntu Edgy

Posted in Unix / Linux, Tips and Documentation at 10:24 am by skoobi

Debian/Ubuntu sympa packages rely on apache1, so one may have problems running sympa on Ubuntu Edgy, for instance.

This simple HOWTO explains how to configure Sympa and WWS to run with Apache2 and FastCGI. FastCGI is used instead of CGI because of the performance impact it has. (Basically, CGI forks a new instance of the CGI script wheareas a FastCGI-enabled script runs as a daemon to handle requests, much like a decent system like Java Servlet API.)
Installation

Install the sympa package, and copy the installed apache1 configuration files to apache2 folder

# apt-get install sympa libapache2-mod-fastcgi

# ln -s /etc/sympa/httpd.conf-fcgi /etc/apache2/conf.d/sympa-fcgi

# dpkg-reconfigure -plow sympa

A wizzard will come up, make sure to :

  • Use a database if you want to use WWS
  • Select “Other” when asked what type of web server you are running
  • Tell the wizzard that you want FastCGI enabled

Once this is done, check the /etc/sympa/wwsympa.conf file, and make sure

use_fast_cgi is set to 1

and the last step is to restart apache :

/etc/init.d/apache2 restart

Now, my personal thoughts about mailing-list systems. My impression is that there is no perfect Open Source mailing-list system (however, there are several ones that “do the job”). The characteristics of a good mailing-list system would be, to my opinion :

  • Be entirely configurable through a Web UI. Sympa does a pretty good job at this, since most of the settings are tweakable from WWS. However, the UI is pretty ugly (sure, one can tweak the templates, but…)
  • Would provide a Mailing-List system as well as a Web forum. To make it simple, something like Google Groups, that also allows to post (and subscribe/unsubscribe orders) via email. It is then up to each subscriber to choose between email and Web.
  • Re-uses a well-known templating system ( Smarty in the PHP world, or Freemarker in the Java world, …). Why re-inventing yet another templating language if very good ones already exist ? Having to learn a templating language per web-application is pretty much cumbersome.
  • Be extensible : It should provide a decent plugin system, that people can use to provide additional modules without touching a single line of the current base code. Requiring the modification of a 9980-line (cat /usr/lib/cgi-bin/sympa/wwsympa.fcgi | wc -l) perl script is, In my humble opinion, very bad practice. This implies coding against interfaces, and using some kind of IoC framework like the Spring Framework.
  • Would be independant of the persistence layer, through the use of a sophisticated persistence engine such as Hibernate. Anyone can then configure it to use his preferred database engine
  • Authentication would also be extensible. A security framework such as Acegi could be used so that anyone can easily have the mailing list system authenticate the users against the configured authentication backend. (be it database, system/PAM, Single-Sign-On, etc..). It doesn’t make sense to re-implement every authentication backend in every webapp, since some frameworks already do the job.

HOWTO: Use more than 3 virtual interfaces with Xen (by using IP Aliasing)

Posted in Unix / Linux, Tips and Documentation at 8:11 am by skoobi

Prerequisite : You have Xen running correctly for less than 3 virtual interfaces. This HOWTO explains how to get it to work on Ubuntu Edgy.

Xen does not support using more than 3 virtual interfaces on the guest machines (the so-called DomU). It is stated in the Xen FAQ, and attempting to use more results  in what Ernie Fontes experienced in this post.
The usual trick for having more interfaces in a stand-alone system is to use IP Aliasing. Ubuntu Linux, among others, support IP Aliasing without any problem. However, IP Aliasing seems not to work (according to my tests) for a Xen DomU.

For some reason that I cannot explain nor understand, there is still a way to use more than 3 virtual interfaces in a Xen DomU by using the offical Xen way of adding interfaces, and by using IP Aliasing on top of that. It is weird, but it works :

  • using Xen virtual interfaces is limited to 3 interfaces.
  • using IP aliasing interfaces makes interfaces that are not pingable from the outside
  • BUT using exactly 3 Xen virtual interfaces, and adding more interfaces thanks to IP Aliasing works beautifully….

Here is a quick HOWTO explaining this procedure :

Xen Configuration

Add 3 interfaces for the DomU (you might replace xenbr1 by xenbr0 if your bridge name is the standard one) :

vif = [ ‘bridge=xenbr1′,’bridge=xenbr1′,’bridge=xenbr1′ ]

And you can create your domain using the usual xm create command.

xm create config.cfg

xm console

DomU Network configuration

You can now configure your domain interfaces and the aliases. For the sake of giving a complete example, here is how to achieve that under Debian/Ubuntu :

Let’s say that we want to configure the 216.240.153.78, 216.240.138.247, 216.240.146.76 IPs for, respectively, eth0, eth1 and eth2, and 216.240.134.6 as well as 216.240.128.182 for the 2 aliases eth0:0 and eth0:1.

/etc/network/interfaces

auto eth0
iface eth0 inet static
address 216.240.153.78
netmask 255.255.255.0
gateway 216.240.153.1

auto eth1
iface eth1 inet static
address 216.240.138.247
netmask 255.255.255.0
gateway 216.240.138.1

auto eth2
iface eth2 inet static
address 216.240.146.76
netmask 255.255.255.0
gateway 216.240.146.1

auto eth0:0
iface eth0:0 inet static
address 216.240.134.6
netmask 255.255.255.0
gateway 216.240.134.1

auto eth0:1
iface eth0:1 inet static
address 216.240.128.182
netmask 255.255.255.0
gateway 216.240.128.1

You can now tell the system to reconfigure the network (/etc/init.d/networking restart), and if it still doesn’t work (especially for the aliases), you can restart the DomU (xm reboot domU-name).

Now, if someone has an idea of why this tip works, I am really really interested to know. Because right now, it looks like magical stuff that I’m not even sure of how I discovered ;-)

11.22.06

Awstats, Libperl-Storage, and endianness (Byte Order) issues

Posted in Unix / Linux, Tips and Documentation at 9:12 am by skoobi

If you have recently migrated your AWstats to a different Architecture (Pentium4 to AMD64, for instance), awstats may report you the following error :

Warning: Error while retrieving hashfile: Byte order is not compatible at ../../lib/Storable.pm (autosplit into ../../lib/auto/Storable/_retrieve.al) line 331, at (eval 8) line 1

This is caused by the fact that awstats/perl caches DNS Entries in a machine-dependant way. Others have experienced the same problems, like on this forum.

The solution is actually quick. Let’s say your stats are in /stats, you can do :

find /stats -name ‘*.hash’ -exec rm {} \;

HOWTO: Apache2 + Awstats setup on Debian/Ubuntu (Edgy Eft)

Posted in Unix / Linux, Tips and Documentation at 9:07 am by skoobi

Here is a simple HOWTO explaining how to configure AWstats to analyze Apache2 logs, and provide detailed statistics, under Ubuntu Edgy Eft. This should also work for other Ubuntu versions, as well as any Debian derivative.

Apache

The first step is to activate Logging in Apache, so that Awstats has something to analyze. For instance, you can add something similar in your VirtualHost configuration :

ErrorLog /var/log/apache2/sirika.com-error.log
CustomLog /var/log/apache2/sirika.com-access.log combined

Another important thing is to configure a few things for awstats in apache, like where the icons are, and more importantly, to activate CGI-scripts (since AWstats is written in perl…) . This can be done thanks to the following /etc/apache2/conf/awstats.conf :

# This provides worldwide access to everything below the directory
# Security concerns:
# * Raw log processing data is accessible too for everyone
# * The directory is by default writable by the httpd daemon, so if
# any PHP, CGI or other script can be tricked into copying or
# symlinking stuff here, you have a looking glass into your server,
# and if stuff can be uploaded to here, you have a public warez site!

Options None
AllowOverride None
Order allow,deny
Allow from all
# This provides worldwide access to everything below the directory
# Security concerns: none known

Options None
AllowOverride None
Order allow,deny
Allow from all

# This provides worldwide access to everything in the directory
# Security concerns: none known
Alias /awstats-icon/ /usr/share/awstats/icon/

# This (hopefully) enables _all_ CGI scripts in the default directory
# Security concerns: Are you sure _all_ CGI scripts are safe?
ScriptAlias /cgi-bin/ /usr/lib/cgi-bin/

Awstats

The next step is to install awstats, with all the necessary perl modules.. Using several optional modules implies that you have installed them. liburi-perl is useful if you use the “decodeutfkeys” module, even though it is NOT listed as recommended or suggested in the awstats package.

sudo apt-get install awstats libnet-dns-perl libnet-ip-perl libgeo-ipfree-perl liburi-perl libnet-xwhois-perl

Once this is working, it is now necessary to configure awstats, to tell it which logs it should monitor, and where it should write its working files (stats). This is done by creating a file /etc/awstats/awstats.website.conf (replace website by your apache2 virtual host name, for instance, and do NOT forget the .conf !).

sudo cp /etc/awstats/awstats.conf /etc/awstats/awstats.website.conf

Editing this file should be pretty straightforward, since it is well commented. In particular, pay attention to the following entries

LogFile=”/var/log/apache2/sirika.com-access.log”

SiteDomain=”sirika.com”

HostAliases=”www.sirika.com”

DirData=”/srv/data/stats/sirika.com”

LogFile should point to the log file configured in Apache2.

SiteDomain is the main domain name, as configured in Apache2.

HostAliases should list ALL the aliases listed in Apache2’s VirtualHost configuration. Usually, you will want the same domain prefixed with www, (or without the www prefix if is it already specified in the main domain name). This is really annoying, error-prone and not having a global definition of a “virtual host” on the system is one of the issues I pointed in “10 things that still suck under linux“. Virtual hosts and aliases should be defined once, globally. Every single concept/thing should be repeated/configured once, and only once, on a perfect system. Anyways…

DirData should point to an empty directory, whose content will be managed by Awstats.
Cron

Everything is now configured. On a perfect system, the setup would stop now. Awstats could be automagically notified for every change in the logs (awstats could register to a so-called “http access event”, and its internal behaviour could define its policy for updating the stats (update synchronously, update asynchronously when the system is idle, etc..). This is one of the things I pointed out in “10 things that still suck under linux“. However, we’re not there yet, so, we need to run the script every day to update the stats. (yeah, that’s the reality today..)

So, this can be done thanks to the following /etc/cron.daily/awstats (do not forget to chmod +x /etc/cron.daily/awstats after creating it) :

#!/bin/sh

/usr/share/doc/awstats/examples/awstats_updateall.pl -awstatsprog=/usr/lib/cgi-bin/awstats.pl now > /dev/null

This will update all the statistics for all the hosts defined in /etc/awstats/*, on a daily basis. Yes, it’s not as beautiful as having a full-featured event-system for which every application could attach to events generated by others, but it has the merits of working…

Logrotate

What happens when your apache logs get rotated (and possibly gzipped, etc) by logrotate (apt-cache show logrotate for more information), and awstats still hasn’t analyzed the end of the logs that is about to be rotated ?

To avoid this situation, it is necessary to tell logrotate to launch awstats BEFORE rotating the logs. This can be done by adding the following lines to /etc/logrotate.d/apache2 :

prerotate
/etc/cron.daily/awstats
endscript

Permissions

And of course, permissions must be tweaked :

  • Since Awstats runs as the web users for viewing stats (CGI-script), the web user needs read access to /srv/data/stats/*
  • Additionnally, you may want to provide the “update now” button on your website stats. So, the web user also needs write access to /srv/data/stats/*
  • Finally, awstats needs access to the apache2 logs to create the stats. This is not a problem when it is run from a cron script, since it is run as root. But, in the case of “update now”, it runs as the web server, so the web server needs read access to its logs. (the default permissions are 660 with root:adm), so www-data doesn’t have access to its logs

The problem with traditional permissions is that there is no decent way of specifying default permissions. So, we are going to use ACLs for that. You can find more information about them here (Using POSIX ACLs to complement traditional Linux permissions). So, this gives, for instance :

# read write execute access for web user to the stats directories
find /srv/data/stats -type d -exec setfacl -m “g:www-data:rwx” {} \;

# read write execute access for FUTURE stats files for the web users

find /srv/data/stats -type d -exec setfacl -d -m “g:www-data:rwx” {} \;

# read write access to the stats files for the web user
find /srv/data/stats -type f -exec setfacl -d -m “g:www-data:rw-” {} \;

# read only access to the logs directory for the web user
find /var/log/apache2 -type d -exec setfacl -m “g:www-data:r-x” {} \;

# read only access to the logs for the web user, for future files

find /var/log/apache2 -type d -exec setfacl -d -m “g:www-data:r-x” {} \;

# read only access to the apache2 logs for the web user
find /var/log/apache2 -type f -exec setfacl -m “g:www-data:r–” {} \;

And it should work.. The last thing would be to protect access to your logs, if you don’t want your users to see them. This can be done using a .htaccess file, and there are plenty of tutorials on the web that explain how to achieve that.

11.18.06

The typical bleeding-edge Web Application Framework Stack

Posted in Java at 6:20 am by skoobi

Most developers now agree they should avoid heavyweight frameworks like EJB2. Maintenance is a nightmare, and more importantly, developers don’t enjoy anything that deals with the generation of stubs and skels (The same applies to technologies like CORBA).

So, what is the recommended technology stack for developing web applications today ? It looks like nowaday’s buzz goes toward :

  • Apache Maven2 for build-management, and managing profiles (development, integration, production) and its set of plugins which ease application development, like Jetty6 plugin, which avoids constantly redeploying wars to tomcat.
  • A so-called “lightweight” JEE framework, like the famous Spring Framework, that everybody raaaves about.
  • A Object Relational Mapping framework, like Hibernate, or any of the Java Persistence API implementations (Hibernate EntityManager, OpenEJB, or Oracle TopLink)
  • An MVC framework, which is either Action/Request Based (The 2 most popular ones being Spring MVC and Webwork2/Struts2 ), or Component Based (The 2 most popular ones being Apache Tapestry and Java Server Faces )
  • A security framework like Acegi to handle authentication and authorization.
  • In addition, other secondary frameworks are often used, like DWR for transparent AJAX, or the client side Dojo Toolkit Javascript framework.

From a methodology viewpoint, the current hype is about :

  • Test-Driven-Development, thanks to some Unit testing frameworks like JUnit+Easymock, and functional-testing framework like Canoo Webtest.
  • Inversion Of Control : Each technical (technical or functional) component must be described by an interface, and the inversion of control container - usually Spring - takes care of instantiating the right objects to avoid the need to create hundreds of Factories and Singletons).
  • Aspect-Oriented Development (AOP) : User code should not be polluted by cross cutting aspects such as Security, performance, Transaction Management, etc. AOP is what allows to factor out this kind of code into reusable aspects.
  • Domain Driven Development (DDD), to design rich domain POJOs. Since designing rich domain objects implies that the objects will be smart and will need to access technical objects like DAOs, and directly accessing the implementations would lower the testability of the application, DDD often means using something similar to Spring’s @Configurable annotation in addition to Spring’s AspectJ integration, to magically dependency-inject domain objects (which cannot be managed by spring, since they can be instantiated everywhere)

From my experience, integrating all these frameworks doesn’t work out-of-the-box. It is often necessary to play with the miscellaneous versions of each framework, and it sometimes causes headaches. Classpath issues and human errors are also a great cause of trouble.

If the proliferation of frameworks continues (and I believe it will) , integrated stacks (like AppFuse, but backed by commercial support) will emerge as a de-facto need for kickstarting developments. How many developers out there actually use the whole IoC/AOP/DDD/TDD fuzz ? By looking at the amount of problems I get by using all these frameworks, I have the feeling that not so many people do ……

A simple example of that, is for instance, the use of Spring AOP declarative Annotation-based (@Transactional) transaction management. Nobody tells you that you cannot use @Transactional on a Spring MVC controller, or that you have to activate CGLib proxying to use @Transactional on Webwork Actions. Sure, once you’ve run into the problem and analyzed the cause, the reasons seem obvious, but I still believe that something needs to be fixed with the integration of all these frameworks. Everything works as expected in isolation, but once you start to deal with dozens of frameworks, problems arise more often, and their cause isn’t always obvious. The access to the right frameworks should be eased to improve the overall quality of applications in the software industry.

Xen and SELinux : anything in common ?

Posted in Unix / Linux at 5:32 am by skoobi

Xen is definitely a great piece of software. It is currently the only viable (truly) Open Source solution to build secure virtual systems by isolating software in their own sandbox, and being able to set CPU/Memory restrictions on each of the sub systems.

However, each subsystem has to be managed and upgraded separately. This means that each subsystem is a (nearly) complete system that must be administrated in its own. Another aspect is User Management, since some users may need to be propagated. An LDAP repository can be used to avoid the ugly NIS-like propagation, but one needs to define a policy regarding how the users are laid-out in the directory and how the directory is used, since not all virtual machines may be accessed by all users… And user management also implies the usual sharing of /home, for which most people use the old and broken (though working ) NFS .
Monitoring is also an important topic in this area : open source monitoring solutions like OpenNMS must be leveraged in order to monitor all the servers. This is another layer of complexity, that isn’t necessarily needed.

So now, what I am wondering about is why all the buzz goes to Xen, and nobody really cares about SELinux (except maybe Red Hat which seems to provide decent SELinux support in its distribution). Ubuntu, in any case, does not seem to make SELinux its priority, as Michael Dolan highlights it.

Sure, Xen and SELinux are not meant to tackle the same problems. Xen is a virtualization layer, whereas SELinux is a security layer. However, the problem, I believe, is that people tend to use Xen to tackle security problems that SELinux could solve without the need of additional systems. Of course, for complex needs, Xen+SELinux could be envisionned, but the philosophy behind virtualization is that the system is dumb from a security perspective, whereas SELinux tries to fix the heart of the problems : making a multi-user system secure.

In fact, why would anyone want to setup of a full-blown virtual server just to run a DNS server, if some security stack could protect the rest of the system from being damaged in the case that the DNS daemon would get hacked ?

11.17.06

RAID 5 vs RAID 50 (informal comparison/benchmark)

Posted in Unix / Linux at 3:38 pm by skoobi

OK, so you just bought that wonderful 4U server, with dual-core Xeon/Core 2 Duo, redundant gigabit ethernet connections, redundant power supplies, KVM over IP / IPMI management card, and most importantly, this wonderful fully hardware-based RAID controller (Semi-Soft RAID has been a disaster for me, by the way).

Now, the question is : Should you use RAID 5, RAID 50 or something else ? Choosing between simple RAID solutions like RAID 0 vs RAID 1 is pretty easy, and one can easily conclude that RAID 0 is for performance (especially writes), whereas RAID 1 is for security. (even though it helps reads). Please note that RAID 1 is very expensive in disk space…
Now, what about the “standard” RAID 3, 4, 5, 5E, 6, 6E levels, or the “nested” ones such as RAID 0+1, 10, 30, 100, 50, 60 ?

In fact, the right choice depends on your specific needs (which is the lovely sentence everyone will use to avoid giving his opinion).

Now, in practice, my personal opinion on that :

  • RAID 3 and RAID 4 are only theoretical RAID levels that nobody actually uses.
  • RAID 5E, RAID 6 and RAID 6E are probably very good RAID levels, but the RAID card I use doesn’t support it, so I suspect these levels to be only available in higher-end models. If you are lucky enough to try one of those, I would be happy to hear about the performance.
  • Most of the other nested RAID levels either are not supported by hardware cards, or are very expensive to implement. So, if you can spend money for them, their evaluation might be good.

After these considerations, and considering that you want a certain balance between security, performance and price, you might want to compare RAID 5 and RAID 50, which are both good compromises between these concerns.

To put it simply let’s consider you have 8 disks. RAID 5 stripes the data over 7 disks, and uses the 8th one for parity checking, so that it can rebuild the data if one disk fails (this is a little simplistic, since no disk is dedicated to partity and the parity blocks are rotated over the disks, but you get the idea).

RAID 5+0 creates 2 RAID arrays with half of the disks, and creates a RAID 0 array (stripping) on top of that. As a result, it is more costly, since 2 disks are used for parity checking instead of one, but is more robust to failures (2 failures are allowed).

It is often said that RAID 5+0 is more efficient for writes… So I wanted to compare how better it is, if ever it is, on a 3ware 9550SX 8 port SATA II RAID controller with write cache enabled, and 8 SATA 300 GB Maxtor DiamondMax 10
So, the result is a set of informal benchmarks, that are probably not related to the way I am going to use the disk anyways, but that has the merit of drawing quick conclusions :
First of all, using RAID 5

sudo dd if=/dev/sda of=/dev/null bs=1M count=1024
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 32.7535 seconds, 328 MB/s

dd if=/dev/zero of=out bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 88.8292 seconds, 121 MB/s

dd if=./out of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 45.3929 seconds, 237 MB/s

dd if=/dev/zero of=ok bs=1M count=1
500 MB / sec
dd if=ok of=/dev/null
630 MB / s
dd if=/dev/zero of=ok bs=1 count=1
40/43 kB / sec (Yes, KB)

dd if=ok of=/dev/null
0+1 records in
0+1 records out
1 byte (1 B) copied, 1.5e-05 seconds, 66.7 kB/s

dd if=./out of=/dev/null
20971520+0 records in
20971520+0 records out
10737418240 bytes (11 GB) copied, 49.5834 seconds, 217 MB/s

dd if=/dev/zero of=ok bs=1k count=1
44 MB / sec

dd if=ok of=/dev/null
2+0 records in
2+0 records out
1024 bytes (1.0 kB) copied, 1.7e-05 seconds, 60.2 MB/s

rm -f ok ; dd if=/dev/zero of=ok bs=1k count=4
4+0 records in
4+0 records out
4096 bytes (4.1 kB) copied, 3.4e-05 seconds, 120 MB/s
120/ 130 MB /s

rm -f ok ; dd if=/dev/zero of=ok bs=1k count=16
16+0 records in
16+0 records out
16384 bytes (16 kB) copied, 8.4e-05 seconds, 195 MB/s

time ( for (( i=0; i < 100000 ; i++ )) ; do dd if=/dev/zero of=$i.out bs=1k count=1 2> /dev/null; done )

real 10m0.205s
user 1m45.403s
sys 8m2.134s

time ( for (( i=0; i < 100000 ; i++ )) ; do dd if=/dev/zero of=$i.out bs=1 count=1 2> /dev/null; done )

real 3m19.841s
user 1m43.242s
sys 1m34.582s

time ( for (( i=0; i < 100000 ; i++ )) ; do dd if=/dev/zero of=$i.out bs=1 count=1 2> /dev/null; done )

real 10m52.149s
user 1m46.703s
sys 9m4.218s

time ( for (( i=0; i < 1000 ; i++ )) ; do dd if=/dev/zero of=$i.out bs=1
count=1 2> /dev/null; done )

real 0m20.931s
user 0m1.068s
sys 0m3.260s

And now, parts of the results on RAID 50 :

dd if=/dev/sda of=/dev/null bs=1M count=10240
Password:
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 40.1916 seconds, 267 MB/s

dd if=/dev/zero of=out bs=1M count=10240
10240+0 records in
10240+0 records out
10737418240 bytes (11 GB) copied, 117.762 seconds, 91.2 MB/s

dd if=/dev/zero of=ok bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.085963 seconds, 12.2 MB/s

dd if=ok of=/dev/null
2048+0 records in
2048+0 records out
1048576 bytes (1.0 MB) copied, 0.001741 seconds, 602 MB/s

dd if=/dev/zero of=ok bs=1 count=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 3.5e-05 seconds, 28.6 kB/s
dd if=ok of=/dev/null
0+1 records in
0+1 records out
1 byte (1 B) copied, 1.7e-05 seconds, 58.8 kB/s

dd if=/dev/zero of=ok bs=1k count=1
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 3.4e-05 seconds, 30.1 MB/s

dd if=ok of=/dev/null
2+0 records in
2+0 records out
1024 bytes (1.0 kB) copied, 1.9e-05 seconds, 53.9 MB/s

time ( for (( i=0; i < 1000 ; i++ )) ; do dd if=/dev/zero of=$i.out bs=1M count=1 2> /dev/null; done )

real 0m15.759s
user 0m1.068s
sys 0m3.180s

time ( for (( i=0; i < 100000 ; i++ )) ; do dd if=/dev/zero of=$i.out bs=1k count=1 2> /dev/null; done )

real 9m46.112s
user 1m44.779s
sys 7m59.798s

So, my conclusion (which I repeat, is not rocket science) is that RAID 50 isn’t that much better than RAID 5, and is sometimes actually worse, though more expensive. As a result, my personal choice has been RAID 5.

11.04.06

Funny quote: Java vs PHP

Posted in Uncategorized at 1:25 pm by skoobi

The  Can Java CMS match PHP ones article contains a funny, yet sadly true, sentence :

Maybe while the Java world was engaged in talking of high end, super techie stuff, with the words ‘enterprise’, ‘transactions’ and ‘SOA ‘embedded in every sentence, the PHP guys actually went out and created a lot of simple yet very useful software.

And we can also cite the following article : Why I hate frameworks which goes in the same direction, and is pretty funny too ;-)