Code Breeze !: October 2015

Friday, October 30, 2015

How to use embedded Java DB (Derby) in maven project

Sheng Wang 12:10 AM Database, Java SE, maven 4 Comments

For now, Java DB is actually just Apache Derby with a different name. In the following of the article, we will call it Derby. It comes with JDK installation. ( Although what's normally used in maven project is not the same binary install in local JDK directory)

Using embedded Java DB means the database will run in the same JVM as your application. The Java DB engine actually gets started when you try to connect to it by JDBC. When the application exits, the database also exits. If you choose to run the Java DB total in memory, when the JVM stops, the data will be gone. Or you can choose to store the data on local file system to make them usable during multiple runs.

Java DB (Derby) is mostly used for convenience in development. No external database is needed even you have code need to play with RMDB.

0. What you need

JDK 6+ (JDK 7 in this demo)
Maven 3.2 +

1. POM file

There's only one dependency needed to use Derby database.

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
  xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>

  <groupId>com.shengwang.demo</groupId>
  <artifactId>javadb-derby-embedded-basic</artifactId>
  <version>1.0</version>
  <packaging>jar</packaging>

  <name>javadb-derby-embedded-basic</name>
  <url>http://maven.apache.org</url>

  <dependencies>
    <dependency>
      <groupId>org.apache.derby</groupId>
      <artifactId>derby</artifactId>
      <version>10.8.3.0</version>
    </dependency>
  </dependencies>

  <!-- Use Java 1.7 -->
  <build>
    <plugins>
      <plugin>
        <groupId>org.apache.maven.plugins</groupId>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>2.5.1</version>
        <configuration>
          <source>1.7</source>
          <target>1.7</target>
        </configuration>
      </plugin>
    </plugins>
  </build>
</project>

Since the version of Java DB comes with JDK 7 is 10.8.3.x, we also use 10.8.3.0 in our maven project. If just using in-memory database(see below), the version used in maven actually doesn't matter. But if database file stores on file system and you want to check the database content after application finishes, you'd better use the same version as from JDK, so the 'ij' tool in $JAVA_HOME/db/bin can open the database without version conflicts.

2. When the embedded database starts

The database starts when you java code try to connect to it by using standard JDBC. How the derby work depends on the way to connect to it, or in other words, depends on the connection url. Suppose we need to connect to a database named 'demo'.

In-memory database, url looks like: jdbc:derby:memory:demo;create=true

'demo' is the database name and can be any string you choose, "memory" is a key word to tell Derby to goes to all-in-memory mode.

File-based database, url looks like: jdbc:derby:c:\Users\shengw\MyDB\demo;create=true

'c:\Users\shengw\MyDB\demo' is the directory to save database files on local file system. (On windows its actually jdbc:derby:c:\\Users\\shengw\\MyDB\\demo;create=true because of the String escaping)

'create=true' is a Derby connection attribute to create the database if it doesn't exist. If use in-memory database, this attribute is mandatory.

3. A complete example

This is a complete hello world level example using embedded Derby database in Maven project. The HelloJavaDb.java lists below.

package com.shengwang.demo;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;

public class HelloJavaDb {
  Connection conn;

  public static void main(String[] args) throws SQLException {
    HelloJavaDb app = new HelloJavaDb();

    app.connectionToDerby();
    app.normalDbUsage();
  }

  public void connectionToDerby() throws SQLException {
    // -------------------------------------------
    // URL format is
    // jdbc:derby:<local directory to save data>
    // -------------------------------------------
    String dbUrl = "jdbc:derby:c:\\Users\\shengw\\MyDB\\demo;create=true";
    conn = DriverManager.getConnection(dbUrl);
  }

  public void normalDbUsage() throws SQLException {
    Statement stmt = conn.createStatement();

    // drop table
    // stmt.executeUpdate("Drop Table users");

    // create table
    stmt.executeUpdate("Create table users (id int primary key, name varchar(30))");

    // insert 2 rows
    stmt.executeUpdate("insert into users values (1,'tom')");
    stmt.executeUpdate("insert into users values (2,'peter')");

    // query
    ResultSet rs = stmt.executeQuery("SELECT * FROM users");

    // print out query result
    while (rs.next()) { 
      System.out.printf("%d\t%s\n", rs.getInt("id"), rs.getString("name"));
    }
  }
}

The demo uses derby database, creates a table 'users', inserts 2 rows into the table and prints the query result set. The whole maven project hierarchy is :

After running the HelloJavaDb example, you can verify the database, because we are not using derby in all-in-memory mode. After running, the database files will appear on local file system like this.

If you connect to the database in command line, you can see the 2 rows add from your Java code. 'ij' is a tool provided by Derby works as a sql client.

How to use Java parallel Fork/Join framework - Hello world example

Sheng Wang 11:54 PM Concurrency, High Level Concurrency, Java SE 3 Comments

Since Java 7, fork/join framework has be introduced in to Java API. The main difference between fork/join and other multi-threading mechanism like executor or thread pools is that: traditional multithreading focuses on "let every task has the chance to run simultaneously", fork/join framework focuses on "saturate the CPU usage, make full use of the hardware resources".

0. Why fork/join?

So what's the problem for traditional thread pools, or why does fork/join need to be introduced since there are already many ways for working in parallel?Fork/join framework normally used with single task, which is BIG. Let's suppose the CPU has 4 cores. To make max usage of the quad-core CPU, you don't want any core idle while others are still busy with some work. But If you split the big task into 4 subtasks and give them to each core to run using the traditional thread pools, when every thread terminate, a core will become idle while the rest cores are still struggling. It's a kind of waste of CPU resources. The fork/join framework can prevent this from happening. Every core keeps busy until the whole job is done. When a thread(CPU core) is idle or done with it's workload, it will try to help other threads instead of just sit there doing nothing. In fork/join framework, it's called work stealing.

Work stealing is the key feature of the fork/join framework.

1. Working theory

In brief, the fork/join framework also use a thread pool, java.util.concurrent.ForkJoinPool, but unlike traditional thread pool, every thread in this pool has a queue. Every thread can access each other thread's queue. The queue of each thread can be treat as a work load buffer.

1.1 What happens when firstly a thread get a task

The fork/join framework starts in this way, suppose the first thread get the task is call Thread A:

step 1. When thread A gets a task, if it's small enough, do the real calculation; if still big, the task will be cut into 2 subtasks.

step 2. The thread A will keep working on one of the subtasks, and put the rest into thread A's queue (its own queue).

step 3. An Idle thread, thread B, can take subtasks out from thread A's queue, which is called work stealing. Then on thread B, same process repeats from step 1

After a big task submits to one thread initially, it will soon propagated to ALL threads of the fork/join thread pool. Something's worthy to mention is that Thread A will keep recursively cut task->queue 1/2->cut 1/2 task-> queue 1/4 -> cut 1/4-> queue 1/8...... until the task is small enough. Recursion is also a feature of fork/join framework.

By default the fork/join thread pool will has threads size exactly same as the available threading unit that you CPU can run simultaneously. For example a Quad-core CPU with Hyper-Threading(2 threads on each physical core), the pool will has 4*2 = 8 threads. So after task has been given to the fork/join pool, all threads/ all CPU will be occupied.

1.2 What happens when any threads finish a subtask

If thread X gets a task and splits it into 2 subtasks, it puts half in to its queue and starts working on the other half. When the second half is done, it will try to check if the first half is done.

if the first half is done, then it can continue to work stealing.
if the first half has been stolen and processed by other thread, thread X has to wait until this half finish.
if the first half is still in the queue, thread X will start to process the first half itself recursively, which means cut the first half, queue 1/4 and work on the other 1/4.

2. Java API

In API level, when to put a subtask into the queue, call fork(). when to process a some pieces of work ,call compute(). when to wait for rest to finish call join(), These 3 methods are key methods of the fork/join framework, which are also where the framework's name comes from.

Always call fork() before compute() and join() so other threads can have the chance to help sharing the workload

In package java.util.concurrent, there are 4 classes key to fork/join framework.

ForkJoinPool - Thread pool for fork/join framework. Implements ExecutorService interface.
ForkJoinTask - Abstract class, has fork() and join() method, as parent class for the next 2 children.
RecursiveTask - Abstract class extends ForkJoinTask, only abstract method is compute()
RecursiveAction - Abstract class extends ForkJoinTask, only abstract method is compute()

The only difference between RecursiveTask and RecursiveAction is that RecursiveTask's compute() has return, but RecursiveAction's compute() doesn't. (Task has return, action doesn't)

3. Demo

We have big char array, 100M items. Every item in this array is one upper case letter from A-Z. The application tries to count how many letter 'A' in this big array. By using fork/join framework, the array will be divided into small area for each thread to go through. Let's first see the main class.

package com.shengwang.demo;

public class ForkJoinDemo {
  private static final int ARRAY_SIZE = 100_000_000;
  private static char[] letterArray = new char[ARRAY_SIZE];

  private static int countLetterUsingForkJoin(char key) {
    int total = 0;
    ForkJoinPool pool = new ForkJoinPool(); // create thread pool for fork/join
    CountLetterTask task = new CountLetterTask(key, letterArray, 0, ARRAY_SIZE);
    total = pool.invoke(task); // submit the task to fork/join pool

    pool.shutdown();
    return total;
  }

  public static void main(String[] args) {
    char key = 'A';
    // fill the big array with A-Z randomly
    for (int i = 0; i < ARRAY_SIZE; i++) {
      letterArray[i] = (char) (Math.random() * 26 + 65); // A-Z
    }

    int count = countLetterUsingForkJoin(key);
    System.out.printf("Using ForkJoin, found %d '%c'\n", count, key);
  }
}

The main class is simple, main() first fill a big array with random upper case letters, then call the countLetterUsingForkJoin(), in which a ForkJoinPool is created and task submit to it. After finishing whole task and get the final result, the pool shuts down and result returned. The task class CountLetterTask is the kernel of this demo and it's shown below.

package com.shengwang.demo;

import java.util.concurrent.RecursiveTask;

class CountLetterTask extends RecursiveTask<Integer> {

  private static final long serialVersionUID = 1L;
  private static final int ACCEPTABLE_SIZE = 10_000;
  private char[] letterArray;
  private char key;
  private int start;
  private int stop;

  public CountLetterTask(char key, char[] letterArray, int start, int stop) {
    this.key = key;
    this.letterArray = letterArray;
    this.start = start;
    this.stop = stop;
  }

  @Override
  protected Integer compute() {
    int count = 0;
    int workLoadSize = stop - start;
    if (workLoadSize < ACCEPTABLE_SIZE) {
      // String threadName = Thread.currentThread().getName();
      // System.out.printf("Calculation [%d-%d] in Thread %s\n",start,stop,threadName);
      for (int i = start; i < stop; i++) {
        if (letterArray[i] == key)
          count++;
      }
    } else {
      int mid = start + workLoadSize / 2;
      CountLetterTask left = new CountLetterTask(key, letterArray, start, mid);
      CountLetterTask right = new CountLetterTask(key, letterArray, mid, stop);

      // fork (push to queue)-> compute -> join
      left.fork();
      int rightResult = right.compute();
      int leftResult = left.join();
      count = leftResult + rightResult;
    }
    return count;
  }
}

Let's go through class CountLetterTask. It extends RecursiveTask<Integer> which mean final result of the task is an Integer. To avoid creating copy of the original big array, the reference of the big array will be send in as a constructor parameter. The current task size is defined by the start(inclusive) and stop(exclusive) index in the array. The criteria to say whether the current task is small enough is defined as a constant variable ACCEPTABLE_SIZE. Here when the subtask deal with part of the array less than 10k is considered as "small enough".

The most interesting part is the compute() method, it first checks if the current task is smaller enough, if so, do the real calculation. If not, the array range will be divided into 2 parts. One task becomes two subtasks, each is also a CountLetterTask instance. Put the first part into queue then call compute() on the second half. The task will be recursively cut small until it's "small enough". Then call the join() to make sure whole task is done. Remember fork() has to run before compute() and join()

4. Run

From the screenshot, CPU resources are fully used for the big task. ( Since the task will only take less than 30ms also on my PC to finish, the screenshot actually comes from a even bigger array running in a loop for many times)

In practical the most used thread methods before thread.start()

Sheng Wang 8:25 PM Concurrency, Java SE, Low Level Concurrency No Comments

The java.lang.Thread class is the core of the low level multi-threading in Java. Almost every java developer know start a new thread is very simple. first, create a instance from Thread or subclass of Thread, then call start() method of that instance. WALA, the new thread is in runnable state and ready to run.

In a nontrivial project, very possibly there are some other methods need to be invoked after you create the thread instance and before start() get called. Let's take a look at these candidates in a hello world level demo.

package com.shengwang.demo;

class MyDummyTask implements Runnable {

  @Override
  public void run() {
    String threadName = Thread.currentThread().getName();
    System.out.printf("Start working in %s\n", threadName);
    throw new RuntimeException("some thing wrong");
  }

}

public class MyThread {

  public static void main(String[] args) throws InterruptedException {
    MyDummyTask r = new MyDummyTask();
    Thread t = new Thread(r);

    //-----------------------------
    // opt 1. set thread name
    //-----------------------------
    t.setName("helloThread");

    //-----------------------------
    // opt 2. set Exception handler 
    //-----------------------------
    t.setUncaughtExceptionHandler(new Thread.UncaughtExceptionHandler() {

      @Override
      public void uncaughtException(Thread t, Throwable e) {
        System.out.println(formatException(t, e));
      }
      
      private String formatException(Thread t, Throwable e) {
        StackTraceElement[] ste = e.getStackTrace();
        String exceptionRootLocation = ste.length > 0 ? ste[0].toString() : "Root cause not specified";
        
        StringBuilder sb = new StringBuilder();
        sb.append("Thread Exception [");
        sb.append(t.getName());
        sb.append("]: ");
        sb.append(e.toString());
        sb.append(" at <");
        sb.append(exceptionRootLocation);
        sb.append(">");
        return sb.toString();
      }

    });

    //-----------------------------
    // opt 3. set daemon 
    //-----------------------------
    t.setDaemon(true);
    
    t.start();
    t.join();
  }
}

1. Set thread's name

This is the most possible thing you may want to do before start the new thread by calling setName(). The default thread name will be 'Thread-n' if create by hand or 'pool-x-thread-y' if created by default ThreadFactory of Execuors. It will be very much helpful to set it a meaningful name.

2. Set the exception handler

In production, it's not rare to see a thead in multi-thread application raises exception. This exception need to be handled or logged to make the application get improved. The developer should set a customized handler to make this done using setUncaughtExceptionHandler()

In Java when uncaught Exception happens in thread, the order to handle the exception is thead's uncaughtExceptionHandler (like we set in the demo above)-> thread group's uncaughtException() method -> default uncaught exception handler. You can choose which layer you want to cut in.

3. Set Daemon

This is much more optional. It depends on your application. JVM will wait for any non-daemon thread to finish before application can finish. So if sub threads' interruption is not your concern, You an set them to daemon so you can quite the application fast.

4. More

In real project, the concurrency is based more on java.util.concurrent.ExecutorService instead of using creating Thread directly. So a customized ThreadFactory will always be used to set the properties on the thread instances before thread instances get involved in any kind of thread pools.

Understand Java primitives wrapper class comparison

Sheng Wang 10:47 PM Java SE No Comments

Java has a wrapper class for every primitive types, Such as Integer to int, Boolean to boolean, Short to short, Character to char and etc. Also we all know that when compare two java objects, "a==b" only check if a and b refer to the same objects, while "a.equals(b)" normally can compare the real value that instance a and b really presents. That is true.

So in short, always use equals() to compare primitives wrapper class.

In this article, let's use some demos to help you get a clearer understand.

All demos used in this article have been tested under JDK 1.7.

Demo 1

package com.shengwang.demo;

public class DemoMain {

  public static void main(String[] args) {
    Integer i1,i2;
    
    i1 = new Integer(5);
    i2 = new Integer(5);

    System.out.println(i1==i2);  // false, i1,i2 refer to 2 objects
  }
}

This demo is very straitforward, since i1 and i2 refer to 2 different Integer instances, the "i1==i2" condition will return false.

Demo 2

package com.shengwang.demo;

public class DemoMain {

  public static void main(String[] args) {
    Integer i1,i2;
    
    i1 = 5;
    i2 = 5;

    System.out.println(i1==i2);  // true, why?
  }
}

This demo 2 is very similar to demo 1, but use i1=5 to initial the Integer instance instead of using keyword new. The output turns out that i1==i2 is true, why? Because in java, an internal java.lang.Integer instance will be created automatically before assign constant int 5 to an java.lang.Integer variable. This is called auto-boxing in Java. In order to save memory, there will be only one internal instance if the primitive value is same. Which means i1 and i2 do point t the same internal Integer object. So the output of demo 2 is true.

Let's step a little bit further.

package com.shengwang.demo;

class ClassA {
  public Integer i1 = 5;
}

class ClassB {
  public Integer i2 = 5;
}

public class DemoMain {

  public static void main(String[] args) {
    System.out.println(new ClassA().i1 == new ClassB().i2); // true again
  }
}

It still uses primitive constant to initial wrapper class instance by auto-boxing. The same rule applies even variables are in different class instances.

Recap

Only use equals() to compare java primitives wrapper class instances.
If you wonder why sometimes "==" also get the appeared right result? That's because JVM try to save memory, auto-boxing for constant always using the same internal object if the primitive value is same.

Use Java regular expression for multiline string

Sheng Wang 8:48 PM Java SE, Regular Expression No Comments

There are 2 points worth to point out when using Java regular expression to parse multi-line string.

The period "." doesn NOT match line break, unless you set extra flag to your pattern instance.
The "^" and "$" mean different in single(default) and multi line mode.

1. period "." does not match line break

Usually the period "." is used to match any thing, but by default it doesn’t match line break, such as "\r","\n". A Patten flag Pattern.DOTALL need to be set explicitly to make it match line break. This is important to know when you use matches(). All String/Pattern/Matcher Class have matches() method which try to match the whole input string with the regex pattern. Since "\n" will break patterns like ".*", then the whole string match breaks, you may get unexpected result.

    String input = "11abc22\n33abc44";  //multi-line input
    String reg = ".*abc.*";
    
    Pattern p0 = Pattern.compile(reg);
    Matcher m0 = p0.matcher(input);
    System.out.println(m0.matches()); // print false 
  
    Pattern p1 = Pattern.compile(reg,Pattern.DOTALL); // set DOTALL flag
    Matcher m1 = p1.matcher(input);
    System.out.println(m1.matches()); // print true

The regex input is a multi-line string. Since the "\n" doesn't fit in pattern ".*" default, the first m0.matches() call returns false.

To make the period "." also matches line break, add Pattern.DOTALL flag to the pattern.

2. Meaning of "^" and "$" in single and multi line mode

The default mode for Java regular expression is single line mode. Use Pattern.MULTILINE to turn on multi-line mode.

The meaning of "^" and "$" in single and multi line mode.

	In default mode	In multi-line mode
^	The beginning of the whole input String	The beginning of every line.
$	The end of the whold input String	The end of every line.

Let's check the following demo code RegexMultiline.java for better understanding.

package com.shengwang.demo;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMultiline {
  // ##x=6##
  // ##x=8##
  static String input = "##x=6##\n##x=8##";
  
  public static void main(String[] args) {
    
    String reg = "^.*x=(\\d+).*$";
        
    // case 1
    System.out.println("--> single line mode"); // default single-line
    Pattern p1 = Pattern.compile(reg);
    searchPatternInMultiLineText(p1);
    
    
    // case 2
    System.out.println("--> single line mode with DOTALL"); // single-line + dotall
    Pattern p2 = Pattern.compile(reg,Pattern.DOTALL);
    searchPatternInMultiLineText(p2);
    
    
    // case 3
    System.out.println("--> multi line mode");
    Pattern p3 = Pattern.compile(reg, Pattern.MULTILINE); // multi-line mode
    searchPatternInMultiLineText(p3);

  }
  
  public static void searchPatternInMultiLineText(Pattern p) {
    boolean isFound = false;
    
    Matcher m = p.matcher(input);
    while (m.find()) {
      System.out.println("x="+m.group(1));
      isFound = true;
    }
    
    if (!isFound) {
      System.out.println("No pattern found");
    }
  }
}

Run this demo and the result is:

--> single line mode
No pattern found
--> single line mode with DOTALL
x=8
--> multi line mode
x=6
x=8

Let go through the demo code. There are 3 cases try to match the same input text with the same pattern "^.*x=(\\d+).*$". The only differences among them are the flags for the pattern instances.

Case 1, pattern instances in single mode, (default mode with no extra flag), The ^ and $ match the beginning and ending of the whole input text. Since "\n" can not fit in pattern ".*", in fact nothing in pattern matches "\n", so the pattern match will fail.

Case 2, single mode with DOTALL flag, ".*" now can cover "\n". The ^ and $ match the beginning and ending of the whole input text. Since default is greedy matching, so the first ".*" will try to consume as much as possible, the whole input will have only one match.

Case3, in multi-line mode. The ^ and $ match the beginning and ending of every line. Since there are 2 lines in the input String, there are 2 matches, one for each line.

Be careful when using Java regular expression for multi-line input text match, same input and same pattern can get different results when working in different modes.

Case-insensitive regular expression in Java

Sheng Wang 6:45 PM Java SE, Regular Expression No Comments

The java.util.regex.Pattern has several flags which are very helpful. One of them is CASE_INSENSITIVE

Just add this flag when creating pattern instance. Everything else follows normal regex usage.

Here is a hello world level demo for case-insensitive regular expression in Java.

package com.shengwang.demo;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexCaseInsensitive {

  public static void main(String[] args) {
    String input = "ABC";
    String regex = "abc";
    boolean  matchResult;
    Pattern p = Pattern.compile(regex, Pattern.CASE_INSENSITIVE); // add flag here
    Matcher m = p.matcher(input);
    matchResult = m.matches();
    System.out.println("matchResult="+matchResult); // print "matchResult=true"
  }
}

In fact case-insensitive regular expression can also be enable by add embedded flag expression(?i). But I personally prefer using Pattern.CASE_INSENSITIVE flag for code readability.

For most basic regular expression Java API usage, check out article “The basic regular expression API in Java”

The basic regular expression API in Java

Sheng Wang 1:07 AM Java SE, Regular Expression No Comments

The basic regular expression in Java involved 2 classes, java.util.regex.Pattern and java.util.regex.Matcher. The most basic usage has 3 steps.

Create a pattern instance
Create a matcher instance from pattern instance. The String under test is set in this step
Check match result using methods from matcher

The following demo code is a hello world level example on how to use Java regular express API.

package com.shengwang.demo;

import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class HelloRegex {

  public static void main(String[] args) {
    String input = "Hello World 123#456";
    // -------------------------------------------
    // Step 1, define Pattern instance
    // -------------------------------------------
    Pattern p = Pattern.compile("(\\d+)"); // use static method

    // -------------------------------------------
    // Step 2, define matcher instance
    // -------------------------------------------
    Matcher m = p.matcher(input); // create matcher from pattern

    // -------------------------------------------
    // Step3, use loop to go through every match
    // -------------------------------------------
    while (m.find()) {
      System.out.println("" + m.group(1));
    }
  }
}

In this demo, we use method find() to do the real match, there are 2 other method matches() and lookingAt(). All these 3 methods return boolean indicates match success or fail. The differences among them are:

matches() try to match the whole input.
find() try to match a substring of the input.
lookingAt() try to match a substring of the input must at beginning of the input.(Think it as a startWith operation)

When using Pattern + Matcher class, normally means we want to do some manipulation on the matched result. If you only want to get a boolean result to verify a input String, there is no need to use class Pattern + Matcher, use method matches() from String class instead.

  String input = "888ABC999";
  boolean  matchResult = input.matches("\\d+ABC\\d+"); // match whole string

Code Breeze !