As Hibernate official document says:
Hibernate have two orthogonal notions here: when is the association fetched and how is it fetched.
There are many words on different document to describe ORM fetch like fetch strategy, fetch type, fetch mode, but in essence just the concept of WHEN and HOW.
In this article, a One-To-Many example will be used to demonstrate the concept. A Client entity and a PurchaseOrder Entity are used in the demo code. One client can have many purchase orders. There are 5 clients in database and every client has 10 orders.
0. Environment and version
- Java 7 (JPA2.0)
- Hibernate 4.x
The following example and discuss are based on this environment.
1. Concept in short
1.1 WHEN to fetch
- Eager : Assemble SQL and access database immediately when an entity is created. JPA compatible
- Lazy - A wrapper class called 'proxy' is used for the entity, when an entity is created. The database access only happens when entity's value gets used. JPA compatible. Default behavior.
- Extra Lazy - Try not to fetch the whole collection into memory unless absolutely needed. Not JPA compatible
When means at what time will hibernate assemble a SQL clause to access the database.
1.2 How to fetch
If a entity have associations, how the associations of the entity get loaded. In our Client-To-PurchaseOrder(One-To-Many) demo, when a client entity is loaded, how to get his purchase orders.
- Join - Use SQL outer join to get associations. Be careful about the return list, it may not like what you expect, see the demo below. JPA compatible. Default behavior for Eager.
- Select - Use a separate select to get only 1 entity's association, will cause the 'N+1' problem. JPA compatible. Default behavior FOR Lazy.
- Batch - Use a separate select to get a batch size entities' association. Can be treated as a kind of improvement of 'Select' Not JPA compatible
- Subselect - Use a separate select to get all associations for all entities retrieved in a previous query. Not JPA compatible
JPA compatible means in java code, only classes or annotations from javax.persistence.* are used, no need to import any thing from org.hibernate.*. Hibernate can just be used as persistence provider in JPA configure file persistence.xml, which makes the the application not close coupled with Hibernate.
2. Code for demo
2.1 The Entity definitions
The entity PurchaseOrder is plain. Nothing's tricky here.
package com.shengwang.demo.model;
import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.JoinColumn;
import javax.persistence.ManyToOne;
import javax.persistence.Table;
@Entity
@Table(name="purchase_order")
public class PurchaseOrder {
@Id
@GeneratedValue(strategy=GenerationType.IDENTITY)
@Column(name="order_id")
private int orderId;
@Column(name="order_desc")
private String orderDesc;
@ManyToOne(cascade=CascadeType.PERSIST)
@JoinColumn(name="client_id")
private Client client;
public int getOrderId() {
return orderId;
}
public String getOrderDesc() {
return orderDesc;
}
public Client getClient() {
return client;
}
public void setOrderId(int orderId) {
this.orderId = orderId;
}
public void setOrderDesc(String orderDesc) {
this.orderDesc = orderDesc;
}
public void setClient(Client client) {
this.client = client;
}
}
All magic happens in the Entity Client. which has a Set variable orders associated with PurcharseOrder to represent the one-to-many mapping. We will change the annoatation for Set variable orders to get different fetch when-how combinations.
package com.shengwang.demo.model;
import java.util.HashSet;
import java.util.Set;
import javax.persistence.CascadeType;
import javax.persistence.Column;
import javax.persistence.Entity;
import javax.persistence.FetchType;
import javax.persistence.GeneratedValue;
import javax.persistence.GenerationType;
import javax.persistence.Id;
import javax.persistence.OneToMany;
import javax.persistence.Table;
@Entity
@Table(name="client")
public class Client {
@Id
@GeneratedValue(strategy=GenerationType.IDENTITY)
@Column(name="client_id")
private int clientId;
@Column(name="client_name")
private String clientName;
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.LAZY,mappedBy="client")
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
public int getClientId() {
return clientId;
}
public String getClientName() {
return clientName;
}
public Set<PurchaseOrder> getPurchaseOrders() {
return orders;
}
public void setClientId(int clientId) {
this.clientId = clientId;
}
public void setClientName(String clientName) {
this.clientName = clientName;
}
public void setPurchaseOrders(Set<PurchaseOrder> orders) {
this.orders = orders;
}
}
2.2 demo code snippet
Following code snippet is used to show how the different fetch combinations effect. We'll keep running this code on different when-how fetch combinations to demonstrate differents effects
@Override
@Transactional
public void printOrdersForEveryClient() {
List <Client> clients;
Criteria criteria;
criteria = sessionFactory.getCurrentSession().createCriteria(Client.class);
clients = criteria.list(); // checkpoint-1
System.out.println("clients.size="+clients.size());
for (Client client: clients) {
System.out.println("client : "+ client.getClientId());
System.out.println(""+client.getPurchaseOrders().size()); // checkpoint-2
Set <PurchaseOrder> orders = client.getPurchaseOrders(); // checkpoint-3
for (PurchaseOrder order : orders) {
System.out.println(order.getOrderId()+","+order.getOrderDesc());
}
}
}
The method printOrdersForEveryClient() is supposed to print out purchase orders for all clients one by one. The comments checkpoint1-4 is added as anchors to locate where the database accessing happens.
3. Behaviors of When+How fetch combinations
Let's try different When-How fetch combinations on the one-to-many(client-to-purchase order) mapping, see what really happens. All the following SQL clauses are NOT the original Hibernate generated ones, but simplified to make them more understandable.
3.1 Eager + Join
Changing the definition of Client entity, set when to eager by "fetch=FetchType.EAGER", set how to join by @Fetch(FetchMode.JOIN). Since join is the default behavior for eager, annotation @Fetch(FetchMode.JOIN) can be omitted.
@Entity
@Table(name="client")
public class Client {
// omit other fields
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.EAGER,mappedBy="client",cascade=CascadeType.PERSIST)
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
// omit getters & setters
}
The database accessing happens only in 1 place:
select a.*, b.* from client a left outer join purchase_order b on a.client_id=b.client_id at checkpoint-1
Suppose there are 5 rows in client table and every client have 10 purchase orders in purchase_order table. The result list size at checkpoint-1 is not 5, but 5x10 = 50. This will make the method printOrdersForEveryClient() doesn't work as design.
3.2 Lazy + Select (default behavior)
The default when is lazy, and the default how for lazy is select.
The database accessing happens in 2 places:
select * from client at checkpoint-1
select * from purchase_order where client_id=? at checkpoint-2
Since the checkpoint-2 is in the for loop, every loop will issue a sql query to database, perfectly demonstrate the so-called 'N+1' problems.
You may also notice that although lazy fetch is used, the database accessing immediately happens when querying the clients. That's because the lazy fetch only works for query entity by primary key.
3.3 Eager+ Select
Changing the definition of Client entity, set when to eager by "fetch=FetchType.EAGER", set how to join by @Fetch(FetchMode.SELECT)
@Entity
@Table(name="client")
public class Client {
// omit other fields
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.EAGER,mappedBy="client",cascade=CascadeType.PERSIST)
@Fetch(FetchMode.SELECT)
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
// omit getters & setters
}
The database accessing happens only in 1 place, but N+1 SQL clauses:
select * from client at checkpoint-1
select * from purchase_order where client_id=? at checkpoint-1
select * from purchase_order where client_id=? at checkpoint-1
select * from purchase_order where client_id=? at checkpoint-1
select * from purchase_order where client_id=? at checkpoint-1
select * from purchase_order where client_id=? at checkpoint-1
The query in java also creates N+1 SQL clauses, but all at same time. One to get all clients and 5 to get orders for each client. (There are 5 clients in the client table)
3.4 Lazy + Batch
Changing the definition of Client entity, set when to lazy by "fetch=FetchType.LAZY", set how to batch by @Fetch(FetchMode.SELECT) and @BatchSize(size=3) . Since select is the default behavior for lazy, annotation @Fetch(FetchMode.SELECT) can be omitted.
@Entity
@Table(name="client")
public class Client {
// omit other fields
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.LAZY,mappedBy="client",cascade=CascadeType.PERSIST)
@BatchSize(size=3)
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
// omit getters & setters
}
The database accessing happens in 2 places:
select * from client at checkpoint-1
select * from purchase_order where client_id in (?, ?, ?) at checkpoint-2
Unlike the select, which accesses database N+1 times, batch fetch accesses database (N/batchSize +1 ) times. In our demo, there are 5 clients in the table, batch size is 3, total database accessing counts = celling(5/3)+1=3.
3.5 Eager + Batch
Changing the definition of Client entity, set when to eager by "fetch=FetchType.EAGER", set how to batch by @Fetch(FetchMode.SELECT) and @BatchSize(size=3)
@Entity
@Table(name="client")
public class Client {
// omit other fields
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.LAZY,mappedBy="client",cascade=CascadeType.PERSIST)
@Fetch(FetchMode.SELECT)
@BatchSize(size=3)
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
// omit getters & setters
}
The database accessing happens in 1 place:
select * from client at checkpoint-1
select * from purchase_order where client_id in (?, ?, ?) at checkpoint-1
Compare to Lazy+Batch, Eager+Batch create same amount SQL and exactly same SQL clauses. but will no wait for the first time accessing contents of client entity, but immediately load all clients' associations when query client entity.
3.6 Lazy + Subselect
Changing the definition of Client entity, set when to lazy by "fetch=FetchType.LAZY", set how to subselect by @Fetch(FetchMode.SUBSELECT)
@Entity
@Table(name="client")
public class Client {
// omit other fields
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.LAZY,mappedBy="client",cascade=CascadeType.PERSIST)
@Fetch(FetchMode.SUBSELECT)
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
// omit getters & setters
}
The database accessing happens in 2 places:
select * from client at checkpoint-1
select a.* from purchase_order a where a.client_id in (select b.client_id from client b) at checkpoint-2
Unlike the select(N+1) or batch (N/batchsize+1), subselect only access database once to get all associations for the previous query. The first time in the loop cause the second SQL and rest of the loop won't access database anymore. Totally the code accesses database twice.
3.7 Eager+ Subselect
Changing the definition of Client entity, set when to eager by "fetch=FetchType.EAGER", set how to subselect by @Fetch(FetchMode.SUBSELECT)
@Entity
@Table(name="client")
public class Client {
// omit other fields
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.EAGER,mappedBy="client",cascade=CascadeType.PERSIST)
@Fetch(FetchMode.SUBSELECT)
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
// omit getters & setters
}
The database accessing happens in 1 place:
select * from client at checkpoint-1
select a.* from purchase_order a where a.client_id in (select b.client_id from client b) at checkpoint-1
Compare to Lazy+Subselect, Eager+Subselect create same amount SQL and exactly same SQL clauses. but will no wait for the first time accessing contents of client entity, but immediately load all clients' associations when query client entity.
3.8 Extra Lazy+ Batch
Changing the definition of Client entity, set when to extra lazy by "fetch=FetchType.LAZY" and @LazyCollection(LazyCollectionOption.EXTRA), set how to batch by @Fetch(FetchMode.SUBSELECT) and @BatchSize(size=3)
@Entity
@Table(name="client")
public class Client {
// omit other fields
//---------------------------------
// This is where the magic happens
//---------------------------------
@OneToMany(fetch=FetchType.LAZY,mappedBy="client",cascade=CascadeType.PERSIST)
@LazyCollection(LazyCollectionOption.EXTRA)
@Fetch(FetchMode.SELECT)
@BatchSize(size=3)
private Set<PurchaseOrder> orders = new HashSet<PurchaseOrder>();
// omit getters & setters
}
The database accessing happens in 3 places:
select * from client at checkpoint-1
select count(order_id) from purchase_order where client_id= ? at checkpoint-2
select * from purchase_order where client_id in (?, ?, ?) at checkpoint-3
Compare to Lazy+Batch, Extra Lazy+Batch create one more SQL clause to avoid getting all collections contents back when possible. Extra Lazy is not supported by other persistence providers like EclipseLink. In EclipseLink developer has to manually define a query to get the collection's size without get all contents back.
4. More to notice
The batch size doesn't mean how many rows will be fetched from the associations, but means how many associations to be fetched. For example using batch fetch, if there are 1000 purchase orders for each client, batch size = 3, every batch SQL will fetch 3 clients' associations, totally 3000 rows, from purchase_order table. If only part of the collections are needed, pagination is the solution.