Exchange 2010/2013: Database Leveling Script

It is common to randomly choose mailbox databases when creating or migrating user mailboxes in Exchange. I actually recommend this practice unless you are setting up a tiered user/storage environment. Unfortunately this may result in an unequal distribution of data which, in turn, can result in an environment where mailbox databases are wildly different in size. In this post I will discuss an approach to leveling the databases so they are equal in size by moving mailboxes between them.

Imagine if you will, a scenario where you are given several buckets of indefinite depth filled with random objects of varying weight. How do you go about making each bucket as close to one another in weight by moving the least amount of objects from one bucket to another? This sounds like something directly out of a computer logic 101 course or possibly even a bad interview question but it is actually has some real world value. One of the practical applications of this would be leveling out an Exchange DAG or set of databases so that they are equal in size (perhaps after adding a new database).

I’ve tucked this question away mentally for a few years now with the intention of eventually searching online for the algorithm which must already exist for this scenario. This chilly holiday season, while the children and wife slept soundly with dreams of Christmas treats, I stayed awake and finally started looking for some algorithmic solution to this issue. I dusted off my math neurons and started white boarding out what needs to be done. To my surprise the solution was more complicated than I initially imagined.

Note: If you don’t care at all about how this works then you can blissfully ignore everything below. Here is a shortcut to the code instead.

Leveling Two Bins

To start, it is good to know the basic steps you might follow to automatically redistribute objects with the fewest moves possible between two bins with the end goal of making the bin overall sizes equal. This will allow us to come up with some general rules and, in the process, add some more requirements we didn’t realize we had.

For each of these examples I’ve included the total, average, and the absolute difference of the bins at each move.

Example 1

Here is an easy example where the bins are filled unequally and which do not redistribute evenly.

For the first move we postulate our first rule. Move the largest object possible from the larger bin to the smaller bin. So the first move is logical, move the object weighing 5 from bin 2 to bin 1.

This reduces the difference between the bins down from 12 to 2 so it is a great start. Following our prior logic, let’s go ahead and move another object from Bin 2 to Bin 1:

Uh oh, after this the difference between the two bins has actually gone up so we have gone too far. The second rule becomes obvious; move the largest object from the larger bin to the smaller bin as long as it does not result in the bin size difference increasing. In this case the only object we could move was 5, which is greater than the difference between the two bins so we wouldn’t even try to move it.

So in this example we stop at move 1 as we can no longer move any more elements from the larger bin to the smaller bin without increasing the difference between the two bins.

Example 2

Here we have a few more objects per bin. The first steps we’ve already seen in example 1.

At the third move, were we to follow our prior rule, we would end up in an endless loop as we would be moving the same element back to bin 2 that we moved to bin 1 in the prior steps. So, instead, we move over the next element from bin 1 to 2 which has not been previously moved. This adds another rule; once an object has been moved between bins it is invalidated for future moves.

If we follow this through to the end the bins come out even. Here is one more rule which probably doesn’t need to be stated but…. If the difference between two bins becomes zero or there are no valid items which can be moved between bins we are done processing.

Example 3

Here is a simplified version of example 3. Following our previous rules the very first move leaves us in a position where there is no change in the absolute difference between bins.

If you look at the bin totals after the move has been performed in Step 1 you will see that Bin 1 becomes larger than Bin 2. In these cases wouldn’t it make sense to attempt to swap an element (if it exists) from Bin 1 which is less than this future difference created by Step 1? Well it is actually very important we do so in these scenarios. This is largely for when we apply this algorithm to more than two bins.

This gives us another rule which sounds more complex than it actually is: When moving an object from one bin to another causes the smaller bin to become greater than the larger bin then attempt to swap back another object which is less than or equal to the future difference between the bins.

If you have been reading up to this point and following along you can go back to example 2 and reprocess with our additional rules to come up with the same results in a few less cycles (but same number of object moves).

Example 4

In this example let’s start out with wildly unbalanced bins and perform the first move.

After the first move we have followed all of the rules, are done processing, and have the fewest number of moves possible to get the bins as equal as we possibly can get them to be. But in order to get things equal we had to move one object which is 25 times larger than all the other objects combined.

Logically it makes sense to move the one largest object as it gets the results we are looking to achieve. But realistically I’d rather expend less effort moving more small items between the two bins than worry so much about moving the fewest number of elements overall. As long as the results are the same in the end.

This gives us one more objective we weren’t originally looking for when we started. Our updated objective is to move as few objects as possible between the bins without surpassing a predefined percentage of total bin effort threshold.

So in our prior example, if the predefined effort threshold is the average bin size (where any object in any bin is ignored if it is larger than or equal to the average bin size), then only the two smaller items get moved. This leaves the larger item in bin 2 but causes one more move to occur, which we are fine with.

Flowchart – Leveling 2 Bins

If we use the logic we have deduced to this point for leveling two bins we end up with a fairly simple flow chart. It is worth adding that the choice to not move any object which is equal to or greater than the average bin size is variable. Playing around with this number may be required to get the best results.

For the most part this will allow us to balance out any set of bins with the same results in a predictable manner. Unfortunately, adding in more bins complicates the entire process so we are not done yet.

Leveling More Than Two Bins

If you start adding more and more bins you still need some of the same bits of information to make decisions from when determining which objects you will need to move. The most important information will be the size differences between them. You would think that this would create very large tables of differences as you get more and more bins. But actually the table size is relatively small. You only need unique differences between each pair of bins. For those who like their combinational algebra the equation for the number of elements in our table goes something like this:

n = Number of bins

r = Number of bins to find unique combinations between (in our case 2)

$\binom{n}{r} = \frac{n!}{r!(n-r)!}$

So for three bins we end up with a table of differences totaling a whopping three elements. If you were to go up to 5 bins, that increases to 10. If you had 10 bins then it only increases to 45 differences. Anyway, you get the picture. If we were attempting to level thousands of bins then it would be beneficial to look into breaking down the bins into reasonable subsets based on their average size (differentiating between bins which are below and above average and possibly combining them in subsets together based on your need). Then using multithreaded code to do the processing.

In my case I’m only looking to get a reasonable approximation of the best way to level no more than several bins so I’m not going to overcomplicate the solution. As it is easy to manually deduce the best way to level 3 bins that is what I’ll be using for my examples.

Example 1

I’ve included the absolute difference in size between each unique set of bins. The very first conundrum we run into is which set of bins to process first as there are two sets which have the same absolute difference. The answer is that it doesn’t matter at all so I simply start with bins 1 and 2.

If I follow our prior flowchart then we will move the 4 object from bin 2 to bin 1 first. This will cause bin 1 to become greater than bin 2 so we evoke our swap rule. The swap takes place between the bins as there is an object in bin 1 which is less than the future difference between the bins after the initial move from bin 2.

After the first step has been performed we recalculate all the differences where there are bins which were involved in step 1. In an example of three bins all combinations of bins will be affected by any single object swap (It is important to note that this may not always be the case, as you add more bins there are more combinations which are not affected by a single move between bins).

The very next step occurs between bin 3 and 1 as it has the largest difference of the three combinations. We do the same thing we did in step one where we move an object from bin 3 to bin 1, evoke our swap rule due to bin 1 becoming larger than bin 3, and swap back an object. When we are done with step 2 we have fully leveled our bins in only 4 total object moves.

Script

Using these examples and some powershell trickery I’ve come up with a method of leveling databases in exchange by moving mailboxes between them. There are some interesting techniques employed which may benefit a curious crowd which are not Exchange administrators as well. You can download the most recent version of this script from the Microsoft technet gallery. As always I welcome feedback.

1. 1:28 AM, 09/18/2018

Hello! Thanks for your script. My problem ist, that the ExchangeMailboxData.csv ist filled but there is just an empty Mailbox-Moves.txt. When running “Original database size information:” is filled for any database by 0

Exchange 2010, 2 DAG, about 8000 mailboxes

Where could be the problem? 🙁

2. 7:49 AM, 02/22/2018

Hi Zachary,
Great idea with that script.
I’m currently running it against 28 db’s with 1700 mailboxes. The hardware is pretty competent but the script has been running for almost 5 hours now. Is that to be expected?

• 8:37 AM, 02/23/2018

It can take days depending on the mailbox sizes. I’d run it on a separate workstation if you aren’t already. I’ve never gone back to optimize this script either, I’m sure there are faster bin packing algorithms out there for this kind of thing (hell, I know there is). Sorry it is rather sluggish anyway. If it does complete please shoot me a note to let me know how it turns out for ya.

Cheers,
Zach

• 10:05 AM, 02/23/2018

It finished this morning.
Now I have excluded a large database with inactive mailboxes and the script finishes in about 30 minutes which is OK for me.
I’m experimenting with filtering out system- health- and discovery mailboxes (doesn’t work with the $IGNORED_MAILBOXES = @(‘HealthMailbox’,’Discovery Search Mailbox’,’SystemMailbox’) filtering) Also, I have problems with users with identities with special characters like æ, ø and å in them. Can I change the script to use alias in stead of display name? • 10:32 AM, 02/23/2018 Hold your horses! I think I have solved it by changing this line (added “-encoding Unicode”:$Mailboxes | Export-Csv -NoTypeInformation -encoding Unicode $DataFile The Display names in ExchangeMailboxData.csv look OK now, I’m waiting for the distribution phase to finish before I can see if the commands in Mailbox-Moves.txt are OK too. • 12:56 PM, 02/23/2018 The unicode did it: the new-moverequest-commands are fine now. Also: the filtering of system- and health mailboxes works too, I didn’t know it happened AFTER ExchangeMailboxData.csv was created. • 7:49 PM, 02/23/2018 Glad it worked for you with a bit of modification. Thanks for sharing your tips and tricks as well. 3. 3:47 PM, 11/04/2015 So just to make sure, this script wont attempt to move any mailboxes correct? I am confuse my the New-MailboxRequest Command in the script and items that are not commented out. Please help • 1:18 PM, 11/05/2015 No, it just spits out a file with move request commands you can run should you decide to do so. 4. 12:57 AM, 08/21/2015 doesn’t work for me. Getting same error as Jonas 5. 9:08 AM, 05/27/2015 Thanks for the script! Works like a charm! I had to add the following lines though to make it work: ‘-Encoding unicode’ to the Export-CSV command since my usernames contained Swedish letters and the following line at the beginning of the Mailbox-Moves to make it complete PS script: Add-PSSnapin Microsoft.Exchange.Management.PowerShell.E2010 Thanks a million! You rock! • 3:09 PM, 05/27/2015 And this is why I continue to release my scripts 🙂 Thanks for reporting on the changes needed to work in your environment, I really appreciate it! 6. 4:17 AM, 01/28/2015 Hi, nice script But When i run i get thiss erro, ples help! We have 64 DB’s in a DAG 16000 users. Exception calling “Add” with “2” argument(s): “Key cannot be null. Parameter name: key” At C:\temp\GU-New-ExchangeRebalancingReport.ps1:346 char:5 +$Databases.Add($DB,$DBSize)
+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~
+ CategoryInfo : NotSpecified: (:) [], MethodInvocationException
+ FullyQualifiedErrorId : ArgumentNullException

WARNING: ConvertTo-HashArray: Something weird happened! – Index operation failed; the array index evaluated to null.
Average database size: NaN
Original database size information:

Future database size information:

Total Moves Required = 0
Mailboxes which should be moved:

• 12:54 PM, 05/20/2015

7. 9:54 AM, 10/22/2014

This is by far one of the most useful scripts that I have run across. My colleague and I were discussing this issue and were about to embark on this journey until I ran across your site. Thanks again.

• 3:44 PM, 10/22/2014

Glad you find it of use, of all the scripts I have written I’m oddly most proud of this one (even though it really needs some polish). Let me know if you are able to improve upon it or are able to use it in your own environment effectively.

8. 3:06 PM, 10/19/2014

Does this script work for databases that are not in DAG? I have 4 databases on the same server and I want to balance mailboxes between those databases. I tried running this script but I get errors.

• 3:45 PM, 10/22/2014

Hello, It should work with some minor modifications. I’ll try to add that in the next version of this script.

9. 6:52 PM, 01/28/2014

Thanks Zachary!, just found your site from another script you posted (firewall requirements script). I am currently in the process of rebalancing my databases, but working on a much grander scale, 100,000 mailboxes across multiple client databases. For my purposes, I will want to use this logic across only a specific subset of my mailboxes (specific department OU, or only specific dbs). ‘That’ looks like an easy modification.. 🙂

• 8:50 AM, 01/29/2014

Hey Eric, I’d be extremely interested to know how well the script works out for you. Please shoot me an email so we can discuss further and I’ll consider us even 🙂 [email protected]

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Follow