Updated: Nov 19, 2020
A first question you probably have when you read about creating millions of records is "Who really needs to create millions of records?" Sometimes it's not "millions"; it's anything between a few thousand to hundreds of thousands of records. But the need is the same: a flexible tool that can insert (and delete of course) many SObject records and will allow:
Companies of any size create sandboxes for User Acceptance Testing (UAT).
AppExchange ISV/Consulting partners create orgs with sample data for demos or for a realistic simulation of their app.
Testers or business users generate their testing data in a sandbox.
Architects create Large Data Volumes (LDV) for stress testing of their designs.
So, what is needed to start working with it?
An anonymous window to execute Apex scripts.
A Lightning component to monitor the progress.
Forceea inherently supports Templates, so the Template creation process is simpler. A Template will not create data; it's a "description" of the structure of the data we want to create. When we construct a Template we define:
The SObjects that will be created.
The number of records of each SObject.
What fields will be populated.
The structure of field values.
A Template is a Map<String, FObject>, so our Template will start with the initialization of this Map:
Map<String, FObject> template = new Map<String, FObject>();
Before starting our Template we should have a good understanding of the SObjects and fields we need, what are the relationships between the SObjects and what data we want for each field.
Here are our (hypothetical) requirements:
Record type: the record type with name MajorAccount.
Name: Account-1, Account-2, etc.
Industry: any picklist value except Banking and Services.
AnnualRevenue: a random integer number between 1M and 10M.
Rating: any picklist value.
Type: any random value between Prospect, Customer and Analyst.
Shipping address: any (real) address from U.S.
Record type: the record type with name BigOpp.
Name: <Account> - <text>, where <Account> is the name of the related account and <text> is a text of random words between 20 and 40 chars.
Amount: a random number between 10K and 1M, rounded to nearest 100.
StageName: any picklist value except Closed Won and Closed Lost.
Type: New Business.
CloseDate: any date between 1 Jan. 2020 and 30 June 2020.
AccountId: the 1st account to the 1st opportunity, the 2nd account to the 2nd opportunity and so on. If we have no more accounts, start from the 1st account, then to the 2nd, etc.
For every 1 account we're going to create 10 opportunities.
First, we "add" the Account definitions in our template:
template.put('Accounts', new FObject(Account.SObjectType) .setNumberOfRecords(10) .setDefinition(Account.Name, 'static value(Account-)') .setDefinition(Account.Name, 'serial type(number) from(1) step(1) scale(0)') .setDefinition(Account.Industry, 'random type(picklist) except(Banking,Services)') .setDefinition(Account.AnnualRevenue, 'random type(number) from(1000000) to(10000000) scale(0)') .setDefinition(Account.Rating, 'random type(picklist)') .setDefinition(Account.Type, 'random type(list) value(Prospect,Customer,Analyst)') .setDefinition(Account.ShippingStreet, 'random type(street) group(shipping)') .setDefinition(Account.ShippingPostalCode, 'random type(postalCode) group(shipping)') .setDefinition(Account.ShippingCity, 'random type(city) group(shipping)') .setDefinition(Account.ShippingState, 'random type(state) group(shipping)') .setDefinition(Account.ShippingCountry, 'random type(country) group(shipping)') );
The order of the field definitions is important! Forceea generates the values for the first field definition, then for the second, etc.
The Name field has 2 definitions. The first generates the same (static) value "Account-" and the second serial numbers (1,2,3,..)
We "grouped" all address definitions in order to "link" the correct street to the correct city, postal cod, etc.
If we had a Billing address, we could copy the value from the Shipping, e.g.
setDefinition(Account.BillingCity, 'copy field(ShippingCity)')
Now we are going to set the Opportunity definitions:
template.put('Opportunitites', new FObject(Opportunity.SObjectType) .setNumberOfRecords(100) .setDefinition(Opportunity.AccountId, 'serial lookup(Account) mode(cyclical) source(forceea)') .setDefinition(Opportunity.Name, 'copy field(AccountId) from(Account.Name)') .setDefinition(Opportunity.Name, 'static value(" - ")') .setDefinition(Opportunity.Name, 'random type(text) minLength(20) maxLength(40)') .setDefinition(Opportunity.Amount, 'random type(number) from(10000) to(1000000) scale(2)') .setDefinition(Opportunity.StageName, 'random type(picklist) except(Closed Won,Closed Lost)') .setDefinition(Opportunity.Type, 'static value(New Business)') .setDefinition(Opportunity.CloseDate, 'random type(date) from(2020-01-01) to(2020-6-30)') );
The FObjectAsync class
Now we can proceed with the actual insertion of records. Our main tool is the FObjectAsync class.
When we insert or delete records asynchronously, Forceea uses Queueable Apex to execute one or more jobs. These jobs have some higher governor limits (e.g. 60,000ms total CPU time and 200 SOQL queries), which is definitely positive for our data generation needs.
If you think "I'm going to create x accounts and y opportunities", forget this way. Forceea works with iterations! An iteration is the number of records (for each SObject) defined in the Template we use. Our template creates 10 accounts and 100 opportunities, so 1 iteration will create 10 accounts and 100 opportunities.
Another important detail is Partitioning, which has two parts:
Template: you define the Partition field for each SObject with the method setPartitionFieldName.
FObjectAsync: you define the Partition field value for all SObjects with the method setPartitionFieldValue.
The Partition field value should be a string which will identify (or "partition") the inserted records. As a best practice, use a value with a few characters, even a single letter (uppercase or lowercase).
When inserting records, Forceea checks:
If there is a Partition field defined in each SObject.
If there is a Partition field value.
If both conditions are valid, Forceea will insert the value in the partition field of each record.
So, let’s say that the Partition field for Account is ForceeaPartition__c and the Partition field value is df. In this case, Forceea will insert the value: • df1 into the records inserted in Job 1. • df2 into the records inserted in Job 2. • df3 into the records inserted in Job 3. etc.
Insert records asynchronously
Now we are going to insert 1,000 iterations, so we’ll insert 1,000 x 10 = 10K accounts and 1,000 x 100 = 100K opportunities.
Open an Anonymous Apex window and enter the following lines:
new FObjectAsync(template) .setNumberOfIterations(1000) .setNumberOfJobs(20) .setPartitionFieldValue('df') .insertRecords();
The default number of (parallel asynchronous) jobs is 30. Here we require 20 jobs.
The partition value is "df".
Execute the code and then go to the Data Factory tab of the Forceea Lightning app.
In the Log panel Forceea displays information about the execution of each job.
The Messages panel contains an overview of the async process.
The Progress panel will let you know how many iterations have been inserted.
Finally, the Job Status panel displays a visual indication of the status for each job (black: pending, green: successful, red: failure, orange: terminated).
Forceea will follow this procedure during the async insertion process:
Benchmarks the operation by inserting 1 iteration in the first batch. The transaction is rolled back, so it doesn’t permanently insert any records.
Executes the second batch of any job, which creates and insert records of each SObject defined in the Template, with as many iterations as possible (remember the benchmarking).
If there are no errors and there are more iterations to be inserted, a third batch is created, and so on.
When all iterations assigned to a job have been inserted, the job ends with a successful completion.
When we have a serial definition, Forceea will insert the records without any gaps in the serialization!
Delete records asynchronously
The deletion process follows almost the same logic:
new FObjectAsync(template) .setNumberOfJobs(20) .setPartitionFieldValue('df') .deleteRecords();
Execute the above Apex code and then go to the Data Factory tab to watch the progress. Forceea will follow these steps during the async deletion process:
Reverses the order of SObjects in the Template, so the last SObject will get the first position, etc.
If all SObjects in the Template have a Partition field and FObjectAsync has a Partition field value, a number of jobs are enqueued for parallel processing (each job will delete all records of different partitions), otherwise it enqueues only 1 job (no partitioning).
The deletion starts from the SObject in the first position, executing the first batch of each job, which benchmarks the transaction to calculate the maximum number of records that can be deleted in every batch. This first benchmarking batch deletes up to 200 records.
If there are no errors and there are more records to be deleted, a second batch is created after the completion of the first batch, and so on.
When all SObject records assigned to a job have been deleted, the job moves to the second SObject, etc.
Important: if Forceea finds in the Template a definition for the RecordTypeId field of an SObject, it will delete the records of this Record Type only.
Forceea will stop the execution of a job when an error is encountered, except from the errors related to record locking, where it will raise an error only after the 5th occurrence of the UNABLE_TO_LOCK_ROW error.
Using existing lookup records
Forceea will take care of all the complex orchestration of the asynchronous process. The parallel processing offers an advantage, but it's based on the assumption that we won't query any existing records from the database, otherwise we may have record locking.
For example, if we have a custom SObject Language__c and we have the lookup field Language__c on Opportunity, to get random IDs for this field we would use:
setDefinition(Opportunity.Language__c, 'random lookup(Language__c) source(salesforce)')
If the above definition raises the UNABLE_TO_LOCK_ROW error (unable to obtain exclusive access to this record), then your only option is to use 1 job only with setNumberOfJobs(1).
Nobody can say that data generation is simple or without issues. Under the hood, the data generation process is quite complex, but it shouldn't be to the user; Forceea will gracefully handle all the complexity.
I strongly believe that an admin, a tester or even a business user, with no Apex knowledge, can insert/delete records asynchronously using FObjectAsync and existing Templates, which a developer or advanced admin could create.
This article was published at Nerd @ Work