Jump to content

Reading in external text files...


cdaters

Recommended Posts

I have a question.

 

In some rules that I have, I am reading in a tab-delimited external data file containing some 60K+ records in it with the following code:

 

var transactionList = new ExternalDataFileEx("/jobs/job1/data_files/transaction_history_data.txt", "\t");
var recipient_id = Field("Recipient ID");

for (var i = 0; i <= transactionList.recordCount; i++) {

 var idVariable = transactionList.GetFieldValue(i, "Recipient ID");
 var dateVariable = transactionList.GetFieldValue(i, "Date");
 var paymentVariable = transactionList.GetFieldValue(i, "Payment Amount");

 if (recipient_id == idVariable) {
   matches.push(dateVariable + paymentVariable);
 }

}

return matches.join("<br>\n");

 

Does it make since to do it this way, maybe load it into JavaScript Globals so that I can use it across multiple rules?

 

Or would it make more since to load it in as a text file resource and then reference it in my rule like:

 

var transactionList = Resource("transactionHistoryFile");

 

I am then running this through an Array matching up relative data between each database (my main input file to external data file) using a "key" column in each and returning a list that shows a sort of "transaction history" for each record in my input file.

 

My main input file is my mail-list/recipient file containing 9K+ records, or recipients. My external data file is a file of about 60K+ records of transaction data, We are generating letter/statements for each recipient.

 

As you might imagine, it is taking a while to compose. I am chunking these out into 500 record PDFs for printing and I am trying to get an idea of a better way that might help speed composition up...?

Link to comment
Share on other sites

There are a couple of inefficiencies I can see in your code. First, you have to iterate all 60K records of the XDF for every 9K records of your main data. Multiplying 60,000 by 9000, you're talking over half a billion (with a "B") iterations overall! Also, on each iteration, you're fetching the date and payment amount even if you're not actually adding them to your result array. So you're calling transactionList.GetFieldValue over 1.5 billion times.

 

This should speed things up quite a bit:

var matches = [];
var transactionList = new ExternalDataFileEx("/jobs/job1/data_files/transaction_history_data.txt", "\t");
var RecipientCursor = transactionList.SortBy("Recipient ID");
var recs = RecipientCursor.FindRecords(Field("Recipient ID"));
for (var i in recs)
{
 var dateVariable = transactionList.GetFieldValue(recs[i], "Date");
 var paymentVariable = transactionList.GetFieldValue(recs[i], "Payment Amount");
 matches.push(dateVariable + paymentVariable);
}

return matches.join("<br>\n");

The cursor caches the numbers of the records which match the selected field to various values, so that you don't have to iterate all the records again each time, and you don't have to do anything at all with the records that don't match.

 

I don't think there will be any additional advantage to moving this to the JavaScript Globals or OnRecordStart, since external data files and cursors are automatically cached for the whole job.

 

Adding the XDF as a resource won't speed anything up either, but it will ensure that the XDF gets collected up with the job, which can be handy.

 

Beyond that, you could speed things up more if you could combine the two data files into a single multi-line record data file.

Edited by Dan Korn
Link to comment
Share on other sites

Dan, thank you for your help. I have had to tweak a couple of things to work between the data that I have and the other (transaction total) rule that I have also incorporated with your help.

 

Here are some interesting observations.

 

On our now primary VDP person's Mac running macOS High Sierra (Mac Mini) with 8GB RAM, this actually took a little longer than with your code above (using my original rules) but let's just call it a wash and say that a 500 record output file took about an hour!

 

On my PC, using those same earlier rules, it took a little over 30 minutes for the same result.

 

Using the streamlined code you were able to assist me with, I now get a 500 record output file in 6 minutes! (mind blown!)

 

So this job actually consists of about 23,000 records (mail recipients) with a second transaction history database containing a little over 65,000 records of transaction data. Some recipients had only 1 record of transaction history while others maybe had 30 or 40 records of transaction history. I was brought into this project (Monday) and by yesterday, we were utilizing 1 PC and 2 Macs to power through this job (it had to drop in the mail and be date-stamped by today), Ugh!

 

Needless to say, I am trying to move our primary VDP person over to a PC to do this work moving forward instead!

 

Thank you for you help!

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...