Go Back   MarcomCentral (PTI) and FusionPro User Communities > Software-Related Talk > The JavaScript Library > Formatting Text

Notices

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old February 20th, 2020, 01:08 PM
dbarbee dbarbee is offline
Junior Community Member
 
Join Date: Jul 2012
Location: Minnesota
Posts: 13
Default Pulling text from large tagged document

I'm setting up a variable letter that we merge frequently. I'm provided a Microsoft Word document that is completely static. I plan on copy and pasting this into a FusionPro text box, which preserves the formatting.

There are elements within this Microsoft Word Document that I would like to use in other parts of the variable letter. I am attempting to create these variables as JavaScript Global Variables. Attached is my code (In JavaScript Globals) so far:

Code:
var frame = FindTextFrame('Letter');
var letter = frame.content.split('</para>');
var lines = [];

//Strips out tagged text formatting in letter, and places it in 'lines' variableName
for (var i=0; i<letter.length; i++){
    if (letter[i] != '')
        lines.push(Trim(RawTextFromTagged(letter[i])));
	}

lines = lines.filter(String); // tidy's up array

//Assumes Date is in 3rd line of letter. Would like make more robust. i.e. find line that matches "Wednesday, April 1, 2020";
var EventDate = lines[2];

//Looks for the Company Name, and returns the two lines following.
for (var i=0; i<lines.length; i++){
    if (lines[i] == 'Company Name'){
        var EventAddress = lines[i+1];
        var EventCity = lines[i+2];
        break;
        }
    }

//Looks for "Guest" and returns everything after the colon;
for (var i=0; i<lines.length; i++){
    if (lines[i].search('Guest') == 0){
        var PresenterLine = lines[i].split(':');
        var GuestSpeaker = PresenterLine[1];
        break;
        }
    }

//Looks for a phone number... not working. Needs to match "(###)###-####." because it's usually at end of sentence.
for (var i=0; i<lines.length; i++){
    var words = lines[i].replace(') ',')').split(' '); //Removes space after parentheses so phone number ends up as one word.
    for (var j=0; j<words.length; j++){
        if (words[j].match(/\(?[\d]{3}\)?[\d]{3}?[\d]{4}$\./)){
            var CompanyPhone = words[j];
            break;
            }
        }	
    }
The biggest issue is I'm having trouble matching the phone number. Most of the time, the phone number is in the format "(###)###-####" but can deviate slightly. It's always at the end of the sentence, so it will end up with a period at the end.

I'm also wondering if there is a better way of extracting the date from this letter. It will always be on its own line in the format: "Wednesday, April 1, 2020"

Last edited by dbarbee; February 20th, 2020 at 01:22 PM..
Reply With Quote
  #2  
Old February 20th, 2020, 02:11 PM
Dan Korn's Avatar
Dan Korn Dan Korn is offline
FusionPro Senior Engineer / Forum Moderator
 
Join Date: Aug 2008
Location: Chicago, IL
Posts: 4,504
Default Re: Pulling text from large tagged document

Sounds like a interesting project.

If you could post any kind of example, at the very least the Word document, but preferably your collected template, that would make it a lot easier to follow what you're doing and offer specific suggestions, not just for how to extract the data from the Word document, but also for how to apply the extracted values in other places in the job.
__________________
Dan Korn
FusionPro Developer / JavaScript Guru / Forum Moderator
PTI Marketing Technologies | Printable | MarcomCentral
LinkedIn

I am a not a Support engineer, and this forum is not a substitute for Support. My participation on this forum is primarily as a fellow user (and a forum moderator). I am happy to provide help and answers to questions when I can; however, there is no guarantee that I, or anyone else on this forum, will be able to answer all questions or fix any problems. If I ask for files to clarify an issue, I might not be able to look at them personally. I am not able to answer private messages, emails, or phone calls unless they go through proper Support channels. Please direct any sales or pricing questions to your salesperson or inquiries@marcom.com.

Complex template-building questions, as well as all installation and font questions or problems, should be directed to FusionProSupport@marcom.com. Paid consulting work may be required to fulfill your template-building needs.

This is a publicly viewable forum. Please DO NOT post fonts, or other proprietary content, to this forum. Also, please DO NOT post any "live" data with real names, addresses, or any other personal, private, or confidential data.

Please include the specific versions of FusionPro, Acrobat, and your operating system in any problem reports or help requests. I recommend putting this information in your forum signature. Please also check your composition log (.msg) file for relevant error or warning messages.

Please post questions specific to the MarcomCentral Enterprise and Web-to-Print applications in the MarcomCentral forum. Click here to request access. Or contact your Business Relationship Manager (BRM/CPM) for assistance.

Please direct any questions specific to EFI's Digital StoreFront (DSF) to EFI support.

How To Ask Questions The Smart Way

The correct spellings are JavaScript, FusionPro, and MarcomCentral (each with two capital letters and no spaces). Acceptable abbreviations are JS, FP, and MC (or MCC). There is no "S" at the end of "Expression" or "Printable"! The name of the product is FusionPro, not "Fusion". "Java" is not is not the same as JavaScript.

Check out the JavaScript Guide and JavaScript Reference! FusionPro 8.0 and newer use JavaScript 1.7. Older versions use JavaScript 1.5.

return "KbwbTdsjqu!spdlt\"".replace(/./g,function(w){return String.fromCharCode(w.charCodeAt()-1)});
Reply With Quote
  #3  
Old February 20th, 2020, 03:12 PM
dbarbee dbarbee is offline
Junior Community Member
 
Join Date: Jul 2012
Location: Minnesota
Posts: 13
Default Re: Pulling text from large tagged document

Attached is a version of the document in regards to.
Attached Files
File Type: zip Sample Letter.zip (35.4 KB, 5 views)
Reply With Quote
  #4  
Old February 20th, 2020, 06:02 PM
Dan Korn's Avatar
Dan Korn Dan Korn is offline
FusionPro Senior Engineer / Forum Moderator
 
Join Date: Aug 2008
Location: Chicago, IL
Posts: 4,504
Default Re: Pulling text from large tagged document

Quote:
Originally Posted by dbarbee View Post
Attached is a version of the document in regards to.
Thanks, that helps me better envision what you're trying to accomplish.

It looks like you already have it mostly working. The way you're parsing the lines of text from the frame is pretty clever.

Though I have to add a bit of a disclaimer here, in that I always recommend against this kind of fuzzy logic to parse out already formatted or composed output. It's a bit like trying to unmake soup into its ingredients. You're always better off dealing with the source data as much as possible. Presumably the Word document was created via some kind of mail merge, based on some original "raw" data. If you can get your hands on that original data, that would make things much more straightforward. But I assume that you don't have access to that, which is why you're trying to do this extraction in the first place.

The other caveat I would add is that I (or someone else here in the community) can help you to figure out some JavaScript magic to parse the data you supplied in that one Word document, but it's hard to know exactly how well that parsing logic will work based on only one set of data, which is all you have provided. If you could supply a couple more examples of these Word documents (i.e. data records), it would give me (or anyone else) a better idea of how much variability we're dealing with in the data, and how robust the parsing code needs to be to handle various edge cases.

All that said, I would suggest a couple of things. First, if you put this parsing logic into OnRecordStart, then you can call FusionPro.Composition.AddVariable for each extracted variable, which will allow you to use those composition variables directly in text frames, without having to actually create global JavaScript variables and rules for each. (Alternatively, you could call FusionPro.Composition.AddTextReplacement to directly replace markers in text, without even needing to insert text variables.) Also, you can make just one pass through the lines to find what you need.

The attached template shows how to do this. Note that I've removed many of the rules in favor of simply calling FusionPro.Composition.AddVariable. I've also removed everything in the JavaScript Globals. Now, if you do need other logic to massage that data, then you'll need to either put that logic into OnRecordStart, like I've done for the "Specialist Name Only" field, or you'll need to move the first line of OnRecordStart var capturedVars = {}; to the JavaScript Globals and then, in other rules, do something like this:
Code:
if (FusionPro.inValidation)
    Rule("OnRecordStart");

var val = capturedVars["Specialist Name Only"];
// do something with val...

As for finding the phone number, I got it to work with this:
Code:
line.match(/\(*\d{3}[\D]*\d{3}[\D]*\d{4}/);
Though when you say that the format "can deviate slightly," as noted above, this is where I would need to know a little more about those variations in order to write code to handle them.

Parsing the date is a bit trickier. If you know it's always going to be a line starting with an English weekday name, then it seems pretty simple:
Code:
if (/^((Monday)|(Tuesday)|(Wednesday)|(Thursday)|(Friday)|(Saturday)|(Sunday))/.test(line))
If the line doesn't always start with a weekday name, that's trickier. But I would need to know all the possible formats to look for in order to code up something to handle those other cases. (See previous comment about not knowing about any other records of data other than the single one provided.)

All that is also in the attached template.

Hope this helps, and thanks again for sharing the template!
Attached Files
File Type: pdf Invite - Clean - Dan-3.pdf (81.7 KB, 5 views)
__________________
Dan Korn
FusionPro Developer / JavaScript Guru / Forum Moderator
PTI Marketing Technologies | Printable | MarcomCentral
LinkedIn

I am a not a Support engineer, and this forum is not a substitute for Support. My participation on this forum is primarily as a fellow user (and a forum moderator). I am happy to provide help and answers to questions when I can; however, there is no guarantee that I, or anyone else on this forum, will be able to answer all questions or fix any problems. If I ask for files to clarify an issue, I might not be able to look at them personally. I am not able to answer private messages, emails, or phone calls unless they go through proper Support channels. Please direct any sales or pricing questions to your salesperson or inquiries@marcom.com.

Complex template-building questions, as well as all installation and font questions or problems, should be directed to FusionProSupport@marcom.com. Paid consulting work may be required to fulfill your template-building needs.

This is a publicly viewable forum. Please DO NOT post fonts, or other proprietary content, to this forum. Also, please DO NOT post any "live" data with real names, addresses, or any other personal, private, or confidential data.

Please include the specific versions of FusionPro, Acrobat, and your operating system in any problem reports or help requests. I recommend putting this information in your forum signature. Please also check your composition log (.msg) file for relevant error or warning messages.

Please post questions specific to the MarcomCentral Enterprise and Web-to-Print applications in the MarcomCentral forum. Click here to request access. Or contact your Business Relationship Manager (BRM/CPM) for assistance.

Please direct any questions specific to EFI's Digital StoreFront (DSF) to EFI support.

How To Ask Questions The Smart Way

The correct spellings are JavaScript, FusionPro, and MarcomCentral (each with two capital letters and no spaces). Acceptable abbreviations are JS, FP, and MC (or MCC). There is no "S" at the end of "Expression" or "Printable"! The name of the product is FusionPro, not "Fusion". "Java" is not is not the same as JavaScript.

Check out the JavaScript Guide and JavaScript Reference! FusionPro 8.0 and newer use JavaScript 1.7. Older versions use JavaScript 1.5.

return "KbwbTdsjqu!spdlt\"".replace(/./g,function(w){return String.fromCharCode(w.charCodeAt()-1)});

Last edited by Dan Korn; February 20th, 2020 at 06:08 PM..
Reply With Quote
  #5  
Old February 21st, 2020, 08:35 AM
dbarbee dbarbee is offline
Junior Community Member
 
Join Date: Jul 2012
Location: Minnesota
Posts: 13
Default Re: Pulling text from large tagged document

This is brilliant, and I learned several new things. Thank you for this.

The deviations in the phone number are mostly minor. The three formats I've seen are below, but it's mostly presented exactly like the sample:
(555)555-5555
(555) 555-5555
555-555-5555
Reply With Quote
  #6  
Old February 21st, 2020, 10:31 AM
Dan Korn's Avatar
Dan Korn Dan Korn is offline
FusionPro Senior Engineer / Forum Moderator
 
Join Date: Aug 2008
Location: Chicago, IL
Posts: 4,504
Default Re: Pulling text from large tagged document

Quote:
Originally Posted by dbarbee View Post
This is brilliant, and I learned several new things. Thank you for this.
Glad I could help! I learned a couple things too.
Quote:
Originally Posted by dbarbee View Post
The deviations in the phone number are mostly minor. The three formats I've seen are below, but it's mostly presented exactly like the sample:
(555)555-5555
(555) 555-5555
555-555-5555
I think the Regular Expression I came up with will handle those cases as well. Post back if you run into something that it doesn't match.
__________________
Dan Korn
FusionPro Developer / JavaScript Guru / Forum Moderator
PTI Marketing Technologies | Printable | MarcomCentral
LinkedIn

I am a not a Support engineer, and this forum is not a substitute for Support. My participation on this forum is primarily as a fellow user (and a forum moderator). I am happy to provide help and answers to questions when I can; however, there is no guarantee that I, or anyone else on this forum, will be able to answer all questions or fix any problems. If I ask for files to clarify an issue, I might not be able to look at them personally. I am not able to answer private messages, emails, or phone calls unless they go through proper Support channels. Please direct any sales or pricing questions to your salesperson or inquiries@marcom.com.

Complex template-building questions, as well as all installation and font questions or problems, should be directed to FusionProSupport@marcom.com. Paid consulting work may be required to fulfill your template-building needs.

This is a publicly viewable forum. Please DO NOT post fonts, or other proprietary content, to this forum. Also, please DO NOT post any "live" data with real names, addresses, or any other personal, private, or confidential data.

Please include the specific versions of FusionPro, Acrobat, and your operating system in any problem reports or help requests. I recommend putting this information in your forum signature. Please also check your composition log (.msg) file for relevant error or warning messages.

Please post questions specific to the MarcomCentral Enterprise and Web-to-Print applications in the MarcomCentral forum. Click here to request access. Or contact your Business Relationship Manager (BRM/CPM) for assistance.

Please direct any questions specific to EFI's Digital StoreFront (DSF) to EFI support.

How To Ask Questions The Smart Way

The correct spellings are JavaScript, FusionPro, and MarcomCentral (each with two capital letters and no spaces). Acceptable abbreviations are JS, FP, and MC (or MCC). There is no "S" at the end of "Expression" or "Printable"! The name of the product is FusionPro, not "Fusion". "Java" is not is not the same as JavaScript.

Check out the JavaScript Guide and JavaScript Reference! FusionPro 8.0 and newer use JavaScript 1.7. Older versions use JavaScript 1.5.

return "KbwbTdsjqu!spdlt\"".replace(/./g,function(w){return String.fromCharCode(w.charCodeAt()-1)});
Reply With Quote
  #7  
Old February 21st, 2020, 12:30 PM
dbarbee dbarbee is offline
Junior Community Member
 
Join Date: Jul 2012
Location: Minnesota
Posts: 13
Default Re: Pulling text from large tagged document

One question: Is it possible to create resources or variables like this in OnJobStart? Thinking about the efficiency/speed in composition.
Reply With Quote
  #8  
Old February 21st, 2020, 01:31 PM
Dan Korn's Avatar
Dan Korn Dan Korn is offline
FusionPro Senior Engineer / Forum Moderator
 
Join Date: Aug 2008
Location: Chicago, IL
Posts: 4,504
Default Re: Pulling text from large tagged document

Quote:
Originally Posted by dbarbee View Post
One question: Is it possible to create resources or variables like this in OnJobStart? Thinking about the efficiency/speed in composition.
Variables are generally per-record. That's the whole idea of variable data.

You could create text replacements on a per-job basis.

Or you could set up that capturedVars object as a global, populate it in OnJobStart, then just do the last few lines on OnRecordStart to iterate through that object and call FusionPro.Composition.AddVariable for each property.
__________________
Dan Korn
FusionPro Developer / JavaScript Guru / Forum Moderator
PTI Marketing Technologies | Printable | MarcomCentral
LinkedIn

I am a not a Support engineer, and this forum is not a substitute for Support. My participation on this forum is primarily as a fellow user (and a forum moderator). I am happy to provide help and answers to questions when I can; however, there is no guarantee that I, or anyone else on this forum, will be able to answer all questions or fix any problems. If I ask for files to clarify an issue, I might not be able to look at them personally. I am not able to answer private messages, emails, or phone calls unless they go through proper Support channels. Please direct any sales or pricing questions to your salesperson or inquiries@marcom.com.

Complex template-building questions, as well as all installation and font questions or problems, should be directed to FusionProSupport@marcom.com. Paid consulting work may be required to fulfill your template-building needs.

This is a publicly viewable forum. Please DO NOT post fonts, or other proprietary content, to this forum. Also, please DO NOT post any "live" data with real names, addresses, or any other personal, private, or confidential data.

Please include the specific versions of FusionPro, Acrobat, and your operating system in any problem reports or help requests. I recommend putting this information in your forum signature. Please also check your composition log (.msg) file for relevant error or warning messages.

Please post questions specific to the MarcomCentral Enterprise and Web-to-Print applications in the MarcomCentral forum. Click here to request access. Or contact your Business Relationship Manager (BRM/CPM) for assistance.

Please direct any questions specific to EFI's Digital StoreFront (DSF) to EFI support.

How To Ask Questions The Smart Way

The correct spellings are JavaScript, FusionPro, and MarcomCentral (each with two capital letters and no spaces). Acceptable abbreviations are JS, FP, and MC (or MCC). There is no "S" at the end of "Expression" or "Printable"! The name of the product is FusionPro, not "Fusion". "Java" is not is not the same as JavaScript.

Check out the JavaScript Guide and JavaScript Reference! FusionPro 8.0 and newer use JavaScript 1.7. Older versions use JavaScript 1.5.

return "KbwbTdsjqu!spdlt\"".replace(/./g,function(w){return String.fromCharCode(w.charCodeAt()-1)});
Reply With Quote
  #9  
Old February 21st, 2020, 02:03 PM
dbarbee dbarbee is offline
Junior Community Member
 
Join Date: Jul 2012
Location: Minnesota
Posts: 13
Default Re: Pulling text from large tagged document

Is there a way to reference variables created with FusionPro.Composition.AddVariable in other rules? In particular, I need to return a Graphic of the presenter.

Last edited by dbarbee; February 21st, 2020 at 02:43 PM..
Reply With Quote
  #10  
Old February 21st, 2020, 03:51 PM
Dan Korn's Avatar
Dan Korn Dan Korn is offline
FusionPro Senior Engineer / Forum Moderator
 
Join Date: Aug 2008
Location: Chicago, IL
Posts: 4,504
Default Re: Pulling text from large tagged document

Quote:
Originally Posted by dbarbee View Post
Is there a way to reference variables created with FusionPro.Composition.AddVariable in other rules? In particular, I need to return a Graphic of the presenter.
I talked about this in my previous post:
Quote:
Originally Posted by Dan Korn View Post
Now, if you do need other logic to massage that data, then you'll need to either put that logic into OnRecordStart, like I've done for the "Specialist Name Only" field, or you'll need to move the first line of OnRecordStart var capturedVars = {}; to the JavaScript Globals and then, in other rules, do something like this:
Code:
if (FusionPro.inValidation)
    Rule("OnRecordStart");

var val = capturedVars["Specialist Name Only"];
// do something with val...
Note that in OnRecordStart, you can also call FusionPro.Composition.AddGraphicVariable(), so you could do something in that loop like:
Code:
FusionPro.Composition.AddGraphicVariable("Specialist Photo", Resource(capturedVars["Specialist Name Only"]));
Or call another mapping if the resource names don't exactly correspond to the names in the data.
__________________
Dan Korn
FusionPro Developer / JavaScript Guru / Forum Moderator
PTI Marketing Technologies | Printable | MarcomCentral
LinkedIn

I am a not a Support engineer, and this forum is not a substitute for Support. My participation on this forum is primarily as a fellow user (and a forum moderator). I am happy to provide help and answers to questions when I can; however, there is no guarantee that I, or anyone else on this forum, will be able to answer all questions or fix any problems. If I ask for files to clarify an issue, I might not be able to look at them personally. I am not able to answer private messages, emails, or phone calls unless they go through proper Support channels. Please direct any sales or pricing questions to your salesperson or inquiries@marcom.com.

Complex template-building questions, as well as all installation and font questions or problems, should be directed to FusionProSupport@marcom.com. Paid consulting work may be required to fulfill your template-building needs.

This is a publicly viewable forum. Please DO NOT post fonts, or other proprietary content, to this forum. Also, please DO NOT post any "live" data with real names, addresses, or any other personal, private, or confidential data.

Please include the specific versions of FusionPro, Acrobat, and your operating system in any problem reports or help requests. I recommend putting this information in your forum signature. Please also check your composition log (.msg) file for relevant error or warning messages.

Please post questions specific to the MarcomCentral Enterprise and Web-to-Print applications in the MarcomCentral forum. Click here to request access. Or contact your Business Relationship Manager (BRM/CPM) for assistance.

Please direct any questions specific to EFI's Digital StoreFront (DSF) to EFI support.

How To Ask Questions The Smart Way

The correct spellings are JavaScript, FusionPro, and MarcomCentral (each with two capital letters and no spaces). Acceptable abbreviations are JS, FP, and MC (or MCC). There is no "S" at the end of "Expression" or "Printable"! The name of the product is FusionPro, not "Fusion". "Java" is not is not the same as JavaScript.

Check out the JavaScript Guide and JavaScript Reference! FusionPro 8.0 and newer use JavaScript 1.7. Older versions use JavaScript 1.5.

return "KbwbTdsjqu!spdlt\"".replace(/./g,function(w){return String.fromCharCode(w.charCodeAt()-1)});

Last edited by Dan Korn; February 21st, 2020 at 06:40 PM..
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump


All times are GMT -7. The time now is 07:00 AM.


Powered by vBulletin® Version 3.8.4
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.
(c) 2011, PTI Marketing Technologies™, Inc.