Gareth Horton
July 12th, 2002, 02:44 PM
Hi Tim,
The statement, "The emails look something like this:", is the deceiding factor in how the data in this file is extracted.
If the files are well formatted (always structured as the example) it can be processed in Monarch by starting with the CONDITIONS section of the file. Choose a 2 line Sample as a Detail, Trap literally on the word CONDITIONS and highlight the second line which can consist of multi-line entries. Click on the Advanced Tab in the Field Properties window and choose an End Field On option that will work well (Blank Field Values....or Minimum Action...). Change the Type to Memo. This will extract all of the multiple text to one field. later in the Table Window calculated fields may be created to split the lines if necessary.
Next I chose from the CALL BACK ARRANGEMENTS to the comments line (just before the CONDITIONS) as a multi-line sample, Trapping literally on CALL BACK.
I did this because it was the largest Sample that is preceeded by and ends with a multi-line entry. By choosing the largest block, ending with multi-line data. This reduces the number of Templates required for data extraction.
Repeat the above to extract the multi-line data and highlight all other fields in the Sample. Repeat for each multi-line sample block.
Choosing Sample blocks that begin after a multi-line data field and end with a multi-line data field allow the flexability to extract the multi-line (comments) using the Advanced End Field On options without interferring with the static labeled data. I captured all with a Detain and 7 Appends.
Of course this all goes out the window if the structure of the report changes.
You may want to consider submitting the acctual report to the Model Building Service available on our website:
http://www.datawatch.com/services/model-building.asp
I hope this is helpful.
DEE MOORE
Tim wrote:
> I have a bunch of emails that I need to trap the information. There are a lot of lines of data to trap. Each section of data has a name followed by a colon ( smile.gif . Any help would be appreciated. The emails look something like this:
> SUBJECT: claim number
>
> SYNOPSIS OF ISSUES REPORTED / ALLEGED:
> multiple lines of text
>
> TYPE OF CALL: Original Issue
> DATE OF REPORT: 07/25/2001 10:55 PM
> INTERVIEW SPECIALIST: NAME
>
> DBA: NAME
> UNIT/NUMBER: UNKNOWN
> ADDRESS: STREET ADDRESS
> CITY/ST/ZIP: CITY, ST #####
>
> OPTIONAL CALLER INFORMATION:
> Name: Name Refused
>
> WHO IS RESPONSIBLE:
> Name: name
> Gender:
> Age:
> Title: title
> Tenure: Approximately xx years
> Name: name
> Gender: Male
> Age:
> Title: title
> Tenure: Approximately 2 years
> Name: UNKNOWN
> Gender: Male
> Age: Approximately xx-xx years of age
> Title: Factory Worker
> Tenure: Approximately six years
> Description of person: multiple lines of text
> starting here
> Name: name
> Gender:
> Age:
> Title: tile
> Tenure: Approximately 10 years
>
> WHAT:
> multiple lines of text
>
> WHEN:
> Ongoing for approximately four months, specific dates unknown
>
> WHERE:
> At the above location
>
> HOW:
> multiple lines of text
>
> HOW LONG HAS THIS BEEN OCCURRING AND HOW OFTEN IN THE PAST?
> Approximately four months
>
> HOW DO YOU KNOW ABOUT THIS? Caller was involved in the incident.
>
> IS THERE ANY DOCUMENTATION THAT WOULD HELP THE COMPANY INVESTIGATE
> THIS? No.
>
> HAVE YOU REPORTED THIS TO ANYONE IN MANAGEMENT? No.
>
>
> IS THERE ANYONE ELSE WHO KNOWS ABOUT THIS? Yes
> Name: Refused
> Title, work area, or responsibility: multiple lines
> of text
>
> INTERVIEW NOTES:
> multiple lines of text
>
> CALL BACK ARRANGEMENTS:
> WCB in two weeks.
>
> DISSEMINATION:
> Disseminated to: name
> Disseminated by: name
> Date: 07/26/2001 02:39 PM
> Via: EMAIL
>
> CALLER CALL BACK:
> Caller Name: Refused
> Interview Specialist: name
> Date: 07/30/2001 04:21 PM
> Comments: multiple lines of text that start here
>
> CONDITIONS:
> multiple lines of text
[ May 02, 2006, 05:53 PM: Message edited by: Todd Niemi ]
The statement, "The emails look something like this:", is the deceiding factor in how the data in this file is extracted.
If the files are well formatted (always structured as the example) it can be processed in Monarch by starting with the CONDITIONS section of the file. Choose a 2 line Sample as a Detail, Trap literally on the word CONDITIONS and highlight the second line which can consist of multi-line entries. Click on the Advanced Tab in the Field Properties window and choose an End Field On option that will work well (Blank Field Values....or Minimum Action...). Change the Type to Memo. This will extract all of the multiple text to one field. later in the Table Window calculated fields may be created to split the lines if necessary.
Next I chose from the CALL BACK ARRANGEMENTS to the comments line (just before the CONDITIONS) as a multi-line sample, Trapping literally on CALL BACK.
I did this because it was the largest Sample that is preceeded by and ends with a multi-line entry. By choosing the largest block, ending with multi-line data. This reduces the number of Templates required for data extraction.
Repeat the above to extract the multi-line data and highlight all other fields in the Sample. Repeat for each multi-line sample block.
Choosing Sample blocks that begin after a multi-line data field and end with a multi-line data field allow the flexability to extract the multi-line (comments) using the Advanced End Field On options without interferring with the static labeled data. I captured all with a Detain and 7 Appends.
Of course this all goes out the window if the structure of the report changes.
You may want to consider submitting the acctual report to the Model Building Service available on our website:
http://www.datawatch.com/services/model-building.asp
I hope this is helpful.
DEE MOORE
Tim wrote:
> I have a bunch of emails that I need to trap the information. There are a lot of lines of data to trap. Each section of data has a name followed by a colon ( smile.gif . Any help would be appreciated. The emails look something like this:
> SUBJECT: claim number
>
> SYNOPSIS OF ISSUES REPORTED / ALLEGED:
> multiple lines of text
>
> TYPE OF CALL: Original Issue
> DATE OF REPORT: 07/25/2001 10:55 PM
> INTERVIEW SPECIALIST: NAME
>
> DBA: NAME
> UNIT/NUMBER: UNKNOWN
> ADDRESS: STREET ADDRESS
> CITY/ST/ZIP: CITY, ST #####
>
> OPTIONAL CALLER INFORMATION:
> Name: Name Refused
>
> WHO IS RESPONSIBLE:
> Name: name
> Gender:
> Age:
> Title: title
> Tenure: Approximately xx years
> Name: name
> Gender: Male
> Age:
> Title: title
> Tenure: Approximately 2 years
> Name: UNKNOWN
> Gender: Male
> Age: Approximately xx-xx years of age
> Title: Factory Worker
> Tenure: Approximately six years
> Description of person: multiple lines of text
> starting here
> Name: name
> Gender:
> Age:
> Title: tile
> Tenure: Approximately 10 years
>
> WHAT:
> multiple lines of text
>
> WHEN:
> Ongoing for approximately four months, specific dates unknown
>
> WHERE:
> At the above location
>
> HOW:
> multiple lines of text
>
> HOW LONG HAS THIS BEEN OCCURRING AND HOW OFTEN IN THE PAST?
> Approximately four months
>
> HOW DO YOU KNOW ABOUT THIS? Caller was involved in the incident.
>
> IS THERE ANY DOCUMENTATION THAT WOULD HELP THE COMPANY INVESTIGATE
> THIS? No.
>
> HAVE YOU REPORTED THIS TO ANYONE IN MANAGEMENT? No.
>
>
> IS THERE ANYONE ELSE WHO KNOWS ABOUT THIS? Yes
> Name: Refused
> Title, work area, or responsibility: multiple lines
> of text
>
> INTERVIEW NOTES:
> multiple lines of text
>
> CALL BACK ARRANGEMENTS:
> WCB in two weeks.
>
> DISSEMINATION:
> Disseminated to: name
> Disseminated by: name
> Date: 07/26/2001 02:39 PM
> Via: EMAIL
>
> CALLER CALL BACK:
> Caller Name: Refused
> Interview Specialist: name
> Date: 07/30/2001 04:21 PM
> Comments: multiple lines of text that start here
>
> CONDITIONS:
> multiple lines of text
[ May 02, 2006, 05:53 PM: Message edited by: Todd Niemi ]