View Full Version : Removing "artifacts"
Chickenman
December 30th, 2009, 05:15 AM
We are exporting some Btrieve customer data to Access and then bringing the Access table into Monarch for cleanup, analysis etc.
In the Access table, the Street Address field (column) has a '?' in a box in the first position of most, but not all, street address fields. When brought into Monarch, this is carried forth as a box (square).
I am attempting to use Strip function but don't know how to define this character. :mad:
Is there a way to clean up either in Access or Monarch?
When stepping through the wizard to import into Monarch, the Sample Data looks like:
\vHIGHWAY 57 should be HIGHWAY 57
\x162001 Main St should be 2001 Main St
\x0E39 Meadoow Lane should be 39 Meadow Lane
etc
all those preceded by '\n' come in fine :confused:
CM
OllyInMunich
December 30th, 2009, 06:20 AM
Hello Chickenman,
If you weren't working in Access it would be easy - a combination of the Options Input for treating non ASCII characters, ignoring print control characters etc, and if necessary, a run through MSRP or Utility to clean it up before taking the data into Monarch. Can you get the data from Btrieve into a format like CSV instead?
Best wishes,
Olly
Chickenman
December 30th, 2009, 06:25 AM
Hello Olly,
Co-incidentally I am looking at export options now to see if that will be a possibility.
Access was chosen as the entire database is being exported to one file with automatic generation of separate tables for each of the database tables similar to the V10 option to export summaries. Slick, but as necessary I can forego that convenience.
Thanx for the quick reply.
CM
OllyInMunich
December 30th, 2009, 06:31 AM
Hello Chickenman,
I'm not familiar enough with Access to know the options, but I would have thought that it would be possible to treat the characters within Access before importing to Monarch as well. Perhaps I'm biased, but if I've ever got to handle data I'll try to use Monarch rather than mucking about in Microsoft Office...
If it's just one field causing problems, perhaps a calculated field could be defined in the Access table that strips out the first n characters?
Good luck!
Olly
Grant Perkins
December 30th, 2009, 06:35 AM
CM,
As I understand it the ? in Access is the equivalent of the 'box' in 'text' files and basically indicates some sort of generic spurious character that the application has no information about in order to help it decide what to do.
In the sample you posted the characters are inconsistent and so may take some considerable work to successfully and reliably remove them on a line by line basis. Therefore I would be tempted to go back to fixing in the Access import.
It looks like an interpretation/compatibility issue to me leaving an incomplete or partial conversion into Access. It's a while since I was involved in anything like that so I doubt I can offer anything useful directly. However the Btrieve software is a product of Pervasive Software these days and there is a support forum at cs.pervasive.com that might contain the information that solves your issue. Or a pointer to it. There are so many potential variables in the components that I think only someone totally familiar with the conversion implementation and the tool versions in use will be able to dig through the potential solutions.
This Wikipedia link may be of interest.
http://en.wikipedia.org/wiki/Newline
in particular the "In programming languages" section which will likely explain the "\n" puzzle.
I think of the input was a print file rather than an Access table you could robably uise the 'Ignore Unused Print Control characters' option to strip the line starts but that's not really something one can do with a database input.
HTH.
Grant
Edit: Looks like Olly had similar thoughts while I was creating this entry!
Chickenman
December 30th, 2009, 06:36 AM
Olly,
I have very limited Access experience so you may very well be correct.
A quick web search revealed a number of low cost utilities that claim to strip the junk out of Access, but this is pretty much a one-shot deal so I hesitate to obtain yet another piece of software, go down the learning cure (albeit short), etc
CM
OllyInMunich
December 30th, 2009, 06:43 AM
Hello CM,
If all the bad data starts with "\xyz123 " then a calculated field in Monarch to remove this would do a lot of the work for you.
Something like:
if(left(field;1)="\";lsplit(field;2;" ";2);field)
might help?
Best wishes,
Olly
Chickenman
December 30th, 2009, 06:52 AM
Olly,
The beginning characters are all over the lot so that doesn't seem possible.
Grant,
Yes, it is running over Pervasive SQL. Didn't see anything in the forum that looked useful. I tried exporting from Monarch to Excel and using a variant of parsing carriage returns from address in Excel to no avail.
I'm an operations guy and need quick easy solutions as these are just tools. If I were a DBA, this might be the kind of challenge that would break up a dull day...:rolleyes:
CM
Grant Perkins
December 30th, 2009, 07:03 AM
Olly,
The beginning characters are all over the lot so that doesn't seem possible.
Grant,
Yes, it is running over Pervasive SQL. Didn't see anything in the forum that looked useful. I tried exporting from Monarch to Excel and using a variant of parsing carriage returns from address in Excel to no avail.
Pretty sure they are 'print control' characters but once passed through the Access part they become ascii. No time ot check this at the moemnt but maybe you could export from Access and run the file through Monarch Utility in one form or another and them re-import to access?
How are you (or whoever) getting the the data from Btrieve to Access? SQL? Is there a setting in there somewhere to deal with the line start characters? Or maybe a wrong ddf (data dafinition file)?
I'm an operations guy and need quick easy solutions as these are just tools. If I were a DBA, this might be the kind of challenge that would break up a dull day...:rolleyes:
CM
I totally agree with that! No friendly DBA available to you today?
Grant
Chickenman
December 30th, 2009, 07:49 AM
It's sort of an on-the-fly DDF and that's the rub - it can pull the fields but cannot determine format and translation for all of them.
I now have the export as csv and will fiddle with the utility as this is now a burr under my saddle.
At the end of the day it would be no big trick to just have someone brute force hit all the cells and delete the offending character - just over 2000 rows would occupy someone for a couple hours. But we like the "sufficient level of technology indistinguishable from magic" approach over here :D
Chickenman
December 30th, 2009, 08:22 AM
I appreciate the help offered by Olly and Grant.
While it would have been nice to simply work through Monarch, in the end an Excel was obtained from the export and the offending character in column D was got rid of by =RIGHT(D2,LEN(D2)-1) so we can go on with life :p
CM
Grant Perkins
December 30th, 2009, 08:29 AM
I appreciate the help offered by Olly and Grant.
While it would have been nice to simply work through Monarch, in the end an Excel was obtained from the export and the offending character in column D was got rid of by =RIGHT(D2,LEN(D2)-1) so we can go on with life :p
CM
So is this a one-off exercise (assume it is based on the previous post to the above)? In which case a pragmatic resolution trumps any other.
Grant
Chickenman
December 30th, 2009, 08:38 AM
Yes, as much as we like to play around and learn new things we chose the route that got us off the dime as this is a tiny part of a much larger project.
CM
Data Kruncher
December 30th, 2009, 01:36 PM
I'm a little late to this party, but it might be useful to know that you can determine precisely what a mystery character is using the Asc() function.
If the problem child is the first character in a string, then
Asc(Left(MyField,1))
will return the numeric ASCII code of the character. The first "normal" code is the space character (ASCII 32), and the last is the tilde ~ character (ASCII 126).
Additionally, under the Options -> Input menu, you can elect to "ignore unused print control characters (0-31)", where 0 to 31 refers to ASCII values.
HTH,
Kruncher
Chickenman
December 30th, 2009, 01:49 PM
[CM immediately stops impatiently drumming his fingers while waiting on Sandy to make the scene :) ]
OK, that could be useful - but I created calc field and get a variety of digits, so can I now use that info to strip off the characters? Not intuitive if so.
The Input Options form is grayed out for Report Translation - assume because this is a database...
Data Kruncher
December 30th, 2009, 02:00 PM
Right, right, right... database source.
Let's see if I can get this right. Assuming that problem characters only appear in the first position (I know, big assumption):
If(Asc(Left(MyField,1))<32 .Or. Asc(Left(MyField,1))>126,Substr(MyField,2,Len(MyField)-1),MyField)
Chickenman
December 30th, 2009, 02:09 PM
Perfecto!
Yes, always first character - thanx so much for the help - I'll memorialize this little gem in my User Defined Functions.
CM
Data Kruncher
December 30th, 2009, 02:13 PM
Trying to squeeze in time for homework, huh? :D Good on you. I know that it's not always easy, but it will pay off.
Enjoy the rest of the day. :)
Grant Perkins
December 30th, 2009, 04:03 PM
Hmm.
Interesting.
Must confess that I thought that the Access phase has already 'converted' the characters to text in a form that would not respond to the ASC() function. Indeed I ran a C&P to create a text file from the sample and got a negative response to a couple of typical settings (as a report file) so decided that was not the way to go.
One lives and learns.
Grant
Powered by vBulletin™ Version 4.1.0 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.