thraxil.org:

question on text processing

by emile Fri 02 May 2003 16:28:41

given two near identical text files without endline characters with a differing substring of an arbitrary length and position in the latter file, is there a tool operable on a unix platform to generate a "patch file", meaning a new file denoting which substrings to delete from the original and what to add to generate the new file, and then to later take such a diff file and apply it to the original to regenerate the edit? diff does this on a line by line basis and merge can rejoin them but i need a tool that analyses a string without endlines.

comments

i would probably just use tr or sed to convert some other reasonably common character (maybe space) to '\n' on each file, run diff on them as normal, then after merging, convert the '\n' back to that character. eg: % tr ' ' '\n' < foo > foo.tmp % tr ' ' '\n' < bar > bar.tmp % diff foo.tmp bar.tmp > test.patch and then remember to do a % tr '\n' ' ' < patched.tmp > output_file after you merge.
hehehe ... we get the beauty of munge.exe in winblowz ... which does this very easily.
munge <a href="http://www.ss64.com/nt/munge.html">looks</a> to me like a weak copy of <a href="http://www.gnu.org/manual/sed/html_mono/sed.html">sed</a>. the whole not working on files that are more than 2MB is pretty sad. and even sed pales in comparison to the power of commandline perl. "perl -pi -e 's/foo/bar/g' *.html" is just the beginning.
Yeah munge is limited, in fact I've had to write my own substring routines in batch files (really not that hard given the NTreskit) ... however ... at least in NT ... scripting is pretty slow. I found myself wondering if I could whip up a C++ program and execute it faster than my script. Regardless, laziness prevailed and I ran my script overnight and the job was completed.
if i were for some reason forced to use windows for any length of time, i'm pretty sure that i would quickly become a fan of <a href="http://www.cygwin.com/">cygwin</a>.
When in windows you learn to love sysinternals.com. Mark Risonivich (sp) wrote a pile of tools that hack into the winblowz kernel and give the user some serious priviledges and information ... most of which give you a relatively unix level of control (lol) ... or at least unix level of OS awareness ... which is key in my job. Also, since my company has infinite financial resources, we have full access to all of microsoft's secret API's ... allowing full kernel application development. I wish I could say I feel priviledges ... I really just feel all dirty. Oh well ... its a paycheck until I can get the fuck away from the computer industry and go do something less banal ... like mountainclimbing or killing people for the navy seals.
a company i was contracting with did its dev in linux then just automated a process to translate pathnames in files and released the "windows" version with cygwin. mmm, laziness.

formatting is with Textile syntax. Comments are not displayed until they are approved by a moderator. Moderators will not approve unless the comment contributes value to the discussion.

namerequired
emailrequired
url
remember info?