View Full Version : compare files and get lines not in both files
acornejo
02-02-2006, 04:22 PM
Hi
I'm quite new on this and I need to compare 2 text files that contains several lines and get a report of wich lines are NOT in both files. I've made a vbscript and can get to have wich lines are in both files; however I can't get to make it work for the ones that are not.
I can not just use the negative of the compare since other wise I'll have a entry for each of line that does not match each line on the first file
This is what I got
FUNCTION Compare
SET f1 = fs.OpenTextFile(TxtFile1, FORREADING)
SET rf = fs.OpenTextFile(ResultFile,FORWRITING,TRUE)
SET OrigLine = Nothing
SET CmpLine = Nothing
DO
SET f2 = fs.OpenTextFile(TxtFile2, FORREADING)
OrigLine = f1.Readline
DO UNTIL f2.AtEndOfStream
CmpLine = f2.Readline
IF OrigLine = CmpLine Then
rf.WriteLine OrigLine
END IF
LOOP
f2.Close
LOOP UNTIL f1.AtEndOfStream
rf.Close
f1.Close
END FUNCTION
Any ideas?
Regards
Alvaro
cprasley
04-11-2006, 03:36 PM
Is the order of the output significant? Your current method of matching every line in one file against every line in the other file is an N^2 problem and therefore won't scale very well to larger files. You might want to consider sorting both files first, and then lock-stepping your way through them using a less/equals/greater test. When you get an "equals" result then you advance both files to the next line, and when you get a "less"/"greater" result you print out the line that sorts lower and advance that file only.
You would launch the two sorts using something like:
Set WshShell = CreateObject("WScript.Shell")
Set File1 = WshShell.Exec("cmd /C "+chr(34)+"TYPE "+chr(34)+FileName1+chr(34)+" | SORT"+chr(34))
Set File2 = WshShell.Exec("cmd /C "+chr(34)+"TYPE "+chr(34)+FileName2+chr(34)+" | SORT"+chr(34))
DO UNTIL File1.StdOut.AtEndOfStream AND File2.StdOut.AtEndOfStream
---read from the two stream handles and compare
LOOP
If order is significant then you should look at downloading the GNU utilities and using DIFF to compare the files.
acornejo
04-12-2006, 11:59 AM
Hi Prasley
Thanks for your answer. It did provide me some clues to have the script run faster.
However, I've discovered that, the sort is not what I would expect. This is: if the values are 1,21,12 ,5,10,3,2,99, then the "sorted" data is 1,10,12,2,21,3,5,99 !!! it is sorting character by character and not by its value.
I would expect to have the sorted data as 1,2,3,5,10,12,21,99
I there any way to have the sort to return values ordered by "value". The help for the SORT command does not said anything about a parameter to do this.
Regards and Thanks again for your time
Alvaro
cprasley
04-29-2006, 10:32 PM
The sort provided by Microsoft does not have any command-line option to select value-based sorting. It treats each line as a pure ASCII string and sorts them alphabetically.
I didn't know you were needing to sort the text file by value, so I gave you a general implimentation.
If you need the results sorted by value then there are a few different ways to tackle the problem. One would be to install a non-MS sort program such as the GNU utilities (which also has a DIFF program that would do most of what you're trying to accomplish anyway). Another would be to install the MS "Services for Unix" package which provides a Unix sort program. A "nothing extra required" solution would be for your script to read the two data files and rewrite them to temporary files, sort the temporary files and then do the lock-step walk through them. In that scenario you could force the numbers to become right justified using something like var_out = RIGHT("________" + var_in, 8) (use 8 spaces instead of 8 underscores) so that a two-digit number would have 6 leading spaces when it was sent to the temp file. After the sort, you could strip the leading blanks away when writing the results.
Depending on the size of your input files, it may be easier to simply read both files into memory and do the sort and compare operations entirely inside the script. For example, if you use the JScript array object then you can call the sort() method and pass it a function that does the comparison.
Powered by vBulletin™ Version 4.1.0 Copyright © 2012 vBulletin Solutions, Inc. All rights reserved.