Beefy Boxes and Bandwidth Generously Provided by pair Networks
Do you know where your variables are?
 
PerlMonks  

Re^7: Sort alphabetically from file

by james28909 (Deacon)
on Jun 18, 2019 at 00:42 UTC ( #11101499=note: print w/replies, xml ) Need Help??


in reply to Re^6: Sort alphabetically from file
in thread Sort alphabetically from file

No, shift removes the first element of @ARGV on each call, returning the element it removed.

so shift physically removes the entry in @ARGV? this is interesting and i did not realize that. also thanks for pointing out the problem there, i actually did some reading on matching myself, and was trying some new stuff and forgot to change it back to how it was, so i will go back and change that right now. thanks.

the provided sample data was not '\t'. as far as i can tell it was not tab separated. it was a double space. there is no reason under the sun (that i can think of) to have two white spaces between data, actually spaces between data is kind of flawed in itself really, your better off using a comma or some other separator that is not normally used. anyways, the same regex would match both cases as well. if you want to pick that apart thenyoull have a blast if go further down in the comments below and find the person who matched two words in different columns. that would throw off your data quicker than removing a white space.

EDIT: you also took what i said out of context for the most part, what i said was: As long as that does not corrupt your data set it should be fine (and i am sure it is fine)

Replies are listed 'Best First'.
Re^8: Sort alphabetically from file
by hippo (Chancellor) on Jun 18, 2019 at 09:03 UTC
    so shift physically removes the entry in @ARGV? this is interesting and i did not realize that

    Indeed it does:

    $ cat shifter.pl #!/usr/bin/env perl use strict; use warnings; do { print "ARGV is @ARGV\n"; print "Shifting " . shift . " off ...\n"; } until $#ARGV < 0; $ ./shifter.pl x y z ARGV is x y z Shifting x off ... ARGV is y z Shifting y off ... ARGV is z Shifting z off ... $

    See perldoc -f shift for more.

Re^8: Sort alphabetically from file
by haukex (Chancellor) on Jun 19, 2019 at 19:40 UTC
    the provided sample data was not '\t'. as far as i can tell it was not tab separated. it was a double space. there is no reason under the sun (that i can think of) to have two white spaces between data

    Well, even if you can't think of a reason, the OP's file format appears to use it ;-) The main point here is this: we don't know the OP's real file format. Depending on where the source code was copied-and-pasted from, tabs could have been converted to spaces. The data is so simplistic that it most likely isn't the real data the OP is working with. And if it is, then it's most likely a homework assignment, and if I was an instructor, in my next assignment I might specifically design my input file format to allow for single whitespace characters in a column and require two or more whitespace characters between columns, just to teach people about how to handle strange situations like that. People tend to get pretty creative in their file formats.

    spaces between data is kind of flawed in itself really, your better off using a comma or some other separator that is not normally used

    I absolutely agree!

    you also took what i said out of context for the most part, what i said was: As long as that does not corrupt your data set it should be fine (and i am sure it is fine)

    If it had just been the first part of the sentence, without the part in parentheses, then I think it's a great way to word it. But the part in parentheses expresses a level of certainty that we just can't have. Even re-reading the sentence now, I don't see another way to understand the wording of that sentence; if I'm mistaken, please feel free to explain what you meant. I quoted that part because that's what I was objecting to, plus a little more so the quote would make more sense. And if someone was missing context, your post is still there :-) I've updated my post though.

    To put it a different way: It sounded like you were saying not to worry about it, but not thinking about these kinds of issues is what contributes to people designing some "strange" file formats :-)

    In such cases I find it better to ask the OP to be specific about their file format (providing a hex dump if necessary), to design a solution as robust and defensively coded as possible based on the data given (i.e. it rejects data that isn't exactly like the sample data), and/or to provide a solution but explain all of the assumptions and limitations.

    By the way, it looks like you've edited your post without mentioning the edit. Please see How do I change/delete my post?, in particular "It is uncool to update a node in a way that renders replies confusing or meaningless".

      I am not sure why yall are jumping my ass over this. you know, i know, he knows, she knows, they know, everyone knows that the supplied data is 99.999% NOT data they are actually parsing. In other words, the data put up in the OP was spoof data, to show us what their file is like. the supplied while loop and search/match case would work for either file format tbh.

      also i did go back and edit, because i was asked to, and noted to that person that i did edit the post. i will make sure to go back and re-edit it for the sake of the node's cleanliness.

      i worded the sentence "As long as that does not corrupt your data set it should be fine (and i am sure it is fine)" like that because i wanted them to know that it /could/ corrupt their data set, but as long as it stayed like that then the code i posted would work for the original format, would clean it up, and would work for future data sets that used said code. It is not up to me to make sure the person copy/pastes the right code. it is, however, up to me to write some code that can atleast help or give an idea of what needs done. all they need to do is add a space in the print statement to print it back to a file exactly as intended.

      EDITED: removed last two sentences, also the other post has been updated. if there is anything else you would like me to do, dont hesitate.

      EDITED: had to put a space in between "last" and "two"

        I am not sure why yall are jumping my ass over this.

        Sorry, but my initial comment was two brief sentences, and your response was much longer and you seemed to object, so I explained where I was coming from.

        In other words, the data put up in the OP was spoof data, to show us what their file is like.

        Unfortunately, I've seen it happen too many times that a wisdom seeker will forget some aspect of their input file format, someone will write some code for the sample data, and the OP will come back with "oh wait, actually this doesn't work for my real data, because my real data actually looks like this ...".

        it /could/ corrupt their data set, but as long as it stayed like that then the code i posted would work for the original format

        When you put it like that, it is clearer, thanks.

        I mentioned the edit because significant unmarked edits mean I can't be certain whether other parts of the post might have been edited too, and so it's harder to go back and double-check whether I maybe overlooked or misunderstood something. Thanks for marking the edits.

        A reply falls below the community's threshold of quality. You may see it by logging in.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: note [id://11101499]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others exploiting the Monastery: (8)
As of 2020-04-08 16:42 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    The most amusing oxymoron is:
















    Results (45 votes). Check out past polls.

    Notices?