Beefy Boxes and Bandwidth Generously Provided by pair Networks
more useful options
 
PerlMonks  

Uniquely identifying each & every html template

by Nik
on Jan 21, 2013 at 12:25 UTC ( #1014433=perlquestion: print w/ replies, xml ) Need Help??
Nik has asked for the wisdom of the Perl Monks concerning the following question:

I use this .htaccess file to rewrite every .html request to counter.py
# ==================================================================== +================================ ============= RewriteEngine On RewriteCond %{REQUEST_FILENAME} -f RewriteRule ^/?(.+\.html) /cgi-bin/counter.py?htmlpage=$1 [L,PT,QSA] # ==================================================================== +================================ =============
counter.py script is created for creating, storing, increasing, displaying a counter for each webpage for every website i have.
It's supposed to identify each webpage by a and then do it's database stuff from there
# ==================================================================== +================================ ============= # open current html template and get the page ID number # ==================================================================== +================================ ============= f = open( '/home/nikos/public_html/' + page ) # read first line of the file firstline = f.readline() # find the ID of the file and store it pin = re.match( r'<!-- (\d+) -->', firstline ).group(1) # ==================================================================== +================================ =============
It works as expected and you can see it works normally by viewing: http//superhost.gr (bottom down its the counter)
What is the problem you ask?!
Problem is that i have to insert at the very first line of every .html template of mine, a unique string containing a number like:

index.html somefile.html other.html nikos.html cool.html to HELP counter.py identify each webpage at a unique way.

Well.... its about 1000 .html files inside my DocumentRoot and i cannot edit ALL of them of course! Some of them created by Notepad++, some with the use of Dreamweaver and some others with Joomla CMS Even if i could embed a number to every html page, it would have been a very tedious task, and what if a change was in order? Edit them ALL back again? Of course not.

My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them. In other words by not altering their contents. ============================================== BEFORE you answer me consider the following specification please: An .html page must retain its database counter value even if its: (renamed && moved && contents altered) ============================================== original attributes of the file:

filename: index.html
filepath: /home/nikos/public_html/
contents: <html> Hello </html>

get modified to:

filename: index2.html
filepath: /home/nikos/public_html/folder/subfolder/
contents: <html> Hello, people </html>

The file is still the same, even though its attributes got modified. We want counter.py script to still be able to "identify" the .html page, hence its counter value, in order to allow it to increase properly.

Thank you ALL in advance for your help.

p.s. The above i'am trying to implement with Python, so sorry for not being a Perl question, BUT, i need your ideas on which method solves this specific problem. Later on, it this will be a coding issue, for now its just a specification issue. Thank you all in advance.

Comment on Uniquely identifying each & every html template
Select or Download Code
Re: Uniquely identifying each & every html template
by Anonymous Monk on Jan 21, 2013 at 13:17 UTC
    Not perl, not a perl question, not a python question. not a Mod_rewrite forum ask elsewhere.
      This is a "specification" problem that i need your ideas for to find a working solution.

      I then can be implemented in Perl or Python. BUT for the time being this is on theoretical level only. Please help.
Re: Uniquely identifying each & every html template
by Arunbear (Parson) on Jan 21, 2013 at 13:33 UTC
    Maybe ... create a database table containing the name of each file along with an auto increment primary key id, and use that id in each file to identify it. Updating the files to add these id's could be done with a script of course.

    If you rename a file, you'd have to update this table as well, but the id will be the same.

Re: Uniquely identifying each & every html template
by BrowserUk (Pope) on Jan 21, 2013 at 13:38 UTC
    My question is HOW am i suppose to identify each and every html webpage i have, without the need of editing and embedding a string containing a number for them.

    The full pathname of each file is already a unique identifier, why do you need a number also?

    Converting the unique name to a unique number can be done in any number of ways:

    • Put a list of all the paths into a file and then number them;
    • On *nix, the node id should be unique;
    • md5 the path;
    • crc64 the path;
    • A custom hashing function:
      sub id{ my $n = 0; $n = ( $n << 1 ^ $_ ) for unpack 'Q(X7Q)*', $_[0]; +$n };; print id( $_ ), ' : ', $_ for qw[ /home/nikos/public_html/folder/subfolder/index2.html /home/nikos/public_html/folder/subfoldre/index2.html /homa/nikos/public_html/folder/subfolder/index2.html /home/nikos/public_html/folder/subfoldre/index.html /home/nikos/public_html/folder/subfolder/index.html ];; 14395874431259872676 : /home/nikos/public_html/folder/subfolder/inde +x2.html 18362554206005274164 : /home/nikos/public_html/folder/subfoldre/inde +x2.html 14252040718160727460 : /homa/nikos/public_html/folder/subfolder/inde +x2.html 534634562378736008 : /home/nikos/public_html/folder/subfoldre/index. +html 2010048469067125824 : /home/nikos/public_html/folder/subfolder/index +.html

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Why there is a need to Convert the unique filename to a unique number? How is this gonna help?
      The hash is a mathematical algorithm of producing a string based on a contents file BUT in my case i stated that:

      the .html file can be:

      1. renamed
      2. moved
      3. contents altered (hash is based on files contents, so that will fail)

      We need some other attribute to identify a file because relying on the above attributes will only produce DOUBLE counters for the modifies file.
        Why there is a need to Convert the unique filename to a unique number? How is this gonna help?

        Why are you asking me the same question as I asked you?

        The hash is a mathematical algorithm of producing a string based on a contents file

        I showed the hash operating on the file path, not its content.

        the .html file can be: 1. renamed 2. moved 3. contents altered

        Thus, you want to be able to consider two files with different names, different locations and different contents as "the same file".

        Without you arranging to place some piece of information within those files that can be searched for, to uniquely identify them; those criteria mean that any two files could be considered the same, which is a nonsense; which is why I ignored the possibility that you actually meant that; and assumed your description was lacking precision.

        However, inserting a piece of information -- say a custom html-like tag or html comment -- into each html file -- regardless of whether the are 100's 1000's or 100s of 1000s would be the works of a few minutes. At least it would be for Perl.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Node Status?
node history
Node Type: perlquestion [id://1014433]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (5)
As of 2014-12-27 03:43 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    Is guessing a good strategy for surviving in the IT business?





    Results (176 votes), past polls