lunes, diciembre 09, 2013

Easy way to remove protocol from urls

A very easy way to remove http:// and https:// from urls stored on a file.

Let's assume you have a file (allURLS.txt) with one URL per line. And you want to remove the http:// and https:// and store the result on the file cleanedUrls.txt

Here a very easy way:

grep "http:" allURLS.txt | cut -b 1-7 --complement >> cleanedUrls.txt

grep "https:" allURLS.txt | cut -b 1-8 --complement > cleanedUrls.txt


cut -b does a substring between i-j, with the option complement you select "everything else"


2 comentarios:

  1. What you're saying is completely true. I know that everybody must say the same thing, but I just think that you put it in a way that everyone can understand. I'm sure you'll reach so many people with what you've got to say.

    ResponderBorrar
  2. Nice post, things explained in details. Thank You.

    ResponderBorrar

¡Comenta!