lunes, diciembre 09, 2013

Easy way to remove protocol from urls

A very easy way to remove http:// and https:// from urls stored on a file.

Let's assume you have a file (allURLS.txt) with one URL per line. And you want to remove the http:// and https:// and store the result on the file cleanedUrls.txt

Here a very easy way:

grep "http:" allURLS.txt | cut -b 1-7 --complement >> cleanedUrls.txt

grep "https:" allURLS.txt | cut -b 1-8 --complement > cleanedUrls.txt


cut -b does a substring between i-j, with the option complement you select "everything else"