Untar and gunzip all files in a directory

I needed a little script to expand a bunch of tarred and gzipped files in a directory and I needed a log of what happened.. and didn’t happen (errors). So, here it is.


#!/bin/bash
 # jcz 16-mar-12
 # filename: untgzall.sh
 # untars (with gzip) all tar gzips in the directory
 # where it is run. Logs all file contents
 # and errors to a text file in the directory
 # in which it is run
 #
 ##################################

############################
 # Global script variables block
 ############################
 # Date and other variables pretty self explanatory, S is seconds
 # date format is currently YYYYMMDD_HHMMSS
 dater=$(date)
 dayer=$(date +%a%F%H%m)
 namer=$(whoami)
 hoster=$(hostname)
 directory=$(pwd)
 filenamer=$(date +%a_%F_%H_%M_%S)_untgzlog
 # sets day of the week
 set $(date)
 logger=$filenamer.txt
 ############################
 # END Global script variables block
 ############################
 echo "Welcome, $namer. I'm running in $directory and I will expand all tarred and gzipped files to here."
 echo ""
 echo "I see the following files to expand. I will write them down for you now"
 ls *.tar.gz 2> deleteme.txt
 ls *.tgz 2> deleteme.txt
 echo ""
 echo "Please review the file $filenamer in this folder when I'm done."
 echo ""
 echo ""
 echo "************* RUNNING ****************"

echo "[START]" >>$logger
 echo "" >>$logger
 echo "" >>$logger
 echo "********** START RUN LOG HEADER ***************" >> $logger
 echo "Dater:" $dater >> $logger
 echo "Username:" $namer >> $logger
 echo "Computer:" $hoster >> $logger
 echo "Directory:" $directory >> $logger
 echo "" >>$logger
 echo "********** END RUN LOG HEADER ***************" >> $logger
 echo "" >>$logger
 echo "" >>$logger
 echo "" >>$logger

# The & characters after the commands log all output (stdout and stderr) to the log file
 echo "I see the following .tar.gz files to expand. I will write them down for you now" >> $logger
 ls -lh *.tar.gz &>> $logger
 echo "" >>$logger
 echo "" >>$logger
 echo "" >>$logger

for zp in *.tar.gz
 do
 echo "---- START $zp ARCHIVE INFO ----" >> $logger
 stat $zp &>> $logger
 echo "---- END $zp ARCHIVE INFO ----" >> $logger
 echo "" >>$logger
 echo "~~~~~~~~~~~~~ START FILES IN ARCHIVE $zp ~~~~~~~~~~~" >> $logger
 echo "Expanding: $zp"
 tar xzvf $zp &>> $logger
 echo "~~~~~~~~~~~~~ END FILES IN ARCHIVE $zp ~~~~~~~~~~~" >> $logger
 echo "" >>$logger
 echo "" >>$logger
 echo "" >>$logger
 done
 echo "" >>$logger
 echo "" >>$logger
 echo "" >>$logger
 echo "I see the following .tgz files to expand. I will write them down for you now" >> $logger
 ls -lh *.tgz &>> $logger
 echo "" >>$logger
 echo "" >>$logger
 echo "" >>$logger

for tz in *.tgz
 do
 echo "---- START $tz ARCHIVE INFO ----" >> $logger
 stat $tz &>> $logger
 echo "---- END $tz ARCHIVE INFO ----" >> $logger
 echo "" >>$logger
 echo "~~~~~~~~~~~~~ START FILES IN ARCHIVE $tz ~~~~~~~~~~~" >> $logger
 echo "Expanding: $tz"
 tar xzvf $tz &>> $logger
 echo "~~~~~~~~~~~~~ END FILES IN ARCHIVE $tz ~~~~~~~~~~~" >> $logger
 echo "" >>$logger
 echo "" >>$logger
 echo "" >>$logger
 done

echo "[END]" >>$logger
 echo "" >>$logger
 unix2dos $filenamer.txt
 echo ""
 echo ""
 echo "************* DONE ****************"

Which gives us the following output at the command line

</pre>
$ untgzall.sh
 Welcome, john.zastrow. I'm running in /home/john.zastrow and I will expand all tarred and gzipped files to here.

I see the following files to expand. I will write them down for you now
 test.tar.gz
 deleteme.tgz test.tgz

Please review the file Tue_2012-04-10_21_47_05_untgzlog in this folder when I'm done.
 ************* RUNNING ****************
 Expanding: test.tar.gz
 Expanding: deleteme.tgz
 Expanding: test.tgz
 unix2dos: converting file Tue_2012-04-10_21_47_05_untgzlog.txt to DOS format ...
 ************* DONE ****************

and the following output in the file called Tue_2012-04-10_21_47_05_untgzlog.txt. Notice there file that errored out.

</pre>
[START]
 ********** START RUN LOG HEADER ***************
 Dater: Tue, Apr 10, 2012 9:47:04 PM
 Username: john.zastrow
 Computer: DIVL-GY4K3R1
 Directory: /home/john.zastrow

********** END RUN LOG HEADER ***************

I see the following .tar.gz files to expand. I will write them down for you now
 -rw-r--r-- 1 john.zastrow Domain Users 11M Mar 13 04:00 test.tar.gz

---- START test.tar.gz ARCHIVE INFO ----
 File: `test.tar.gz'
 Size: 10862048 Blocks: 10608 IO Block: 65536 regular file
 Device: 92b3f5b8h/2461267384d Inode: 804736958415954494 Links: 1
 Access: (0644/-rw-r--r--) Uid: (57187/john.zastrow) Gid: (10513/Domain Users)
 Access: 2012-03-13 13:10:00.000000000 -0400
 Modify: 2012-03-13 04:00:45.000000000 -0400
 Change: 2012-04-10 21:46:01.572102800 -0400
 Birth: 2012-03-13 13:09:44.433265300 -0400
 ---- END test.tar.gz ARCHIVE INFO ----

~~~~~~~~~~~~~ START FILES IN ARCHIVE test.tar.gz ~~~~~~~~~~~
 tar: This does not look like a tar archive
 tar: Skipping to next header
 tar: Exiting with failure status due to previous errors
 ~~~~~~~~~~~~~ END FILES IN ARCHIVE test.tar.gz ~~~~~~~~~~~
 I see the following .tgz files to expand. I will write them down for you now
 -rw-r--r-- 1 john.zastrow Domain Users 51 Apr 10 21:46 deleteme.tgz
 -rw-r--r-- 1 john.zastrow Domain Users 189 Apr 10 21:45 test.tgz

---- START deleteme.tgz ARCHIVE INFO ----
 File: `deleteme.tgz'
 Size: 51 Blocks: 1 IO Block: 65536 regular file
 Device: 92b3f5b8h/2461267384d Inode: 3659174697428854 Links: 1
 Access: (0644/-rw-r--r--) Uid: (57187/john.zastrow) Gid: (10513/Domain Users)
 Access: 2012-04-10 21:46:56.546599300 -0400
 Modify: 2012-04-10 21:46:56.562199400 -0400
 Change: 2012-04-10 21:46:56.562199400 -0400
 Birth: 2012-04-10 21:46:56.546599300 -0400
 ---- END deleteme.tgz ARCHIVE INFO ----

~~~~~~~~~~~~~ START FILES IN ARCHIVE deleteme.tgz ~~~~~~~~~~~

gzip: stdin: not in gzip format
 tar: Child returned status 1
 tar: Error is not recoverable: exiting now
 ~~~~~~~~~~~~~ END FILES IN ARCHIVE deleteme.tgz ~~~~~~~~~~~

---- START test.tgz ARCHIVE INFO ----
 File: `test.tgz'
 Size: 189 Blocks: 1 IO Block: 65536 regular file
 Device: 92b3f5b8h/2461267384d Inode: 4785074604212675 Links: 1
 Access: (0644/-rw-r--r--) Uid: (57187/john.zastrow) Gid: (10513/Domain Users)
 Access: 2012-04-10 21:45:03.602401000 -0400
 Modify: 2012-04-10 21:45:03.680401100 -0400
 Change: 2012-04-10 21:45:03.680401100 -0400
 Birth: 2012-04-10 21:45:03.602401000 -0400
 ---- END test.tgz ARCHIVE INFO ----

~~~~~~~~~~~~~ START FILES IN ARCHIVE test.tgz ~~~~~~~~~~~
 deleteme.txt
 ~~~~~~~~~~~~~ END FILES IN ARCHIVE test.tgz ~~~~~~~~~~~

[END]

Spatialite and Spatial Indexes

This post is part of a series [1][2][3][4]

Tonight I continued dabbling with my little project and experimenting with spatial indexes in Spatialite. I quickly realized that while using indexes benefitted the queries, the questions were too easy and the queries were finishing very quickly regardless of using indexes or not (nice problem to have). Therefore, the benefits of using the indexes were being swamped out by little errors in timings.

So, I made a bigger dataset. As shown below, the number of features I’m testing against went from 183 to 2429. You’d think that would be enough, but stay tuned…

Bigger sheds

A bigger, badder testing dataset. Note the previous data set at 183 features (purple) and the much larger count of features in the new, blue polys.

So, armed with this larger dataset I proceeded to test sensitivity to using indexes.

I dinked around with trying to write own my R-tree index queries (the “some index” examples below) and they helped. Then Sandro (author of spatialite) replied to a post I made seeking guidance. There he re-wrote my query (best examples below)and suggested I read about a new, easier way to use spatial indexes here http://www.gaia-gis.it/gaia-sins/SpatialIndex-Update.pdf

The Original Query

So here is the original query.

[cc lang='sql' ]

SELECT HU_12_NAME FROM huc12
WHERE ST_Contains(Geometry, MakePoint(-70.250,43.802)) = 1

[/cc]

Faced with 2429 polygons, the original query takes about 1.9 seconds on average. My goal is to get results back from anywhere in the country in 1 second or less. Clearly this type of query isn’t going to cut it.

Timings without index

Timings without index

I’m also including the explain plans here. Obviously without an index the query has to do a full scan of the table to figure out which records it needs. Scans are bad. Scans of many records are very bad.

Explain plan with no index

Explain plan with no index

A Better Query

Ok, so to scan less you need to do some filtering based on something to reduce the records you’re talking to. In this case, we use the bboxes from the index tables to grab a more limited set of features to actually perform the more intensive ST_Contains test with just to make sure the point ACTUALLY falls within the set of polygons the BBOX suggests.

[cc lang='sql' ]

SELECT HU_12_NAME FROM huc12
WHERE ST_Contains(Geometry, MakePoint(-70.250,43.802))
AND
ROWID IN (
SELECT pkid from idx_huc12_Geometry where xmin < -69 and xmax > -71)

[/cc]

OK, now I was getting ~1.1 seconds pretty reliably. Yep, that’s detectable change between the two queries. So, we have enough records to see the benefits of the improved queries. This was an improvement, but not enough.

Timings with some index

Timings with some index

The explain plan gets better. We see the filtering and while we’re scanning, it’s more manageable.

Explain plan with some index

Explain plan with some index

 Best Query

So then came Sandro’s reply. In it he guided me to this type of query.. which rocks.

[cc lang='sql' ]

SELECT HU_12_NAME FROM huc12
WHERE ST_Contains(Geometry, MakePoint(-70.250,43.802)) = 1
AND ROWID IN (
SELECT ROWID
FROM SpatialIndex
WHERE f_table_name = ‘huc12′
AND search_frame = MakePoint(-70.250,43.802));

[/cc]

And here we see true magic. The original query with no indexes ran for 1.9 seconds. Some index sped things up to 1.1. Now, hold on to your socks, Sandro’s more proper query comes in at 0.16 seconds. That kind of performance really gives me hope.

Timings with some index

Timings with some index

We don’t learn too much more from the explain plan. But hey, who needs to ask more questions of that kind of speed up.

Explain plan with optimal virtual index

Explain plan with optimal virtual index

That was fun. What does it do when stuck into PHP?

First let’s just do the same old “no index” version.

PHP page with more features and no index in query

PHP page with more features and no index in query

About 2.5 seconds on average. No surprises there.

How about with some spatial index.

PHP page with more features and some spatial index in query

PHP page with more features and some spatial index in query

 

About 1.8 seconds on average. Shaved quite a bit off, but we’re still way over the 1 second goal. How about with Sandro’s query?

PHP page with more features and best spatial index in query

PHP page with more features and best spatial index in query

Cool. We’re below the 1 second mark. I even saw a 0.7 second timing every so often. But, we’re very close to 1 second, so I’m starting to sweat a little. But wait, there’s more. I’ve still got those little filler queries happening on the page, and one of them is a bulky SELECT DISTINCT which is going to scan my larger dataset now. So, what happens when we get rid the crud?

PHP page with more features, best spatial index, and extra queries removed

PHP page with more features, best spatial index, and extra queries removed

Yes! At 0.26 seconds we’re way below 1 second again. Phew! What about the original query? What happens when we remove the cruft from that page but still use the un-optimized query?

PHP page with no index, but extra queries removed

PHP page with no index, but extra queries removed

1.9 seconds. Some improvement from the first page, but not where we need it. No worries, we’re moving on!

Example with PHP and Spatialite, part 2

This post is part of a series [1][2][3][4]

So I’m ready to take the next steps with my little project. This is a continuation of my previous post about my little journey. At this point I am connecting to a physical database file that I loaded with some sample data (12-digit watersheds). Now I’m going to practice with queries and you can see the results below.

Here are the base data.

12-digit US watersheds (HUC12) and the example data set used here. Found in Southern Maine, Cumberland

And here are some close ups of the data. These are fairly dense polygons.

Example of polygon vertices

Example of polygon vertices

In fact, it looks like this query is testing the relationship between the point and polygons formed by 144,700 coordinate pairs (vertices) by scanning without the help of an index.

Lots of little points to check

Lots of little points to check

At this point I’m just going to perform basic queries without using spatial indexes. You will almost always want to use spatial indexes, but I’m going to practice this in phases so these examples won’t use indexes.

Note that unlike tradition database indexes, spatial databases like Spatialite and PostGIS (and their GiST/R-tree indexes) do not use indexes for spatial queries unless you explicitly tell them to (though PostGIS seems to use them by default some of the time). You must smartly craft the use of indexes the same way that you do the SQL itself… or at least it seems this way to me.

In Spatialite, the indexes are just tables and you have to add subqueries to your query to grab the bounding rectangles from the Rtrees to pre-filter your queries for the index-driven speed-up.

And here are the spatial indexes that spatialite sees.

spatialite_indexes

Spatial indexes used in spatialite

So what’s in these indexes? Boxes…as we see below. Hopefully you can imagine how we get boxes from Xmin, Ymin, Xmax, Ymax extents. There is one box for each polygon HUC12 feature (note the PK_UID is the primary key of the main geometry layer). These simple boxes are much simpler to test for spatial relationships that the multitude of vertices we saw above. But also much less accurate; especially for funny shaped things like watersheds. But, we can use these simple boxes to pre-filter the number of features that have to be tested by the more accurate (but lengthy) spatial test – hence speeding up the overall operation in many cases.

What is in a name... or an Rtree index.

What is in a name... or an Rtree index.

 

Below is an example of the spatial query used in the code below. Translated, it says, “show me the name of the HUC12 that contains this point.”

The free gui provided by spatialite and a spatial query

The free gui provided by spatialite and a spatial query

Here are the files in the project so far. Of course you’re not normally going to be putting a  loadable extension library (libspatialite.so) in a web server file directory. But, this is just practice.

Files so far for this project

Files so far for this project

Here’s the code of db.php. This isn’t using spatial indexes, so it’s scanning 183 features and a whole bunch of vertices to figure out which polygon actually contains that point… and doing a couple simpler things like opening a connection, asking some simple questions, and closing the connection all in about 0.4 seconds.

[cc lang='php' ]

<html>
<head>
<title>Testing SpatiaLite on PHP</title>
</head>
<body>
<h1>Testing SpatiaLite on PHP</h1>

<?php
// Start TIMER
// ———–
$stimer = explode( ‘ ‘, microtime() );
$stimer = $stimer[1] + $stimer[0];
// ———–
try {
/*** connect to SQLite database ***/
$db = new SQLite3(‘db.sqlite’);

/*** a little message to say we did it ***/
echo ‘database connected’;
}
catch(PDOException $e)
{
echo $e->getMessage();
}
# loading SpatiaLite as an extension
$db->loadExtension(‘libspatialite.so’);

# reporting some version info
$rs = $db->query(‘SELECT sqlite_version()’);
while ($row = $rs->fetchArray())
{
print “<h3>SQLite version: $row[0]</h3>”;
}
$rs = $db->query(‘SELECT spatialite_version()’);
while ($row = $rs->fetchArray())
{
print “<h3>SpatiaLite version: $row[0]</h3>”;
}

/* SELECT HU_12_NAME FROM huc12 WHERE ST_Contains(Geometry, MakePoint(-70.250,43.802));
*/
/*
* Create a query
*/
$sql = “SELECT DISTINCT Count(*), ST_GeometryType(Geometry), “;
$sql .= “ST_Srid(Geometry) FROM huc12″;
$rs = $db->query($sql);
while ($row = $rs->fetchArray())
{
# read the result set
$msg = “There are “;
$msg .= $row[0];
$msg .= ” entities of type “;
$msg .= $row[1];
$msg .= ” SRID=”;
$msg .= $row[2];
print “<h3>$msg</h3>”;
}

$sql = “SELECT HU_12_NAME FROM huc12 WHERE ST_Contains(Geometry, MakePoint(-70.250,43.802))”;
$rs = $db->query($sql);
while ($row = $rs->fetchArray())
{
# read the result set
$msg = “Your point is in the HUC12: “;
$msg .= $row[0];
print “<h3>$msg</h3>”;
}
/*
* do not forget to release all handles !
*/

# closing the DB connection
$db->close();

// End TIMER
// ———
$etimer = explode( ‘ ‘, microtime() );
$etimer = $etimer[1] + $etimer[0];
echo ‘<p style=”margin:auto; text-align:center”>’;
printf( “Script timer: <b>%f</b> seconds.”, ($etimer-$stimer) );
echo ‘</p>’;
// ———

?>

</body>
</html>

[/cc]

 

Not too bad, but I want this faster because I want to feed it much larger data in my final project.

Results of the first try at this

Results of the first try at this

Example with PHP and Spatialite, part 1

This post is part of a series [1][2][3][4]

So, PHP supports SQLite out of the box (http://www.sqlite.org/, now at 3.7.10), making it a nice combo if you want to do some reads from your page. My impression is that SQLite is not recommended if you want stuff with database writes and you have more than a couple visitors. But reading seems to be fine.

I think this fits my use cases just fine as I just want to hang out some very basic utility services that can run on the “single beige box in the corner” or the “beige cloud in the sky” with few resources needed. First, I want to create a simple REST service that, when passed a pair of long/lat coordinates, will do nothing but return that name of the county they are in. Then I’ll do one for watershed identifiers (USDA WBD HUC12 to be exact), and eventually maybe I’ll work up to a nearest place service (http://www.geonames.org/) and so on. Maybe even some downstream/upstream routing with Spatialite’s network utilities [1], [2], [3].

Of course, these are spatial functions so I’m using the spatial extension to SQLite called Spatialite found at http://www.gaia-gis.it/gaia-sins/. I find Spatialite to be a profoundly elegant amalgam of existing projects (SQLite and GEOS) and new, efficient and pragmatic programming that fills an empty niche. Here that niche is don’t make me deploy anything more than I need. I simply want some basic, re-usable services that don’t do much, including have their data updated – and I don’t want to run a heavy spatial infrastructure just to cheaply answer some basic questions on a $6/month virtual, private, cloud LAMP box with 256MB of memory or this little Pentium 4 appliance running in my snack drawer at work.

So, I began compiling spatialite, and at the time I was using 3.0beta1a, so I just kept running with it. I’m still learning the basics of spatialite so I dinked around a bit. Then I followed the instructions for getting spatialite running within PHP  [here] and/or [here] (not sure which one is the official guide. The site has been migrating to a new infrastructure lately).

After making way too many typos, I got it working and am getting the expected output. I also added some timer code which tells me that from my Ubuntu VM running on my 6-month-old laptop I’m completing these ~30,000 operations in about 6 seconds against the in-memory database, including opening and closing the connection to a database and tables that are created each page load.

Sample spatialite with PHP screen

Sample spatialite with PHP screen

My next exercise will be to figure out how to connect to an existing disk-based DB and try some simpler operations. My goal will be to get my operations out the door in about 1 second on modest hardware under no load.


Continue reading

PostGIS Install and Cloning Ubuntu Systems Notes

[cc lang='bash' ]sudo apt-get remove –purge[/cc]

[cc lang='bash' ]apt-get install deborphan debfoster
debfoster
deborphan
deborphan –guess-all[/cc]

[cc lang='bash' ]apt-get autoremove
sudo apt-get remove –purge postgresql-client
sudo apt-get remove –purge postgresql-client-8.4
sudo apt-get remove –purge postgresql-client-common
apt-get clean[/cc]

still didn’t work.
[cc lang='bash' ]jcz@dell390:/usr/local/src/postgis-1.5.2$ ls -l /usr/bin/pg*
-rwxr-xr-x 1 root root 26260 2011-02-02 03:56 /usr/bin/pg
-rwxr-xr-x 1 root root 25912 2011-04-20 10:27 /usr/bin/pg_config
-rwxr-xr-x 1 root root 13860 2010-07-06 20:21 /usr/bin/pgrep
jcz@dell390:/usr/local/src/postgis-1.5.2$ sudo rm /usr/bin/pg_config[/cc]

then
[cc lang='bash' ]sudo ln -s /usr/local/pgsql/bin/pg_config /usr/bin/pg_config
apt-get install libxml2-dev[/cc]

and finally got
[cc lang='bash' ]PostGIS is now configured for i686-pc-linux-gnu

————– Compiler Info ————-
C compiler:           gcc -g -O2
C++ compiler:         g++ -g -O2

————– Dependencies ————–
GEOS config:          /usr/local/bin/geos-config
GEOS version:         3.2.2
PostgreSQL config:    /usr/bin/pg_config
PostgreSQL version:   PostgreSQL 9.0.4
PROJ4 version:        47
Libxml2 config:       /usr/bin/xml2-config
Libxml2 version:      2.7.7
PostGIS debug level:  0

——– Documentation Generation ——–
xsltproc:
xsl style sheets:
dblatex:
convert:           [/cc]

[cc lang='bash' ]#!/bin/sh
sudo ln -s /usr/local/pgsql/bin/createlang /usr/bin/createlang
sudo ln -s /usr/local/pgsql/bin/dropdb /usr/bin/dropdb
sudo ln -s /usr/local/pgsql/bin/initdb /usr/bin/initdb
sudo ln -s /usr/local/pgsql/bin/pg_ctl /usr/bin/pg_ctl
sudo ln -s /usr/local/pgsql/bin/createdb /usr/bin/createdb
sudo ln -s /usr/local/pgsql/bin/createuser  /usr/bin/createuser
sudo ln -s /usr/local/pgsql/bin/pg_dump /usr/bin/pg_dump
sudo ln -s /usr/local/pgsql/bin/pgsql2shp /usr/bin/pgsql2shp
sudo ln -s /usr/local/pgsql/bin/pg_upgrade /usr/bin/pg_upgrade
sudo ln -s /usr/local/pgsql/bin/shp2pgsql /usr/bin/shp2pgsql

/usr/local/pgsql/bin/initdb -D /usr/local/pgsql/data
/usr/local/pgsql/bin/postgres -D /usr/local/pgsql/data >logfile 2>&1 &
/usr/local/pgsql/bin/createdb test
/usr/local/pgsql/bin/psql test

make
make install
createlang plpgsql yourtestdatabase
psql -d yourtestdatabase -f postgis/postgis.sql
psql -d yourtestdatabase -f spatial_ref_sys.sql[/cc]

how to generate a list of installed packages and use it to reinstall packages
[cc lang='bash' ]sudo apt-get update
sudo apt-get dist-upgrade
sudo dpkg –get-selections | grep -v deinstall | awk ‘{print $1}’ > 164.ubuntu-files_b.txt
sudo cat 164.ubuntu-files_b.txt | xargs sudo aptitude install[/cc]

NOTE: WordPress interprets two dashes (- -) as one dash (–). When you’re putting this into your CLI, make sure it’s dropping two dashes ‘- -’ without the space between them.

———————————————————————–

Update 20-Aug-2011

One liners install process instructions for Ubuntu 11

wget http://postgis.refractions.net/download/postgis-1.5.3.tar.gz
wget http://wwwmaster.postgresql.org/download/mirrors-ftp/source/v9.0.4/postgresql-9.0.4.tar.gz
wget http://download.osgeo.org/geos/geos-3.3.0.tar.bz2
wget http://download.osgeo.org/proj/proj-4.7.0.tar.gz

 

sudo apt-get install libreadline6-dev zlib1g-dev libxml2 libxml2-dev bison openssl libssl-dev
sudo apt-get yum install -y


mkdir -p /usr/local/src

cd /usr/local/src

tar zxvf postgresql-8.3.7.tar.gz

cd postgresql-8.3.7

./configure –with-openssl –enable-integer-datetimes

make

make install

cd /usr/local/src/postgresql-8.3.7/contrib/

make all

make install

cp /usr/local/src/postgresql-8.3.7/contrib/start-scripts/linux /etc/init.d/postgresql

chmod 775 /etc/init.d/postgresql

update-rc.d /etc/init.d/postgresql defaults

adduser postgres -d /usr/local/pgsql

echo ‘PATH=$PATH:/usr/local/pgsql/bin; export PATH’ > /etc/profile.d/postgresql.sh
echo ‘MANPATH=$MANPATH:/usr/local/pgsql/man; export MANPATH >> /etc/profile.d/pgmanual.sh

chmod 775 /etc/profile.d/postgresql.sh
chmod 775 /etc/profile.d/pgmanual.sh

mkdir -p /var/log/pgsql
chown -R postgres:postgres /var/log/pgsql/

mkdir /usr/local/pgsql/data

chown -R postgres:postgres /usr/local/pgsql/data

su – postgres

/usr/local/pgsql/bin/initdb -U postgres -E=UTF8 /usr/local/pgsql/data

These steps should be done as the postgres user.  As root, issue: `su – postgres` (no password needed), the postgresql.conf and pg_hba.conf configuration files are located in /usr/local/pgsql/data/

Using the ‘nano‘ editor, (or vi), modify the postgresql.conf to allow the installation to listen for remote connections. Also, while we’re in here let’s configure the logging to create the log file in /var/log/pgsql/. The main cause of not being able to connect to a PostgreSQL database is because of a misconfiguration in this file.

listen_addresses = ‘*’
port = 5432
log_destination = ‘stderr’
logging_collector = on
log_directory = ‘/var/log/pgsql/’
log_filename = ‘postgresql-%Y-%m-%d’
log_line_prefix = ‘ %t %d %u ‘

Now, edit the pg_hba.conf and configure some network rules. Add the line in red to match your LAN address range. Set access from other computers to use md5 authentication. You can also set the other methods to md5, (and others) but for managability, leave the local connections set to ‘trust’ for now. The order of rules in this file matters.

# TYPE DATABASE USER CIDR-ADDRESS METHOD
# “local” is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
host all all 192.168.0.0/24 md5
# IPv6 local connections:
host all all ::1/128 trust

/etc/init.d/postgresql start

 

find to copy files into single directory

Useful little one liners. This one makes copy of subset of dir/ and below based on finding files that match the criteria. In this case, I wanted all .doc files copied into a single place.

I run most of this stuff in Windows on Cygwin, so I use the:

[cce_bash]

-print0 | xargs -0
[/cce_bash]

part to handle the spaces in file and directory names.

[cce_bash]

find /cygdrive/f/dir1/ -name '*.doc' -print0 | xargs -0 cp -a --target-directory=/cygdrive/c/Temp --parents
[/cce_bash]

My MySQL latest config file

I never record this stuff and I always wish I did. So here’s a working MySQL config file that I’m using on a linux virtual machine with 2GB of memory. Notes and all so I don’t keep having to look stuff up.

# MySQL config file for APPSRV VPS with 2GB of memory
# 25-March 25-2011
# jcz.
# For MySQL 5.1
# The following options will be passed to all MySQL clients
[client]
port        = 3306
socket        = /var/lib/mysql/mysql.sock

Continue reading

File listing ditties

Here are some simple bash scripts to list files into a text files that can be used to catalog stuff. Most of the time I use the last one. 
[cce_bash]

filer=$(find . -mtime -1)
sizer=$(ls -lah $filer | awk ‘{ print $5″\t” $6″\t” $7″\t” $8″\t” $9″\t\n” }’)
echo $sizer

[/cce_bash]

or this one

[cce_bash]
ls -ghG –full-time | awk ‘{ print $1″\t” $3 “\t” $4 “\t” $7 $8 $9 $10 $11 $12 “\n” }’ > files.txt
cat files.txt
[/cce_bash]
The script above makes output as follows. The extra lines just made it easier for me to read. This one will output files with spaces in the name by removing the spaces (lumping the words together). If you don’t like that, just put the ” ” between $7 $8 $9 etc. above.

[john.zastrow@appsrv ~]$ cat files.txt
total
drwxr-xr-x.     4.0K    2011-03-24      Desktop
drwxr-xr-x.     4.0K    2011-03-24      Documents
drwxr-xr-x.     4.0K    2011-03-24      Downloads
-rwxrwxrwx      126     2011-04-29      filer1.sh
-rwxrwxrwx      158     2011-04-29      filer2.sh
-rwxrwxrwx      123     2011-04-29      filer3.sh
-rw-rw-r--      0       2011-04-29      files.txt
-rwxrwxrwx      73      2011-04-29      inter.sh

Then this script

 

?

[cce_bash]
#!/bin/sh
date > statfile.txt
echo -e “File_type \t Modified_date \t Change_date \t File_bytes \t File_name” >> statfile.txt
stat –printf “%F \t %y \t %z \t %s \t %N\n” * >> statfile.txt
cat statfile.txt

[/cce_bash]

 is just a little different and produces the following output. Notice that in both script I’m using tabs so these text files should come into a spreadsheet program nicely as below. Also notice the use of –printf so that I can embed the tabs right into stat’s output thereby more gracefully handling filenames with spaces in them (plus I quote them for good measure). A good use of this would be to combine find with it.

Remove header from a text file

I’m processing some Darwin-schema biodiversity text files in preparation for GISing them and I need to trim the headers from the tops of the files before importing them to the database. Of course, *NIX command line utilities have been my friend here. The following will trim the first line off the file. Just up the number to trim more lines off. I’m using Cygwin on Windows to run these.

sed '1d' file.txt # trims the line then replaces file.txt, so be careful.

awk 'FNR&gt;1{print}' file.txt &gt; newfile.txt # I ended up using this to be safe. It was easier to test.

tail -n+2 /path/to/file &gt; /path/to/output # can be tested with echo -e "foo\nbar\nbaz" | tail -n+2

Also in my case, I wanted column headers left in file, and thankfully the headers were pretty much the same across all the files. So I grep searched in the directory with all the files for the name of the first uniquely worded column and asked grep to also return the line numbers of where it found that word. Then I use that number (minus one) to put into my header deleting command.

&gt; grep -n 'Institution' *
HerpnetUtahBottomHalfProviders_January25_2010.txt:35:Institution	Collection	Catalog number text ...
HerpnetUtahTopHalfProviders_January25_2010.txt:27:Institution	Collection	Catalog number text...

Then, for example, I would use

 awk 'FNR&gt;34{print}' HerpnetUtahBottomHalfProviders_January25_2010.txt &gt; HerpnetUtahBottomHalfProviders_January25_2010_trimmed.txt

I love using Excel to craft a slew of commands then copy/paste them into the shell to batch process tens of files in my case.