MySQL Functions

I just want to record these for future reference. I’m actually using the first now.

From the MySQL 5.0 Online manual

An example of how to make the first letter in a string uppercase – analogous to UCFIRST

[cce_sql]

SELECT CONCAT(UPPER(SUBSTRING(firstName, 1, 1)), LOWER(SUBSTRING(firstName FROM 2))) AS properFirstName

[/cce_sql]

a user-defined function in MySQL 5.0+ similar to PHP’s substr_count(), since I could not find an equivalent native function in MySQL. (If there is one please tell me!!!)

[cce_sql]

delimiter ||
DROP FUNCTION IF EXISTS substrCount||
CREATE FUNCTION substrCount(s VARCHAR(255), ss VARCHAR(255)) RETURNS TINYINT(3) UNSIGNED LANGUAGE SQL NOT DETERMINISTIC READS SQL DATA
BEGIN
DECLARE count TINYINT(3) UNSIGNED;
DECLARE offset TINYINT(3) UNSIGNED;
DECLARE CONTINUE HANDLER FOR SQLSTATE ’02000′ SET s = NULL;

SET count = 0;
SET offset = 1;

REPEAT
IF NOT ISNULL(s) AND offset > 0 THEN
SET offset = LOCATE(ss, s, offset);
IF offset > 0 THEN
SET count = count + 1;
SET offset = offset + 1;
END IF;
END IF;
UNTIL ISNULL(s) OR offset = 0 END REPEAT;

RETURN count;
END;

||

delimiter ;

[/cce_sql]

Use like this:

[cce_sql]

SELECT substrCount(‘/this/is/a/path’, ‘/’) `count`;

[/cce_sql]

`count` would return 4 in this case. Can be used in such cases where you might want to find the “depth” of a path, or for many other uses.

It’s pretty easy to create your own string functions for many examples listed here

[cce_sql]
## Count substrings

CREATE FUNCTION substrCount(x varchar(255), delim varchar(12)) returns int
return (length(x)-length(REPLACE(x, delim, ”)))/length(delim);

SELECT substrCount(‘/this/is/a/path’, ‘/’) as count;

[/cce_sql]

+——-+
| count |
+——-+
|     4 |
+——-+

[cce_sql]

SELECT substrCount(‘/this/is/a/path’, ‘is’) as count;

[/cce_sql]

+——-+
| count |
+——-+
|     2 |
+——-+

[cce_sql]

## Split delimited strings

CREATE FUNCTION strSplit(x varchar(255), delim varchar(12), pos int) returns varchar(255)
return replace(substring(substring_index(x, delim, pos), length(substring_index(x, delim, pos – 1)) + 1), delim, ”);

select strSplit(“aaa,b,cc,d”, ‘,’, 2) as second;

[/cce_sql]

+——–+
| second |
+——–+
| b      |
+——–+

[cce_sql]

select strSplit(“a|bb|ccc|dd”, ‘|’, 3) as third;

[/cce_sql]

+——-+
| third |
+——-+
| ccc   |
+——-+

[cce_sql]

select strSplit(“aaa,b,cc,d”, ‘,’, 7) as 7th;

[/cce_sql]

+——+
| 7th  |
+——+
| NULL |
+——+

[cce_sql]

## Upper case first letter, UCFIRST or INITCAP

CREATE FUNCTION ucfirst(x varchar(255)) returns varchar(255)
return concat( upper(substring(x,1,1)),lower(substring(x,2)) );

select ucfirst(“TEST”);

[/cce_sql]

+—————–+
| ucfirst(“TEST”) |
+—————–+
| Test            |
+—————–+

[cce_sql]

##Or a more complicated example, this will repeat an insert after every nth position.

drop function insert2;
DELIMITER //
CREATE FUNCTION insert2(str text, pos int, delimit varchar(124))
RETURNS text
DETERMINISTIC
BEGIN
DECLARE i INT DEFAULT 1;
DECLARE str_len INT;
DECLARE out_str text default ”;
SET str_len=length(str);
WHILE(i<str_len) DO
SET out_str=CONCAT(out_str, SUBSTR(str, i,pos), delimit);
SET i=i+pos;
END WHILE;
– trim delimiter from end of string
SET out_str=TRIM(trailing delimit from out_str);
RETURN(out_str);
END//
DELIMITER ;

select insert2(“ATGCATACAGTTATTTGA”, 3, ” “) as seq2;

[/cce_sql]

+————————-+
| seq2                    |
+————————-+
| ATG CAT ACA GTT ATT TGA |
+————————-+

—————————-

ITIS Organism Flat table from MySQL

I need to make a table to look up common (vernacular) names for some organisms. So I imported the USGS ITIS database from text files into MySQL and created this little view. Hopefully this helps someone else.

CREATE ALGORITHM=UNDEFINED DEFINER=`root`@`localhost` SQL SECURITY DEFINER VIEW `v_flat_table` AS
SELECT
  `taxonomic_units`.`tsn`         AS `tsn`,
  `taxonomic_units`.`unit_ind1`   AS `unit_ind1`,
  `taxonomic_units`.`unit_name1`  AS `unit_name1`,
  `taxonomic_units`.`unit_ind2`   AS `unit_ind2`,
  `taxonomic_units`.`unit_name2`  AS `unit_name2`,
  `taxonomic_units`.`unit_ind3`   AS `unit_ind3`,
  `taxonomic_units`.`unit_name3`  AS `unit_name3`,
  `taxonomic_units`.`unit_ind4`   AS `unit_ind4`,
  `taxonomic_units`.`parent_tsn`  AS `parent_tsn`,
  `taxonomic_units`.`update_date` AS `update_date`,
  `taxon_unit_types`.`rank_name`  AS `rank_name`,
  `vernaculars`.`vernacular_name` AS `vernacular_name`,
  `longnames`.`completename`      AS `completename`
FROM (((`taxon_unit_types`
     JOIN `taxonomic_units`
       ON (((`taxon_unit_types`.`rank_id` = `taxonomic_units`.`rank_id`)
            AND (`taxon_unit_types`.`kingdom_id` = `taxonomic_units`.`kingdom_id`))))
    LEFT JOIN `vernaculars`
      ON ((`vernaculars`.`tsn` = `taxonomic_units`.`tsn`)))
   LEFT JOIN `longnames`
     ON ((`longnames`.`tsn` = `taxonomic_units`.`tsn`)))

Loading tab-delimited data to local MySQL

I’m looking at some data from the Avian Knowledge Network and Microsoft Access just isn’t up to dealing with the volume of records. So I switched over to MySQL.

I’m using the Positive Observation Essentials format as queried from their database. Here’s an example of the data:

I did use a sed command (created and verified in Excel) on the original files to fix the spaces in the column names (if you are in the business of making data for people, never, ever, ever, ever create large tabular files with spaces or special characters in the column names. Avoid dashes as well, and preferably use all caps). Note that the commands below will REPLACE your files. So be sure to back up your originals in case anything goes wrong.

find . -name ‘*AKN*’ -exec sed -i ‘s/Record\ Number/Record_Number/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Global\ Unique\ Identifier/Global_Unique_Identifier/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Collection\ Code/Collection_Code/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Scientific\ Name/Scientific_Name/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Latitude/Latitude/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Longitude/Longitude/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Country/Country/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/State\ Province/State_Province/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Observation\ Count/Observation_Count/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Obs\ Count\ At\ Least/Obs_Count_At_Least/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Obs\ Count\ At\ Most/Obs_Count_At_Most/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Observation\ Date/Observation_Date/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Duration\ In\ Hours/Duration_In_Hours/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Time\ Observations\ Started/Time_Observations_Started/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Time\ Observations\ Ended/Time_Observations_Ended/g’ {} \;
find . -name ‘*AKN*’ -exec sed -i ‘s/Sampling\ Event\ Identifier/Sampling_Event_Identifier/g’ {} \;

Then I create a basic staging table (commands also from Excel). I don’t know why I used InnoDB, when MyIsam would have been faster. But, notice the absence of indexes for faster loading.

CREATE TABLE `nv_akn` (
  `Record_Number` VARCHAR(30) DEFAULT NULL,
  `Global_Unique_Identifier` VARCHAR(30) DEFAULT NULL,
  `Collection_Code` VARCHAR(30) DEFAULT NULL,
  `Scientific_Name` VARCHAR(150) DEFAULT NULL,
  `Latitude` VARCHAR(30) DEFAULT NULL,
  `Longitude` VARCHAR(30) DEFAULT NULL,
  `Country` VARCHAR(50) DEFAULT NULL,
  `State_Province` VARCHAR(50) DEFAULT NULL,
  `Observation_Count` VARCHAR(30) DEFAULT NULL,
  `Obs_Count_At_Least` VARCHAR(30) DEFAULT NULL,
  `Obs_Count_At_Most` VARCHAR(30) DEFAULT NULL,
  `Observation_Date` VARCHAR(30) DEFAULT NULL,
  `Duration_In_Hours` VARCHAR(30) DEFAULT NULL,
  `Time_Observations_Started` VARCHAR(30) DEFAULT NULL,
  `Time_Observations_Ended` VARCHAR(30) DEFAULT NULL,
  `Sampling_Event_Identifier` VARCHAR(50) DEFAULT NULL
) ENGINE=INNODB DEFAULT CHARSET=latin1

I’m just doing this locally, so XAMPP is my friend. So from the xampp /mysql/bin directory, I ran mysql.exe. I chose my database and ran the following on the text files produced by the above sed cleaners.

mysql> LOAD DATA LOCAL INFILE ‘c:/xampp/mysql/bin/Nevada_Pos_obs_Essent_15-MAR-2011.txt’ INTO TABLE nv_akn
FIELDS TERMINATED BY ‘\t’
LINES TERMINATED BY ‘\n’ IGNORE 1 LINES;

I used LOCAL since the database is on my workstation. Note the full path to the windows file, with forward slashes. Fields are tab-delimited, lines seem to just use carriage returns (or at least it doesn’t look like I need another line ender) and I’m ignoring the column header row.

Remove header from a text file

I’m processing some Darwin-schema biodiversity text files in preparation for GISing them and I need to trim the headers from the tops of the files before importing them to the database. Of course, *NIX command line utilities have been my friend here. The following will trim the first line off the file. Just up the number to trim more lines off. I’m using Cygwin on Windows to run these.

sed '1d' file.txt # trims the line then replaces file.txt, so be careful.

awk 'FNR&gt;1{print}' file.txt &gt; newfile.txt # I ended up using this to be safe. It was easier to test.

tail -n+2 /path/to/file &gt; /path/to/output # can be tested with echo -e "foo\nbar\nbaz" | tail -n+2

Also in my case, I wanted column headers left in file, and thankfully the headers were pretty much the same across all the files. So I grep searched in the directory with all the files for the name of the first uniquely worded column and asked grep to also return the line numbers of where it found that word. Then I use that number (minus one) to put into my header deleting command.

&gt; grep -n 'Institution' *
HerpnetUtahBottomHalfProviders_January25_2010.txt:35:Institution	Collection	Catalog number text ...
HerpnetUtahTopHalfProviders_January25_2010.txt:27:Institution	Collection	Catalog number text...

Then, for example, I would use

 awk 'FNR&gt;34{print}' HerpnetUtahBottomHalfProviders_January25_2010.txt &gt; HerpnetUtahBottomHalfProviders_January25_2010_trimmed.txt

I love using Excel to craft a slew of commands then copy/paste them into the shell to batch process tens of files in my case.

SYSSTAT: SAR/IOSTAT

The pidstat command is used to monitor processes and threads currently being managed by the Linux kernel. It can also monitor the children of those processes and threads.

With its -d option, pidstat can report I/O statistics, providing that you have a recent Linux kernel (2.6.20+) with the option CONFIG_TASK_IO_ACCOUNTING compiled in. So imagine that your system is undergoing heavy I/O and you want to know which tasks are generating them. You could then enter the following command:

$ pidstat -d 2
Linux 2.6.20 (localhost)    09/26/2007
10:13:31 AM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
10:13:31 AM     15625      1.98  16164.36      0.00  dd
10:13:33 AM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
10:13:33 AM     15625      4.00  20556.00      0.00  dd
10:13:35 AM       PID   kB_rd/s   kB_wr/s kB_ccwr/s  Command
10:13:35 AM     15625      0.00  10642.00      0.00  dd

When no PID’s are explicitly selected on the command line (as in the case above), the pidstat command examines all the tasks managed by the system but displays only those whose statistics are varying during the interval of time.

The sar utility (System Activity Reporter) is a system activity reporter that is quite popular with HP/UX and Solaris, and sar is also available for AIX. Just like top, sar gives detailed information about Oracle tasks from the UNIX level. You will be able to see the overall consumption of CPU, disk, memory, and Journal File System (JFS) buffer usage. There are three major flags that you can use with sar:

  • sar –u Shows CPU activity
  • sar –w Shows swapping activity
  • sar –b Shows buffer activity

NOTE: Each flavor of UNIX has a different implementation of sar. For example, some of the key flags used in the Sun version of sar are not available on HP/UX. The examples in this book show the HP/UX version of sar.

The output from sar reports usually shows a time-based snapshot of activity. This is true for all reports that you’ll see in this section. When you issue the sar command, you pass two numeric arguments. The first represents the time interval between samples, and the second represents the number of samples to take. For example:

L 6-4

sar –u 10 5

The sar command in this example is requesting five samples taken at 10-second intervals.

sar –w: The Memory Switching and Swapping Activity Report

The sar –w command is especially useful if you suspect that your database server
is experiencing a memory shortage. The following example shows the swapping activity report that you get from sar:

L 6-6

>sar -w 5 5

HP-UX corp-hp1 B.11.00 U 9000/800    12/25/01

07:19:33 swpin/s bswin/s swpot/s bswot/s pswch/s
07:19:38    0.00     0.0    0.00     0.0     261
07:19:43    0.00     0.0    0.00     0.0     231
07:19:48    0.00     0.0    0.00     0.0     326
07:19:53    0.00     0.0    0.00     0.0     403
07:19:58    0.00     0.0    0.00     0.0     264

Average     0.00     0.0    0.00     0.0     297

The column descriptions are as follows:

  • swpin/s Number of process swap-ins per second.
  • swpot/s Number of process swap-outs per second.
  • bswin/s Number of 512-byte swap-ins per second.
  • bswot/s Number of 512-byte swap-outs per second.
  • pswch/s Number of process context switches per second.

With sar you can watch realtime the network usage:

# sar -n DEV 1 0
Linux 2.6.22-15-generic (xXxXx)  07/09/2008
11:26:36 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
11:26:37 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
11:26:37 AM      eth0      5.05      0.00      0.86      0.00      0.00      0.00      0.00
11:26:37 AM     IFACE   rxpck/s   txpck/s    rxkB/s    txkB/s   rxcmp/s   txcmp/s  rxmcst/s
11:26:38 AM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00
11:26:38 AM      eth0      4.00      0.00      0.45      0.00      0.00      0.00      0.00

iostat
Display a single history since boot report for all CPU and
Devices.

iostat -d 2
Display a continuous device report at two second intervals.

iostat -d 2 6
Display six reports at two second intervals for all devices.

iostat -x hda hdb 2 6
Display six reports of extended statistics at two second inter-
vals for devices hda and hdb.

iostat -p sda 2 6
Display six reports at two second intervals for device sda and
all its partitions (sda1, etc.)

Display Disk IO Statistics using sar command

# sar –d

Linux 2.6.9-42.ELsmp (dev-db)        01/01/2009
12:00:01 AM    DEV              tps    rd_sec/s  wr_sec/s
12:05:01 AM    dev2-0           1.65      1.28     45.43
12:10:01 AM    dev8-1          4.08      8.11     21.81

Skipped..

Average:       dev2-0           4.66    120.77     69.45
Average:       dev8-1          1.89      3.17      8.02

Display networking Statistics using sar command

# sar -n DEV | more

Linux 2.6.9-42.ELsmp (dev-db)        01/01/2009
12:00:01 AM     IFACE   rxpck/s   txpck/s   rxbyt/s   txbyt/s   rxcmp/s   txcmp/
s  rxmcst/s
12:05:01 AM        lo      0.17      0.16     25.31     23.33      0.00      0.0
0      0.00
12:10:01 AM      eth0     52.92     53.64  10169.74  12178.57      0.00      0.0
0      0.00

# sar -n SOCK |more

Linux 2.6.9-42.ELsmp (dev-db)        01/01/2009
12:00:01 AM    totsck    tcpsck    udpsck    rawsck   ip-frag
12:05:01 AM        50        13         3         0         0
12:10:01 AM        50        13         4         0         0
12:15:01 AM        53        13         5         0         0

Quick SCP

Logged into source host

scp -r -p * root@remotehost:/home/user/public_html

This will copy *all* files to the directory /home/user/public_html in the remote server remotehost. The -p preserves the modification and access times, as well as the permissions of the source-file in the destination-file. The -r copies the contents of the source-file (directory in this case) recursively.

Updated find to list script

This is an update to the earlier script.


#!/bin/sh
# v1 jcz 30-dec-2009
# This is a silly little script that will search
# for files of a certain type and create a text file of the results
# TODO:
#  - Everything
# - Fix this script to run under cygwin 1.7.X after working fine under 1.5.X

############################
# enable for debugging #####
############################
# set -vx

############################
#  Global script variables block
############################
# Date and other variables pretty self explanatory, S is seconds
# date format is currently YYYYMMDD_HHMMSS
 dater=$(date +%Y-%m-%d_%H:%M:%S)
 dayer=$(date +%a)
 namer=$(whoami)
 hoster=$(hostname)
 directory=$(pwd)
 filenamer=$(date +%Y%m%d_%H%M%S).txt
# sets day of the week for incremental backups
 set $(date)

############################
#  Clear the screen and introduce the user to the script
############################

clear
echo ""
echo "WELCOME TO THE FIND TO LIST SCRIPT"
echo ""

############################
#  Wait for the user to enter a new file extension and capture the value as a variable
############################
echo ""
echo -n "Enter file extension to search for, without the leading dot (e.g. txt): "
read fileext
echo ""
echo ""

############################
#  Wait for the user to enter a new file destination
############################
echo -n "Enter a new log file destination without ending slash (e.g., /cygdrive/c ): "
read filedest
echo ""
echo ""

############################
#  Create the file for the script named after the file extension
############################
echo "----" >> $filedest/$filenamer
# echo "----" > $filedest/$fileext_files_from_$directory_on_$dater.txt
echo "File created on: "$dater  >> $filedest/$filenamer
echo "Script was run in: "$directory >> $filedest/$filenamer
echo "By user: " $namer  >> $filedest/$filenamer
echo "Searching for files ending in: " $fileext >> $filedest/$filenamer
echo "This file was written to: " $filedest/$filenamer >> $filedest/$filenamer
echo "The command issued was: find . -name '*.'$fileext -type f -print0 | xargs -0  stat -c 'file: %N | bytes: %s | modtime: %y | changetime: %z'" >> $filedest/$filenamer
echo "***************************"  >>$filedest/$filenamer
echo "" >> $filedest/$filenamer

find . -name '*.'$fileext -type f -print0 | xargs -0  stat -c 'file: %N | bytes: %s | modtime: %y | changetime: %z' >> $filedest/$filenamer

echo -n "Hit enter to continue "
read none

echo ""
echo "* Now I will show you the file and be done"
echo ""
echo -n "Hit enter to list or Ctrl-c to quit "
read none
less $filedest/$filenamer

UPDATE_ON column in MySQL with trigger

I needed to update a column with the date and time whenever a record changed in my database. So here is the recipe for the trigger in MySQL 5.0.

DELIMITER $$

CREATE
    /*[DEFINER = { user | CURRENT_USER }]*/
    TRIGGER `my_database`.`UpdatedOn` BEFORE UPDATE
    ON `my_database`.`my_table`
    FOR EACH ROW BEGIN
set NEW.update_on = now();
    END$$

DELIMITER ;