469 lines
20 KiB
Plaintext
469 lines
20 KiB
Plaintext
|
|
NAME
|
|
parse_date - parses a date string into a timespec struct.
|
|
|
|
SYNOPSIS
|
|
#include "timeutils.h"
|
|
|
|
int parse_date(struct timespec *result, char const *p,
|
|
struct timespec const *now)
|
|
|
|
LDADD libcommon.la
|
|
|
|
DESCRIPTION
|
|
Parse a date/time string, storing the resulting time value into *result.
|
|
The string itself is pointed to by *p. Return 1 if successful.
|
|
*p can be an incomplete or relative time specification; if so, use
|
|
*now as the basis for the returned time.
|
|
|
|
|
|
This function is based upon gnulib's parse-datetime.y-dd7a871.
|
|
|
|
Below is a plain text version of the gnulib parse-datetime.texi-dd7a871 manual
|
|
describing the input strings that are recognized.
|
|
|
|
Any future modifications to the util-linux parser that affect input strings
|
|
should be noted below.
|
|
|
|
|
|
1 Date input formats
|
|
********************
|
|
|
|
First, a quote:
|
|
|
|
Our units of temporal measurement, from seconds on up to months,
|
|
are so complicated, asymmetrical and disjunctive so as to make
|
|
coherent mental reckoning in time all but impossible. Indeed, had
|
|
some tyrannical god contrived to enslave our minds to time, to
|
|
make it all but impossible for us to escape subjection to sodden
|
|
routines and unpleasant surprises, he could hardly have done
|
|
better than handing down our present system. It is like a set of
|
|
trapezoidal building blocks, with no vertical or horizontal
|
|
surfaces, like a language in which the simplest thought demands
|
|
ornate constructions, useless particles and lengthy
|
|
circumlocutions. Unlike the more successful patterns of language
|
|
and science, which enable us to face experience boldly or at least
|
|
level-headedly, our system of temporal calculation silently and
|
|
persistently encourages our terror of time.
|
|
|
|
... It is as though architects had to measure length in feet,
|
|
width in meters and height in ells; as though basic instruction
|
|
manuals demanded a knowledge of five different languages. It is
|
|
no wonder then that we often look into our own immediate past or
|
|
future, last Tuesday or a week from Sunday, with feelings of
|
|
helpless confusion. ...
|
|
|
|
--Robert Grudin, `Time and the Art of Living'.
|
|
|
|
This section describes the textual date representations that GNU
|
|
programs accept. These are the strings you, as a user, can supply as
|
|
arguments to the various programs. The C interface (via the
|
|
`parse_datetime' function) is not described here.
|
|
|
|
1.1 General date syntax
|
|
=======================
|
|
|
|
A "date" is a string, possibly empty, containing many items separated
|
|
by whitespace. The whitespace may be omitted when no ambiguity arises.
|
|
The empty string means the beginning of today (i.e., midnight). Order
|
|
of the items is immaterial. A date string may contain many flavors of
|
|
items:
|
|
|
|
* calendar date items
|
|
|
|
* time of day items
|
|
|
|
* time zone items
|
|
|
|
* combined date and time of day items
|
|
|
|
* day of the week items
|
|
|
|
* relative items
|
|
|
|
* pure numbers.
|
|
|
|
We describe each of these item types in turn, below.
|
|
|
|
A few ordinal numbers may be written out in words in some contexts.
|
|
This is most useful for specifying day of the week items or relative
|
|
items (see below). Among the most commonly used ordinal numbers, the
|
|
word `last' stands for -1, `this' stands for 0, and `first' and `next'
|
|
both stand for 1. Because the word `second' stands for the unit of
|
|
time there is no way to write the ordinal number 2, but for convenience
|
|
`third' stands for 3, `fourth' for 4, `fifth' for 5, `sixth' for 6,
|
|
`seventh' for 7, `eighth' for 8, `ninth' for 9, `tenth' for 10,
|
|
`eleventh' for 11 and `twelfth' for 12.
|
|
|
|
When a month is written this way, it is still considered to be
|
|
written numerically, instead of being "spelled in full"; this changes
|
|
the allowed strings.
|
|
|
|
In the current implementation, only English is supported for words
|
|
and abbreviations like `AM', `DST', `EST', `first', `January',
|
|
`Sunday', `tomorrow', and `year'.
|
|
|
|
The output of the `date' command is not always acceptable as a date
|
|
string, not only because of the language problem, but also because
|
|
there is no standard meaning for time zone items like `IST'. When using
|
|
`date' to generate a date string intended to be parsed later, specify a
|
|
date format that is independent of language and that does not use time
|
|
zone items other than `UTC' and `Z'. Here are some ways to do this:
|
|
|
|
$ LC_ALL=C TZ=UTC0 date
|
|
Mon Mar 1 00:21:42 UTC 2004
|
|
$ TZ=UTC0 date +'%Y-%m-%d %H:%M:%SZ'
|
|
2004-03-01 00:21:42Z
|
|
$ date --rfc-3339=ns # --rfc-3339 is a GNU extension.
|
|
2004-02-29 16:21:42.692722128-08:00
|
|
$ date --rfc-2822 # a GNU extension
|
|
Sun, 29 Feb 2004 16:21:42 -0800
|
|
$ date +'%Y-%m-%d %H:%M:%S %z' # %z is a GNU extension.
|
|
2004-02-29 16:21:42 -0800
|
|
$ date +'@%s.%N' # %s and %N are GNU extensions.
|
|
@1078100502.692722128
|
|
|
|
Alphabetic case is completely ignored in dates. Comments may be
|
|
introduced between round parentheses, as long as included parentheses
|
|
are properly nested. Hyphens not followed by a digit are currently
|
|
ignored. Leading zeros on numbers are ignored.
|
|
|
|
Invalid dates like `2005-02-29' or times like `24:00' are rejected.
|
|
In the typical case of a host that does not support leap seconds, a
|
|
time like `23:59:60' is rejected even if it corresponds to a valid leap
|
|
second.
|
|
|
|
1.2 Calendar date items
|
|
=======================
|
|
|
|
A "calendar date item" specifies a day of the year. It is specified
|
|
differently, depending on whether the month is specified numerically or
|
|
literally. All these strings specify the same calendar date:
|
|
|
|
1972-09-24 # ISO 8601.
|
|
72-9-24 # Assume 19xx for 69 through 99,
|
|
# 20xx for 00 through 68.
|
|
72-09-24 # Leading zeros are ignored.
|
|
9/24/72 # Common U.S. writing.
|
|
24 September 1972
|
|
24 Sept 72 # September has a special abbreviation.
|
|
24 Sep 72 # Three-letter abbreviations always allowed.
|
|
Sep 24, 1972
|
|
24-sep-72
|
|
24sep72
|
|
|
|
The year can also be omitted. In this case, the last specified year
|
|
is used, or the current year if none. For example:
|
|
|
|
9/24
|
|
sep 24
|
|
|
|
Here are the rules.
|
|
|
|
For numeric months, the ISO 8601 format `YEAR-MONTH-DAY' is allowed,
|
|
where YEAR is any positive number, MONTH is a number between 01 and 12,
|
|
and DAY is a number between 01 and 31. A leading zero must be present
|
|
if a number is less than ten. If YEAR is 68 or smaller, then 2000 is
|
|
added to it; otherwise, if YEAR is less than 100, then 1900 is added to
|
|
it. The construct `MONTH/DAY/YEAR', popular in the United States, is
|
|
accepted. Also `MONTH/DAY', omitting the year.
|
|
|
|
Literal months may be spelled out in full: `January', `February',
|
|
`March', `April', `May', `June', `July', `August', `September',
|
|
`October', `November' or `December'. Literal months may be abbreviated
|
|
to their first three letters, possibly followed by an abbreviating dot.
|
|
It is also permitted to write `Sept' instead of `September'.
|
|
|
|
When months are written literally, the calendar date may be given as
|
|
any of the following:
|
|
|
|
DAY MONTH YEAR
|
|
DAY MONTH
|
|
MONTH DAY YEAR
|
|
DAY-MONTH-YEAR
|
|
|
|
Or, omitting the year:
|
|
|
|
MONTH DAY
|
|
|
|
1.3 Time of day items
|
|
=====================
|
|
|
|
A "time of day item" in date strings specifies the time on a given day.
|
|
Here are some examples, all of which represent the same time:
|
|
|
|
20:02:00.000000
|
|
20:02
|
|
8:02pm
|
|
20:02-0500 # In EST (U.S. Eastern Standard Time).
|
|
|
|
More generally, the time of day may be given as
|
|
`HOUR:MINUTE:SECOND', where HOUR is a number between 0 and 23, MINUTE
|
|
is a number between 0 and 59, and SECOND is a number between 0 and 59
|
|
possibly followed by `.' or `,' and a fraction containing one or more
|
|
digits. Alternatively, `:SECOND' can be omitted, in which case it is
|
|
taken to be zero. On the rare hosts that support leap seconds, SECOND
|
|
may be 60.
|
|
|
|
If the time is followed by `am' or `pm' (or `a.m.' or `p.m.'), HOUR
|
|
is restricted to run from 1 to 12, and `:MINUTE' may be omitted (taken
|
|
to be zero). `am' indicates the first half of the day, `pm' indicates
|
|
the second half of the day. In this notation, 12 is the predecessor of
|
|
1: midnight is `12am' while noon is `12pm'. (This is the zero-oriented
|
|
interpretation of `12am' and `12pm', as opposed to the old tradition
|
|
derived from Latin which uses `12m' for noon and `12pm' for midnight.)
|
|
|
|
The time may alternatively be followed by a time zone correction,
|
|
expressed as `SHHMM', where S is `+' or `-', HH is a number of zone
|
|
hours and MM is a number of zone minutes. The zone minutes term, MM,
|
|
may be omitted, in which case the one- or two-digit correction is
|
|
interpreted as a number of hours. You can also separate HH from MM
|
|
with a colon. When a time zone correction is given this way, it forces
|
|
interpretation of the time relative to Coordinated Universal Time
|
|
(UTC), overriding any previous specification for the time zone or the
|
|
local time zone. For example, `+0530' and `+05:30' both stand for the
|
|
time zone 5.5 hours ahead of UTC (e.g., India). This is the best way to
|
|
specify a time zone correction by fractional parts of an hour. The
|
|
maximum zone correction is 24 hours.
|
|
|
|
Either `am'/`pm' or a time zone correction may be specified, but not
|
|
both.
|
|
|
|
1.4 Time zone items
|
|
===================
|
|
|
|
A "time zone item" specifies an international time zone, indicated by a
|
|
small set of letters, e.g., `UTC' or `Z' for Coordinated Universal
|
|
Time. Any included periods are ignored. By following a
|
|
non-daylight-saving time zone by the string `DST' in a separate word
|
|
(that is, separated by some white space), the corresponding daylight
|
|
saving time zone may be specified. Alternatively, a
|
|
non-daylight-saving time zone can be followed by a time zone
|
|
correction, to add the two values. This is normally done only for
|
|
`UTC'; for example, `UTC+05:30' is equivalent to `+05:30'.
|
|
|
|
Time zone items other than `UTC' and `Z' are obsolescent and are not
|
|
recommended, because they are ambiguous; for example, `EST' has a
|
|
different meaning in Australia than in the United States. Instead,
|
|
it's better to use unambiguous numeric time zone corrections like
|
|
`-0500', as described in the previous section.
|
|
|
|
If neither a time zone item nor a time zone correction is supplied,
|
|
timestamps are interpreted using the rules of the default time zone
|
|
(*note Specifying time zone rules::).
|
|
|
|
1.5 Combined date and time of day items
|
|
=======================================
|
|
|
|
The ISO 8601 date and time of day extended format consists of an ISO
|
|
8601 date, a `T' character separator, and an ISO 8601 time of day.
|
|
This format is also recognized if the `T' is replaced by a space.
|
|
|
|
In this format, the time of day should use 24-hour notation.
|
|
Fractional seconds are allowed, with either comma or period preceding
|
|
the fraction. ISO 8601 fractional minutes and hours are not supported.
|
|
Typically, hosts support nanosecond timestamp resolution; excess
|
|
precision is silently discarded.
|
|
|
|
Here are some examples:
|
|
|
|
2012-09-24T20:02:00.052-05:00
|
|
2012-12-31T23:59:59,999999999+11:00
|
|
1970-01-01 00:00Z
|
|
|
|
1.6 Day of week items
|
|
=====================
|
|
|
|
The explicit mention of a day of the week will forward the date (only
|
|
if necessary) to reach that day of the week in the future.
|
|
|
|
Days of the week may be spelled out in full: `Sunday', `Monday',
|
|
`Tuesday', `Wednesday', `Thursday', `Friday' or `Saturday'. Days may
|
|
be abbreviated to their first three letters, optionally followed by a
|
|
period. The special abbreviations `Tues' for `Tuesday', `Wednes' for
|
|
`Wednesday' and `Thur' or `Thurs' for `Thursday' are also allowed.
|
|
|
|
A number may precede a day of the week item to move forward
|
|
supplementary weeks. It is best used in expression like `third
|
|
monday'. In this context, `last DAY' or `next DAY' is also acceptable;
|
|
they move one week before or after the day that DAY by itself would
|
|
represent.
|
|
|
|
A comma following a day of the week item is ignored.
|
|
|
|
1.7 Relative items in date strings
|
|
==================================
|
|
|
|
"Relative items" adjust a date (or the current date if none) forward or
|
|
backward. The effects of relative items accumulate. Here are some
|
|
examples:
|
|
|
|
1 year
|
|
1 year ago
|
|
3 years
|
|
2 days
|
|
|
|
The unit of time displacement may be selected by the string `year'
|
|
or `month' for moving by whole years or months. These are fuzzy units,
|
|
as years and months are not all of equal duration. More precise units
|
|
are `fortnight' which is worth 14 days, `week' worth 7 days, `day'
|
|
worth 24 hours, `hour' worth 60 minutes, `minute' or `min' worth 60
|
|
seconds, and `second' or `sec' worth one second. An `s' suffix on
|
|
these units is accepted and ignored.
|
|
|
|
The unit of time may be preceded by a multiplier, given as an
|
|
optionally signed number. Unsigned numbers are taken as positively
|
|
signed. No number at all implies 1 for a multiplier. Following a
|
|
relative item by the string `ago' is equivalent to preceding the unit
|
|
by a multiplier with value -1.
|
|
|
|
The string `tomorrow' is worth one day in the future (equivalent to
|
|
`day'), the string `yesterday' is worth one day in the past (equivalent
|
|
to `day ago').
|
|
|
|
The strings `now' or `today' are relative items corresponding to
|
|
zero-valued time displacement, these strings come from the fact a
|
|
zero-valued time displacement represents the current time when not
|
|
otherwise changed by previous items. They may be used to stress other
|
|
items, like in `12:00 today'. The string `this' also has the meaning
|
|
of a zero-valued time displacement, but is preferred in date strings
|
|
like `this thursday'.
|
|
|
|
When a relative item causes the resulting date to cross a boundary
|
|
where the clocks were adjusted, typically for daylight saving time, the
|
|
resulting date and time are adjusted accordingly.
|
|
|
|
The fuzz in units can cause problems with relative items. For
|
|
example, `2003-07-31 -1 month' might evaluate to 2003-07-01, because
|
|
2003-06-31 is an invalid date. To determine the previous month more
|
|
reliably, you can ask for the month before the 15th of the current
|
|
month. For example:
|
|
|
|
$ date -R
|
|
Thu, 31 Jul 2003 13:02:39 -0700
|
|
$ date --date='-1 month' +'Last month was %B?'
|
|
Last month was July?
|
|
$ date --date="$(date +%Y-%m-15) -1 month" +'Last month was %B!'
|
|
Last month was June!
|
|
|
|
Also, take care when manipulating dates around clock changes such as
|
|
daylight saving leaps. In a few cases these have added or subtracted
|
|
as much as 24 hours from the clock, so it is often wise to adopt
|
|
universal time by setting the `TZ' environment variable to `UTC0'
|
|
before embarking on calendrical calculations.
|
|
|
|
1.8 Pure numbers in date strings
|
|
================================
|
|
|
|
The precise interpretation of a pure decimal number depends on the
|
|
context in the date string.
|
|
|
|
If the decimal number is of the form YYYYMMDD and no other calendar
|
|
date item (*note Calendar date items::) appears before it in the date
|
|
string, then YYYY is read as the year, MM as the month number and DD as
|
|
the day of the month, for the specified calendar date.
|
|
|
|
If the decimal number is of the form HHMM and no other time of day
|
|
item appears before it in the date string, then HH is read as the hour
|
|
of the day and MM as the minute of the hour, for the specified time of
|
|
day. MM can also be omitted.
|
|
|
|
If both a calendar date and a time of day appear to the left of a
|
|
number in the date string, but no relative item, then the number
|
|
overrides the year.
|
|
|
|
1.9 Seconds since the Epoch
|
|
===========================
|
|
|
|
If you precede a number with `@', it represents an internal timestamp
|
|
as a count of seconds. The number can contain an internal decimal
|
|
point (either `.' or `,'); any excess precision not supported by the
|
|
internal representation is truncated toward minus infinity. Such a
|
|
number cannot be combined with any other date item, as it specifies a
|
|
complete timestamp.
|
|
|
|
Internally, computer times are represented as a count of seconds
|
|
since an epoch--a well-defined point of time. On GNU and POSIX
|
|
systems, the epoch is 1970-01-01 00:00:00 UTC, so `@0' represents this
|
|
time, `@1' represents 1970-01-01 00:00:01 UTC, and so forth. GNU and
|
|
most other POSIX-compliant systems support such times as an extension
|
|
to POSIX, using negative counts, so that `@-1' represents 1969-12-31
|
|
23:59:59 UTC.
|
|
|
|
Traditional Unix systems count seconds with 32-bit two's-complement
|
|
integers and can represent times from 1901-12-13 20:45:52 through
|
|
2038-01-19 03:14:07 UTC. More modern systems use 64-bit counts of
|
|
seconds with nanosecond subcounts, and can represent all the times in
|
|
the known lifetime of the universe to a resolution of 1 nanosecond.
|
|
|
|
On most hosts, these counts ignore the presence of leap seconds.
|
|
For example, on most hosts `@915148799' represents 1998-12-31 23:59:59
|
|
UTC, `@915148800' represents 1999-01-01 00:00:00 UTC, and there is no
|
|
way to represent the intervening leap second 1998-12-31 23:59:60 UTC.
|
|
|
|
1.10 Specifying time zone rules
|
|
===============================
|
|
|
|
Normally, dates are interpreted using the rules of the current time
|
|
zone, which in turn are specified by the `TZ' environment variable, or
|
|
by a system default if `TZ' is not set. To specify a different set of
|
|
default time zone rules that apply just to one date, start the date
|
|
with a string of the form `TZ="RULE"'. The two quote characters (`"')
|
|
must be present in the date, and any quotes or backslashes within RULE
|
|
must be escaped by a backslash.
|
|
|
|
For example, with the GNU `date' command you can answer the question
|
|
"What time is it in New York when a Paris clock shows 6:30am on October
|
|
31, 2004?" by using a date beginning with `TZ="Europe/Paris"' as shown
|
|
in the following shell transcript:
|
|
|
|
$ export TZ="America/New_York"
|
|
$ date --date='TZ="Europe/Paris" 2004-10-31 06:30'
|
|
Sun Oct 31 01:30:00 EDT 2004
|
|
|
|
In this example, the `--date' operand begins with its own `TZ'
|
|
setting, so the rest of that operand is processed according to
|
|
`Europe/Paris' rules, treating the string `2004-10-31 06:30' as if it
|
|
were in Paris. However, since the output of the `date' command is
|
|
processed according to the overall time zone rules, it uses New York
|
|
time. (Paris was normally six hours ahead of New York in 2004, but
|
|
this example refers to a brief Halloween period when the gap was five
|
|
hours.)
|
|
|
|
A `TZ' value is a rule that typically names a location in the `tz'
|
|
database (http://www.twinsun.com/tz/tz-link.htm). A recent catalog of
|
|
location names appears in the TWiki Date and Time Gateway
|
|
(http://twiki.org/cgi-bin/xtra/tzdate). A few non-GNU hosts require a
|
|
colon before a location name in a `TZ' setting, e.g.,
|
|
`TZ=":America/New_York"'.
|
|
|
|
The `tz' database includes a wide variety of locations ranging from
|
|
`Arctic/Longyearbyen' to `Antarctica/South_Pole', but if you are at sea
|
|
and have your own private time zone, or if you are using a non-GNU host
|
|
that does not support the `tz' database, you may need to use a POSIX
|
|
rule instead. Simple POSIX rules like `UTC0' specify a time zone
|
|
without daylight saving time; other rules can specify simple daylight
|
|
saving regimes. *Note Specifying the Time Zone with `TZ': (libc)TZ
|
|
Variable.
|
|
|
|
1.11 Authors of `parse_datetime'
|
|
================================
|
|
|
|
`parse_datetime' started life as `getdate', as originally implemented
|
|
by Steven M. Bellovin (<smb@research.att.com>) while at the University
|
|
of North Carolina at Chapel Hill. The code was later tweaked by a
|
|
couple of people on Usenet, then completely overhauled by Rich $alz
|
|
(<rsalz@bbn.com>) and Jim Berets (<jberets@bbn.com>) in August, 1990.
|
|
Various revisions for the GNU system were made by David MacKenzie, Jim
|
|
Meyering, Paul Eggert and others, including renaming it to `get_date' to
|
|
avoid a conflict with the alternative Posix function `getdate', and a
|
|
later rename to `parse_datetime'. The Posix function `getdate' can
|
|
parse more locale-specific dates using `strptime', but relies on an
|
|
environment variable and external file, and lacks the thread-safety of
|
|
`parse_datetime'.
|
|
|
|
This chapter was originally produced by François Pinard
|
|
(<pinard@iro.umontreal.ca>) from the `parse_datetime.y' source code,
|
|
and then edited by K. Berry (<kb@cs.umb.edu>).
|
|
|