Commande Unix wc en C

Marsh Posté le 20-11-2005 à 20:28:54

Bonjour

Je voudrais un programme en C qui est identique a la commande wc.

Il faut que ce programme compte le nombre de caracteres, de mots et de lignes d'un fichier passé en paramètre.

Exemple :
wc fichier.txt
nb de caracteres : 658
nb de mots : 137
nb de lignes : 32

Merci pour votre aide

Message cité 1 fois

Reply

Marsh Posté le 20-11-2005 à 20:28:54

Reply

Marsh Posté le 20-11-2005 à 20:31:42

ben alors tu veux quoi qu on te le code du debut a la fin

Reply

Marsh Posté le 20-11-2005 à 20:35:29

Cela ressemble à un devoir d'étudiant.
Il faudrait que tu cherches par toi-même, et que tu ne viennes demander de l'aide qu'après avoir étudier le problème, avoir commencer, et être bloquer par un obstacle que tu peux décrire.

Cependant, j'ai trouvé le programme suivant dans mes tiroirs :

Code :

/*
File wc.c - a sample word count program
Written and submitted to public domain by Jay Elkes
April, 1992
*/
#include <stdio.h>
#include <ctype.h>
int main (int argc, char *argv[])
{
FILE *infileptr;
char infile[80];
long int nl = 0;
long int nc = 0;
long int nw = 0;
int state = 0;
const int NEWLINE = '\n';
int c;
/* The program name itself is the first command line arguement so we
ignore it (argv[0]) when showing user entered parameters. */
switch (argc - 1)
{
case (0):
printf("no parameters\n" );
return 12;
case (1):
break;
default:
printf("too many parameters\n" );
return 12;
}
strcpy(infile,argv[1]);
infileptr = fopen(infile,"rb" );
if (infileptr == NULL)
{
printf("Cannot open %s\n",infile);
return 12;
}
while ((c = getc(infileptr)) != EOF)
{
++nc;
if (c == NEWLINE)
++nl;
if (isspace(c))
state = 0;
else if (state == 0)
{
state = 1;
++nw;
}
}
/* Final Housekeeping */
printf("%ld Lines, %ld Words, %ld Characters", nl, nw, nc);
return 0;
}

Reply

Marsh Posté le 20-11-2005 à 20:36:48

T'ouvres le fichier, tu lis avec fgets() par exemple, à chaque fin de boucle t'incrémentes le nombre de lignes.
Le nombre de caractères tu peux prendre strlen de la ligne, et un mot c'est separé par un espace donc tu vois avec ça, tu fermes le fichier et t'affiche tes résultats.

Ou alors tu fait un popen sur wc et tu affiche les résultats

Edit: ou alors tu regardes au dessus..

Message édité par Flaie le 20-11-2005 à 20:37:50

---------------
Always wear a camera!

Reply

Marsh Posté le 20-11-2005 à 20:36:53

EDIT : multi-grillage, donc inutile d'en rajouter

Message édité par Elmoricq le 20-11-2005 à 20:39:04

Reply

Marsh Posté le 20-11-2005 à 22:40:06

JustOne72 a écrit :

Je voudrais un programme en C qui est identique a la commande wc.

Tu as oublié

Soit de poster le code que tu as ecrit en indiquant ce qui ne va pas.
Soit de donner l'email de ton prof pour qu'on lui envoie le code directement...
Soit de brancher ton cerveau

Message édité par Emmanuel Delahaye le 20-11-2005 à 23:03:21

---------------
Des infos sur la programmation et le langage C: http://www.bien-programmer.fr Pas de Wi-Fi à la maison : http://www.cpl-france.org/

Reply

Marsh Posté le 20-11-2005 à 23:42:55

Code :

/* wc - print the number of bytes, words, and lines in files
Copyright (C) 85, 91, 1995-2002 Free Software Foundation, Inc.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2, or (at your option)
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software Foundation,
Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */
/* Written by Paul Rubin, phr@ocf.berkeley.edu
and David MacKenzie, djm@gnu.ai.mit.edu. */
#include <config.h>
#if HAVE_INTTYPES_H
# include <inttypes.h>
#endif
#include <stdio.h>
#include <getopt.h>
#include <sys/types.h>
/* Get mbstate_t, mbrtowc(), wcwidth(). */
#if HAVE_WCHAR_H
#include <wchar.h>
#endif
/* Get iswprint(), iswspace(). */
#if HAVE_WCTYPE_H
# include <wctype.h>
#ndif
#if !defined iswprint && !HAVE_ISWPRINT
# define iswprint(wc) 1
#endif
#if !defined iswspace && !HAVE_ISWSPACE
#define iswspace(wc) \
((wc) == (unsigned char) (wc) && ISSPACE ((unsigned char) (wc)))
#endif
/* Include this after wctype.h so that we `#undef' ISPRINT
(from Solaris's euc.h, from widec.h, from wctype.h) before
redefining and using it. */
#include "system.h"
#include "closeout.h"
#include "error.h"
#include "human.h"
#include "safe-read.h"
/* Some systems, like BeOS, have multibyte encodings but lack mbstate_t. */
#if HAVE_MBRTOWC && defined mbstate_t
#define mbrtowc(pwc, s, n, ps) (mbrtowc) (pwc, s, n, 0)
#endif
#ifndef HAVE_DECL_WCWIDTH
"this configure-time declaration test was not run"
#endif
#if !HAVE_DECL_WCWIDTH
extern int wcwidth ();
#endif
/* If wcwidth() doesn't exist, assume all printable characters have
width 1. */
#if !defined wcwidth && !HAVE_WCWIDTH
#define wcwidth(wc) ((wc) == 0 ? 0 : iswprint (wc) ? 1 : -1)
#endif
/* The official name of this program (e.g., no `g' prefix). */
#define PROGRAM_NAME "wc"
#define AUTHORS N_ ("Paul Rubin and David MacKenzie" )
/* Size of atomic reads. */
#define BUFFER_SIZE (16 * 1024)
/* The name this program was run with. */
char *program_name;
/* Cumulative number of lines, words, chars and bytes in all files so far.
max_line_length is the maximum over all files processed so far. */
static uintmax_t total_lines;
static uintmax_t total_words;
static uintmax_t total_chars;
static uintmax_t total_bytes;
static uintmax_t max_line_length;
/* Which counts to print. */
static int print_lines, print_words, print_chars, print_bytes;
static int print_linelength;
/* Nonzero if we have ever read the standard input. */
static int have_read_stdin;
/* The error code to return to the system. */
static int exit_status;
/* If nonzero, do not line up columns but instead separate numbers by
a single space as specified in Single Unix Specification and POSIX. */
static int posixly_correct;
static struct option const longopts[] =
{
{"bytes", no_argument, NULL, 'c'},
{"chars", no_argument, NULL, 'm'},
{"lines", no_argument, NULL, 'l'},
{"words", no_argument, NULL, 'w'},
{"max-line-length", no_argument, NULL, 'L'},
{GETOPT_HELP_OPTION_DECL},
{GETOPT_VERSION_OPTION_DECL},
{NULL, 0, NULL, 0}
};
void
usage (int status)
{
if (status != 0)
fprintf (stderr, _("Try `%s --help' for more information.\n" ),
program_name);
else
{
printf (_("\
Usage: %s [OPTION]... [FILE]...\n\
" ),
program_name);
fputs (_("\
Print byte, word, and newline counts for each FILE, and a total line if\n\
more than one FILE is specified. With no FILE, or when FILE is -,\n\
read standard input.\n\
-c, --bytes print the byte counts\n\
-m, --chars print the character counts\n\
-l, --lines print the newline counts\n\
" ), stdout);
fputs (_("\
-L, --max-line-length print the length of the longest line\n\
-w, --words print the word counts\n\
" ), stdout);
fputs (HELP_OPTION_DESCRIPTION, stdout);
fputs (VERSION_OPTION_DESCRIPTION, stdout);
printf (_("\nReport bugs to <%s>.\n" ), PACKAGE_BUGREPORT);
}
exit (status == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
}
static void
write_counts (uintmax_t lines,
uintmax_t words,
uintmax_t chars,
uintmax_t bytes,
uintmax_t linelength,
const char *file)
{
char buf[LONGEST_HUMAN_READABLE + 1];
char const *space = "";
char const *format_int = (posixly_correct ? "%s" : "%7s" );
char const *format_sp_int = (posixly_correct ? "%s%s" : "%s%7s" );
if (print_lines)
{
printf (format_int, human_readable (lines, buf, 1, 1));
space = " ";
}
if (print_words)
{
printf (format_sp_int, space, human_readable (words, buf, 1, 1));
space = " ";
}
if (print_chars)
{
printf (format_sp_int, space, human_readable (chars, buf, 1, 1));
space = " ";
}
if (print_bytes)
{
printf (format_sp_int, space, human_readable (bytes, buf, 1, 1));
space = " ";
}
if (print_linelength)
{
printf (format_sp_int, space, human_readable (linelength, buf, 1, 1));
}
if (*file)
printf (" %s", file);
putchar ('\n');
}
static void
wc (int fd, const char *file)
{
char buf[BUFFER_SIZE + 1];
ssize_t bytes_read;
uintmax_t lines, words, chars, bytes, linelength;
int count_bytes, count_chars, count_complicated;
lines = words = chars = bytes = linelength = 0;
/* If in the current locale, chars are equivalent to bytes, we prefer
counting bytes, because that's easier. */
#if HAVE_MBRTOWC && (MB_LEN_MAX > 1)
if (MB_CUR_MAX > 1)
{
count_bytes = print_bytes;
count_chars = print_chars;
}
else
#endif
{
count_bytes = print_bytes + print_chars;
count_chars = 0;
}
count_complicated = print_words + print_linelength;
/* We need binary input, since `wc' relies on `lseek' and byte counts. */
SET_BINARY (fd);
/* When counting only bytes, save some line- and word-counting
overhead. If FD is a `regular' Unix file, using lseek is enough
to get its `size' in bytes. Otherwise, read blocks of BUFFER_SIZE
bytes at a time until EOF. Note that the `size' (number of bytes)
that wc reports is smaller than stats.st_size when the file is not
positioned at its beginning. That's why the lseek calls below are
necessary. For example the command
`(dd ibs=99k skip=1 count=0; ./wc -c) < /etc/group'
should make wc report `0' bytes. */
if (count_bytes && !count_chars && !print_lines && !count_complicated)
{
off_t current_pos, end_pos;
struct stat stats;
if (fstat (fd, &stats) == 0 && S_ISREG (stats.st_mode)
&& (current_pos = lseek (fd, (off_t) 0, SEEK_CUR)) != -1
&& (end_pos = lseek (fd, (off_t) 0, SEEK_END)) != -1)
{
off_t diff;
/* Be careful here. The current position may actually be
beyond the end of the file. As in the example above. */
bytes = (diff = end_pos - current_pos) < 0 ? 0 : diff;
}
else
{
while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
{
bytes += bytes_read;
}
if (bytes_read < 0)
{
error (0, errno, "%s", file);
exit_status = 1;
}
}
}
else if (!count_chars && !count_complicated)
{
/* Use a separate loop when counting only lines or lines and bytes --
but not chars or words. */
while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
{
register char *p = buf;
while ((p = memchr (p, '\n', (buf + bytes_read) - p)))
{
++p;
++lines;
}
bytes += bytes_read;
}
if (bytes_read < 0)
{
error (0, errno, "%s", file);
exit_status = 1;
}
}
#if HAVE_MBRTOWC && (MB_LEN_MAX > 1)
# define SUPPORT_OLD_MBRTOWC 1
else if (MB_CUR_MAX > 1)
{
int in_word = 0;
uintmax_t linepos = 0;
mbstate_t state;
uintmax_t last_error_line = 0;
int last_error_errno = 0;
# if SUPPORT_OLD_MBRTOWC
/* Back-up the state before each multibyte character conversion and
move the last incomplete character of the buffer to the front
of the buffer. This is needed because we don't know whether
the `mbrtowc' function updates the state when it returns -2, -
this is the ISO C 99 and glibc-2.2 behaviour - or not - amended
ANSI C, glibc-2.1 and Solaris 2.7 behaviour. We don't have an
autoconf test for this, yet. */
int prev = 0; /* number of bytes carried over from previous round */
# else
const int prev = 0;
# endif
memset (&state, 0, sizeof (mbstate_t));
while ((bytes_read = safe_read (fd, buf + prev, BUFFER_SIZE - prev)) > 0)
{
const char *p;
# if SUPPORT_OLD_MBRTOWC
mbstate_t backup_state;
# endif
bytes += bytes_read;
p = buf;
bytes_read += prev;
do
{
wchar_t wide_char;
size_t n;
# if SUPPORT_OLD_MBRTOWC
backup_state = state;
# endif
n = mbrtowc (&wide_char, p, bytes_read, &state);
if (n == (size_t) -2)
{
# if SUPPORT_OLD_MBRTOWC
state = backup_state;
# endif
break;
}
if (n == (size_t) -1)
{
/* Signal repeated errors only once per line. */
if (!(lines + 1 == last_error_line
&& errno == last_error_errno))
{
char hr_buf[LONGEST_HUMAN_READABLE + 1];
last_error_line = lines + 1;
last_error_errno = errno;
error (0, errno, "%s:%s", file,
human_readable (lines + 1, hr_buf, 1, 1));
}
p++;
bytes_read--;
}
else
{
if (n == 0)
{
wide_char = 0;
n = 1;
}
p += n;
bytes_read -= n;
chars++;
switch (wide_char)
{
case '\n':
lines++;
/* Fall through. */
case '\r':
case '\f':
if (linepos > linelength)
linelength = linepos;
linepos = 0;
goto mb_word_separator;
case '\t':
linepos += 8 - (linepos % 8);
goto mb_word_separator;
case ' ':
linepos++;
/* Fall through. */
case '\v':
mb_word_separator:
if (in_word)
{
in_word = 0;
words++;
}
break;
default:
if (iswprint (wide_char))
{
int width = wcwidth (wide_char);
if (width > 0)
linepos += width;
if (iswspace (wide_char))
goto mb_word_separator;
in_word = 1;
}
break;
}
}
}
while (bytes_read > 0);
# if SUPPORT_OLD_MBRTOWC
if (bytes_read > 0)
{
if (bytes_read == BUFFER_SIZE)
{
/* Encountered a very long redundant shift sequence. */
p++;
bytes_read--;
}
memmove (buf, p, bytes_read);
}
prev = bytes_read;
# endif
}
if (bytes_read < 0)
{
error (0, errno, "%s", file);
exit_status = 1;
}
if (linepos > linelength)
linelength = linepos;
if (in_word)
words++;
}
#endif
else
{
int in_word = 0;
uintmax_t linepos = 0;
while ((bytes_read = safe_read (fd, buf, BUFFER_SIZE)) > 0)
{
const char *p = buf;
bytes += bytes_read;
do
{
switch (*p++)
{
case '\n':
lines++;
/* Fall through. */
case '\r':
case '\f':
if (linepos > linelength)
linelength = linepos;
linepos = 0;
goto word_separator;
case '\t':
linepos += 8 - (linepos % 8);
goto word_separator;
case ' ':
linepos++;
/* Fall through. */
case '\v':
word_separator:
if (in_word)
{
in_word = 0;
words++;
}
break;
default:
if (ISPRINT ((unsigned char) p[-1]))
{
linepos++;
if (ISSPACE ((unsigned char) p[-1]))
goto word_separator;
in_word = 1;
}
break;
}
}
while (--bytes_read);
}
if (bytes_read < 0)
{
error (0, errno, "%s", file);
exit_status = 1;
}
if (linepos > linelength)
linelength = linepos;
if (in_word)
words++;
}
if (count_chars < print_chars)
chars = bytes;
write_counts (lines, words, chars, bytes, linelength, file);
total_lines += lines;
total_words += words;
total_chars += chars;
total_bytes += bytes;
if (linelength > max_line_length)
max_line_length = linelength;
}
static void
wc_file (const char *file)
{
if (STREQ (file, "-" ))
{
have_read_stdin = 1;
wc (0, file);
}
else
{
int fd = open (file, O_RDONLY);
if (fd == -1)
{
error (0, errno, "%s", file);
exit_status = 1;
return;
}
wc (fd, file);
if (close (fd))
{
error (0, errno, "%s", file);
exit_status = 1;
}
}
}
int
main (int argc, char **argv)
{
int optc;
int nfiles;
program_name = argv[0];
setlocale (LC_ALL, "" );
bindtextdomain (PACKAGE, LOCALEDIR);
textdomain (PACKAGE);
atexit (close_stdout);
exit_status = 0;
posixly_correct = (getenv ("POSIXLY_CORRECT" ) != NULL);
print_lines = print_words = print_chars = print_bytes = print_linelength = 0;
total_lines = total_words = total_chars = total_bytes = max_line_length = 0;
while ((optc = getopt_long (argc, argv, "clLmw", longopts, NULL)) != -1)
switch (optc)
{
case 0:
break;
case 'c':
print_bytes = 1;
break;
case 'm':
print_chars = 1;
break;
case 'l':
print_lines = 1;
break;
case 'w':
print_words = 1;
break;
case 'L':
print_linelength = 1;
break;
case_GETOPT_HELP_CHAR;
case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
default:
usage (1);
}
if (print_lines + print_words + print_chars + print_bytes + print_linelength
== 0)
print_lines = print_words = print_bytes = 1;
nfiles = argc - optind;
if (nfiles == 0)
{
have_read_stdin = 1;
wc (0, "" );
}
else
{
for (; optind < argc; ++optind)
wc_file (argv[optind]);
if (nfiles > 1)
write_counts (total_lines, total_words, total_chars, total_bytes,
max_line_length, _("total" ));
}
if (have_read_stdin && close (0))
error (EXIT_FAILURE, errno, "-" );
exit (exit_status == 0 ? EXIT_SUCCESS : EXIT_FAILURE);
}

de rien [:kbchris]

Message édité par masklinn le 20-11-2005 à 23:44:01

---------------
I mean, true, a cancer will probably destroy its host organism. But what about the cells whose mutations allow them to think outside the box by throwing away the limits imposed by overbearing genetic regulations? Isn't that a good thing?

Reply

Marsh Posté le 21-11-2005 à 01:21:12

:lol: 596 lignes pour compter lignes, caracteres et mots !! [:petrus75]

Message cité 1 fois

Reply

Marsh Posté le 21-11-2005 à 07:01:55

gocho a écrit :

:lol: 596 lignes pour compter lignes, caracteres et mots !! [:petrus75]

Euh, tu l'as lu au moins ? Et compris ? [:pingouino]

Non parce que se référer au nombre de lignes d'un source qui compte autant de commentaires, de directives de compilation et de fonctions type "j'affiche l'aide de mon programme", c'est un peu stupide hein.

Qui plus est ce programme compte n fichiers, il évite les cas tordus, et est certainement bien plus rapide que la boucle avec fgetc() [:petrus75]

Tiens d'ailleurs c'est le vrai wc celui-là. Celui qui va pas me mettre 15 ans à me sortir le nombre de bytes d'un fichier de 150Mo

Bon évidemment, pour le posteur initial et son devoir d'étudiant, c'est de l'overkill, mais bon le but de Masklinn c'était pas vraiment de l'aider non plus.

Message cité 2 fois
Message édité par Elmoricq le 21-11-2005 à 07:13:43

Reply

Marsh Posté le 21-11-2005 à 10:29:14

Elmoricq a écrit :

Celui qui va pas me mettre 15 ans à me sortir le nombre de bytes d'un fichier de 150Mo

Justement, je me suis posé la question, et voici quelques résultats intéressants (PC Celeron 2.8 GHz/XP)
Le fichier de test est un gros .zip de 219 Mo (donc lecture en binaire)

Test 1 met en oeuvre fgetc()

while ((c = fgetc (fp)) != EOF)

Test 2 met en oeuvre fread() avec un buffer de BUFSIZ et une lecture par blocs de BUFSIZ bytes.

while ((n = fread (buf, sizeof buf, 1, fp)) > 0)

Test 3 met en oeuvre fread() avec un buffer de BUFSIZ et BUFSIZ lectures de un byte.

while ((n = fread (buf, 1, sizeof buf, fp)) > 0)

Résultat :

g:/asma.zip has 230493897 bytes
"Test 1" done in 105.61 s
g:/asma.zip has 230493696 bytes (at least)
"Test 2" done in 0.58 s
g:/asma.zip has 230493897 bytes
"Test 3" done in 0.56 s

Je vois déjà les cris 'haro sur la lecture byte par byte" etc... Ok?

Poursuivons en inversant les tests :

g:/asma.zip has 230493696 bytes (at least)
"Test 2" done in 80.92 s
g:/asma.zip has 230493897 bytes
"Test 1" done in 50.25 s
g:/asma.zip has 230493897 bytes
"Test 3" done in 0.61 s

Les choses ne sont plus si claires...
D'autres séquences montrent d'autres résultats.

g:/asma.zip has 230493897 bytes
"Test 3" done in 75.41 s
g:/asma.zip has 230493897 bytes
"Test 1" done in 43.48 s
g:/asma.zip has 230493696 bytes (at least)
"Test 2" done in 0.52 s

Conclusion, seul le test joué en premier est concluant. Ensuite, les caches faussent complètement les résultats. Se méfier des conclusions hatives, et décrire ses conditions de test...

Le rapport entre fgetc() et fread() n'est finalement de moins de 1 à 2...

Message édité par Emmanuel Delahaye le 21-11-2005 à 10:45:57

---------------
Des infos sur la programmation et le langage C: http://www.bien-programmer.fr Pas de Wi-Fi à la maison : http://www.cpl-france.org/

Reply

Marsh Posté le 21-11-2005 à 10:29:14

Reply

Marsh Posté le 21-11-2005 à 10:44:18

Oui mais si on se contente de compter les bytes avec wc, et si on n'est pas dans un cas tordu, il utilise lseek ce qui est quand même plus rapide qu'un parcours exhaustif.

Pour un fichier de plus de 200Mo, il me met 3 ou 4s pour me sortir le nombre de bytes utilisés, c'est plutôt raisonnable.

Et puis on voit quand même que la méthode de lecture par bloc est plus rapide, de "20% environ" sur ton seul exemple, ce n'est pas négligeable ; et je pense qu'on peut encore améliorer ce temps avec une taille de buffer correspondant à la taille du fichier (genre avec 200Mo de fichier, un buffer de 1024 c'est un peu petit).
Mais c'est sûr que le cache fausse carrément les résultats. Et j'ai mis "20% environ" entre guillemets parce que d'autres facteurs interviennent, genre ce que fait le disque dur à ce moment-là, ce que le système décide de t'accorder comme temps, la fragmentation du fichier etc...

Message édité par Elmoricq le 21-11-2005 à 10:48:05

Reply

Marsh Posté le 21-11-2005 à 10:45:49

Elmoricq a écrit :

Tiens d'ailleurs c'est le vrai wc celui-là. Celui qui va pas me mettre 15 ans à me sortir le nombre de bytes d'un fichier de 150Mo [:petrus75]

Yay, GNU WC for the win [:kbchris]

Tiré directement de Textutils

---------------
I mean, true, a cancer will probably destroy its host organism. But what about the cells whose mutations allow them to think outside the box by throwing away the limits imposed by overbearing genetic regulations? Isn't that a good thing?

Reply

Commande Unix wc en C

Sujets relatifs:

Leave a Replay