CodingNumpyPython
Create NumPy array from Text file
Python NumPy tutorial to create multi dimensional array from text file like CSV, TSV and other. Example Source code in Python and Jupyter.
1 Min18 Feb, 2019

## 1. Intro

NumPy has helpful methods to create an array from text files like CSV and TSV. In real life our data often lives in the file system, hence these methods decrease the development/analysis time dramatically.

``````numpy.loadtxt(fname, dtype=, comments='#', delimiter=None, converters=None, skiprows=0, usecols=None, unpack=False, ndmin=0, encoding='bytes', max_rows=None)
``````

Numpy loadtxt() method is an efficient way to load data from text files where each row have distinct value counts.

## 2. NumPy array from CSV file

We have a csv file with delhi rainfall data in millimeters for every months of year 2017 and 2018.

CSV file

``````12.0, 12.0, 14.0, 16.0, 19.0, 12.0, 11.0, 14.0, 17.0, 19.0, 11.0, 11.5
13.0, 11.0, 13.5, 16.7, 15.0, 11.0, 12.0, 11.0, 19.0, 18.0, 13.0, 12.5
``````

We will create a NumPy array from a CSV file using `numpy.loadtxt()` method. This method takes a delimiter character, which makes it very flexible to handle files.

``````#%%
# Create an array from rain-fall.csv, keeping rainfall data in mm
array_rain_fall = np.loadtxt(fname="rain-fall.csv", delimiter=",")
print("NumPy array: \n", array_rain_fall)
print("Shape: ", array_rain_fall.shape)
print("Data Type: ", array_rain_fall.dtype.name)
``````

OUTPUT

``````NumPy array:
[[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
[13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape:  (2, 12)
Data Type:  float64
``````

### 2.1 Error when different column counts in rows

While creating NumPy array using numpy.loadtxt() method, make sure CSV rows have distinct column counts, lack of it will result in an error.

We are trying to use numpy.loadtxt() method when there is a difference in column counts in the rain-fall-wrong.csv file.

``````#%%
# Check error when different column counts in rows
array_rain_fall_wrong = np.loadtxt(
fname="rain-fall-wrong.csv", delimiter=","
)
``````

OUTPUT:

``````ValueError: Wrong number of columns at line 2
``````

### 2.2 Skipping rows and columns in CSV

We can skip rows and columns while creating a NumPy array from CSV. It is useful when CSV contains row and column names.

We have to pass skiprows and usecols argument to loadtxt() method.

rain-fall-row-col-names.csv file:

``````Year, Jan, Feb, Mar, Apr, May, Jun, July, Aug, Sep, Oct, Nov, Dec
2017, 12.0, 12.0, 14.0, 16.0, 19.0, 12.0, 11.0, 14.0, 17.0, 19.0, 11.0, 11.5
2018, 13.0, 11.0, 13.5, 16.7, 15.0, 11.0, 12.0, 11.0, 19.0, 18.0, 13.0, 12.5
``````
``````#%%
# Skip first row and first column
array_rain_fall_named = np.loadtxt(
fname="rain-fall-row-col-names.csv",
delimiter=",",
skiprows=1,
usecols=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
)
print("NumPy array: \n", array_rain_fall_named)
print("Shape: ", array_rain_fall_named.shape)
print("Data Type: ", array_rain_fall_named.dtype.name)
``````

OUTPUT:

``````NumPy array:
[[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
[13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape:  (2, 12)
Data Type:  float64
``````

## 2.3 Create NumPy array with GZipped file

Gzip is helpful in reducing the size of files, especially text. For .gz extension file, NumPy.loadtxt() automatically unzip first; before processing as usual.

We can use it for text value file with any delimiters.

``````#%%
# Create array from gzipped csv
array_rain_fall_zip = np.loadtxt(
fname="rain-fall-row-col-names.csv.gz",
delimiter=",",
skiprows=1,
usecols=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
)
print("NumPy array: \n", array_rain_fall_zip)
print("Shape: ", array_rain_fall_zip.shape)
print("Data Type: ", array_rain_fall_zip.dtype.name)
``````

OUTPUT:

``````NumPy array:
[[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
[13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape:  (2, 12)
Data Type:  float64
``````

## 3. Create NumPy array from TSV

TSV (Tab Separated Values) files are used to store plain text in the tabular form. We create a NumPy array from TSV by passing \t as value to delimiter argument in numpy.loadtxt() method.

``````#%%
# Create array from tsv files
array_rain_fall_tab = np.loadtxt(
fname="rain-fall-row-col-names.tsv",
delimiter="\t",
skiprows=1,
usecols=(1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12),
)
print("NumPy array: \n", array_rain_fall_zip)
print("Shape: ", array_rain_fall_zip.shape)
print("Data Type: ", array_rain_fall_zip.dtype.name)
``````

OUTPUT:

``````NumPy array:
[[12. 12. 14. 16. 19. 12. 11. 14. 17. 19. 11. 11.5]
[13. 11. 13.5 16.7 15. 11. 12. 11. 19. 18. 13. 12.5]]
Shape: (2, 12)
Data Type: float64
``````

## 4. Conclusion

In this tutorial we learned about key techniques to create NumPy array using data stored on plain text files like CSV, TSV etc. These methods are very handy while doing data exploration as well as developing program.

Please download source code related to this tutorial here. You can run the Jupyter notebook for this tutorial here.

Mrityunjay
© 2021, All Rights Reserved
Made In India 🇮🇳 with ❤️
Quick Links
Legal Info
Social Media